Interface Research Between Audiovisual Translation and Written Translation: On the Translation Strategies of Audio Description (AD) Scripts

ISSN:3029-2301

EISSN:3029-2328

语言:英文

作者
Chang Haomei
文章摘要
To address the core issues of traditional manual Audio Description (AD) script translation—difficulty in aligning dynamic visual elements with text and lack of systematic cross-cultural adaptation—this study constructs a hierarchical deep learning translation framework comprising the Multimodal Data Preprocessing Layer, Dynamic Attention Fusion Layer, and Cultural Adaptation Optimization Layer. The Multimodal Data Preprocessing Layer aligns source-language AD scripts with video frames, extracts visual features via a "static + dynamic" dual-path strategy (object detection for static features, SlowFast two-stream model for dynamic features), and enhances text features using BPE, BERT, and functional label annotation. The Dynamic Attention Fusion Layer (based on a modified Transformer) achieves in-depth cross-modal fusion through parallel encoding and cross-attention, while the Cultural Adaptation Optimization Layer optimizes cultural elements via an AD Scene Cultural Knowledge Graph. Experiments on a self-built 120,000-entry multi-scenario, multilingual AD script-visual dataset (split 8:1:1) show the framework outperforms Google Translate, Traditional Transformer, and Multimodal Transformer, with 64.3% BLEU, 59.8% METEOR, 70.1% CHRF, and 92.5% visual-semantic matching, effectively improving AD script translation quality and cultural adaptability to support multi-language AD production.
文章关键词
Audio Description; Written Translation; deep learning
参考文献
[1] Yeskindirova M Z,Zadorozhnaya L A,Korogod N P.Audiovisual translation and audio description:history and development[J].Bulletin of LN Gumilyov Eurasian National University.PHILOLOGY Series,2024,148(3):203-210. [2] Talaván N,Lertola J,Moreno A I.Audio description and subtitling for the deaf and hard of hearing:Media accessibility in foreign language learning[J].Translation and Translanguaging in Multilingual Contexts,2022,8(1):1-29. [3] Bausells-Espín A.Audio description as a pedagogical tool in the foreign language classroom:An analysis of student perceptions of difficulty,usefulness and learning progress[J].Journal of Audiovisual Translation,2022,5(2):152–175-152–175. [4] Reviers N,Roofthooft H,Remael A.Translating multisemiotic texts:The case of audio introductions for the performing arts[J].JoSTrans:the journal of specialised translation.-London,2003,currens,2021,35:69-95. [5] Talaván N,Lertola J.Audiovisual translation as a didactic resource in foreign language education.A methodological proposal[J].Encuentro Journal,2022,30:23-39. [6] Wang Y,Liang W,Huang H,et al.Toward automatic audio description generation for accessible videos[C]//Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems.2021:1-12. [7] Soldan M,Pardo A,Alcázar J L,et al.Mad:A scalable dataset for language grounding in videos from movie audio descriptions[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:5026- 5035. [8] Papadakis N M,Aletta F,Kang J,et al.Translation and cross-cultural adaptation methodology for soundscape attributes–A study with independent translation groups from English to Greek[J].Applied Acoustics,2022,200:109031.
Full Text:
DOI