











摘要: 與以往使用雙語(yǔ)語(yǔ)料庫(kù)作為翻譯記憶(Translation Memory,TM)并采用源端相似度搜索進(jìn)行記憶檢索,進(jìn)而將檢索到的相似句對(duì)與神經(jīng)機(jī)器翻譯(Neural Machine Translation,NMT)模型融合的這種分階段進(jìn)行的方法不同,提出一種新的融合框架,即基于跨語(yǔ)言注意力記憶網(wǎng)絡(luò)的神經(jīng)機(jī)器翻譯模型,該模型使用單語(yǔ)翻譯記憶即目標(biāo)語(yǔ)言句子作為T(mén)M,并以跨語(yǔ)言的方式執(zhí)行可學(xué)習(xí)的檢索。該框架具有一定的優(yōu)勢(shì):第一,跨語(yǔ)言注意力記憶網(wǎng)絡(luò)允許單語(yǔ)句子作為T(mén)M,適合于雙語(yǔ)語(yǔ)料缺乏的低資源場(chǎng)景;第二,跨語(yǔ)言注意力記憶網(wǎng)絡(luò)和NMT模型可以為最終的翻譯目標(biāo)進(jìn)行聯(lián)合優(yōu)化,實(shí)現(xiàn)一體化訓(xùn)練。實(shí)驗(yàn)表明,所提出的方法在4個(gè)翻譯任務(wù)上取得了較好的效果,在雙語(yǔ)資源稀缺的專(zhuān)業(yè)領(lǐng)域中也表現(xiàn)出其在低資源場(chǎng)景下的有效性。
關(guān)鍵詞: 神經(jīng)機(jī)器翻譯;單語(yǔ)翻譯記憶;跨語(yǔ)言注意力記憶網(wǎng)絡(luò);低資源領(lǐng)域;Transformer模型
中圖分類(lèi)號(hào): TP391" " " " 文獻(xiàn)標(biāo)志碼: A
doi:10.3969/j.issn.2095-1248.2023.02.009
Neural machine translation method integrating monolingual translation memory
WANG Bing, YE Na, CAI Dong-feng
(Human-Computer Intelligence Research Center, Shenyang Aerospace University, Shenyang 110136, China)
Abstract: Different from previous researches that used bilingual corpus as TM and source-end similarity search for memory retrieval, a new NMT framework was proposed, which used monolingual translation memory and performed learnable retrieval in a cross-language way. Monolingual translation memory was the use of target language sentences as TM. This framework had certain advantages: firstly, the cross-language memory network allowed monolingual data to be used as TM; secondly, the cross-language memory network and NMT model was jointly optimized for the ultimate translation goal, thus realizing integrated training. Experiments show that the proposed method achieved good results in four translation tasks, and the model also shows its effectiveness in low-resource scenarios.
Key words: neural machine translation;monolingual translation memory;cross-language attention memory network;low-resource scenarios;transformer" model
機(jī)器翻譯是指利用計(jì)算機(jī)將一種自然語(yǔ)言(源語(yǔ)言)轉(zhuǎn)換為另一種自然語(yǔ)言(目標(biāo)語(yǔ)言)的過(guò)程[1]。近年來(lái),端到端的神經(jīng)機(jī)器翻譯(Neural Machine Translation, NMT)取得了巨大進(jìn)步[2-4]。特別是自Transformer模型[4]提出以來(lái),機(jī)器翻譯的翻譯質(zhì)量得到了顯著的提升。Transformer逐漸成為機(jī)器翻譯領(lǐng)域中的主流模型。翻譯記憶是一種輔助翻譯人員完成翻譯任務(wù)的工具,其中存儲(chǔ)著之前已經(jīng)翻譯好的句對(duì)、段落或文本句段[5-7]。譯員遇到待翻譯的源語(yǔ)句時(shí),先從翻譯記憶庫(kù)中檢索與當(dāng)前句子最相似的翻譯記憶句對(duì),檢索完畢后,譯員可以重用匹配的部分來(lái)避免冗余翻譯,以保證翻譯的質(zhì)量。……