Research Paper: Word-Sense Annotation Preprocessor for Improving Neural Machine Translation

 

Abstract

Although neural machine translation (NMT) has recently achieved state-of-the-art performance, it is confronted with the challenge of word-sense disambiguation (WSD). This paper proposes a Korean word-sense annotation preprocessor based on a lexical-semantic network that we built as a large-scale lexical knowledge base for the Korean language. We evaluated the effectiveness of the proposed preprocessor on NMT using Korean-Japanese and Korean-English bi-directional translations. The experiments show that the proposed preprocessor significantly improves the quality of NMT systems for both the similar (Korean-Japanese) and different (Korean-English) sentence structural language pairs in terms of the BLEU and TER evaluation metrics.

Keywords – Lexical semantic network, lexical knowledge base, neural machine translation, parallel corpus word sense disambiguation