Grammatical Error Correction in Indonesian - Korean Machine Translation
2. Abstract
Machine translation is the process of using a computation to automatically translate text from one language to another without human involvement. Apart from the ease and progress of the current translation system which uses a neural network, the Neural Machine Translation system still suffers from language-specific problems. One of the examples for language-specific problems is the translation of honorifics, which relates to the error in grammatical form. This study uses a multilingual model named NLLB-200 to be fine-tuned with the Korean-Indonesian parallel corpus. In the result of baseline model, it was shown that there are some problems related to honorifics in translation.
To better the results, a beam search decoding algorithm is used. The result indicates that there is an improvement in SacreBLEU scores ranging from 0.74 to 0.98 points, depending on the change in beam size. Then, data augmentation method is applied to utilize available monolingual corpus. Here, back-translation technique is used for creating source (synthetic) - target parallel data for training. The back-translation gives a SacreBLEU score result of 24.37 points in Indonesian - Korean model and 25.86 points in Korean - Indonesian model. In this study, back-translation combined with beam search decoding algorithm is able to correct some errors in honorifics translation.