多義使役動詞「讓」之二元分類

陳正賢Chen, Alvin Cheng-Hsien任賓森Robinson, Mark James2024-12-172024-02-052024https://etds.lib.ntnu.edu.tw/thesis/detail/35c9edce5a51fb07efe2934b84bf205a/http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/123109nonePolysemy in language is a significant challenge for language comprehension, particularly in the field of natural language processing. This has led to the development of word sense disambiguation tasks that attempt to determine which sense of a word is being invoked in a given sentence/context. The explosion of machine learning and various computational techniques has produced significant success in this field. Word sense disambiguation methods have been useful in the field of translation, although distinct and various challenges persist. In this paper, one such challenge will be explored. The Mandarin Chinese periphrastic causative verb ràng is polysemous and can take two causative forms: strong, weak. This thesis used translations of ràng based on an open-source corpus, OpenSubtitles, to produce an automatically annotated dataset. This dataset was then used to train three different machine learning algorithms that classify the two different forms of the verb. A bag-of-words model, a feature-engineered model, and a BERT transformer model achieved approximately 79%, 78%, and 84% percent accuracy respectively. These results indicate a potentially usefulapproach to machine translation research. These models yielded new insights into syntactic patterns that favor certain interpretations of ràng. Such insights give evidence to the claim that the methods used in this paper have the potential to improve machine translation and can inform word sense disambiguation task methodology.nonemachine translationpolysemyword sense disambiguationmachine learningràngperiphrastic causatives多義使役動詞「讓」之二元分類Binary classification of polysemous ràng as a periphrastic causative verb學術論文