結合韻律特徵與聲學特徵於錯誤發音檢測與診斷之研究
No Thumbnail Available
Date
2019
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
本論文探討韻律特徵應用多任務深層網路模型於錯誤發音檢測及診斷(mispronunciation detection and diagnosis, MDD)之研究。電腦輔助發音訓練(computer assisted pronunciation training, CAPT)之目的在於透過電腦自動地指正外語學習者的發音問題;其在程序上大致可分為錯誤發音檢測(mispronunciation detection)與錯誤發音診斷(mispronunciation diagnosis)等兩個階段。本論文主要探討 1.)韻律特徵與聲學特徵結合後對於錯誤發音檢測與診斷的幫助。 2.)希望利用多任務深層網路模型解決資料正例反例不平衡之問題。 3.)結合基於相似度的評分(likelihood-based scoring,GOP)以及基於分類器評分(classification-based scoring)的方法達到更好的檢測結果以及診斷結果。 實驗結果顯示,聲學特徵對於錯誤發音檢測任務較有幫助;而韻律特徵對錯誤發音診斷任務有較好的助益。
The main idea of this thesis is to discuss the assists of the multi-task deep neural network model and prosody characteristics in mispronunciation detection and diagnosis (MDD). The purpose of computer assisted pronunciation training (CAPT) is to help second-language (L2) learners automatically correcting the mistaken pronunciation. Computer assisted pronunciation training can be divided into mispronunciation detection and mispronunciation diagnosis. This paper mainly focuses on three aspects. First, we explore the benefits using the combined features of prosodic and phonetic characteristic in mispronunciation detection and diagnosis task. Second, we use multi-task learning models to help solving the data unbalanced problem. Last but not least, we combine likelihood-based scoring (GOP) method and classification-based scoring method in order to achieve better detection and diagnosis results. The result of experiments shows that phonetic features work better when we need to detect the mispronunciation. On the contrary, prosodic features are more helpful to mispronunciation diagnosis task.
The main idea of this thesis is to discuss the assists of the multi-task deep neural network model and prosody characteristics in mispronunciation detection and diagnosis (MDD). The purpose of computer assisted pronunciation training (CAPT) is to help second-language (L2) learners automatically correcting the mistaken pronunciation. Computer assisted pronunciation training can be divided into mispronunciation detection and mispronunciation diagnosis. This paper mainly focuses on three aspects. First, we explore the benefits using the combined features of prosodic and phonetic characteristic in mispronunciation detection and diagnosis task. Second, we use multi-task learning models to help solving the data unbalanced problem. Last but not least, we combine likelihood-based scoring (GOP) method and classification-based scoring method in order to achieve better detection and diagnosis results. The result of experiments shows that phonetic features work better when we need to detect the mispronunciation. On the contrary, prosodic features are more helpful to mispronunciation diagnosis task.
Description
Keywords
電腦輔助發音訓練, 多任務學習, 自動語音辨識, 錯誤發音檢測, 錯誤發音診斷, 韻律特徵, 深層類神經網路, computer assisted pronunciation training, mispronunciation detection, mispronunciation diagnosis, acoustic models, deep neural networks, multi-task learning, prosodic features