陳柏琳Chen, Berlin蔡孟庭TSAI, Meng-Ting2024-12-172024-02-052024https://etds.lib.ntnu.edu.tw/thesis/detail/7ebad777c171949790edc1075e56d9a6/http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/123696在語言學習領域中,聽力與口語訓練的方法可概分為跟讀法、聆聽並重複法與回聲法三種方式。這些方法的核心概念均是在聆聽目標語言的發音後,進而模仿其語調並覆述相同內容。然而在實際的學習環境中,要找到可作為學習對象的母語 (First Language, L1) 語者存在諸多限制,例如偏鄉地區師資缺乏、成本高昂、不利於個人化進度安排等。另外研究也指出,當第二語言 (English as a Second Language / Second Language, ESL / L2) 學習者在聆聽與模仿標準語音時,若語音標的的語者特徵與自己較為接近,對於發音技巧的訓練更為有益。這種結合 L2 學習者語者特性與 L1 語者口音特性的語音段落稱為「黃金語者 (Golden Speaker)」。為了解決上述問題,本研究選擇英語作為合成目標語言,以產生 L2 英語學習者的黃金標準語音。除嘗試改進合成結果並提出適用於發音學習情境的合成語音評估框架,也證實合成語音可以改善錯誤發音。研究並將此合成語音言應用於電腦輔助發音訓練領域,驗證 L2 學習者原始語音與合成語音之間動態時間校正差異量可有效作為發音評估的預測特徵,並藉由合成語音提高自動發音評估的準確率,進而促進學習者與教學者在電腦輔助發音訓練情境的學習及工作效益。In the field of language learning, listening and speaking training methods can be broadly categorized into three approaches: shadowing, listen-and-repeat, and echoing. The core concepts underlying these methods are to listen to the pronunciation of the target language and subsequently imitate its intonation and restate the same content. However, in practical learning environments, there are many constraints in finding native speaker, also known as L1 speakers, as learning targets, such as a lack of qualified instructors in rural areas, high costs, and challenges in accommodating personalized progress schedules. In addition, research indicates that when English as a Second Language/Second Language (ESL/L2) learners listen to and imitate standard pronunciations, the more aligned the speaker characteristics of the pronunciation model are with those of the learners, the more advantageous the pronunciation skills training becomes. This study chooses English as the synthesized target language to generate golden standard pronunciations for L2 English learners. Apart from attempting to improve the synthesis results and proposing a synthesis speech evaluation framework applicable to pronunciation training scenarios, this study confirms that synthesized speech can improve erroneous pronunciations. The study further extends the application of this synthesized speech, validating that the dynamic time warping cost between the original speech of L2 learners and the synthesized speech can effectively serve as predictive features for pronunciation assessment. Through the utilization of synthesized speech, the aim is to enhance the accuracy of automatic pronunciation assessment, thereby fostering improved learning and work efficiency for both learners and instructors in the context of computer-assisted pronunciation training.語音合成電腦輔助發音訓練黃金語音自動發音評估text-to-speechcomputer-assisted pronunciation traininggolden speakerautomatic pronunciation assessment語音合成技術應用於電腦輔助發音訓練之研究A Study of Speech Synthesis Techniques for Computer-Assisted Pronunciation Training學術論文