No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
本研究旨在改良囊括日常用語的台灣閩南語語音字典。有鑒於台灣的老化人口日益增加,建置台灣閩南語語音資料庫於未來多元應用更趨重要,如:語音科技改良與語言保存。然而,由於台灣閩南語為低資源語言(low-resource language)之一,目前可取得的台灣閩南語語料相當稀少;本研究蒐集線上台灣閩南語對話語料,並予以人工分詞與標記。本研究以蒐集的語料探究現有的台灣閩南語字典之於蒐集語料的涵蓋率,並發現尚有未被收錄於台灣閩南語字典中的台灣閩南語發音詞條與台灣閩南語新詞條。本研究將整理未被收錄於字典中的詞條,蒐集的詞條將被分類為三個類別;其分別為:發音變異(pronunciation variation, PV)、多種發音(multiple pronunciation, MP) 與新詞(new word, NW) 。本文將呈現針對蒐集語料的深入分析,並基於觀察的結果進行討論。討論重點將著眼於總括性的觀察性統整。我們期望此研究結果能夠反映部分台灣閩南語語詞在實際台灣閩南語對話中的使用情形,並協助改良現有台灣閩南語語音辨識系統。
This thesis aims to optimize a Taiwanese Southern Min (TSM) lexicon that accommodates daily use of TSM words. In light of the increasing aging population in Taiwan, it might be necessary to build a database containing TSM words for diverse applications such as speech technologies and language preservation. Nevertheless, as a low-resource language, there is a dearth of available for TSM research. Due to the scarcity of TSM data, this thesis prepared TSM data by gathering on-line TSM conversational speeches, segmenting the content of the speeches, and annotating the data manually. Next, this thesis investigated the word coverage of the existing TSM dictionary and found that some TSM pronunciations and TSM words have yet been included in the dictionary. Data that were not found were then sorted into 3 categories: pronunciation variation (PV), multiple pronunciation (MP), and new word (NW) based on their pronunciation variation types. Followed up an in-depth description of data analysis, a discussion based on our observation will be elicited. The discussion would shed the lights on the generalization of our findings. It is expected that our findings would be capable of capturing a glimpse of daily use of TSM. We hope our results could be able to help optimize the lexicon for Taiwanese Southern Min speech recognition (TSMSR) system in progress and benefit TSM-related studies in the future.
This thesis aims to optimize a Taiwanese Southern Min (TSM) lexicon that accommodates daily use of TSM words. In light of the increasing aging population in Taiwan, it might be necessary to build a database containing TSM words for diverse applications such as speech technologies and language preservation. Nevertheless, as a low-resource language, there is a dearth of available for TSM research. Due to the scarcity of TSM data, this thesis prepared TSM data by gathering on-line TSM conversational speeches, segmenting the content of the speeches, and annotating the data manually. Next, this thesis investigated the word coverage of the existing TSM dictionary and found that some TSM pronunciations and TSM words have yet been included in the dictionary. Data that were not found were then sorted into 3 categories: pronunciation variation (PV), multiple pronunciation (MP), and new word (NW) based on their pronunciation variation types. Followed up an in-depth description of data analysis, a discussion based on our observation will be elicited. The discussion would shed the lights on the generalization of our findings. It is expected that our findings would be capable of capturing a glimpse of daily use of TSM. We hope our results could be able to help optimize the lexicon for Taiwanese Southern Min speech recognition (TSMSR) system in progress and benefit TSM-related studies in the future.
台灣閩南語, 發音變異, 多種發音, 新詞, 字典, Taiwanese Southern Min (TSM), pronunciation variation (PV), multiple pronunciation (MP), new word (NW), lexicon