探討聲學模型化技術與半監督鑑別式訓練於語音辨識之研究

羅天宏; Lo, Tien-Hong

探討聲學模型化技術與半監督鑑別式訓練於語音辨識之研究

dc.contributor	陳柏琳	zh_TW
dc.contributor	Chen, Berlin	en_US
dc.contributor.author	羅天宏	zh_TW
dc.contributor.author	Lo, Tien-Hong	en_US
dc.date.accessioned	2019-09-05T11:14:54Z
dc.date.available	2019-02-16
dc.date.available	2019-09-05T11:14:54Z
dc.date.issued	2019
dc.description.abstract	近年來鑑別式訓練(Discriminative training)的目標函數Lattice-free maximum mutual information (LF-MMI)在自動語音辨識(Automatic speech recognition, ASR)的聲學模型(Acoustic model)訓練上取得重大的突破。儘管LF-MMI在監督式環境下斬獲最好的成果，然而在半監督式環境下的研究成果仍然有限。在常見的半監督式方法─自我訓練(Self-training)中，種子模型(Seed model)常因為語料有限而效果不佳。再者，因為LF-MMI屬於鑑別式訓練之故，較易受到標記正確與否的影響。基於上述，本論文將半監督式訓練拆解成兩個問題：1)如何提升種子模型的效能，以及2)如何利用未轉寫(無人工標記)語料。針對第一個問題，我們使用兩種方法可分別對應到是否具存有額外資料的情況，其一為遷移學習(Transfer learning)，使用技術為權重遷移(Weight transfer)和多任務學習(Multitask learning)；其二為模型合併(Model combination)，使用技術為假說層級合併(Hypothesis-level combination)和音框層級合併(Frame-level combination)。針對第二個問題，基於LF-MMI目標函數，我們引入負條件熵(Negative conditional entropy, NCE)與保留更多假說空間的詞圖監督(Lattice for supervision)。在一系列於互動式會議語料(Augmented multi-party interaction, AMI)的實驗結果顯示，不論是利用領域外資料(Out-of-domain data, OOD)的遷移學習或多樣性互補的模型合併皆可提升種子模型的效能，而NCE與詞圖監督則能運用未轉寫語料降改善錯誤率(Word error rate, WER)與詞修復率(WER recovery rate, WRR)。	zh_TW
dc.description.abstract	More recently, a novel objective function of discriminative acoustic model training, namely Lattice-free maximum mutual information (LF-MMI), has been proposed and achieved the new state-of-the-art in automatic speech recognition (ASR). Although LF-MMI shows excellent performance in various ASR tasks with supervised training settings, its performance is often significantly degraded when with semi-supervised settings. This is because LF-MMI shares a common deficiency of discriminative training criteria, being sensitive to the accuracy of the corresponding transcripts of training utterances. In view of the above, this thesis explores two questions to LF-MMI with a semi-supervised training setting: the first one is how to improve the seed model and the second one is how to use untranscribed training data. For the former, we investigate several transfer learning approaches (e.g. weight transfer and multitask learning) and the model combination (e.g. hypothesis-level combination and frame-level combination). The distinction between the above two methods is whether extra training data is being used or not. On the other hand, for the second question, we introduce negative conditional entropy (NCE) and lattice for supervision, in conjunction with the LF-MMI objective function. A series of experiments were conducted on the Augmented Multi-Party Interaction (AMI) benchmark corpus. The experimental results show that transfer learning using out-of-domain data (ODD) and model combination based on complementary diversity can effectively improve the performance of the seed model. The pairing of NCE and lattice for supervision can improve the word error rate (WER) and WER recovery rate (WRR).	en_US
dc.description.sponsorship	資訊工程學系	zh_TW
dc.identifier	G060547047S
dc.identifier.uri	http://etds.lib.ntnu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dstdcdr&s=id=%22G060547047S%22.&%22.id.&
dc.identifier.uri	http://rportal.lib.ntnu.edu.tw:80/handle/20.500.12235/106500
dc.language	中文
dc.subject	半監督式學習	zh_TW
dc.subject	鑑別式訓練	zh_TW
dc.subject	整體學習	zh_TW
dc.subject	遷移學習	zh_TW
dc.subject	自動語音辨識	zh_TW
dc.subject	聲學模型	zh_TW
dc.subject	LF-MMI	zh_TW
dc.subject	semi-supervised training	en_US
dc.subject	discriminative training	en_US
dc.subject	transfer learning	en_US
dc.subject	ensemble learning	en_US
dc.subject	automatic speech recognition	en_US
dc.subject	acoustic model	en_US
dc.subject	LF-MMI	en_US
dc.title	探討聲學模型化技術與半監督鑑別式訓練於語音辨識之研究	zh_TW
dc.title	Investigating Acoustic Modeling and Semi-supervised Discriminative Training for Speech Recognition	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 060547047s01.pdf
Size:: 5.09 MB
Format:: Adobe Portable Document Format

Download

Collections

學位論文