改善鑑別式聲學模型訓練於中文連續語音辨識之研究

劉士弘; Shih-Hung Liu

改善鑑別式聲學模型訓練於中文連續語音辨識之研究

dc.contributor	陳柏琳教授	zh_TW
dc.contributor	Berlin Chen	en_US
dc.contributor.author	劉士弘	zh_TW
dc.contributor.author	Shih-Hung Liu	en_US
dc.date.accessioned	2019-09-05T11:23:36Z
dc.date.available	2007-7-18
dc.date.available	2019-09-05T11:23:36Z
dc.date.issued	2007
dc.description.abstract	本論文探討改善鑑別式聲學模型於中文大詞彙連續語音辨識之研究。首先，本論文提出一個新的時間音框層次音素正確率函數來取代最小化音素錯誤訓練的原始音素正確率函數，此新的音素正確率函數在某種程度上能充分地懲罰刪除錯誤。其次，本論文提出一個新的以時間音框層次正規化熵值為基礎的資料選取方法來改進鑑別式訓練，其正規化熵值是由訓練語料所產生之詞圖中高斯分布之事後機率所求得。此資料選取方法可以讓鑑別式訓練更集中在那些離決定邊界較近的訓練樣本所收集的統計值，以達到較佳的鑑別力。此資料選取方法更進一步地應用到非監督鑑別式聲學模型訓練上。最後，本論文也嘗試修改鑑別式訓練的目標函數，以收集不同的統計值來改進最小化音素錯誤鑑別式訓練。所使用的實驗題材是公視新聞語料。由初步的實驗結果來看，結合時間音框層次的資料選取方法和新的音素正確率函數在前幾次的迭代訓練中確實有些微且一致的進步。	zh_TW
dc.description.abstract	This thesis considers improved discriminative training of acoustic models for Mandarin large vocabulary continuous speech recognition (LVCSR). First, we presented a new phone accuracy function based on the frame-level accuracy of hypothesized phone arcs instead of using the raw phone accuracy function of minimum phone error (MPE) training, which to some extent can sufficiently penalize deletion errors of speech recognition. Second, a novel data selection approach based on the normalized frame-level entropy of Gaussian posterior probabilities obtained from the word lattice of the training utterance was explored for discriminative training. It has the merit of making the training algorithm focus much more on the trainingstatistics of those frame samples that center nearly around the decision boundary for better discrimination. The proposed data selection approach was further applied to unsupervised discriminative training of acoustic models. Finally, a few other modifications of the training objective functions, as well as the lattice structures, for the accumulation of MPE training statistics were investigated. Experiments conducted on the Mandarin broadcast news corpus (MATBN) collected in Taiwan showed that the integration of the frame-level data selection and new phone accuracy function could achieve slight but consistent improvements over the conventional MPE training at lower training iterations.	en_US
dc.description.sponsorship	資訊工程學系	zh_TW
dc.identifier	GN0693470185
dc.identifier.uri	http://etds.lib.ntnu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dstdcdr&s=id=%22GN0693470185%22.&%22.id.&
dc.identifier.uri	http://rportal.lib.ntnu.edu.tw:80/handle/20.500.12235/106651
dc.language	中文
dc.subject	鑑別式聲學模型訓練	zh_TW
dc.subject	大詞彙連續語音辨識	zh_TW
dc.subject	時間音框正確率函數	zh_TW
dc.subject	資料選取	zh_TW
dc.subject	Discriminative training	en_US
dc.subject	Large vocabulary continuous speech recognition	en_US
dc.subject	time frame accuracy function	en_US
dc.subject	data selection	en_US
dc.title	改善鑑別式聲學模型訓練於中文連續語音辨識之研究	zh_TW
dc.title	Improved discriminative training for Mandarin continuous speech recognition	en_US