改善鑑別式聲學模型訓練於中文連續語音辨識之研究

劉士弘; Shih-Hung Liu

改善鑑別式聲學模型訓練於中文連續語音辨識之研究

Files

n069347018501.pdf (262.01 KB)

n069347018502.pdf (294.5 KB)

n069347018503.pdf (191.03 KB)

n069347018504.pdf (203.14 KB)

n069347018505.pdf (357.09 KB)

Date

2007

Authors

劉士弘

Shih-Hung Liu

Abstract

本論文探討改善鑑別式聲學模型於中文大詞彙連續語音辨識之研究。首先，本論文提出一個新的時間音框層次音素正確率函數來取代最小化音素錯誤訓練的原始音素正確率函數，此新的音素正確率函數在某種程度上能充分地懲罰刪除錯誤。其次，本論文提出一個新的以時間音框層次正規化熵值為基礎的資料選取方法來改進鑑別式訓練，其正規化熵值是由訓練語料所產生之詞圖中高斯分布之事後機率所求得。此資料選取方法可以讓鑑別式訓練更集中在那些離決定邊界較近的訓練樣本所收集的統計值，以達到較佳的鑑別力。此資料選取方法更進一步地應用到非監督鑑別式聲學模型訓練上。最後，本論文也嘗試修改鑑別式訓練的目標函數，以收集不同的統計值來改進最小化音素錯誤鑑別式訓練。所使用的實驗題材是公視新聞語料。由初步的實驗結果來看，結合時間音框層次的資料選取方法和新的音素正確率函數在前幾次的迭代訓練中確實有些微且一致的進步。
This thesis considers improved discriminative training of acoustic models for Mandarin large vocabulary continuous speech recognition (LVCSR). First, we presented a new phone accuracy function based on the frame-level accuracy of hypothesized phone arcs instead of using the raw phone accuracy function of minimum phone error (MPE) training, which to some extent can sufficiently penalize deletion errors of speech recognition. Second, a novel data selection approach based on the normalized frame-level entropy of Gaussian posterior probabilities obtained from the word lattice of the training utterance was explored for discriminative training. It has the merit of making the training algorithm focus much more on the trainingstatistics of those frame samples that center nearly around the decision boundary for better discrimination. The proposed data selection approach was further applied to unsupervised discriminative training of acoustic models. Finally, a few other modifications of the training objective functions, as well as the lattice structures, for the accumulation of MPE training statistics were investigated. Experiments conducted on the Mandarin broadcast news corpus (MATBN) collected in Taiwan showed that the integration of the frame-level data selection and new phone accuracy function could achieve slight but consistent improvements over the conventional MPE training at lower training iterations.

Keywords

鑑別式聲學模型訓練, 大詞彙連續語音辨識, 時間音框正確率函數, 資料選取, Discriminative training, Large vocabulary continuous speech recognition, time frame accuracy function, data selection

URI

http://etds.lib.ntnu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dstdcdr&s=id=%22GN0693470185%22.&%22.id.&
http://rportal.lib.ntnu.edu.tw:80/handle/20.500.12235/106651

Collections

學位論文

Full item page

改善鑑別式聲學模型訓練於中文連續語音辨識之研究

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By