應用時間結構資訊之分佈式語音特徵參數正規化技術於強健性語音辨識之研究
No Thumbnail Available
Date
2013
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
近年來,在強健性語音辨識的領域中,統計圖等化法(histogram equalization, HEQ)由於其簡單又擁有優良效能的特性,已成為一個十分熱門的研究課題。在本論文中,我們提出兩種優化的統計圖等化法的技術:分別是利用多項式迴歸改進在調變頻譜(modulation spectrum)上統計圖等化法的效能,以及利用空間與時間的前後文資訊打破傳統作用在梅爾倒頻譜係數特徵的統計圖等化法之假設。這些方法有兩個主要的特色:其一是利用高次方的多項式進行語音特徵的正規化,並加入時間與空間(不同維度)上的前後文資訊,打破傳統統計圖等化法假設時間與空間分別獨立的狀況;其二是將時間上的差分資訊引入語音特徵的正規化中,此舉能更巧妙運用前後文資訊,並對語音辨識的效能有一定的提升。本論文使用Aurora-2語料庫來進行驗證不同強健性語音特徵擷取技術在小詞彙語音辨識任務之效能,並在Aurora-4語料庫來進一步驗證不同強健性語音特徵擷取技術在大詞彙語音辨識任務之效能;而這些試驗的結果證實了本論文所提出兩種優化的統計圖等化法的技術,可以有效降低語音辨識的詞錯誤率,並且對其它進階的特徵(如ETSI advanced front end, AFE)也能產生正面的效果。
Recently, histogram equalization (HEQ) of speech features has received considerable attention in the area of robust speech recognition because of its relative simplicity and good empirical performance. In this thesis, we present a polynomial variant of spectral histogram equalization (SHE) on the modulation spectra of speech features and a novel extension to the conventional HEQ approach conducted on the cepstral domain. Our HEQ methods at least have the following two attractive properties. First, polynomial regression of various orders is employed to efficiently perform feature normalization building upon the notion of HEQ. Second, not only the contextual distributional statistics but also the dynamics of feature values are taken as the input to the presented regression functions for better normalization performance. By doing so, we can to some extent relax the dimension-independence and bag-of-frames assumptions made by the conventional HEQ methods. All experiments were carried out on the Aurora-2 corpus and task and further verified on the Aurora-4 corpus and task. The corresponding results demonstrate that our proposed methods can achieve considerable word error rate reductions over the baseline systems and offer additional performance gains for the AFE-processed features.
Recently, histogram equalization (HEQ) of speech features has received considerable attention in the area of robust speech recognition because of its relative simplicity and good empirical performance. In this thesis, we present a polynomial variant of spectral histogram equalization (SHE) on the modulation spectra of speech features and a novel extension to the conventional HEQ approach conducted on the cepstral domain. Our HEQ methods at least have the following two attractive properties. First, polynomial regression of various orders is employed to efficiently perform feature normalization building upon the notion of HEQ. Second, not only the contextual distributional statistics but also the dynamics of feature values are taken as the input to the presented regression functions for better normalization performance. By doing so, we can to some extent relax the dimension-independence and bag-of-frames assumptions made by the conventional HEQ methods. All experiments were carried out on the Aurora-2 corpus and task and further verified on the Aurora-4 corpus and task. The corresponding results demonstrate that our proposed methods can achieve considerable word error rate reductions over the baseline systems and offer additional performance gains for the AFE-processed features.
Description
Keywords
調變頻譜, 統計圖等化法, 前後文資訊, 多項式擬合, 強健性語音辨識, modulation spectrum, histogram equalization, context information, polynomial fitting, robust speech recognition