運用調變頻譜分解技術於強健語音特徵擷取之研究
No Thumbnail Available
Date
2014
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
近年來,語音特徵調變頻譜的研究,由於其簡單又能針對語音特徵提供整體變化分析的特性,在強健性自動語音辨識的領域獲得了廣大的迴響;本論文著重於二個部分:其一為非負矩陣分解法之延伸,非負矩陣分解法由於能有效擷取調變頻譜中關鍵且不受雜訊影響的資訊,而得到許多關注,本論文將延續這個領域的研究,提出對語音進行分群處理的分群式非負矩陣分解法,以及加上稀疏性之條件的稀疏化非負矩陣分解法。其二為壓縮感知法之延伸,壓縮感知法為一種用較相關之資訊以較精簡的方式來還原訊號,本論文提出一個展新的想法,將壓縮感知法應用在語音特徵調變頻譜。分群式非負矩陣分解法為運用分群處理的技術將不同特性的語句分開處理,使非負矩陣分解法能夠更精準地擷取語音中的重要資訊,而不受語句之間的變異性干擾;稀疏化非負矩陣分解法為探索非負矩陣分解法中稀疏性帶來的影響,以期取得較集中且不重覆的基底調變頻譜。本論文所有的實驗皆使用常見的Aurora-2語料庫進行驗證,並進一步在大詞彙語料庫Aurora-4進行驗證。實驗的結果說明了:本論文所提出的兩種延伸方法,確實能在改進語音辨識的強健性上發揮其效力,並得到比其他調變頻譜應用技術更佳的辨識正確率。
Modulation spectrum processing of acoustic features has received considerable attention in the area of robust automatic speech recognition (ASR) because of its relative simplicity and good empirical performance. This thesis focus on two concept: one is nonnegative matrix factorization (NMF). An emerging school of thought is to conduct NMF on the modulation spectrum domain so as to distill intrinsic and noise-invariant temporal structure characteristics of acoustic features for better robustness. Our work try to extend the NMF by cluster the training data called cluster-based NMF and consider the sparsity of NMF called sparsed NMF. The other is compressive sensing. We proposed a novel concept to use compressive sensing on modulation spectrum. Cluster-based NMF is to investigate an alternative cluster-based NMF processing, in which speech utterances belonging to different clusters will have their own set of cluster-specific basis vectors. As such, the speech utterances can retain more compressive sensing in the NMF processed modulation spectra. Sparsed NMF is try to explore the notion of sparsity for NMF so as to ensure the derived basis vectors have sparser and more localized representations of the modulation spectra. All experiments were conducted with the widely-used Aurora-2 database and task. Furthermore, we used to LVCSR task Aurora-4. Empirical evidence reveals that our methods can offer substantial improvements and achieve performance competitive to or better than several widely-used robustness methods.
Modulation spectrum processing of acoustic features has received considerable attention in the area of robust automatic speech recognition (ASR) because of its relative simplicity and good empirical performance. This thesis focus on two concept: one is nonnegative matrix factorization (NMF). An emerging school of thought is to conduct NMF on the modulation spectrum domain so as to distill intrinsic and noise-invariant temporal structure characteristics of acoustic features for better robustness. Our work try to extend the NMF by cluster the training data called cluster-based NMF and consider the sparsity of NMF called sparsed NMF. The other is compressive sensing. We proposed a novel concept to use compressive sensing on modulation spectrum. Cluster-based NMF is to investigate an alternative cluster-based NMF processing, in which speech utterances belonging to different clusters will have their own set of cluster-specific basis vectors. As such, the speech utterances can retain more compressive sensing in the NMF processed modulation spectra. Sparsed NMF is try to explore the notion of sparsity for NMF so as to ensure the derived basis vectors have sparser and more localized representations of the modulation spectra. All experiments were conducted with the widely-used Aurora-2 database and task. Furthermore, we used to LVCSR task Aurora-4. Empirical evidence reveals that our methods can offer substantial improvements and achieve performance competitive to or better than several widely-used robustness methods.
Description
Keywords
調變頻譜, 強健性, 自動語音辨識, 非負矩陣分解法, 稀疏性, 壓縮感知法, modulation spectrum, robustness, automatic speech recognition, nonnegative matrix factorization, sparsity, compressive sensing