運用調變頻譜分解技術於強健語音特徵擷取之研究

汪逸婷

運用調變頻譜分解技術於強健語音特徵擷取之研究

Files

n060047062s01.pdf (1.59 MB)

Date

2014

Authors

汪逸婷

Abstract

近年來，語音特徵調變頻譜的研究，由於其簡單又能針對語音特徵提供整體變化分析的特性，在強健性自動語音辨識的領域獲得了廣大的迴響；本論文著重於二個部分：其一為非負矩陣分解法之延伸，非負矩陣分解法由於能有效擷取調變頻譜中關鍵且不受雜訊影響的資訊，而得到許多關注，本論文將延續這個領域的研究，提出對語音進行分群處理的分群式非負矩陣分解法，以及加上稀疏性之條件的稀疏化非負矩陣分解法。其二為壓縮感知法之延伸，壓縮感知法為一種用較相關之資訊以較精簡的方式來還原訊號，本論文提出一個展新的想法，將壓縮感知法應用在語音特徵調變頻譜。分群式非負矩陣分解法為運用分群處理的技術將不同特性的語句分開處理，使非負矩陣分解法能夠更精準地擷取語音中的重要資訊，而不受語句之間的變異性干擾；稀疏化非負矩陣分解法為探索非負矩陣分解法中稀疏性帶來的影響，以期取得較集中且不重覆的基底調變頻譜。本論文所有的實驗皆使用常見的Aurora-2語料庫進行驗證，並進一步在大詞彙語料庫Aurora-4進行驗證。實驗的結果說明了：本論文所提出的兩種延伸方法，確實能在改進語音辨識的強健性上發揮其效力，並得到比其他調變頻譜應用技術更佳的辨識正確率。
Modulation spectrum processing of acoustic features has received considerable attention in the area of robust automatic speech recognition (ASR) because of its relative simplicity and good empirical performance. This thesis focus on two concept: one is nonnegative matrix factorization (NMF). An emerging school of thought is to conduct NMF on the modulation spectrum domain so as to distill intrinsic and noise-invariant temporal structure characteristics of acoustic features for better robustness. Our work try to extend the NMF by cluster the training data called cluster-based NMF and consider the sparsity of NMF called sparsed NMF. The other is compressive sensing. We proposed a novel concept to use compressive sensing on modulation spectrum. Cluster-based NMF is to investigate an alternative cluster-based NMF processing, in which speech utterances belonging to different clusters will have their own set of cluster-specific basis vectors. As such, the speech utterances can retain more compressive sensing in the NMF processed modulation spectra. Sparsed NMF is try to explore the notion of sparsity for NMF so as to ensure the derived basis vectors have sparser and more localized representations of the modulation spectra. All experiments were conducted with the widely-used Aurora-2 database and task. Furthermore, we used to LVCSR task Aurora-4. Empirical evidence reveals that our methods can offer substantial improvements and achieve performance competitive to or better than several widely-used robustness methods.

Keywords

調變頻譜, 強健性, 自動語音辨識, 非負矩陣分解法, 稀疏性, 壓縮感知法, modulation spectrum, robustness, automatic speech recognition, nonnegative matrix factorization, sparsity, compressive sensing

URI

http://etds.lib.ntnu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dstdcdr&s=id=%22GN060047062S%22.&%22.id.&
http://rportal.lib.ntnu.edu.tw:80/handle/20.500.12235/106570

Collections

學位論文

Full item page

運用調變頻譜分解技術於強健語音特徵擷取之研究

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By