探索調變頻譜特徵之低維度結構應用於強健性語音辨識

No Thumbnail Available

Date

2017

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

語音強健技術(Robustness)在自動化語音辨識系統(Automatic Speech Recognition, ASR)中扮演著相當重要的角色,尤其是環境的影響(Environment effect )下,更能突顯其重要性。近年來的研究指出,探索語音特徵的低維度結構(Low-dimensional Structure)有助於萃取出較具有強健性的語音特徵。有鑒於上述觀點,我們研究多種考量語音特徵固有(Intrinsic)的低維度結構,並找尋俱有特定結構的子空間以涵蓋原本高維度的語音特徵空間,以此希望能獲得較具強健性的語音特徵。 在本篇論文中,我們探索了一系列的低維度結構方法並應用在語音條變頻譜域(Modulation Spectra),希望能淬煉出強健性語音特徵。首先,我們使用基於稀疏表示(Sparse Representation)的方法來廣泛地分析高維度語音特徵,再從中去找出一冗贅(Residual)的基底(Basis)後並加以去除。接著我們提出了基於低秩表示法(Low-rank Representation)來探索語音條變頻譜的子空間結構,從而減輕噪音所造成的負面影響。最後,我們探索語音特徵調變頻譜上固有的幾何低維度流形結構(Geometric Low-dimensional Manifold Structures),希望能將帶有噪音的音訊投影到此流形結構上,以獲得更具有強健性的語音特徵。此外,為了獲得更好的語音辨識效能,我們將所提出的方法與常見的語音正規化特徵結合,其結果都有良好的表現。所有實驗都在Aurora-4數據庫和任務上進行和驗證。
Developments of noise robustness techniques are vital to the success of automatic speech recognition (ASR) systems in face of varying sources of environmental interference. Recent studies have shown that exploring low dimensionality of speech features can yield good robustness. Along this vein, researches on low dimensional structures, which considers the intrinsic structures of speech features residing in some low-dimensional subspaces, has gained considerable interest from the ASR community. In this thesis, we have explored a family of the low-dimensional structure methods at modulation domain, in hope to obtain more noise-robust speech features. The general line of this research is divided into three significant aspects. First, sparse representation based methods are utilized to remove some residual bases from the modulation spectra of speech features Second, we propose a novel use of the LRR-based method to discover the subspace structures of modulation spectra, thereby alleviating the negative effects of noise interference. Third, we endeavor to explore the intrinsic geometric low-dimensional manifold structures inherent in modulation spectra of speech features, in the hope to obtain more noise-robust speech features. Furthermore, we also extensively compare our approaches with several well-practiced feature-based normalization methods. All experiments were conducted and verified on the Aurora-4 database and task.

Description

Keywords

調變頻譜, 強健性語音辨識, 流形學習法, 稀疏表示法, 低秩表示法, robust speech recognition, manifold learning, sparse representation, low-rank representation, modulation spectrum

Citation

Collections