多種鑑別式語言模型應用於語音辨識之研究

No Thumbnail Available

Date

2010

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

N連(N-gram)語言模型在語音辨識器中扮演著關鍵性的角色,因為它可幫助辨識器從其大量輸出的候選詞序列中,區分出正確與非正確的候選詞序列。然而,因N連語言模型的訓練目標為最大化訓練語料的機率,而不是以最佳化語音辨識評估量為目標,導致在語音辨識效能表現上有所侷限。本論文我們首先探討多種基於不同訓練目標的鑑別式語言模型(Discriminative Language Model, DLMs)。鑑別式語言模型的根本精神即為直接提昇語音辨識效能;接著會比較它們在理論與實際上運用在大詞彙語音辨識上的表現。另外,我們也提出語句相關之鑑別式語言模型(Utterance-driven Discriminative Language Model, UDLM),此語言模型可考慮測試語句的特性,並即時估計其模型參數。最後,我們將最大化事後機率法(Maximum a Posterior, MAP)結合語句相關之鑑別式語言模型,期望最大化事後機率法所產生的辨識結果,能幫助語句相關之鑑別式語言模型獲致更顯著的語音辨識率提昇。本論文的實驗皆建立在臺灣中文廣播新聞語料上,實驗結果顯示本論文所提出之作法可獲得一定的語音辨識率提升。
N-gram language modeling is a crucial component in any speech recognizer since it is expected to help the recognizer distinguish the correct hypothesis from the other incorrect ones in an extremely large output space of the recognizer. However, the N-gram language models are inadequate since they usually set the goal of training at maximizing the likelihood of a large amount of training text, but not at optimizing the final performance measure of speech recognition. In this thesis, we first investigate a wide variety of discriminative language models (DLMs), which have their roots stemming from different training objectives but are consistent with the intuition of enhancing recognition performance. The utilities of these DLMs are compared both theoretically and empirically. Further, we also propose a test utterance-driven DLM (UDLM) that can efficiently infer its model parameters on-the-fly and accommodate itself well to speech recognition applications. As a final point, we pair UDLM with the maximum a posteriori probability (MAP) language model adaptation approach for better recognition performance. All experiments are conducted on a Mandarin broadcast news corpus compiled in Taiwan, and the associated results seem to demonstrate the feasibility of the proposed methods.

Description

Keywords

語音辨識, 語言模型, 鑑別式語言模型, 重新排序

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By