語言模型訓練與調適技術於中文大詞彙連續語音辨識之初步研究

蔡文鴻; Wen-Hung Tsai

語言模型訓練與調適技術於中文大詞彙連續語音辨識之初步研究

Files

701301.pdf (34.34 KB)

701302.pdf (118.19 KB)

701303.pdf (184.89 KB)

701304.pdf (58.11 KB)

701305.pdf (101.34 KB)

Date

2005

Authors

蔡文鴻

Wen-Hung Tsai

Abstract

在過去三十年間，統計式語言模型在各種與自然語言相關的應用上一直是一個重要的研究議題，它的功能是擷取自然語言中的各種資訊，諸如前後文資訊（contextual information）、語意資訊（semantic information）等，再利用這些資訊以機率量化來決定一個詞序列（word sequence）發生的可能性。例如，在語音辨識中，語言模型扮演的角色是要解決聲學混淆（acoustic confusion）的問題，將正確的辨識結果從有可能的候選詞序列中挑選出來。近年來，語音辨識在我們生活中已有越來越多的應用，例如語音聽寫（voice dictation）、電話轉接（call routing）系統等等。但是語音辨識效能的好壞，通常會隨著辨識任務的詞彙或語意的不同，而受到嚴重的影響，於是誕生了語言模型調適的研究。語言模型調適是要利用辨識任務中固有的詞彙和語意資訊來彌補訓練語料與測試語料間的不一致性（mismatch）。在本論文中，提出了原本應用在機率式資訊檢索上的主題混合模型法（topic mixture model, TMM）來動態的利用長距離的主題資訊，並且運用在語言模型調適上得到了不錯的效果。此外，本論文對最大熵值法（maximum entropy, ME）亦做了深入的研究，最大熵值法是一種將不同資訊來源（information sources）整合的方法，在此方法中，每一個資訊來源都會引發一群限制（constraints），限制合併後的語言模型要滿足所有的資訊。然而，這些限制的交集（intersection），是滿足所有資訊的機率分佈的集合，在這個集合中，擁有最大熵值（highest entropy）的機率分佈即為此方法的解。初步的實驗結果顯示以最大熵值法來合併一連詞、二連詞與三連詞所得到的語言模型，比用傳統最大相似度估測法（maximum likelihood）所訓練的語言模型，在中文廣播新聞轉寫上的字錯誤率（character error rate, CER）與語言模型複雜度（perplexity）都達到較好的效果。
Statistical language modeling, which aims to capture the regularities in human natural language and quantify the acceptance of a given word sequence, has continuously been an important research issue in a wide variety of applications of natural language processing (NLP) over the past three decades. For example, in speech recognition, the principal role of the language models is to help resolve the acoustic confusion and thus separate the correct hypothesis from the competing ones. In the recent past, there were quite many applications of speech recognition technology being developed, such as voice dictation and call routing systems, etc. However, speech recognition performance is often seriously affected by the varying lexical and semantic characteristics among different application tasks. Thus, there is always a need for language model adaptation, which has the goal to exploit the specific lexical and semantic information inherent in the recognition domain, so as to compensate the mismatch between training and testing conditions. In this thesis, a topical mixture model (TMM) previously proposed for probabilistic information retrieval was investigated to dynamically explore the long-span latent topical information for language model adaptation. Moreover, we also studied the use of the Maximum Entropy (ME) principle for language modeling. ME is a principle for efficient combination of a variety of information sources. Under the ME criterion, each information source gives rise to a set of constraints that can be futher imposed on the resultant language model. The intersection of these constraints is the set of language model probability distributions which can satisfy all of these constraints. The probability distribution which has highest entropy is thus the solution of the ME principle. The preliminary experimental results show that the ME-based language modeling approach can achieve superior performance over the conventional Maximum Likelihood (ML) based approach in both character error rate and perplexity reductions on the Mandarin broadcast news transcription task.

Keywords

語言模型, 語言模型調適, 主題混合模型, 最大熵值法, language model, language model adaptation, topic mixture model, maximum entropy

URI

http://etds.lib.ntnu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dstdcdr&s=id=%22G0069147013%22.&%22.id.&
http://rportal.lib.ntnu.edu.tw:80/handle/20.500.12235/106315

Collections

學位論文

Full item page

語言模型訓練與調適技術於中文大詞彙連續語音辨識之初步研究

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By