運用鄰近與概念資訊於語言模型調適之研究

郝柏翰

運用鄰近與概念資訊於語言模型調適之研究

Files

n060047082s01.pdf (1.73 MB)

Date

2014

Authors

郝柏翰

Abstract

本論文研究語言模型調適技術用於中文大詞彙連續語音辨識，其主要貢獻有兩個部分：第一部分探討主題模型(Topic Models)之延伸與改進，除了希望能放寬詞袋假設的限制之外，更藉由融入鄰近資訊(Proximity Information)期望使主題模型有更好的預測效能；第二部分提出概念模型(Concept Language Model, CLM)，其主要目的為近似使用者心中所想之概念，並藉此觀察較為相關之用詞；同時，本論文更嘗試以不同方式來估測概念模型。本論文實驗以字錯誤率(Character Error Rate, CER)與語言複雜度(Perplexity)為評估依據；結果顯示本論文所提出方法對辨識效能之提升有明顯的幫助。
This thesis investigates and develops language model adaptation techniques for Mandarin large vocabulary continuous speech recognition (LVCSR) and its main contribution is two-fold. First, the so-called “bag-of-words” assumption of conventional topic models is relaxed by additionally incorporating word proximity cues into the model formulation. By doing so, the resulting topic models can achieve better prediction capabilities for use in LVCSR. Second, we propose a novel concept language modeling (CLM) approach to rendering the relationships between a search history and an upcoming word. The instantiations of CLM can be constructed with different levels of lexical granularities, such as words and document clusters. A series of experiments on a LVCSR task demonstrate that our proposed language models can offer substantial improvements over the baseline N-gram system, and achieve performance competitive to, or better than, some state-of-the-art language models.

Keywords

語音辨識, 語言模型, 鄰近資訊, 概念資訊, Automatic Speech Recognition, Language Modeling, Proximity Cues, Concept Information

URI

http://etds.lib.ntnu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dstdcdr&s=id=%22GN060047082S%22.&%22.id.&
http://rportal.lib.ntnu.edu.tw:80/handle/20.500.12235/106586

Collections

學位論文

Full item page

運用鄰近與概念資訊於語言模型調適之研究

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By