遞迴式類神經網路語言模型使用額外資訊於語音辨識之研究

陳柏琳Berlin Chen黃邦烜Bang-Xuan Huang2019-09-052012-8-272019-09-052012http://etds.lib.ntnu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dstdcdr&s=id=%22GN0699470204%22.&%22.id.&http://rportal.lib.ntnu.edu.tw:80/handle/20.500.12235/106897語言模型藉由大量的文字訓練後，可以捕捉自然語言的規律性，並根據歷史詞序列來區辨出下一個詞應該為何，因此在自動語音辨識(Automatic Speech Recognition, ASR)系統中扮演著不可或缺的角色。傳統統計式N連(N-gram)語言模型是常見的語言模型，它基於已知的前N-1個詞來預測下一個詞出現的可能性。當N小時，缺乏了長距離的資訊；而N大時，會因訓練語料不足產生資料稀疏之問題。近年來，由於類神經網路(Neural Networks)的興起，許多相關研究應運而生，類神經網路語言模型即是一例。令人感興趣的是，類神經網路語言模型能夠解決資料稀疏的問題，它透過將詞序列映射至連續空間來估測下一個詞出現的機率，因此在訓練語料中不會遇到未曾出現過的詞序列組合。除了傳統前饋式類神經網路語言模型外，近來也有學者使用遞迴式類神經網路來建構語言模型，其希望使用遞迴的方式將歷史資訊儲存起來，進而獲得長距離的資訊。本論文研究遞迴式類神經網路語言模型於中文大詞彙連續語音辨識之使用，探索額外使用關聯資訊以更有效地捕捉長距離資訊，並根據語句的特性動態地調整語言模型。實驗結果顯示，使用關聯資訊於遞迴式類神經網路語言模型能對於大詞彙連續語音辨識的效能有相當程度的提昇。The goal of language modeling (LM) attempts to capture the regularities of natural languages. It uses large amounts of training text for model training so as to help predict the most likely upcoming word given a word history. Therefore, it plays an indispensable role in automatic speech recognition (ASR). The N-gram language model, which determines the probability of an upcoming word given its preceding N-1 word history, is most prominently used. When N is small, a typical N-gram language model lacks the ability of rendering long-span lexical information. On the other hand, when N becomes larger, it will suffer from the data sparseness problem because of insufficient training data. With this acknowledged, research on the neural network-based language model (NNLM), or more specifically, the feed-forward NNLM, has attracted considerable attention of researchers and practitioners in recent years. This is attributed to the fact that the feed-forward NNLM can mitigate the data sparseness problem when estimating the probability of an upcoming word given its corresponding word history through mapping them into a continuous space. In addition to the feed-forward NNLM, a recent trend is to use the recurrent neural network-based language model (RNNLM) to construct the language model for ASR, which can make efficient use of the long-span lexical information inherent in the word history in a recursive fashion. In this thesis, we not only investigate to leverage extra information relevant to the word history for RNNLM, but also devise a dynamic model estimation method to obtain an utterance-specific RNNLM. We experimentally observe that our proposed methods can show promise and perform well when compared to the existing LM methods on a large vocabulary continuous speech recognition (LVCSR) task.語音辨識語言模型前饋式類神經網路遞迴式類神經網路automatic speech recognitionlanguage modelingfeed-forward neural networkrecurrent neural networks遞迴式類神經網路語言模型使用額外資訊於語音辨識之研究Recurrent Neural Network-based Language Modeling with Extra Information Cues for Speech Recognition