探究新穎深度學習方法於中英文混語語音辨識之使用

dc.contributor陳柏琳zh_TW
dc.contributorChen, Berlinen_US
dc.contributor.author林韋廷zh_TW
dc.contributor.authorLin, Wei-Tingen_US
dc.date.accessioned2022-06-08T02:43:25Z
dc.date.available2021-10-03
dc.date.available2022-06-08T02:43:25Z
dc.date.issued2021
dc.description.abstract在多語言社會中易有一段對話中包含了多種語言的情形發生,不僅是多語言社會,甚至是單語言社會也受全球化的影響,對話中常參雜一些其他語言,這種現象稱為語碼轉換(Code-Switching, CS)。在CS自動語音辨識(Automatic Speech Recognition, ASR)中,需要同時辨識出兩種或更多種的語言,但又與多語言語音辨識不同,語者除了在句子間轉換語言外,更常在句子內進行轉換,所以也在最近被視為一個難題而被關注。本論文的研究分為兩個方面,分別為端對端和DNN-HMM混語語音辨識之改進方法,前者著重於增強中英文混語語料庫SEAME。我們採用了前陣子提出的模型Conformer,並設計語言遮罩(language-masked) multi-head attention架構應用到解碼器(decoder)端,希望讓各自的語言能學到其獨立的語言特性並強化其單語言的辨識能力。另外,為了防止模型學出的中文和英文特徵向量相近,而將三元組損失(Triplet loss)用於訓練模型。後者我們提出多種不同階段的語言模型合併策略以用於企業應用領域的多種語料。在本篇論文的實驗設定中,會有兩種中英文CS語言模型和一種中文的單語言模型,其中CS語言模型使用的訓練資料與測試集同一領域(Domain),而單語言模型是用大量一般中文語料訓練而成。我們透過多種不同階段的語言模型合併策略以探究ASR是否能結合不同的語言模型其各自的優勢以在不同任務上都有好的表現。在本篇論文中有三種語言模型合併策略,分別為N-gram語言模型合併、解碼圖 (Decoding Graph) 合併和詞圖 (Word Lattice) 合併。經由一系列的實驗結果證實,透過語言模型的合併的確能讓CS ASR對不同的測試集都有好的表現。而端到端混語語音辨識之方法於測試集上的字符錯誤率(Token Error Rate, TER)並沒有顯著的進步,但透過其他數據分析發現我們的研究方法仍有些微效果。zh_TW
dc.description.abstractIn multilingual societies, it is common to have a conversation that contains multiple languages. Meanwhile, on account of globalization, the use of switching distinct languages is also pervasive within daily dialogues in monolingual societies. This phenomenon is called code-switching (CS). In CS automatic speech recognition (ASR), two or more languages need to be recognized at the same time. However, unlike multilingual speech recognition, speakers not merely switch their language between sentences but more often within sentences, and thus it is regarded as a problem and has been a concern recently. The research in this paper is divided into two aspects, namely end-to-end and DNN-HMM CS ASR improvement, with the former focusing on Mandarin-English CS corpus SEAME. We use the model Conformer proposed recently, construct the language-masked multi-head attention architecture and apply it on the decoder, aiming to allow each language to learn their individual language attributes and enhance its monolingual recognition ability. Moreover, in order to prevent the Mandarin and English embeddings learned by the model from being similar to each other, triplet loss is used to train the model. As regards the latter method, we put forward disparate strategies, which conduct the combination of various language models at different stages on CS speech corpora compiled from different industrial application scenarios. Our experimental configuration consists of two CS (i.e., mixing of Mandarin Chinese and English) language models and one monolingual (i.e., Mandarin Chinese) language model, where the two CS language models are domain-specific and the monolingual language model is trained on a general text collection. Through the language model combination at different stages of the ASR process, we purport to know if the ASR system could integrate the strengths of various language models to achieve improved performance across different tasks. More specifically, three strategies for combining language models are investigated, namely simple N-gram language model combination, decoding graph combination, and word lattice combination. A series of ASR experiments have confirmed the utility of the aforementioned LM combination strategies, but the end-to-end CS ASR method has no significant Token Error Rate (TER) reduction on the test sets. However, it is found that our method still has some minor effects through other data analysis approaches.en_US
dc.description.sponsorship資訊工程學系zh_TW
dc.identifier60747014S-40330
dc.identifier.urihttps://etds.lib.ntnu.edu.tw/thesis/detail/cd04a820b1c2ac79f2f0810eea1411c0/
dc.identifier.urihttp://rportal.lib.ntnu.edu.tw/handle/20.500.12235/117293
dc.language中文
dc.subject語碼轉換zh_TW
dc.subject中英文混語語音辨識zh_TW
dc.subject三元組損失zh_TW
dc.subjectConformerzh_TW
dc.subject語言模型zh_TW
dc.subjectcode-switchingen_US
dc.subjectMandarin-English automatic speech recognitionen_US
dc.subjecttriplet lossen_US
dc.subjectConformeren_US
dc.subjectlanguage modelen_US
dc.title探究新穎深度學習方法於中英文混語語音辨識之使用zh_TW
dc.titleSeveral Novel Deep Learning Approaches to Mandarin-English Code-switching Automatic Speech Recognitionen_US
dc.type學術論文

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
60747014S-40330.pdf
Size:
4.02 MB
Format:
Adobe Portable Document Format
Description:
學術論文

Collections