會議語音辨識之上下文語言模型 Reranking 研究
dc.contributor | 陳柏琳 | zh_TW |
dc.contributor | Chen, Berlin | en_US |
dc.contributor.author | 王泓壬 | zh_TW |
dc.contributor.author | Wang, Hung-Ren | en_US |
dc.date.accessioned | 2023-12-08T08:02:51Z | |
dc.date.available | 2023-08-11 | |
dc.date.available | 2023-12-08T08:02:51Z | |
dc.date.issued | 2023 | |
dc.description.abstract | ASR N-Best Reranking是自動語音識別(ASR)系統中用於提高轉錄輸出準確性的一種技術。在ASR系統中,系統為輸入音頻片段生成多個後選假設,稱為N-Best列表。而BERT (Bidirectional Encoder Representations from Transformers)是一種先進的語言模型,在文本分類、命名實體識別和問題解答等各種自然語言處理(NLP)任務中表現出卓越的性能。由於BERT能夠捕捉上下文信息並生成高品質的輸入文本表示,因此被用於ASR N-Best Reranking。為了更進一步增強BERT模型的預測,我們探索了增強語意信息與訓練目標,大致分為四部分: (1)將文本文法優劣信息融入到模型中的有效方法;(2)間接將整個N-Best列表信息融入到模型中的有效方法;(3)探討分類、排序及多任務訓練目標於模型訓練的可行性;(4)強化模型提取的文本信息。大型生成式語言模型(LLMs)已經證明了其在各種語言相關任務中的卓越泛化能力。本研究我們評估利用LLMs如ChatGPT於ASR N-Best Reranking任務的可行性。我們在AMI會議語料庫進行一系列的實驗,實驗結果顯示在降低單詞錯誤率(WER %),提出的方法有其有效性,與基本ASR系統比較最多可達到1.37%的絕對WER (%)下降。 | zh_TW |
dc.description.abstract | ASR (Automatic Speech Recognition) N-Best reranking is a task that aims to improve the accuracy of ASR systems by re-ranking the output of the ASR system, known as N-Best lists. The N-Best reranking task involves selecting the most likely transcription from the N-Best list based on additional contextual information. BERT (Bidirectional Encoder Representations from Transformers) is a state-of-the-art language model that has shown remarkable performance in various natural language processing (NLP) tasks. BERT is being used in ASR N-Best reranking due to its ability to capture contextual information and generate high-quality representations of input text. We explore the enhancement of semantic information and training objectives, which are broadly divided into four parts: (1) effective methods to incorporate text grammatical strength and weakness information into the model; (2) effective methods to indirectly incorporate the whole N-Best list information into the model; (3) exploring the feasibility of categorization, sorting, and multitask training objectives for the model training; and (4) enhancement of textual information extracted by the model. Large-scale generative language models (LLMs) have demonstrated their excellent generalization ability in various language-related tasks. In this study we evaluate to utilize the excellent generalization ability of LLMs in ASR N-Best Reranking task.We conduct a series of experiments on AMI meeting corpus and the experimental results show the effectiveness of the proposed method in reducing the Word Error Rate (1.37 %). | en_US |
dc.description.sponsorship | 資訊工程學系 | zh_TW |
dc.identifier | 61047091S-44066 | |
dc.identifier.uri | https://etds.lib.ntnu.edu.tw/thesis/detail/3199aeaac3f4fb62bb17b7436150f2f6/ | |
dc.identifier.uri | http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/121640 | |
dc.language | 中文 | |
dc.subject | 自動語音辨識 | zh_TW |
dc.subject | 語言模型 | zh_TW |
dc.subject | 對話語音 | zh_TW |
dc.subject | N-Best 列表 | zh_TW |
dc.subject | 列表資訊 | zh_TW |
dc.subject | 重新排序 | zh_TW |
dc.subject | 跨句資訊 | zh_TW |
dc.subject | 大型生成式語言模型 | zh_TW |
dc.subject | ChatGPT | zh_TW |
dc.subject | Automatic Speech Recognition | en_US |
dc.subject | Language Modeling | en_US |
dc.subject | Conversational Speech | en_US |
dc.subject | N-Best Lists | en_US |
dc.subject | List Information | en_US |
dc.subject | Large Generative Language Models | en_US |
dc.subject | ChatGPT | en_US |
dc.title | 會議語音辨識之上下文語言模型 Reranking 研究 | zh_TW |
dc.title | Contextualize Language Model Reranking for Meeting Speech Recognition | en_US |
dc.type | etd |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- 202300044066-106364.pdf
- Size:
- 2.17 MB
- Format:
- Adobe Portable Document Format
- Description:
- etd