會議語音辨識之上下文語言模型 Reranking 研究

dc.contributor陳柏琳zh_TW
dc.contributorChen, Berlinen_US
dc.contributor.author王泓壬zh_TW
dc.contributor.authorWang, Hung-Renen_US
dc.date.accessioned2023-12-08T08:02:51Z
dc.date.available2023-08-11
dc.date.available2023-12-08T08:02:51Z
dc.date.issued2023
dc.description.abstractASR N-Best Reranking是自動語音識別(ASR)系統中用於提高轉錄輸出準確性的一種技術。在ASR系統中,系統為輸入音頻片段生成多個後選假設,稱為N-Best列表。而BERT (Bidirectional Encoder Representations from Transformers)是一種先進的語言模型,在文本分類、命名實體識別和問題解答等各種自然語言處理(NLP)任務中表現出卓越的性能。由於BERT能夠捕捉上下文信息並生成高品質的輸入文本表示,因此被用於ASR N-Best Reranking。為了更進一步增強BERT模型的預測,我們探索了增強語意信息與訓練目標,大致分為四部分: (1)將文本文法優劣信息融入到模型中的有效方法;(2)間接將整個N-Best列表信息融入到模型中的有效方法;(3)探討分類、排序及多任務訓練目標於模型訓練的可行性;(4)強化模型提取的文本信息。大型生成式語言模型(LLMs)已經證明了其在各種語言相關任務中的卓越泛化能力。本研究我們評估利用LLMs如ChatGPT於ASR N-Best Reranking任務的可行性。我們在AMI會議語料庫進行一系列的實驗,實驗結果顯示在降低單詞錯誤率(WER %),提出的方法有其有效性,與基本ASR系統比較最多可達到1.37%的絕對WER (%)下降。zh_TW
dc.description.abstractASR (Automatic Speech Recognition) N-Best reranking is a task that aims to improve the accuracy of ASR systems by re-ranking the output of the ASR system, known as N-Best lists. The N-Best reranking task involves selecting the most likely transcription from the N-Best list based on additional contextual information. BERT (Bidirectional Encoder Representations from Transformers) is a state-of-the-art language model that has shown remarkable performance in various natural language processing (NLP) tasks. BERT is being used in ASR N-Best reranking due to its ability to capture contextual information and generate high-quality representations of input text. We explore the enhancement of semantic information and training objectives, which are broadly divided into four parts: (1) effective methods to incorporate text grammatical strength and weakness information into the model; (2) effective methods to indirectly incorporate the whole N-Best list information into the model; (3) exploring the feasibility of categorization, sorting, and multitask training objectives for the model training; and (4) enhancement of textual information extracted by the model. Large-scale generative language models (LLMs) have demonstrated their excellent generalization ability in various language-related tasks. In this study we evaluate to utilize the excellent generalization ability of LLMs in ASR N-Best Reranking task.We conduct a series of experiments on AMI meeting corpus and the experimental results show the effectiveness of the proposed method in reducing the Word Error Rate (1.37 %).en_US
dc.description.sponsorship資訊工程學系zh_TW
dc.identifier61047091S-44066
dc.identifier.urihttps://etds.lib.ntnu.edu.tw/thesis/detail/3199aeaac3f4fb62bb17b7436150f2f6/
dc.identifier.urihttp://rportal.lib.ntnu.edu.tw/handle/20.500.12235/121640
dc.language中文
dc.subject自動語音辨識zh_TW
dc.subject語言模型zh_TW
dc.subject對話語音zh_TW
dc.subjectN-Best 列表zh_TW
dc.subject列表資訊zh_TW
dc.subject重新排序zh_TW
dc.subject跨句資訊zh_TW
dc.subject大型生成式語言模型zh_TW
dc.subjectChatGPTzh_TW
dc.subjectAutomatic Speech Recognitionen_US
dc.subjectLanguage Modelingen_US
dc.subjectConversational Speechen_US
dc.subjectN-Best Listsen_US
dc.subjectList Informationen_US
dc.subjectLarge Generative Language Modelsen_US
dc.subjectChatGPTen_US
dc.title會議語音辨識之上下文語言模型 Reranking 研究zh_TW
dc.titleContextualize Language Model Reranking for Meeting Speech Recognitionen_US
dc.typeetd

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
202300044066-106364.pdf
Size:
2.17 MB
Format:
Adobe Portable Document Format
Description:
etd

Collections