會議語音辨識之上下文語言模型 Reranking 研究

王泓壬; Wang, Hung-Ren

會議語音辨識之上下文語言模型 Reranking 研究

dc.contributor	陳柏琳	zh_TW
dc.contributor	Chen, Berlin	en_US
dc.contributor.author	王泓壬	zh_TW
dc.contributor.author	Wang, Hung-Ren	en_US
dc.date.accessioned	2023-12-08T08:02:51Z
dc.date.available	2023-08-11
dc.date.available	2023-12-08T08:02:51Z
dc.date.issued	2023
dc.description.abstract	ASR N-Best Reranking是自動語音識別(ASR)系統中用於提高轉錄輸出準確性的一種技術。在ASR系統中，系統為輸入音頻片段生成多個後選假設，稱為N-Best列表。而BERT (Bidirectional Encoder Representations from Transformers)是一種先進的語言模型，在文本分類、命名實體識別和問題解答等各種自然語言處理(NLP)任務中表現出卓越的性能。由於BERT能夠捕捉上下文信息並生成高品質的輸入文本表示，因此被用於ASR N-Best Reranking。為了更進一步增強BERT模型的預測，我們探索了增強語意信息與訓練目標，大致分為四部分: (1)將文本文法優劣信息融入到模型中的有效方法;(2)間接將整個N-Best列表信息融入到模型中的有效方法;(3)探討分類、排序及多任務訓練目標於模型訓練的可行性;(4)強化模型提取的文本信息。大型生成式語言模型(LLMs)已經證明了其在各種語言相關任務中的卓越泛化能力。本研究我們評估利用LLMs如ChatGPT於ASR N-Best Reranking任務的可行性。我們在AMI會議語料庫進行一系列的實驗，實驗結果顯示在降低單詞錯誤率(WER %)，提出的方法有其有效性，與基本ASR系統比較最多可達到1.37%的絕對WER (%)下降。	zh_TW
dc.description.abstract	ASR (Automatic Speech Recognition) N-Best reranking is a task that aims to improve the accuracy of ASR systems by re-ranking the output of the ASR system, known as N-Best lists. The N-Best reranking task involves selecting the most likely transcription from the N-Best list based on additional contextual information. BERT (Bidirectional Encoder Representations from Transformers) is a state-of-the-art language model that has shown remarkable performance in various natural language processing (NLP) tasks. BERT is being used in ASR N-Best reranking due to its ability to capture contextual information and generate high-quality representations of input text. We explore the enhancement of semantic information and training objectives, which are broadly divided into four parts: (1) effective methods to incorporate text grammatical strength and weakness information into the model; (2) effective methods to indirectly incorporate the whole N-Best list information into the model; (3) exploring the feasibility of categorization, sorting, and multitask training objectives for the model training; and (4) enhancement of textual information extracted by the model. Large-scale generative language models (LLMs) have demonstrated their excellent generalization ability in various language-related tasks. In this study we evaluate to utilize the excellent generalization ability of LLMs in ASR N-Best Reranking task.We conduct a series of experiments on AMI meeting corpus and the experimental results show the effectiveness of the proposed method in reducing the Word Error Rate (1.37 %).	en_US
dc.description.sponsorship	資訊工程學系	zh_TW
dc.identifier	61047091S-44066
dc.identifier.uri	https://etds.lib.ntnu.edu.tw/thesis/detail/3199aeaac3f4fb62bb17b7436150f2f6/
dc.identifier.uri	http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/121640
dc.language	中文
dc.subject	自動語音辨識	zh_TW
dc.subject	語言模型	zh_TW
dc.subject	對話語音	zh_TW
dc.subject	N-Best 列表	zh_TW
dc.subject	列表資訊	zh_TW
dc.subject	重新排序	zh_TW
dc.subject	跨句資訊	zh_TW
dc.subject	大型生成式語言模型	zh_TW
dc.subject	ChatGPT	zh_TW
dc.subject	Automatic Speech Recognition	en_US
dc.subject	Language Modeling	en_US
dc.subject	Conversational Speech	en_US
dc.subject	N-Best Lists	en_US
dc.subject	List Information	en_US
dc.subject	Large Generative Language Models	en_US
dc.subject	ChatGPT	en_US
dc.title	會議語音辨識之上下文語言模型 Reranking 研究	zh_TW
dc.title	Contextualize Language Model Reranking for Meeting Speech Recognition	en_US
dc.type	etd

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 202300044066-106364.pdf
Size:: 2.17 MB
Format:: Adobe Portable Document Format
Description:: etd

Download

Collections

學位論文