針對端到端語音辨識中語境偏移之適應性研究

白立亭; Pai, Li-Ting

針對端到端語音辨識中語境偏移之適應性研究

dc.contributor	陳柏琳	zh_TW
dc.contributor	Chen, Berlin	en_US
dc.contributor.author	白立亭	zh_TW
dc.contributor.author	Pai, Li-Ting	en_US
dc.date.accessioned	2025-12-09T08:19:17Z
dc.date.available	2025-02-06
dc.date.issued	2025
dc.description.abstract	隨著後疫情時代的到來，線上會議成為主流，使得對語音轉錄技術的需求日益增加。然而，在這些會議場景中，語音辨識系統面臨專業術語、人名、關鍵詞等辨識不準確的挑戰，影響了轉錄結果的完整性和精確度。這些問題尤其常見於涉及特定行業術語或專業背景的會議，如醫療、法律、金融等領域。在此情境下，準確地轉錄關鍵詞和專有名詞不僅是為了提升會議紀錄的可讀性，也有助於在後續的資訊檢索和分析中更有效地處理和提取重要內容。針對此需求，語音辨識技術逐漸引入語境化偏移及文字提示功能，通過整合特定語境清單和專業術語庫，使系統能更精確地辨識會議中的重要內容，進一步提高會議資料的品質與實用性。本研究聚焦於增強語音辨識模型的上下文敏銳度，旨在透過引入不同類型的語義特徵以及特定的提示訊息來提升模型對領域特定詞彙的辨識能力。研究結果顯示，利用提示訓練，在AISHELL-1 資料集上的詞相對錯誤率可以達到13.8 %的相對詞錯誤率，以及7.5 %的相對實體錯誤率，研究結果表明本研究有效地喚醒模型對於專業術語或重要詞彙的敏感性，降低偏移詞錯誤率，並提升轉錄結果的精確度。透過提供了詞彙的語境線索，幫助模型在專業場景下更準確地辨識並正確轉錄相應內容，從而減少因上下文缺乏而導致的誤差。	zh_TW
dc.description.abstract	With the arrival of the post-pandemic era, online meetings have become the norm, leading to a growing demand for speech transcription technology. However, in these meeting scenarios, speech recognition systems face challenges with accurately recognizing specialized terminology, names, and keywords, which in turn affects the completeness and precision of the transcription results. These issues are especially common in meetings involving industry-specific or specialized knowledge, such as in healthcare, law, and finance. In such contexts, accurately transcribing keywords and proper nouns not only improves the readability of meeting minutes but also facilitates more effective retrieval and extraction of important information in subsequent analysis. To address this need, speech recognition technology has gradually introduced contextual biasing and text prompting functionality. By integrating domain-specific word lists and specialized terminology databases, the system can more accurately recognize important content in meetings and further enhance the quality and utility of meeting data. This study focuses on enhancing the contextual sensitivity of speech recognition models by introducing different types of semantic features and specific prompts to improve the recognition of domain-specific vocabulary. The results show that through prompt-based training on the AISHELL-1 dataset, it is possible to achieve a 13.8% relative word error rate reduction and a 7.5% relative entity error rate reduction. These findings indicate that this approach effectively heightens the model’s sensitivity to specialized terminology or critical vocabulary, reduces errors in biasing words, and improves transcription accuracy. By providing contextual clues for the vocabulary, the model is better able to accurately recognize and correctly transcribe relevant content in professional settings, thereby reducing errors caused by a lack of context.	en_US
dc.description.sponsorship	資訊工程學系	zh_TW
dc.identifier	61147095S-46730
dc.identifier.uri	https://etds.lib.ntnu.edu.tw/thesis/detail/12cb184e4843bab7b028615214c245ce/
dc.identifier.uri	http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/125821
dc.language	中文
dc.subject	語音辨識	zh_TW
dc.subject	語境偏移	zh_TW
dc.subject	關鍵詞辨識	zh_TW
dc.subject	提示微調	zh_TW
dc.subject	Speech Recognition	en_US
dc.subject	Contextual Biasing	en_US
dc.subject	Keyword Recognition	en_US
dc.subject	Prompt-Tuning	en_US
dc.title	針對端到端語音辨識中語境偏移之適應性研究	zh_TW
dc.title	A Study on Contextual Biasing Adaptation in End-to-End Speech Recognition	en_US
dc.type	學術論文

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 202500046730-109105.pdf
Size:: 2.53 MB
Format:: Adobe Portable Document Format
Description:: 學術論文

Download

Collections

學位論文