針對端到端語音辨識中語境偏移之適應性研究

dc.contributor陳柏琳zh_TW
dc.contributorChen, Berlinen_US
dc.contributor.author白立亭zh_TW
dc.contributor.authorPai, Li-Tingen_US
dc.date.accessioned2025-12-09T08:19:17Z
dc.date.available2025-02-06
dc.date.issued2025
dc.description.abstract隨著後疫情時代的到來,線上會議成為主流,使得對語音轉錄技術的需求日益增加。然而,在這些會議場景中,語音辨識系統面臨專業術語、人名、關鍵詞等辨識不準確的挑戰,影響了轉錄結果的完整性和精確度。這些問題尤其常見於涉及特定行業術語或專業背景的會議,如醫療、法律、金融等領域。在此情境下,準確地轉錄關鍵詞和專有名詞不僅是為了提升會議紀錄的可讀性,也有助於在後續的資訊檢索和分析中更有效地處理和提取重要內容。針對此需求,語音辨識技術逐漸引入語境化偏移及文字提示功能,通過整合特定語境清單和專業術語庫,使系統能更精確地辨識會議中的重要內容,進一步提高會議資料的品質與實用性。本研究聚焦於增強語音辨識模型的上下文敏銳度,旨在透過引入不同類型的語義特徵以及特定的提示訊息來提升模型對領域特定詞彙的辨識能力。研究結果顯示,利用提示訓練,在AISHELL-1 資料集上的詞相對錯誤率可以達到13.8 %的相對詞錯誤率,以及7.5 %的相對實體錯誤率,研究結果表明本研究有效地喚醒模型對於專業術語或重要詞彙的敏感性,降低偏移詞錯誤率,並提升轉錄結果的精確度。透過提供了詞彙的語境線索,幫助模型在專業場景下更準確地辨識並正確轉錄相應內容,從而減少因上下文缺乏而導致的誤差。zh_TW
dc.description.abstractWith the arrival of the post-pandemic era, online meetings have become the norm, leading to a growing demand for speech transcription technology. However, in these meeting scenarios, speech recognition systems face challenges with accurately recognizing specialized terminology, names, and keywords, which in turn affects the completeness and precision of the transcription results. These issues are especially common in meetings involving industry-specific or specialized knowledge, such as in healthcare, law, and finance. In such contexts, accurately transcribing keywords and proper nouns not only improves the readability of meeting minutes but also facilitates more effective retrieval and extraction of important information in subsequent analysis. To address this need, speech recognition technology has gradually introduced contextual biasing and text prompting functionality. By integrating domain-specific word lists and specialized terminology databases, the system can more accurately recognize important content in meetings and further enhance the quality and utility of meeting data. This study focuses on enhancing the contextual sensitivity of speech recognition models by introducing different types of semantic features and specific prompts to improve the recognition of domain-specific vocabulary. The results show that through prompt-based training on the AISHELL-1 dataset, it is possible to achieve a 13.8% relative word error rate reduction and a 7.5% relative entity error rate reduction. These findings indicate that this approach effectively heightens the model’s sensitivity to specialized terminology or critical vocabulary, reduces errors in biasing words, and improves transcription accuracy. By providing contextual clues for the vocabulary, the model is better able to accurately recognize and correctly transcribe relevant content in professional settings, thereby reducing errors caused by a lack of context.en_US
dc.description.sponsorship資訊工程學系zh_TW
dc.identifier61147095S-46730
dc.identifier.urihttps://etds.lib.ntnu.edu.tw/thesis/detail/12cb184e4843bab7b028615214c245ce/
dc.identifier.urihttp://rportal.lib.ntnu.edu.tw/handle/20.500.12235/125821
dc.language中文
dc.subject語音辨識zh_TW
dc.subject語境偏移zh_TW
dc.subject關鍵詞辨識zh_TW
dc.subject提示微調zh_TW
dc.subjectSpeech Recognitionen_US
dc.subjectContextual Biasingen_US
dc.subjectKeyword Recognitionen_US
dc.subjectPrompt-Tuningen_US
dc.title針對端到端語音辨識中語境偏移之適應性研究zh_TW
dc.titleA Study on Contextual Biasing Adaptation in End-to-End Speech Recognitionen_US
dc.type學術論文

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
202500046730-109105.pdf
Size:
2.53 MB
Format:
Adobe Portable Document Format
Description:
學術論文

Collections