自動語音辨識中語境偏移之優化與適應研究

No Thumbnail Available

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

隨著語音辨識技術廣泛應用於會議記錄、自動字幕生成、語音搜尋等場景,系統對罕見詞彙與領域術語的辨識能力日益受到關注,這些偏移詞通常不僅資訊含量高,亦多與使用者目的密切相關,若遭到遺漏或錯誤辨識,將嚴重影響語音轉錄的完整性與下游任務的準確性。然而,偏移詞往往處於語料庫的長尾分布區間,出現頻率低、訓練樣本不足,造成現有語音模型難以有效學習與辨識,為了解決此問題,情境化語音辨識近年成為研究焦點,透過引入偏移詞清單或外部提示訊息,引導模型關注語意上更關鍵的詞彙,進而提升對偏移詞的辨識敏感度與精準度。本研究針對偏移詞辨識提出兩項創新方法。首先,設計一套名為情境化門控平衡適配器的架構,動態調整模型對不同時間步輸出的偏重程度,有效強化對偏移詞的辨識能力,同時維持整體運算效率;其次,本研究進一步提出PRIME 框架,應用於大型語音模型 Whisper,分別從語法、語意與發音三個層面導入情境化提示,並以輕量架構整合至解碼器輸入,無須修改模型主體結構,即可靈活增強偏移詞的預測能力。本研究分別於 AISHELL-1與 SlideSpeech資料集上進行實驗,結果顯示,所提出的兩種方法皆能有效降低偏移詞的錯誤率,並在不同語言與應用場景中展現穩定的提升效果。整體而言,本研究驗證了語境感知機制與提示學習策略在語音辨識系統中的可行性與潛力,為處理領域特定語音資料提供一種兼顧準確性與效率的設計方向。
With the widespread deployment of automatic speech recognition (ASR) systems in real-world scenarios such as meeting transcription and subtitle generation, the accurate recognition of rare words and domain-specific terms has become a critical challenge. These words often lie in the long-tail distribution of vocabulary, and their omission or misrecognition can significantly impair transcription quality and downstream task performance. To address this issue, contextualized ASR has emerged as a promising direction, enhancing biasing word recognition by incorporating external word lists or prompting strategies. This thesis proposes two novel approaches to improve contextual biasing in ASR. First, we introduce the Gate-Balanced Contextual Adapter, a dynamic gating mechanism that selectively enhances the influence of contextual information at each decoding timestep, effectively improving biasing word recognition with minimal computational overhead. Second, we present PRIME, a prompt-based contextual ASR framework built upon the Whisper model, which integrates syntactic, semantic, and phonetic cues into the decoder input through lightweight prompting strategies without altering the original model architecture. Extensive experiments on the AISHELL-1 and SlideSpeech datasets demonstrate that both proposed methods significantly reduce biasing word error rates and improve overall transcription accuracy. These findings highlight the effectiveness of context-aware adaptation and prompt-based learning, providing a practical and efficient solution for domain-adaptive ASR applications.

Description

Keywords

自動語音辨識, 語境偏移, 門控機制, 偏移詞辨識, 提示微調, Whisper, automatic speech recognition, contextual biasing, gating mechanism, biasing word recognition, prompt tuning, Whisper

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By