多口音英語語音辨識

No Thumbnail Available

Date

2024

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

隨著全球化的趨勢,英語作為國際通用語言的角色日益重要。然而,由於母語背景、地區和文化差異的影響,英語口音的多樣性也相應增加。這使得語音辨識系統在識別各種口音的英語時面臨著挑戰。本論文探討針對在有限口音語料的狀況下如何通過增加口音鑑別力來改進Conformer模型對於多口音英語語音的辨識效果。本論文提出了一種方法將口音分類任務加入語音辨識模型中,旨在提高模型對於不同口音的敏感性和鑑別能力。實驗結果顯示,與傳統的語音辨識方法相比,此方法在口音英語語音辨識的詞錯率有下降,並且也將模型編碼器中不同層的口音特徵視覺化來進行分析,探討模型在不同層的特徵所代表的訊息。另外,本論也探討了利用大量資料訓練的Whisper模型在英語版、多語言版本以及不同模型大小的設定下對於多口音英語語音辨識任務的效果,也比較了使用LoRA的方式來訓練模型與全面微調方式的差異,為模型的選擇提供了一個更明確的參考。
With globalization, the role of English as an international lingua franca has become increasingly important. However, the diversity of English accents, influenced by native language backgrounds, regional and cultural differences, poses challenges to speech recognition systems in recognizing various accents. This thesis investigates how to improve the Conformer model for multi-accent English speech recognition under limited accent data by enhancing accent discrimination. A method integrating accent classification tasks into the speech recognition model is proposed to increase the model's sensitivity and discrimination towards different accents. The results demonstrate a decrease in word error rate for accented English speech recognition compared to traditional methods. Furthermore, this study visualizes accent features in different layers of the model encoder for analysis, exploring the information represented by features at various layers. Additionally, the thesis examines the performance of the extensively trained Whisper model in English and multilingual versions, as wellas under different model sizes, for multi-accent English speech recognition tasks. It also compares the differences between training the model using LoRA and comprehensive fine-tuning, expecting to provide clearer guidance for model selection.

Description

Keywords

語音辨識, 口音, 多任務學習, 資料視覺化, 模型探測, 轉換器, Speech Recognition, Accent, Multi-task Learning, Data Visualization, Model Probing, Adapter

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By