語音增益之研究 — 適應性與可解釋性

dc.contributor陳柏琳zh_TW
dc.contributorChen, Berlinen_US
dc.contributor.author何冠勳zh_TW
dc.contributor.authorHo, Kuan-Hsunen_US
dc.date.accessioned2024-12-17T03:37:22Z
dc.date.available2024-01-29
dc.date.issued2024
dc.description.abstract本論文深入探討語音增益(SE)領域,這是一個通過減少噪音和失真來精煉語音信號的關鍵過程。借助深度神經網絡(DNNs),本研究解決了兩個基本挑戰:1)探索SE和自動語音辨識(ASR)系統之間的兼容性,以及2)增強基於DNN的SE模型的可解釋性。動機來源於SE模型可能在運作中引入的偽影(Artifacts),可能危及ASR性能,因此需要重新評估學習目標。為應對這一問題,提出了一種新穎的噪聲和偽影感知損失函數(NAaLoss),它在保持SE質量的同時,顯著提高了ASR性能。另外,在基於DNN的SE方法中,我們探索了一種新穎的設計,即基於Sinc的卷積(Sinc-conv),以在解釋性和時域方法的學習自由之間取得平衡。基於此,我們設計了重塑的Sinc卷積(rSinc-conv),不僅提升了SE的最新技術水平,還揭示了神經網絡在SE期間優先考慮的特定頻率組合。這項研究做出了實質性的貢獻,包括定義1)SE中的處理偽影,展示NAaLoss的有效性,通過視覺化偽影獲取洞見,並填補SE和ASR目標之間的差距。2)為SE量身定制的rSinc-conv的開發在訓練效率、濾波器多樣性和可解釋性方面提供了優勢。3)解析神經網絡的優先關注,對不同形狀濾波器的探索以及對各種SE模型的評估進一步促進了我們對SE網絡的理解和改進。總的來說,這項研究旨在為SE領域的討論做出貢獻,並為在現實情境中實現更強大和高效的SE鋪平技術道路。zh_TW
dc.description.abstractThis work delves into the domain of Speech Enhancement (SE), a critical process for refining speech signals by reducing noise and distortions. Leveraging deep neural networks (DNNs), this study addresses two fundamental challenges: 1) exploring the compatibility between SE and Automatic Speech Recognition (ASR) systems, and 2) enhancing the interpretability of DNN-based SE models.The motivation stems from the potential introduction of artifacts by SE models that can compromise ASR performance, necessitating a re-evaluation of the learning objectives. To tackle this, a novel Noise- and Artifact-aware loss function (NAaLoss) is proposed, significantly improving ASR performance while preserving SE quality.Within DNN-based SE methods, a novel approach, Sinc-based convolution (Sinc-conv), is explored to strike a balance between the interpretability of spectral approaches and the learning freedom of time-domain methods. Standing upon that, we devise the reformed Sinc-conv (rSinc-conv), which not only enhances the state-of-the-art in SE but also sheds light on the specific frequency components prioritized by neural networks during SE.This research makes substantial contributions, including defining processing artifacts in SE, demonstrating the effectiveness of NAaLoss, visualizing artifacts for insights, and bridging the gap between SE and ASR objectives. The development of rSinc-conv tailored for SE offers advantages in training efficiency, filter diversity, and interpretability. Insights into neural network attention, exploration of different shaped filters, and evaluation of various SE models further advance the understanding and improvement of SE networks. Overall, this work aims to contribute to the discourse in SE and pave the way for more robust and efficient SE techniques with broader applications in real-world scenarios.en_US
dc.description.sponsorship資訊工程學系zh_TW
dc.identifier61047017S-44671
dc.identifier.urihttps://etds.lib.ntnu.edu.tw/thesis/detail/cdb31cb1b13a45d84a86199e64c51f0c/
dc.identifier.urihttp://rportal.lib.ntnu.edu.tw/handle/20.500.12235/123699
dc.language英文
dc.subject語音增益zh_TW
dc.subject兼容性zh_TW
dc.subject強健性語音辨識zh_TW
dc.subject處理偽影zh_TW
dc.subject可解釋性zh_TW
dc.subjectSinc卷積zh_TW
dc.subject關鍵頻帶zh_TW
dc.subjectSpeech Enhancementen_US
dc.subjectCompatibilityen_US
dc.subjectNoise-robust Speech Recognitionen_US
dc.subjectProcessing Artifactsen_US
dc.subjectInterpretabilityen_US
dc.subjectSinc-convolutionen_US
dc.subjectCrucial Bandsen_US
dc.title語音增益之研究 — 適應性與可解釋性zh_TW
dc.titleImproving Compatibility and Interpretability in Speech Enhancementen_US
dc.type學術論文

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
202400044671-106999.pdf
Size:
1.89 MB
Format:
Adobe Portable Document Format
Description:
學術論文

Collections