語音增益之研究 — 適應性與可解釋性

何冠勳; Ho, Kuan-Hsun

語音增益之研究 — 適應性與可解釋性

dc.contributor	陳柏琳	zh_TW
dc.contributor	Chen, Berlin	en_US
dc.contributor.author	何冠勳	zh_TW
dc.contributor.author	Ho, Kuan-Hsun	en_US
dc.date.accessioned	2024-12-17T03:37:22Z
dc.date.available	2024-01-29
dc.date.issued	2024
dc.description.abstract	本論文深入探討語音增益（SE）領域，這是一個通過減少噪音和失真來精煉語音信號的關鍵過程。借助深度神經網絡（DNNs），本研究解決了兩個基本挑戰：1）探索SE和自動語音辨識（ASR）系統之間的兼容性，以及2）增強基於DNN的SE模型的可解釋性。動機來源於SE模型可能在運作中引入的偽影（Artifacts），可能危及ASR性能，因此需要重新評估學習目標。為應對這一問題，提出了一種新穎的噪聲和偽影感知損失函數（NAaLoss），它在保持SE質量的同時，顯著提高了ASR性能。另外，在基於DNN的SE方法中，我們探索了一種新穎的設計，即基於Sinc的卷積（Sinc-conv），以在解釋性和時域方法的學習自由之間取得平衡。基於此，我們設計了重塑的Sinc卷積（rSinc-conv），不僅提升了SE的最新技術水平，還揭示了神經網絡在SE期間優先考慮的特定頻率組合。這項研究做出了實質性的貢獻，包括定義1）SE中的處理偽影，展示NAaLoss的有效性，通過視覺化偽影獲取洞見，並填補SE和ASR目標之間的差距。2）為SE量身定制的rSinc-conv的開發在訓練效率、濾波器多樣性和可解釋性方面提供了優勢。3）解析神經網絡的優先關注，對不同形狀濾波器的探索以及對各種SE模型的評估進一步促進了我們對SE網絡的理解和改進。總的來說，這項研究旨在為SE領域的討論做出貢獻，並為在現實情境中實現更強大和高效的SE鋪平技術道路。	zh_TW
dc.description.abstract	This work delves into the domain of Speech Enhancement (SE), a critical process for refining speech signals by reducing noise and distortions. Leveraging deep neural networks (DNNs), this study addresses two fundamental challenges: 1) exploring the compatibility between SE and Automatic Speech Recognition (ASR) systems, and 2) enhancing the interpretability of DNN-based SE models.The motivation stems from the potential introduction of artifacts by SE models that can compromise ASR performance, necessitating a re-evaluation of the learning objectives. To tackle this, a novel Noise- and Artifact-aware loss function (NAaLoss) is proposed, significantly improving ASR performance while preserving SE quality.Within DNN-based SE methods, a novel approach, Sinc-based convolution (Sinc-conv), is explored to strike a balance between the interpretability of spectral approaches and the learning freedom of time-domain methods. Standing upon that, we devise the reformed Sinc-conv (rSinc-conv), which not only enhances the state-of-the-art in SE but also sheds light on the specific frequency components prioritized by neural networks during SE.This research makes substantial contributions, including defining processing artifacts in SE, demonstrating the effectiveness of NAaLoss, visualizing artifacts for insights, and bridging the gap between SE and ASR objectives. The development of rSinc-conv tailored for SE offers advantages in training efficiency, filter diversity, and interpretability. Insights into neural network attention, exploration of different shaped filters, and evaluation of various SE models further advance the understanding and improvement of SE networks. Overall, this work aims to contribute to the discourse in SE and pave the way for more robust and efficient SE techniques with broader applications in real-world scenarios.	en_US
dc.description.sponsorship	資訊工程學系	zh_TW
dc.identifier	61047017S-44671
dc.identifier.uri	https://etds.lib.ntnu.edu.tw/thesis/detail/cdb31cb1b13a45d84a86199e64c51f0c/
dc.identifier.uri	http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/123699
dc.language	英文
dc.subject	語音增益	zh_TW
dc.subject	兼容性	zh_TW
dc.subject	強健性語音辨識	zh_TW
dc.subject	處理偽影	zh_TW
dc.subject	可解釋性	zh_TW
dc.subject	Sinc卷積	zh_TW
dc.subject	關鍵頻帶	zh_TW
dc.subject	Speech Enhancement	en_US
dc.subject	Compatibility	en_US
dc.subject	Noise-robust Speech Recognition	en_US
dc.subject	Processing Artifacts	en_US
dc.subject	Interpretability	en_US
dc.subject	Sinc-convolution	en_US
dc.subject	Crucial Bands	en_US
dc.title	語音增益之研究 — 適應性與可解釋性	zh_TW
dc.title	Improving Compatibility and Interpretability in Speech Enhancement	en_US
dc.type	學術論文

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 202400044671-106999.pdf
Size:: 1.89 MB
Format:: Adobe Portable Document Format
Description:: 學術論文

Download

Collections

學位論文