應用階層式語意暨聲學特徵表示於語音文件摘要之研究
No Thumbnail Available
Date
2019
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
由於巨量資訊的快速傳播,如何有效率地瀏覽資料是ㄧ項重要的課題。對於多媒體文件而言,語音是其內容中具有語意的主要元素之一,能夠相當完整的表達整份多媒體文件。近年來,有許多研究紛紛針對多媒體文件的理解與檢索進行深入的研究探討,並且有優異的成果與貢獻,如影像摘要、音訊摘要及影片摘要。
文件摘要可概分為節錄式 (Extractive) 和重寫式 (Abstractive) 摘要。其中節錄式摘要會依固定的比例,從文件中選出具代表性的文句組成其摘要結果;而重寫式摘要主要會先完整理解整份文件中的隱含意義,之後會根據其隱含意義,並使用不同的文詞,產生一個簡短版本的文件描述即為摘要。由於重寫式摘要對於自動語音摘要任務的困難度較高,故目前的研究大多是以節錄式摘要方式為主流。
本論文主要探討新穎的節錄式摘要方法於語音文件摘要任務上的應用,並深入研究如何改善語音文件摘要之成效。因此,我們提出以類神經網路為基礎之摘要摘要模型,運用階層式的架構及注意力機制深層次地理解文件蘊含的主旨,並以強化學習輔助訓練模型根據文件主旨選取並排序具代表性的語句組成摘要。同時,我們為了避免語音辨識的錯誤影響摘要結果,也將語音文件中相關的聲學特徵加入模型訓練以及使用次詞向量作為輸入。最後我們在中文廣播新聞語料(MATBN)上進行一系列的實驗與分析,從實驗結果中可驗證本論文提出之假設且在摘要成效上有顯著的提升。
With the rapid spread of tremendous amounts of multimedia information, how to browse the associated content efficiently becomes an important issue. Speech is one of the primary sources of semantics in multimedia content, by listening to which we can digest the content in a more complete manner. In recent years, many studies have conducted in-depth research and discussion on understanding and retrieval of multimedia documents, achieving excellent performance and making substantial contributions on a wide array of tasks, such as image caption, audio summarization and video caption. Document summarization methods can be broadly divided into two categories: extraction-based and abstraction-based methods. The former ones select a representative set sentences from the document to produce a summary according to a predefined summarization ratio whilst preserving its important information. The latter ones manage to understand a whole document, and then produce a short version of the document based on its main theme. Due to abstractive summarization is still far from being satisfied for either text or spoken documents, most of current studies focus exclusively on the development of extraction-based summarization methods. This thesis set to explore novel and effective extractive methods for spoken document summarization. To this end, we propose a neural summarization approach leveraging a hierarchical modeling structure with an attention mechanism to understand a document deeply, and in turn to select representative sentences as its summary. Meanwhile, for alleviating the negative effect of speech recognition errors, we make use of acoustic features and subword-level input representations for the proposed approach. Finally, we conduct a series of experiments on the Mandarin Broadcast News (MATBN) Corpus. The experimental results confirm the utility of our approach which improves the performance of state-of-the-art ones.
With the rapid spread of tremendous amounts of multimedia information, how to browse the associated content efficiently becomes an important issue. Speech is one of the primary sources of semantics in multimedia content, by listening to which we can digest the content in a more complete manner. In recent years, many studies have conducted in-depth research and discussion on understanding and retrieval of multimedia documents, achieving excellent performance and making substantial contributions on a wide array of tasks, such as image caption, audio summarization and video caption. Document summarization methods can be broadly divided into two categories: extraction-based and abstraction-based methods. The former ones select a representative set sentences from the document to produce a summary according to a predefined summarization ratio whilst preserving its important information. The latter ones manage to understand a whole document, and then produce a short version of the document based on its main theme. Due to abstractive summarization is still far from being satisfied for either text or spoken documents, most of current studies focus exclusively on the development of extraction-based summarization methods. This thesis set to explore novel and effective extractive methods for spoken document summarization. To this end, we propose a neural summarization approach leveraging a hierarchical modeling structure with an attention mechanism to understand a document deeply, and in turn to select representative sentences as its summary. Meanwhile, for alleviating the negative effect of speech recognition errors, we make use of acoustic features and subword-level input representations for the proposed approach. Finally, we conduct a series of experiments on the Mandarin Broadcast News (MATBN) Corpus. The experimental results confirm the utility of our approach which improves the performance of state-of-the-art ones.
Description
Keywords
語音文件, 節錄式摘要, 類神經網路, 階層式語意表示, 注意力機制, 聲學特徵, 次詞向量, 強化學習, Spoken Documents, Extractive Summarization, Deep Neural Networks, Hierarchical Semantic Representations, Attention Mechanism, Acoustic Features, Subword Embedding, Reinforcement Learning