研究使用詞彙與語意資訊於
No Thumbnail Available
Date
2005
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
語音文件切割是指在長時間的聲音訊號上自動地標定不同主題之間的邊界,因此可將語音文件分隔成具有主題凝聚力的段落。另外,語音文件組織是指對於已切割過的段落分析其應隸屬的主題,使這些段落群聚在主題群集中,並標示群集標記後以階層式視覺化呈現便於使用者瀏覽。兩者在近幾年都逐漸受到重視。
本論文首先探究如何將隱藏式馬可夫模型(HMM)此種已被廣泛應用在語音辨識及資訊檢索的模型延伸應用於語音文件切割。不僅使用了語音文件本身具有的詞彙資訊,如統計上的特徵及語言模型機率。另考量了聲學上的資訊,像是停頓分佈及辨識可信度,以辨別段落邊界。我們也融合了語音文件中具有的語意資訊於隱藏式馬可夫模型切割器中以更精確地模擬狀態的觀測分佈。此外,我們也研究了兩種非監督式且為資料導引式的組織方法於語音新聞文件分析上,分別為自我組織圖(SOM)以及機率式潛藏語意分析圖示(ProbMap)。我們提出了另一種觀察潛藏主題方式的主題混合模型圖示(TMMmap)以改進機率式潛藏語意分析圖示。透過一系列在主題偵測與追(TDT)中文語音文件集上的實驗,來分析這些方法的效能與其中的異同。最後,我們更進一步融合主題分佈資訊,也就是語音文件組織所得到的拓撲分佈資訊,於隱藏式馬可夫模型切割器中。初步發現有非常好的效果與進步空間。
Spoken document segmentation is to automatically set the boundaries between different small topics begin mentioned in long steams of audio signals, and divide the spoken documents into a set of cohesive paragraphs of sentences sharing some common central topic. While spoken document organization aims at automatically analyzing the subject topics of the segmented shot paragraphs of the spoken documents, clustering them into groups with topic labels and organizing them into some hierarchical visual presentation easier for users to browse. Both of them have gained growing attention in the past few years. In the thesis, we explored the use of the Hidden Markov Model (HMM) approach, which has been proven effective for speech recognition and information retrieval, in the context of spoken document segmentation. We not only exploited the lexical information inherent in the spoken document, such as the statistical features or the language model probabilities, but also considered the acoustic information, such as the pause distribution and the confidence measure, in identifying segment boundaries. Moreover, the semantic information conveyed in the spoken document was also integrated into the HMM segmenter for accurately modeling the state observation distributions. On the other hand, we investigated two unsupervised and data-driven organization approaches as well for spoken document analysis, i.e., the Self-Organizing Map (SOM) and Probabilistic Latent Semantic Analysis Map (ProbMap). While for the ProbMap approach, a topical mixture model approach (TMMmap), which came from an alternative perspective, was also studied. A series of experiments was conducted on the Topic Detection and Tracking (TDT) spoken document collections in order to analyze the performance levels of these approaches and compare the differences between them. Finally, we further attempted to incorporate the topic distributions as well as the topological constraints achieved from spoken document organization into the HMM segmenter. Very Promising results were initially demonstrated.
Spoken document segmentation is to automatically set the boundaries between different small topics begin mentioned in long steams of audio signals, and divide the spoken documents into a set of cohesive paragraphs of sentences sharing some common central topic. While spoken document organization aims at automatically analyzing the subject topics of the segmented shot paragraphs of the spoken documents, clustering them into groups with topic labels and organizing them into some hierarchical visual presentation easier for users to browse. Both of them have gained growing attention in the past few years. In the thesis, we explored the use of the Hidden Markov Model (HMM) approach, which has been proven effective for speech recognition and information retrieval, in the context of spoken document segmentation. We not only exploited the lexical information inherent in the spoken document, such as the statistical features or the language model probabilities, but also considered the acoustic information, such as the pause distribution and the confidence measure, in identifying segment boundaries. Moreover, the semantic information conveyed in the spoken document was also integrated into the HMM segmenter for accurately modeling the state observation distributions. On the other hand, we investigated two unsupervised and data-driven organization approaches as well for spoken document analysis, i.e., the Self-Organizing Map (SOM) and Probabilistic Latent Semantic Analysis Map (ProbMap). While for the ProbMap approach, a topical mixture model approach (TMMmap), which came from an alternative perspective, was also studied. A series of experiments was conducted on the Topic Detection and Tracking (TDT) spoken document collections in order to analyze the performance levels of these approaches and compare the differences between them. Finally, we further attempted to incorporate the topic distributions as well as the topological constraints achieved from spoken document organization into the HMM segmenter. Very Promising results were initially demonstrated.
Description
Keywords
語音文件切割, 語音文件組織, 自我組織圖, 主題混合模型圖示, Spoken Document Segmentation, Spoken Document Organization, Self-Organization Map, Topic Mixture Model Map