多個專有詞彙概念解釋句語意關連自動分析組織之研究
No Thumbnail Available
Date
2010
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
本論文研究以電子書作為內容來源,針對兩個特定領域專有詞彙的概念解釋句,進行自動擷取以及分群組織整理。為了克服傳統上使用字詞頻率建構特徵向量卻忽略隱含語意關係的缺點,本論文提出計算句子中出現的所有字詞對選取的特徵字詞之語意相似關係,來對句子建立MI特徵向量,進行句子分群。從分群的結果中選定可以代表分群概念的標籤,使用標籤來重新組織概念架構,並且在分群中挑出可以代表兩個專有詞彙的比較句。
In this thesis, we use PDF textbook as data resource, focus on comparing the conceptual sentences of two domain-specific terms .We first calculate the mutual information of every word in sentence and selected feature words to build MI vector space model. The vector space model is used to evaluate the similarity of two sentences for the hierarchical clustering algorithm. After clustering, we choose representative labels and comparative sentence pair for every cluster. According representative labels, the clusters which have the same labels will be grouped as a new concept hierarchy.
In this thesis, we use PDF textbook as data resource, focus on comparing the conceptual sentences of two domain-specific terms .We first calculate the mutual information of every word in sentence and selected feature words to build MI vector space model. The vector space model is used to evaluate the similarity of two sentences for the hierarchical clustering algorithm. After clustering, we choose representative labels and comparative sentence pair for every cluster. According representative labels, the clusters which have the same labels will be grouped as a new concept hierarchy.
Description
Keywords
資料探勘, 資訊檢索, 句子分群, 自動摘要, Data Mining, Information Retrieval, Sentence Clustering, Automatic Summarization