兩個專有詞彙概念關聯句 自動擷取技術之研究
No Thumbnail Available
Date
2011
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
本論文之研究目的是針對特定專業領域的電子書為文件集,根據讀者輸入的兩個專有辭彙作為查詢詞彙,自動擷取出兩個專有詞彙概念關聯句組,以方便讀者了解兩個查詢詞彙在各共同概念詞底下之異同處。從電子書擷取出包含各別查詢詞彙之句子後,我們透過各共同候選概念詞與兩個查詢詞彙之字詞關聯性,及各分組之語意一致性,評估每一個共同候選概念詞之語意關聯代表度,自動找出與兩個查詢詞彙具高語意關聯之共同概念詞。接下來,針對每一個共同概念詞,從兩個查詢詞彙個別之句子集中,找出與查詢詞彙以及共同概念詞具高語意關聯度之句子,形成兩個查詢詞彙在共同概念詞底下之關聯代表句組。此外,由於一個句子所能表達的內容有限,因此我們也提出如何找出代表句在書中語意相關擴展段落的技術。實驗結果顯示本研究方法能有效擷取出與兩個專有詞彙相關的共同概念詞,而以關聯句組分數篩選後所找出的概念關聯句組多有助於使用者釐清兩個查詢詞彙異同處,特別是在提供擴展段落後確實可提升使用者對兩個專有詞彙的了解度。
This thesis studies the strategies of automatically extracting concept related sentence pairs of two domain-specific query terms from domain-specific eBooks. The goal of extracting the sentence pairs is to describe the similar and different points on common concepts of the two query terms for users. First, the sentences that contain one of the two query terms are retrieved from the eBooks. Then the semantic relatedness degree of a common concept term is obtained by evaluating not only the relatedness between the concept term and the two query terms but also the semantic consistence of the corresponding sentence set of the concept term. Accordingly, the common concept terms with the top-k highest semantic relatedness degree are extracted. Next, for each extracted common concept, two sentences which totally have the highest semantic relatedness degree both with one of the two query terms and with the common concept term are selected from the corresponding sentence set to form a pair of concept related sentences. For solving the limited semantics described by a sentence, we also propose a method to discover an expanded paragraph for each concept related sentence. The experimental results show that the method proposed by this thesis effectively extracts common related concept terms of two query terms. Besides, after filtering the sentence pairs according to their semantic relatedness scores, most of the discovered concept related sentence pairs help users clarify the two query terms. Especially, the users’ understanding of the two query terms is further improved after reading the provided expanded paragraphs of the concept related sentence pairs.
This thesis studies the strategies of automatically extracting concept related sentence pairs of two domain-specific query terms from domain-specific eBooks. The goal of extracting the sentence pairs is to describe the similar and different points on common concepts of the two query terms for users. First, the sentences that contain one of the two query terms are retrieved from the eBooks. Then the semantic relatedness degree of a common concept term is obtained by evaluating not only the relatedness between the concept term and the two query terms but also the semantic consistence of the corresponding sentence set of the concept term. Accordingly, the common concept terms with the top-k highest semantic relatedness degree are extracted. Next, for each extracted common concept, two sentences which totally have the highest semantic relatedness degree both with one of the two query terms and with the common concept term are selected from the corresponding sentence set to form a pair of concept related sentences. For solving the limited semantics described by a sentence, we also propose a method to discover an expanded paragraph for each concept related sentence. The experimental results show that the method proposed by this thesis effectively extracts common related concept terms of two query terms. Besides, after filtering the sentence pairs according to their semantic relatedness scores, most of the discovered concept related sentence pairs help users clarify the two query terms. Especially, the users’ understanding of the two query terms is further improved after reading the provided expanded paragraphs of the concept related sentence pairs.
Description
Keywords
共同概念詞, 概念關聯代表句組, 擴展段落, 專有詞彙, 語意關聯度, 電子書, common concepts, concept related sentence pairs, expanded paragraph, domain-specific term, semantic relatedness, e-Books