新聞面向事實自動擷取與整合之研究
No Thumbnail Available
Date
2016
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
網路資訊流通快速,新聞媒體已經從傳統報章雜誌,改以網路平台傳播新聞資訊,但對同一新聞事件,不同媒體報導內容會有部分相似或相異情況,使用者需耗費時間和精力去統整新聞事實資訊。因此,本論文提出自動擷取新聞事實資訊方法,透過擷取報導內文中的主題關鍵詞,挑選出候選主題相關事實句,並以分類方式,判斷出主題相關事實句。在擷取新聞事實方面,基於主題事實句,使用自然語言分析結果,設計擷取面向詞、關聯詞、描述詞的事實三元詞組方法。而在資訊整合方面,同時考慮三元詞組間相似面向和相似描述語意,使用階層式分群對不同面向事實資訊進行分群,並以漸進式合併方法對相似面向或描述語意的事實三元詞組進行合併。實驗結果顯示事實句擷取、詞組擷取與合併都達到良好效果。因此本論文提供的方法能有效自動整合相關報導中的不同面向資訊,讓使用者對某一新聞事件能有效率獲得各方面事實資訊的瞭解。
Internet speeds up the flow of information. News media has replaced traditional newspaper and magazines to spread information online in recent years. However, users have to take much time and effort to get exact fact information from the news documents because the news documents collected from different news media have similar content but may also provide additional facts specifically. For solving this problem, we propose a method to automatically extract and integrate fact information of news documents. The candidates of fact sentences are picked out by extracting the keywords of topics from news contents. Then, various features of the candidate sentences are used to perform classification to identify the fact sentences. In order to provide fact information, the triples consisting of facet term, relation term, and description term, are extracted by using a natural language tool on the topic sentences. Then the similarity of the facet terms between two triples is used to cluster the extracted triples by agglomerative hierarchical clustering. For each cluster of triples, we use the incremental method to combine each pair of triples which have similar facet or description terms in order to provide integrated fact information. The result of performance evaluation shows that the methods of fact sentences extraction, triple extraction and combination all get good performance. The proposed approach can effectively integrate facet information from different news documents, which provides users a comprehensive understanding of news documents.
Internet speeds up the flow of information. News media has replaced traditional newspaper and magazines to spread information online in recent years. However, users have to take much time and effort to get exact fact information from the news documents because the news documents collected from different news media have similar content but may also provide additional facts specifically. For solving this problem, we propose a method to automatically extract and integrate fact information of news documents. The candidates of fact sentences are picked out by extracting the keywords of topics from news contents. Then, various features of the candidate sentences are used to perform classification to identify the fact sentences. In order to provide fact information, the triples consisting of facet term, relation term, and description term, are extracted by using a natural language tool on the topic sentences. Then the similarity of the facet terms between two triples is used to cluster the extracted triples by agglomerative hierarchical clustering. For each cluster of triples, we use the incremental method to combine each pair of triples which have similar facet or description terms in order to provide integrated fact information. The result of performance evaluation shows that the methods of fact sentences extraction, triple extraction and combination all get good performance. The proposed approach can effectively integrate facet information from different news documents, which provides users a comprehensive understanding of news documents.
Description
Keywords
事實句擷取, 新聞事實擷取, 資訊整合, fact sentence extraction, news fact extraction, information integration