從新聞擷取命名實體與應用
No Thumbnail Available
Date
2021
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
每天透過新聞的閱覽,可以得到許多的資訊,而這些最新的資訊是屬於「現在」的資訊。隨著每天不斷的累積著「現在」的資訊,堆積成許許多多雖然不是最新,但是仍然重要的「歷史」資訊。本研究試著從新聞擷取命名實體並加以應用,試圖從中挖掘有意義且寶貴的資訊。研究分為三階段,第一階段為分析文字類型,依據分析結果將文字歸類,將非結構化資料轉為結構化資料。第二階段為將結構化資料導入資料庫,利用關聯式資料庫建立結構化資料,以備未來分析。第三階段為建置搜尋與呈現資料的系統,以時間來看當時的人物與職務狀況,或是以人物來看時間與職務的變化。在這個系統中目前已完成四種搜尋的功能系統,分別是:查詢職務、查詢姓名、查詢地名-職務、查詢機關-職務。遇到最大的困難在資料的質與量。若是能做更好的命名實體識別,尤其是在姓名與職務的部分,那就能夠讓這系統更精確。
Through reading news paper every day, you can get a lot of information, and the latest information is"nowaday" information. With the accumulation of " nowaday " information every day, lot of "historical" information that is not the latest but still very valued. This research extracts name entity from news and try to applicate collected data, to earn more meaningful and valuable information. The three stages of this research are to analyze the type of text, classify the text by analyze the results, convert the unstructured data. into structured data. The second stage is to import structured datainto the database, using relational database to contain these data for future analysis. The third stage is to construct a system for searching and presenting data, look for a person’s career through time, or look for personal career changes in a period time.Among the four search function systems currently completed in this system, include: career searching, name searching, place-name searching, party-name searching.The greatest difficulty encountered, can be in the quality and quantity of the data. If you can make your own identification, especially in the name entity extract and classify, surely will make this system more accurate.
Through reading news paper every day, you can get a lot of information, and the latest information is"nowaday" information. With the accumulation of " nowaday " information every day, lot of "historical" information that is not the latest but still very valued. This research extracts name entity from news and try to applicate collected data, to earn more meaningful and valuable information. The three stages of this research are to analyze the type of text, classify the text by analyze the results, convert the unstructured data. into structured data. The second stage is to import structured datainto the database, using relational database to contain these data for future analysis. The third stage is to construct a system for searching and presenting data, look for a person’s career through time, or look for personal career changes in a period time.Among the four search function systems currently completed in this system, include: career searching, name searching, place-name searching, party-name searching.The greatest difficulty encountered, can be in the quality and quantity of the data. If you can make your own identification, especially in the name entity extract and classify, surely will make this system more accurate.
Description
Keywords
命名實體, 新聞, 查詢系統, 資訊擷取, 關聯性資料庫, none