使用廣義知網於情感詞彙之極性分析研究
No Thumbnail Available
Date
2015
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
近幾年隨著網路的快速發展,我們可以根據自己的需求,很方便的找到各式各樣相關的資料。在消費前,人們往往習慣於收集評論和分析做為參考;而評論中出現的情感詞彙更是影響使用者看法的指標。採用人工的方式找出意見詞彙,雖然準確度高,卻相當耗費時間和人力,更永遠不可能趕上資訊產生的速度。
在此本論文提出一種非監督的方法,過程不需要人工的介入。主要目的是分析電影領域的評論文章,從中找出帶有情感的詞彙,並給予極性。本論文分兩大部分處理此問題,第一部分透過中文的語法規則找出情感詞彙可能出現的位置,收集這些位置出現的詞彙做為種子,接著透過廣義知網進行擴充。本研究統計廣義知網對部分詞彙情緒標記的正負數目,給予一個類別中的成員相同的極性。
在第二部分中,針對國立臺灣大學情緒詞詞典NTUSD(舊版)進行斷詞分析,再次透過廣義知網擴充,找出可能的情緒詞彙。對於無法由廣義知網部分詞彙的情緒標記而得到極性的詞彙,和NTUSD(舊版)進行完全比對,試著納入更多的擴充詞彙。最後利用前幾步驟中得到的類別整體極性,為帶有複雜概念結構的詞彙分類極性。
結合兩部分後,本研究以人工方式選出的980個情緒詞彙做測試,實驗結果顯示可以得到92.48%的正確率。
The sentiment vocabularies are the most powerful key point which can influence user’s opinion in commends. It is very time-wasted and costs people lots of efforts to manually make the polarity classification. Besides, it is impossible for us to catch the speed of information produced in the World Wide Web. The thesis proposes an unsupervised method to deal with the problem of the polarity classification. The goal is to analyze the commends in the movie domain, to find the sentiment vocabularies, and to classify them with the polarity. The research consists of two main parts. In the first part, the Chinese syntactic rules are built to find the positions where the sentiment vocabularies may appear. The vocabularies in the positions are collected as the seeds, and then E-HowNet is utilized to expand the sentiment vocabularies. In the second part, the terms in NTUSD are segmented and served as seeds, and E-HowNet is employed subsequently. The terms in NTUSD are used to determine the polarity of the words which can't be classified in the preceding steps. At last, we use the polarity of the class to classify the structural words in E-HowNet. Combining with the two parts, there are 980 sentiment vocabularies chosen as the test data in a man-made fashion. The result shows a good performance of 92.48% accuracy.
The sentiment vocabularies are the most powerful key point which can influence user’s opinion in commends. It is very time-wasted and costs people lots of efforts to manually make the polarity classification. Besides, it is impossible for us to catch the speed of information produced in the World Wide Web. The thesis proposes an unsupervised method to deal with the problem of the polarity classification. The goal is to analyze the commends in the movie domain, to find the sentiment vocabularies, and to classify them with the polarity. The research consists of two main parts. In the first part, the Chinese syntactic rules are built to find the positions where the sentiment vocabularies may appear. The vocabularies in the positions are collected as the seeds, and then E-HowNet is utilized to expand the sentiment vocabularies. In the second part, the terms in NTUSD are segmented and served as seeds, and E-HowNet is employed subsequently. The terms in NTUSD are used to determine the polarity of the words which can't be classified in the preceding steps. At last, we use the polarity of the class to classify the structural words in E-HowNet. Combining with the two parts, there are 980 sentiment vocabularies chosen as the test data in a man-made fashion. The result shows a good performance of 92.48% accuracy.
Description
Keywords
自然語言處理, 情緒分析, 中文處理, 廣義知網, 情感詞典, NLP, sentiment analysis, Chinese parser, E-HowNet, semantic dictionary