利用剖析樹結構探討論壇評論之特徵與意見詞配對關係

No Thumbnail Available

Date

2017

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

隨著網際網路的蓬勃發展,人們的消費習慣逐漸傾向網路購物,然而在尚未見到實體的情況下,往往會被官方「美好」的商品照片及描述所矇蔽,因為官方往往帶有主觀的推銷目的而不會將產品真正的優劣寫出來,故網友的評論就具有很大的參考價值,這也是本研究進行「分析評論」以達成產品推薦的主要原因。 本研究從巴哈姆特論壇中找尋該產品的相關評論,利用中研院剖析器逐一進行分析,從中找到標記為Head Na系列之詞彙 (本研究稱為特徵詞)及標記為VH、A系列之詞彙(本研究稱為意見詞),由於網路評論大多為非正式中文,故在語料庫之擷取上本論文秉持著只要有一個特徵詞或是意見詞就採納。利用投票的方式建構出特徵詞的資料庫,意見詞資料庫的建構部分則是與台大的情緒字典(NTUSD)比對,並利用物以類聚法、教育部重編字典和人工標記等方式加以補充,建構好之資料庫可用於處裡分群及给定分數等工作,並利用Aspect Based Semantic Analysis (ABSA)的核心概念,藉由剖析樹進行特徵及意見詞的配對。在輸出方面會提供使用者該產品的各項評論之特徵、意見詞、意見詞的情感分數、特徵及意見詞之配對及整體產品的分數等,以期提供評論之重要資訊給使用者。 本論文的最後的實驗數據在特徵詞分群上有著81.8%的正確率、意見詞的分群上有著87.71%的正確率,特徵詞語意見詞之配對正確率有著87.13%,而最後與日本亞馬遜的推薦與否在星等上有著90%的相似度,IDF值上有著70%的相似度。
As the development of Internet, people’s consumption habits grow to tend to shopping in the online shop. However, we are usually deceived by the ‘beautiful pictures and words’ without seeing the real items. We analyze the comments which were written by netizens in the forum to avoid the manufacturer’s marketing purpose that makes us confusion that which advantages are right. This is the reason why we choose to explore the forum comments in the study. In the thesis, the study retrieve the comments in ‘Bahamūt Forum’ and then parse the reviews by CKIP(Chinese Knowledge Information Processing) parser. We extract the words with tags ‘Head Na’ as the features words, and extract the words with tags ‘VH’ or ‘A’ as the opinion words. The comments in the forum are usually unofficial, so the sentences are maybe not complete. Thus, if the sentence has one of features words or opinion words, the system will extract it. The study uses the majority vote strategy to construct the Feature_Words_Database, the Opinion_Words_Database is constructed by NTUSD, the distance from Positive_Words to Negative_Words, and the dictionary revised by the Ministry of Education. These databases are used for classification and scoring tasks. Based on the concept of ABSA(Aspect Based Semantic Anlysis), a pair of the feature word and opion word is generated. The output includes the information of feature words, opinion words, the score of the production and the pair of feature words and opinion words that can be offered to users for their reference. The experiments show the precision of feature word classification is 87.71% and opinion words classification is 81.8%. The precision of pair matching is 87.13%. Finally, the similarity of stars between the system and amazon.jp is 90%, and the similarity of IDF number between the system and amazon.jp is 70%.

Description

Keywords

意見探勘, 剖析樹結構, 論壇評論, PVC人形模型, opinion mining, parse tree structure, forum reviews, PVC figure model

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By