旅遊評論關注面向與不一致性分析研究
No Thumbnail Available
Date
2018
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
網路的便利性改變人們的消費習慣和店家的經營模式,許多人在進行購物前習慣上網先查詢相關評價再決定是否購買,希望購買的物品能達到預期的效益。店家則希望消費者在購物體驗後能上網留下評價,這些評論能夠吸引更多人關注並且提供店家維持品質和改善的方向。一篇評論通常包含使用者給予的星等分數和意見,當評論文章數量變多,經過觀察會發現其中有些評論的星等分數和意見內容不符合,像是使用者給予5顆星的正面評分但是留下的意見卻都是許多缺失和抱怨,就是所謂的不一致現象。
本論文使用的資料來自於TripAdvisor國際旅遊評論網站,實驗資料選自台北市知名7間飯店。研究目的有二:第一個目的是擴充情感字典裡的詞彙數量,透過自建擴充的情緒詞彙庫和所提出的情緒計算模組能自動賦予每個詞彙情緒分數,分析評論文章的不一致性,以便提供有效的評論意見供旅客做為參考依據。第二個目的是找出評論文章裡的面向詞(Aspect term),將所有面向詞映射到向量空間後使用分群演算法進行分群,希望意義相近的詞彙能夠分到同一類並找出能夠代表此類的代表字,當使用者想要查看所在意面向的評論文章時,不需要每篇評論都要看過,而是能夠透過分析出來的代表字快速找到有關此面向的評論文章,也能更細部的分析各個面向的正負面評價。
本研究提出三種基於不同規則的統計算法辨識評論文章的不一致性,其中使用去掉最低分做算術平均數之規則,系統準確率可達到85.7%。關注面向部分,使用Word2vec產生詞向量,利用K-Means和Fuzzy C-Means將面向詞分群,並找出每群的代表字。研究結果顯示,使用Fuzzy C-Means分群找出的代表字較能區分各種不同面向。
The convenience of the Internet changes people's buying behaviors and business models. On the one hand, many people tend to do online research about related reviews before making decisions, and hope the goods they purchased would fulfill their expectations. On the other hand, retailers hope consumers leave shopping reviews online in order to draw more attention and offer a direction for the shops to improve product and service quality. A review generally includes a rating score and comments. However, sometimes when the amount of reviews grows to a certain number, there might be some rating scores not fitting in with the comments. For example, a user gave a five-star rating score, but the comment included the complaint about the service and product quality. In this study, such situations are so-called inconsistent phenomena. This thesis downloaded the review data of seven well-known hotels in Taipei, Taiwan from TripAdvisor, an international traveler review site. There are two key objectives for this research. The first one is to expand the emotional vocabulary list, by presenting a formula, taking emotional vocabulary as the parameter, to generate a related score to each of the word. The study uses the scores to analyze each of the comments and their inconsistency, and further to provide travelers reliable opinions accordingly. The second objective is to find the related aspect term from the reviews, to project the terms to vector spaces where the study applies a clustering algorithm to group them. The aim of this step is to find a core term to represent the similar words. Therefore, when users want to check the reviews about the topics that they care about the most, they do not need to read each of the reviews thoroughly. In short, they could use the analyzed core term to find articles about this aspect, as well as analyzing the positive and negative reviews in a more detailed way regarding to each aspect. The research offers three methods to recognize the inconsistency in comments. The third method calculates the average score after removing the lowest scores, which makes the system reach the accuracy of 85.7%. Regarding to the aspect part, the study uses Word2vec to produce word vectors, and furthermore applies K-Means and Fuzzy C-Means to group the terms and find the core one among each group. The study results show that Fuzzy C-Means method generates the better core terms to distinguish different aspects than K-means method.
The convenience of the Internet changes people's buying behaviors and business models. On the one hand, many people tend to do online research about related reviews before making decisions, and hope the goods they purchased would fulfill their expectations. On the other hand, retailers hope consumers leave shopping reviews online in order to draw more attention and offer a direction for the shops to improve product and service quality. A review generally includes a rating score and comments. However, sometimes when the amount of reviews grows to a certain number, there might be some rating scores not fitting in with the comments. For example, a user gave a five-star rating score, but the comment included the complaint about the service and product quality. In this study, such situations are so-called inconsistent phenomena. This thesis downloaded the review data of seven well-known hotels in Taipei, Taiwan from TripAdvisor, an international traveler review site. There are two key objectives for this research. The first one is to expand the emotional vocabulary list, by presenting a formula, taking emotional vocabulary as the parameter, to generate a related score to each of the word. The study uses the scores to analyze each of the comments and their inconsistency, and further to provide travelers reliable opinions accordingly. The second objective is to find the related aspect term from the reviews, to project the terms to vector spaces where the study applies a clustering algorithm to group them. The aim of this step is to find a core term to represent the similar words. Therefore, when users want to check the reviews about the topics that they care about the most, they do not need to read each of the reviews thoroughly. In short, they could use the analyzed core term to find articles about this aspect, as well as analyzing the positive and negative reviews in a more detailed way regarding to each aspect. The research offers three methods to recognize the inconsistency in comments. The third method calculates the average score after removing the lowest scores, which makes the system reach the accuracy of 85.7%. Regarding to the aspect part, the study uses Word2vec to produce word vectors, and furthermore applies K-Means and Fuzzy C-Means to group the terms and find the core one among each group. The study results show that Fuzzy C-Means method generates the better core terms to distinguish different aspects than K-means method.
Description
Keywords
不一致性, 旅遊評論, 面向意見探勘, 非監督式學習, 自然語言處理, Inconsistency, Travel reviews, Aspect opinion mining, Unsupervised learning, NLP