賣場文字評論內容自動面向摘要之研究
No Thumbnail Available
Date
2018
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
線上商場中的評論通常包括對產品或賣家的敘述,但動輒好幾百條的評論,使用者不容易一一瀏覽這些評論內容。若能將上述的評論內容進行摘要總結,將會有助於使用者有效選擇產品。本論文對線上商場的評論建構一個判別評論屬於產品或商家描述的摘要系統,提出以三種類型特徵來建立分類模型。第一種是運用單詞(unigram)頻率特徵,計算評論內所有字在各文字片段的TF-IDF值為特徵值。第二種是以主題模型分析,建立每個文字片段在不同主題數的程度值,作為每個文字片段的主題性特徵值。第三種則透過人為標註不同面向的文字片段內容,找出各面向中文字片段內卡方代表值高的字詞或利用LDA主題底下的字詞當作關鍵詞,再利用Word2Vec計算一個文字片段與各關鍵詞特徵的相似度值。分類後各類的文字片段以LDA分析結果做面向歸納,並將面向內的文字片段透過Word2Vec將語意相似的文字片段合併,進行摘要整理。實驗結果顯示關鍵字詞特徵在商家分類上有較好的分類效果,而主題性特徵結合關鍵字詞特徵在產品分類有較好的分類效果,能有效的區分出商家和產品的文字片段,而摘要結果則有助於使用者有效率瀏覽該商場的資訊。
Online Market reviews usually include descriptions about sellers or products, but it remains a lot of information that users can’t easily browse. If there has a system to summarize these reviews, it will help users choose products efficiently. In this thesis, we construct a system to summarize the snippets of market reviews about sellers or products. We use three types of features to build the classification model for distinguish seller reviews, product reviews, ans other reviews. The first one is the frequency of unigrams. We calculate TF-IDF values of every words in snippets as features. The second is the topic model features. The degrees of LDA topic models of each snippet form the features. The third one is the keyword features. Chi-square value test and LDA topic words are used to select the keywords. Then, Word2Vec is used to calculate the similarity between a snippet and each selected keyword to generate the feature values. After getting the snippets classified into seller reviews and product reviews, we use LDA analysis to cluster the snippets into topics of aspects. Finally, sematics-similar snippets in the same topic are combined according to their Word2Vec to generate the summarization. The result of the experiments shows that using keyword features achieves higher precision for classifing the seller reviews. To combine the topic model feature and keywords feature have better classification result for the product reviews. This system will help users browse the market review more efficiently.
Online Market reviews usually include descriptions about sellers or products, but it remains a lot of information that users can’t easily browse. If there has a system to summarize these reviews, it will help users choose products efficiently. In this thesis, we construct a system to summarize the snippets of market reviews about sellers or products. We use three types of features to build the classification model for distinguish seller reviews, product reviews, ans other reviews. The first one is the frequency of unigrams. We calculate TF-IDF values of every words in snippets as features. The second is the topic model features. The degrees of LDA topic models of each snippet form the features. The third one is the keyword features. Chi-square value test and LDA topic words are used to select the keywords. Then, Word2Vec is used to calculate the similarity between a snippet and each selected keyword to generate the feature values. After getting the snippets classified into seller reviews and product reviews, we use LDA analysis to cluster the snippets into topics of aspects. Finally, sematics-similar snippets in the same topic are combined according to their Word2Vec to generate the summarization. The result of the experiments shows that using keyword features achieves higher precision for classifing the seller reviews. To combine the topic model feature and keywords feature have better classification result for the product reviews. This system will help users browse the market review more efficiently.
Description
Keywords
評論摘要, TF-IDF特徵, 主題特徵, 關鍵字特徵, Review Summary, TF-IDF feature, topic model feature, keywords feature