基於標籤類別的權重之情感分析分類器
No Thumbnail Available
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
情感分析是自然語言處理的一個子領域,目的是依據文章中表達的正面或負面情感將文章分類。 多項式單純貝氏分類器、補集單純貝氏分類器和支援向量機是情緒分析中常用的三種方法。 為了改善這些分類器的結果,有許多監督/非監督術語權重方法可以用來輔助,這些方法會依據每個字在所有文章中的分佈情況給予不同的權重。 本論文提出了一種基於標籤的監督式術語權重來進一步改進這些分類器,此外,我們也提出使用 AFINN 字典將文字轉換到較低維度的情感特性來進行情感分析,避免過高維度帶來的龐大的計算量。我們分別用F1 分數、ROC 曲線和曲線下面積 (AUC)來比較我們所提的權重調整方法是否能幫助分類器有更好的表現。
Sentiment analysis is a subfield of natural language processing that aims to determine the sentiment expressed in textual materials. Multinomial Naive Bayes, Complement Naive Bayes, and Support Vector Machine are three popular methods in sentiment analysis to classify documents into positive or negative categories. Some supervised/ unsupervised term weight methods have been developed to adjust the corresponding weight of a word. In this thesis, we propose a label-based supervised term weighting which considers the sentiment labels (positive or negative) of a document not only when computing the adjusted term weight but also when applying these weights to the whole data. By doing so, more sentiment information can be captured. Additionally, we propose using AFINN lexicon along with these adjusted term weights to further improve the classifiers. Applications of our methods to three data sets are presented and their corresponding F1-score, ROC curve and AUC are given.
Sentiment analysis is a subfield of natural language processing that aims to determine the sentiment expressed in textual materials. Multinomial Naive Bayes, Complement Naive Bayes, and Support Vector Machine are three popular methods in sentiment analysis to classify documents into positive or negative categories. Some supervised/ unsupervised term weight methods have been developed to adjust the corresponding weight of a word. In this thesis, we propose a label-based supervised term weighting which considers the sentiment labels (positive or negative) of a document not only when computing the adjusted term weight but also when applying these weights to the whole data. By doing so, more sentiment information can be captured. Additionally, we propose using AFINN lexicon along with these adjusted term weights to further improve the classifiers. Applications of our methods to three data sets are presented and their corresponding F1-score, ROC curve and AUC are given.
Description
Keywords
情感分析, 單純貝氏分類器, 監督式權重調整, sentiment analysis, Naive Bayes, Supervised term weighting