基於集成學習方法進行謠言偵測

dc.contributor侯文娟zh_TW
dc.contributorHou, Wen-Juanen_US
dc.contributor.author陳煒鈞zh_TW
dc.contributor.authorChen, Wei-Jiunen_US
dc.date.accessioned2023-12-08T08:02:37Z
dc.date.available9999-12-31
dc.date.available2023-12-08T08:02:37Z
dc.date.issued2022
dc.description.abstract網路社交媒體充斥著假消息,連牛津辭典在2016年都將"Post-Truth"列為一個詞彙,錯誤的資訊可能對人造成危害,所以建構一個能夠辨識網路上各種不一樣說法、消息的系統是一個重要的議題。本研究利用預訓練語言模型搭配文字以外的特徵建立出一套辨識謠言的系統,辨識在社交媒體Twitter及Reddit使用者發表內容的真偽。 本論文的資料集來自SemEval 2019 RumourEval: Determining rumour veracity and support for rumours (SemEval 2019 Task 7)的任務B,該任務將Twitter及Reddit上的句子經由人工標註分為3類,真(True)、假(False)、未驗證(Unverified),本研究先經由資料增強的方式增加資料量,接著以不同的語言模型(RoBERTa、ALBERT)及傳統分類(SVM)個別進行訓練,再將不同的模型組合進行集成學習(Ensemble Learning),訓練並給予不同的權重,最後加上後處理達到Marco F1 72 %,RMSE 0.5879的成績。zh_TW
dc.description.abstractMedia is full of false claims. Even Oxford Dictionaries named “post-truth” as the word in 2016. Misinformation can be harmful to people, so constructing a system that can identify true/false news and statements is an important issue. In this thesis, we use a pretraining language model and some external features to build a system that can identify rumours that are published by users on social media Twitter and Reddit.In this study, the dataset for our experiments is from task B of SemEval 2019 RumourEval: Determining rumour veracity and support for rumours (SemEval 2019 Task 7). The task divides sentences on Twitter and Reddit into three categories by human annotation, and the label of sentences can be "True","False",and "Unverified". Our research first increases the amount of data by means of data augmentation. Secondly, we train different language models (RoBERTa、ALBERT) and traditional classifiers (SVM)individually. Next, different models are combined for ensemble learning and different weights are given through training. After applying post-processing, the scores of Macro F1 72% and RMSE 0.5879 are achieved.en_US
dc.description.sponsorship資訊工程學系zh_TW
dc.identifier60947040S-41365
dc.identifier.urihttps://etds.lib.ntnu.edu.tw/thesis/detail/da142f23c82996d1f6d1bf9b7d9708c2/
dc.identifier.urihttp://rportal.lib.ntnu.edu.tw/handle/20.500.12235/121577
dc.language中文
dc.subject語言模型zh_TW
dc.subject深度學習zh_TW
dc.subject假新聞zh_TW
dc.subject集成學習zh_TW
dc.subjectLanguage Modelen_US
dc.subjectDeep Learningen_US
dc.subjectFake newsen_US
dc.subjectEnsemble Learningen_US
dc.title基於集成學習方法進行謠言偵測zh_TW
dc.titleUsing Ensemble Learning Methods on Social Media Rumours Detectionen_US
dc.typeetd

Files

Collections