應用關鍵字差異分析於立法委員選舉得票率預測之研究
No Thumbnail Available
Date
2017-10-??
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
國立台灣師範大學圖書資訊學研究所
Graduate institute of library and information studies ,NTNU
Graduate institute of library and information studies ,NTNU
Abstract
為探討臺灣立法委員選舉與網路新聞之關係及是否可使用網路新聞進行預測,我們使用2002年1 月1 日至2009 年12 月31 日之udn 網路新聞文本進行模型設計及訓練,以2008 年立法委員選舉得票率預測2012 年立法委員選舉得票率,並將實際得票率及預測得票率進行差異分析,觀察其相關性。而在視覺化部份,我們使用候選人詞彙差集和交集之社群網路圖呈現,快速呈現候選人特色。由於實際得票率影響變因十分複雜,在本研究中分析結果最佳平均絕對誤差約7%、相關係數約0.5,預測結果雖非十分準確,但其作為其中一類網路意見,用以補充電話民調,仍具有參考價值。本研究之主要貢獻在於應用自然語言處理及機器學習建構立法委員得票率之模型,說明如何處理稀疏矩陣及特徵選取之問題,最後我們說明情感分析的進一步應用,期許未來能有效從網路文本中提取有用資料以建構不同應用模型。
The relationship between Taiwan’s national legislative election results and web‐based news is explored through multiple regression models applied to news content taken from the online version of one of Taiwan’s major daily newspapers from January 1, 2002 to December 31, 2009. The 2008 election results were used as training data, while that of 2012 was used for testing to evaluate the predictive value of online news content for election results. The best results featured an MAE of about 7% with a Pearson correlation coefficient of about 0.5. Although these results lack precision, they can still serve as a reference for online political opinion. The models are constructed using natural language processing and machine learning, and address the sparse matrix problem with feature selection. Future work will integrate sentiment analysis to improve model performance.
The relationship between Taiwan’s national legislative election results and web‐based news is explored through multiple regression models applied to news content taken from the online version of one of Taiwan’s major daily newspapers from January 1, 2002 to December 31, 2009. The 2008 election results were used as training data, while that of 2012 was used for testing to evaluate the predictive value of online news content for election results. The best results featured an MAE of about 7% with a Pearson correlation coefficient of about 0.5. Although these results lack precision, they can still serve as a reference for online political opinion. The models are constructed using natural language processing and machine learning, and address the sparse matrix problem with feature selection. Future work will integrate sentiment analysis to improve model performance.