利用機器學習填補遺漏值的比較與研究

dc.contributor呂翠珊zh_TW
dc.contributorLu, Tsui-Shanen_US
dc.contributor.author陳柏瑋zh_TW
dc.contributor.authorChen, Po-Weien_US
dc.date.accessioned2023-12-08T07:55:59Z
dc.date.available2027-08-10
dc.date.available2023-12-08T07:55:59Z
dc.date.issued2022
dc.description.abstract本研究主要探討具有遺漏值的數據通過多種機器學習方法填補後之比較。遺漏值的填補是進行資料分析的重要過程,若隨意刪除或簡易替換,可能會導致後續的統計分析出現重大偏差,因此,在可用的填補方法中進行有效的選擇至關重要。我們利用近期熱門的機器學習填補法 K-鄰近算法 (K-Nearest Neighbor)、鏈式方程多重填補法 (Multivariate Imputation by Chained Equations) 及缺失森林 (MissForest) 等三種方法進行了模擬研究。在各種隨機遺漏設置下,當數據是完全續、完全類別或混合型數據集時,以評估每種方法的各自結果,結果表明,利用缺失森林 (MissForest) 方法來對資料進行填補時,其正規化方根均差 (NRMSE) 或是類別錯誤率 (PFC) 都有著最好的表現。我們還將三種方法應用於幾個實徵數據集上,結果顯示缺失森林皆優於其他兩種機器學習填補法。zh_TW
dc.description.abstractThis study explores the comparison of data with missing values after imputation bymultiple machine-learning methods. The imputation of missing values is an important process in data analysis. If the missing values are arbitrarily deleted or simply substituted, it may lead to substantial bias in the subsequent statistical analysis. Therefore, the effective selection among available imputation methods is extremely crucial.In this paper, we consider the recent machine-learning imputation methods, K-Nearest Neighbor, Multivariate Imputation by Chained Equations and MissForest. We conduct simulation studies for all-continuous, all-categorical and mixed data to evaluate the respective results from each method under various settings of random omission. The results show that the MissForest method has the best performance in terms of NRMSE and PFC. We also apply three methods to several real data sets.en_US
dc.description.sponsorship數學系zh_TW
dc.identifier60940022S-41863
dc.identifier.urihttps://etds.lib.ntnu.edu.tw/thesis/detail/2c4e5f415335317d824919e74b6baef1/
dc.identifier.urihttp://rportal.lib.ntnu.edu.tw/handle/20.500.12235/121102
dc.language英文
dc.subject遺漏值zh_TW
dc.subject機器學習zh_TW
dc.subjectK-鄰近算法zh_TW
dc.subject鏈式方程多重填補法zh_TW
dc.subject缺失森林zh_TW
dc.subjectImputation of missing valuesen_US
dc.subjectK-Nearest Neighboren_US
dc.subjectMultivariate Imputation by Chained Equationsen_US
dc.subjectMissForesten_US
dc.title利用機器學習填補遺漏值的比較與研究zh_TW
dc.titleComparison of multiple machine-learning methods of imputationen_US
dc.typeetd

Files

Collections