利用隨機交互森林預測模型之應用
No Thumbnail Available
Date
2023
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
根據生物、工業,以及商業統計資料,對於不同領域下的預測分析,舉例客戶行為、消費者需求或股票價格波動以及診斷病人等等,從中探討重要變數之間的交互作用,達到模型更準確的預測結果,本研究套用了隨機森林演算法,考慮交互效應予以改善模型並允許對解釋變數做交互作用進行有價值的洞察效果,而隨機交互作用森林(Random Interaction Forest, RIF)是隨機森林(Random Forest, RF)所衍生出來的一種新策略演算法,適合用於類別、連續變數或存活等資料型態加以預測,並明確地模擬建構森林中的決策樹所執行變數之間定性與定量的相互作用。在模擬研究中,使用了R包套件中"vivid"(Variable Importance and Variable Interactions Displays),呈現了機器學習模型中變數之間的重要性以及交互作用的可視覺化工具,同時也使用了R包中"diversityForest",透過投票分割抽樣,在隨機森林中進行複雜的分類程序,使用雙變數拆分對定量和定性交互效應進行建模。
交互森林(Interaction Forest, IF)帶有效果重要性度量(Effect Importance Measure, EIM),可用於識別具有高預測相關性的定量和定性交互作用的變數做應對。IF和EIM專注於易於解釋的交互形式。透過新的隨機交互森林結構,檢驗了線性迴歸模型、邏輯迴歸模型,增添了機器學習預測模型的能力。研究結果表明,當RIF模型存在交互作用時,不僅優於隨機森林和邏輯、迴歸分析方法。同時,證實RIF在執行許多情況下比傳統統計方法所創建的模型識別來的更為準確。並且交互作用為顯著時,RIF的性能也顯得更加優越表現,表示使用此方法不但可以提高業務流程和科學研究的效率。而且RIF在預測建模中的辨識度以及利用交互效果的部分都相對容易解釋,這是一項具有挑戰性且合適的工具。本文將透過這些方法的檢測應用於2012~2016年台北市死亡數實際資料進行評估。
According to biological, industrial, and commercial statistical data, for predictive analysis in different fields, such as customer behavior, consumer demand or stock price fluctuations, and patient diagnosis etc., we can explore the interaction between important variables to achieve a more accurate model. To predict the results, this thesis applies the random forest algorithm, considers the interaction effect to improve the model and allows valuable insight into the interaction of explanatory variables. The random interaction forest (RIF) is a random forest and it is a new strategy of algorithm, suitable for categorical, continuous and survival prediction outcomes. It explicitly models the qualitative and quantitative interactions between variables implemented by decision trees in construction forests.In the simulation study,"Vivid" (Variable Importance and Variable Interactions Displays) in the R package was used to present a visualization tool for the importance and interaction between variables in the machine learning model, and "diversityForest" in the R package was also used, with split sampling by vote, complex classification procedures in random forests, modeling quantitative and qualitative interaction effects using bivariate splits. The interactional forest with an effect importance measure (EIM) can be used to identify variable responses for quantitative and qualitative interactions with high predictive correlations. Feature Interaction (FI) and EIM focus on easily interpretable forms of interaction. Through the new random interaction forest structure, the linear regression model and logistic regression model are tested, and the ability of the machine learning prediction model is added. The results of the simulation show that the RIF model is not only superior to the random forest and logistic and regression analysis methods, but also gives more accurate results than models created by traditional statistical methods. When the interaction is more significant, the performance of RIF is more superior, indicating that this method can improve the efficiency of business processes and scientific research. Moreover, RIF's recognizability in predictive the model and use of interaction effects are relatively easy to interpret. We believe that it is a challenging and suitable tool in the future. In this paper, the prediction is applied to the actual data of the number of deaths in Taipei City from 2012 to 2016 for evaluation by the method.
According to biological, industrial, and commercial statistical data, for predictive analysis in different fields, such as customer behavior, consumer demand or stock price fluctuations, and patient diagnosis etc., we can explore the interaction between important variables to achieve a more accurate model. To predict the results, this thesis applies the random forest algorithm, considers the interaction effect to improve the model and allows valuable insight into the interaction of explanatory variables. The random interaction forest (RIF) is a random forest and it is a new strategy of algorithm, suitable for categorical, continuous and survival prediction outcomes. It explicitly models the qualitative and quantitative interactions between variables implemented by decision trees in construction forests.In the simulation study,"Vivid" (Variable Importance and Variable Interactions Displays) in the R package was used to present a visualization tool for the importance and interaction between variables in the machine learning model, and "diversityForest" in the R package was also used, with split sampling by vote, complex classification procedures in random forests, modeling quantitative and qualitative interaction effects using bivariate splits. The interactional forest with an effect importance measure (EIM) can be used to identify variable responses for quantitative and qualitative interactions with high predictive correlations. Feature Interaction (FI) and EIM focus on easily interpretable forms of interaction. Through the new random interaction forest structure, the linear regression model and logistic regression model are tested, and the ability of the machine learning prediction model is added. The results of the simulation show that the RIF model is not only superior to the random forest and logistic and regression analysis methods, but also gives more accurate results than models created by traditional statistical methods. When the interaction is more significant, the performance of RIF is more superior, indicating that this method can improve the efficiency of business processes and scientific research. Moreover, RIF's recognizability in predictive the model and use of interaction effects are relatively easy to interpret. We believe that it is a challenging and suitable tool in the future. In this paper, the prediction is applied to the actual data of the number of deaths in Taipei City from 2012 to 2016 for evaluation by the method.
Description
Keywords
交互作用, 隨機森林, 隨機交互森林, 機器學習, 迴歸分析, interaction effect, random forests, random interaction forests, machine learning, regression analysis