基於Word2vec與XGBoost方法之急診住院預測研究
No Thumbnail Available
Date
2021
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
急診壅塞問題將增加病患等待時間與病患多樣性亦造成醫療資源配置的困難,故能於病患檢傷階段進行住院預測可將醫療資源配置給急需急診醫療資源之患者。本研究以「台北馬偕教學醫院」之2011年至2018年共計八個年度,1,065,480筆急診病患於檢傷階段可得之主訴、基本資料與診斷資料為研究資料,用以預測住院可能性。研究首先採用自然語言處理之Word2vec詞嵌入語言模型,由主訴篩選住院相關之語意關聯詞。研究進而將結果整合於集成演算法之XGBoost方法進行後續住院預測。本研究涵蓋一系列機器學習流程,包含了結構化資料與主訴資料前處理、主訴之否定詞處理、不平衡資料集處理、Word2vec與XGBoost模型建立及評估。研究結果發現透過整合Word2vec與XGBoost之結果,AUC指標可達0.77;此外,透過檢傷一與檢傷五級資料的整併甚至AUC可達0.89,遠高於過去相關研究。研究推論兩者極端的檢傷資料集可展現弱分類器XGBoost的優勢,因而可顯著提升預測力。研究方法與發現提供急診住院預測參考並希冀提升急診室資源有效配置。
Overcrowded conditions in the emergency departments (EDs) have increased patient’s waiting time, while the variety of patient’s afflictions have caused difficulties in the allocation of medical resources. Therefore, the ability to predict a patient's hospital admission at the time of triage could allocate medical resources to patients who attend the EDs in urgent need of immediate care. With the dataset from the MacKay Memorial Hospital in Taipei (Taiwan), which contains over 1 million records collected from 2011 to 2018, we aim to have on hand chief complaints (CCs), demographic data, administration information and clinical information at the triage stage to predict the probability of a patient’s hospital admission. Firstly, we integrated the word embedding approach in natural language processing, namely Word2vec, to select terms with a semantic relationship from the CCs used to predict which patients may require eventual hospitalization. We then integrated Word2vec with the ensemble learning approach, that is XGBoost, to predict the probability of admission of patients. Accordingly, this research includes a series of machine learning processes, such as data preprocessing for structure data and chief complaints data, negative terms in CCs preprocessing, imbalanced data processing, model construction by XGBoost, and model evaluation. The research results show the proposed hybrid approach can achieve 0.77 performance in terms of AUC. Furthermore, we also found that if we used a dataset selected from triage level 1 and level 5, it can achieve 0.89 in terms of AUC which is far better than previously related research results. We infer the nature of the dataset distribution can exhibit the strength of the XGBoost algorithm, thus leading to better prediction results. The results of this study will provide a reference approach in the field of emergency hospital admissions prediction and help hospitals to improve the resource allocation in emergency rooms.
Overcrowded conditions in the emergency departments (EDs) have increased patient’s waiting time, while the variety of patient’s afflictions have caused difficulties in the allocation of medical resources. Therefore, the ability to predict a patient's hospital admission at the time of triage could allocate medical resources to patients who attend the EDs in urgent need of immediate care. With the dataset from the MacKay Memorial Hospital in Taipei (Taiwan), which contains over 1 million records collected from 2011 to 2018, we aim to have on hand chief complaints (CCs), demographic data, administration information and clinical information at the triage stage to predict the probability of a patient’s hospital admission. Firstly, we integrated the word embedding approach in natural language processing, namely Word2vec, to select terms with a semantic relationship from the CCs used to predict which patients may require eventual hospitalization. We then integrated Word2vec with the ensemble learning approach, that is XGBoost, to predict the probability of admission of patients. Accordingly, this research includes a series of machine learning processes, such as data preprocessing for structure data and chief complaints data, negative terms in CCs preprocessing, imbalanced data processing, model construction by XGBoost, and model evaluation. The research results show the proposed hybrid approach can achieve 0.77 performance in terms of AUC. Furthermore, we also found that if we used a dataset selected from triage level 1 and level 5, it can achieve 0.89 in terms of AUC which is far better than previously related research results. We infer the nature of the dataset distribution can exhibit the strength of the XGBoost algorithm, thus leading to better prediction results. The results of this study will provide a reference approach in the field of emergency hospital admissions prediction and help hospitals to improve the resource allocation in emergency rooms.
Description
Keywords
主訴, 住院預測, 檢傷級數, Word2vec, XGBoost, Chief complaint, Word2vec, Prediction of Hospital Admission, Triage, XGBoost