以混合式方法自生醫文獻擷取藥物-藥物交互作用之研究
No Thumbnail Available
Date
2017
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
一種疾病往往伴隨著許多不同的症狀,而一種症狀通常使用一種藥物治療,例如:感冒時,會有咳嗽、鼻塞或頭痛等症狀,所以就需要多種藥物來治癒該疾病。在服藥期間,若藥物與藥物之間產生不理想之狀況,像是藥效過強或互相抵抗,導致療效失敗,嚴重甚至導致死亡,就是所謂的藥物-藥物交互作用(Drug-Drug Interaction, DDI)。目前許多的藥物-藥物交互作用仍隱藏在大量的生醫文獻中,等著被研究人員挖掘,若利用自然語言處理(Natural Language Processing, NLP)的擷取和分析等技術,將能大量挖掘隱藏的藥物-藥物交互作用以及減少研究人員的挖掘時間。
論文中所使用的資料來源是由SemEval 2013 Task 9所提供的語料庫,內容包括MedLine的摘要和DrugBank的資料庫,SemEval 2013 Task 9的競賽內容為自生醫文獻中擷取藥物-藥物交互作用(SemEval 2013 Task9:Extraction of Drug-Drug Interactions from Biomedical Texts),將藥物-藥物交互作用分成五類:Advice(建議)、Effect(影響)、Mechanism(機制)、Int(交互作用)和無交互作用,評估的方式為計算辨識和分類的precision、recall和F1-measure。
本研究利用混合式方法進行辨識和分類,其中混合式方法為機器學習方法和以規則為基方法,由於語料庫內部五個類別的數量呈現不平衡的狀態,因此,運用兩階段的方式先辨識藥物對是否有交互作用存在,辨識所獲得的F1-measure為70.8%,接著再將辨識出有交互作用的藥物對做分類,分類所獲得的F1-measure為62.5%,其中FBK-irst隊伍獲得最好的效能,辨識和分類的F1-measure分別為80.0%和65.1%,參賽隊伍之平均辨識和分類的F1-measure分別為68.1%和51.8%,雖然辨識和分類無法比FBK-irst隊伍所獲得的F1-measure還高,但所獲得的F1-measure高於平均許多。在未來可將機器學習方法和以規則為基方法運用於其他領域的資訊擷取研究上。
A disease is often accompanied by many different symptoms, and a symptom is usually treated with a drug. For example, when someone gets a cold, he or she usually has symptoms such as coughing, stuffy nose or headache, so it leads to need many kinds of drugs to cure the disease. Drug-Drug Interaction (DDI) is happened during the treatment with drugs if unpredictable results are produced. It may increase or decrease the drug effect, even may cause death. At present, many Drug-Drug Interactions are still hidden in a large number of biomedical literature. It takes a lot of time to find out the DDIs for the researchers. Using Natural Language Processing (NLP) extraction and analysis technologies will be able to discover a large number of hidden DDIs and reduce the researchers’ research time. The corpus in the thesis is provided by Semeval 2013 Task 9, which includes MedLine abstracts and DrugBank database. Semeval 2013 Task 9 aims to extraction of Drug-Drug Interactions from biomedical texts, and DDIs are classified as the following five types: Advice(ADV),Effect(EFF),Mechanism(MEC),Int(INT) and non-interaction. Evaluation results will be reported using the standard precision、recall and F1-measure. This study uses the hybrid method to detect and classify DDIs. The hybrid method includes a machine learning method and a rule-based method. Because the corpus is unbalanced, the study uses two stages to complete the tasks of detection and classification. The first stage is to detect with all the classes (i.e., positive and negative), and the second stage is to make a classification on the positive DDIs (i.e., ADV, EFF, MEC, INT). The experiments show the results of 70.8% F-score in detection, and 62.5% F-score in classification. Though the performance is still worse than FBK-irst team in DDI detection and classification, the performance is higher than the average performance of all teams. In the future, we hope to use the hybrid method in other area of information extraction researches.
A disease is often accompanied by many different symptoms, and a symptom is usually treated with a drug. For example, when someone gets a cold, he or she usually has symptoms such as coughing, stuffy nose or headache, so it leads to need many kinds of drugs to cure the disease. Drug-Drug Interaction (DDI) is happened during the treatment with drugs if unpredictable results are produced. It may increase or decrease the drug effect, even may cause death. At present, many Drug-Drug Interactions are still hidden in a large number of biomedical literature. It takes a lot of time to find out the DDIs for the researchers. Using Natural Language Processing (NLP) extraction and analysis technologies will be able to discover a large number of hidden DDIs and reduce the researchers’ research time. The corpus in the thesis is provided by Semeval 2013 Task 9, which includes MedLine abstracts and DrugBank database. Semeval 2013 Task 9 aims to extraction of Drug-Drug Interactions from biomedical texts, and DDIs are classified as the following five types: Advice(ADV),Effect(EFF),Mechanism(MEC),Int(INT) and non-interaction. Evaluation results will be reported using the standard precision、recall and F1-measure. This study uses the hybrid method to detect and classify DDIs. The hybrid method includes a machine learning method and a rule-based method. Because the corpus is unbalanced, the study uses two stages to complete the tasks of detection and classification. The first stage is to detect with all the classes (i.e., positive and negative), and the second stage is to make a classification on the positive DDIs (i.e., ADV, EFF, MEC, INT). The experiments show the results of 70.8% F-score in detection, and 62.5% F-score in classification. Though the performance is still worse than FBK-irst team in DDI detection and classification, the performance is higher than the average performance of all teams. In the future, we hope to use the hybrid method in other area of information extraction researches.
Description
Keywords
藥物-藥物交互作用, 生醫文獻, 機器學習, 規則為基, Drug-Drug Interaction, Biomedical Literature, Machine Learning, Rule-based