生醫文獻中特定關係組合之自動化擷取

No Thumbnail Available

Date

2018

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

本研究目的為擷取自然語句中指定名詞間的關係判定,並應用在生醫文獻內,以便快速地找出文獻中有用途的關係。雖然本研究是透過生醫文獻為基礎,但是對於各個領域的研究人員在探討自己領域的相關文獻資料時,也可以透過此方法更快速且正確的篩選到自己需要的文獻及資料。 本研究所使用的資料集分成兩組,並在實驗上兩組資料個別獨立。一組為參考Clinical trials (https://clinicaltrials.gov)網站中提供美國官方已完成的疾病研究和藥物的配對為基礎,並透過PubMed資料庫(https://www.ncbi.nlm.nih.gov/pubmed)搜尋目標疾病藥物對的生醫文獻摘要。其資料分成兩類:從PubMed文章摘要找出含有Clinical trials所提及到的疾病可被藥物治療之句子,視為正向的句子;以及相同疾病不能被藥物治療或是疾病與藥物無任何關聯之句子,視為負向的句子。 另一組為SemEval 2013 Task 9所提供,內容為MedLine的摘要以及DrugBank的資料庫構成的語料庫,SemEval 2013 Task 9為從生醫文獻中擷取藥物間交互作用的競賽(SemEval 2013 Task 9:Extraction of Drug-Drug Interactions from Biomedical Texts),該競賽將藥物間的交互作用分成五類:Advice(建議)、Effect(影響)、Mechanism(機制)、Int(交互作用)和False(無交互作用)。 本研究為透過多層次的機器學習方法搭配基本字詞轉換與自然語言句子分析作為特徵擷取。本研究在藥物—疾病關係辨識實驗最佳結果Accuracy為75.7%、Precision為76.3%、Recall為74.6%以及F-score為75.5%;在藥物—藥物關係辨識實驗最佳結果Precision為47.8%、Recall為72.4%以及F-score為57.6%。
The objectives of this study is to extract the relationship between the specified nouns from natural language sentences and applies them in the biomedical literature to quickly find useful relationships in the literature. Although this study is based on the biomedical literature, researchers in various fields can also use this method to quickly and correctly retrieve the literature and materials they need when discussing relevant literature in their field. The data sets used in this study were divided into two parts, and the two parts of data were individually independent in the experiments. The first part is based on the official US completed disease studies and drug pairings on the Clinical trials (https://clinicaltrials.gov) website and the relevant Medline abstracts to the target disease-drug pairs is retrieved through the PubMed database (https://www.ncbi.nlm.nih. gov/pubmed). The data is divided into two categories: from the PubMed article abstracts to find the sentences containing the drugs that clinical trials mentioned the drug able to treat some specified disease, regarded as positive sentences. If the same disease can not be treated by drugs or the disease and drugs have no connection, the sentences are considered as negatives. The other part is provided by SemEval 2013 Task 9, which includes MedLine abstracts and a corpus of DrugBank's database. SemEval 2013 Task 9 is a competition for drug interactions from the biomedical literature (SemEval 2013 Task 9: Extraction Of Drug-Drug Interactions from Biomedical Texts), which divides the interactions between drugs into five categories: Advice, Effect, Mechanism, Int, and False. This study dose the feature extraction through a multi-level machine learning method with basic word conversion and natural language sentence analysis. In this study, the best results in the drug-disease relationship identification experiment were 75.7% for Accuracy, 76.3% for Precision, 74.6% for Recall, and 75.5% for F-score. The best results for the drug-drug relationship identification experiment were 47.8% precision rate, 72.4% recall rate and 57.6% F-score.

Description

Keywords

疾病—藥物關聯度, 藥物—藥物交互作用, 機器學習, 生醫文獻, Disease-Drug Association, Drug-Drug Interaction, Machine Learning, Biomedical Literature

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By