結合統計與規則探討生醫文件疾病與基因之關係

郭博元

結合統計與規則探討生醫文件疾病與基因之關係

dc.contributor	侯文娟	zh_TW
dc.contributor.author	郭博元	zh_TW
dc.date.accessioned	2019-09-05T11:19:32Z
dc.date.available	2019-8-14
dc.date.available	2019-09-05T11:19:32Z
dc.date.issued	2014
dc.description.abstract	本研究嘗試在生醫文獻中探討基因以及疾病的關聯度，所使用的資料為孟德爾遺傳學(Online Mendelian Inheritance in Man, OMIM)網站中提供的morbid中所包含的Mendelian Inheritance in Man (MIM)文獻。在本論文中，首先從生醫文獻找出含有人類遺傳疾病與基因之句子，視為正確的句子；以及不包含疾病與基因的句子，視為錯誤的句子。然後透過Memory-Based Shallow Parser (MBSP)標記句子以取得我們需要的資訊，模擬ALEPH系統進行規則的學習，並利用這些規則在本實驗的生醫文獻中，抓取單一句子以及相鄰句子配對到的基因與疾病，再使用統計方法中驗證值減期望值所得到的Z-Score值來判斷該配對是否可以列為有效配對，接著結合一些限制條件、Rule數之多寡等因素進行其他實驗，最後以Precision、Recall以及F-Score值當作評估的標準。	zh_TW
dc.description.abstract	The study focuses on automatically extracting the relationships between human genetic diseases and genes from the biomedical literatures. The experimental data is retrieved from Mendelian Inheritance in Man (MIM) literatures of morbid in Online Mendelian Inheritance in Man (OMIM) database. To collect the corpus used in the research, the first step is to find the sentences that include both the related human genetic diseases and genes mentioned from the morbid file, and they are regarded as the correct sentences. In the second step, the sentences that neither have the related human genetic diseases nor the genes mentioned from the morbid file are randomly selected, and they are regarded as the incorrect sentences. Next, Memory-Based Shallow Parser (MBSP) is utilized to analyze these sentences to get some information in order to find rules in the following step. Then, some learning rules are obtained by simulating ALEPH system in the study. These generated rules are applied to catch the pairs of human genetic diseases and genes within one sentence or multi-sentences. The thesis also proposes a statistical approach, called Z-score method, to determine whether the pairs are valid or not. Finally, the experiments are made with considering some constraints and different numbers of rules. Furthermore, the evaluation metrics in the experiments are precision, recall rates, and F-scores.	en_US
dc.description.sponsorship	資訊工程學系	zh_TW
dc.identifier	GN060147036S
dc.identifier.uri	http://etds.lib.ntnu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dstdcdr&s=id=%22GN060147036S%22.&%22.id.&
dc.identifier.uri	http://rportal.lib.ntnu.edu.tw:80/handle/20.500.12235/106603
dc.language	中文
dc.subject	規則學習	zh_TW
dc.subject	統計方法	zh_TW
dc.subject	疾病與基因關係	zh_TW
dc.subject	生物醫學文獻探勘	zh_TW
dc.subject	Rule learning	en_US
dc.subject	Statistical method	en_US
dc.subject	Gene-disease relationship	en_US
dc.subject	Biomedical text mining	en_US
dc.title	結合統計與規則探討生醫文件疾病與基因之關係	zh_TW
dc.title	A Hybrid Method for Discovering Disease-Gene Associations from Biomedical Texts	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: n060147036s01.pdf
Size:: 946.24 KB
Format:: Adobe Portable Document Format

Download

Collections

學位論文