結合統計與規則探討生醫文件疾病與基因之關係

dc.contributor侯文娟zh_TW
dc.contributor.author郭博元zh_TW
dc.date.accessioned2019-09-05T11:19:32Z
dc.date.available2019-8-14
dc.date.available2019-09-05T11:19:32Z
dc.date.issued2014
dc.description.abstract本研究嘗試在生醫文獻中探討基因以及疾病的關聯度,所使用的資料為孟德爾遺傳學(Online Mendelian Inheritance in Man, OMIM)網站中提供的morbid中所包含的Mendelian Inheritance in Man (MIM)文獻。在本論文中,首先從生醫文獻找出含有人類遺傳疾病與基因之句子,視為正確的句子;以及不包含疾病與基因的句子,視為錯誤的句子。然後透過Memory-Based Shallow Parser (MBSP)標記句子以取得我們需要的資訊,模擬ALEPH系統進行規則的學習,並利用這些規則在本實驗的生醫文獻中,抓取單一句子以及相鄰句子配對到的基因與疾病,再使用統計方法中驗證值減期望值所得到的Z-Score值來判斷該配對是否可以列為有效配對,接著結合一些限制條件、Rule數之多寡等因素進行其他實驗,最後以Precision、Recall以及F-Score值當作評估的標準。zh_TW
dc.description.abstractThe study focuses on automatically extracting the relationships between human genetic diseases and genes from the biomedical literatures. The experimental data is retrieved from Mendelian Inheritance in Man (MIM) literatures of morbid in Online Mendelian Inheritance in Man (OMIM) database. To collect the corpus used in the research, the first step is to find the sentences that include both the related human genetic diseases and genes mentioned from the morbid file, and they are regarded as the correct sentences. In the second step, the sentences that neither have the related human genetic diseases nor the genes mentioned from the morbid file are randomly selected, and they are regarded as the incorrect sentences. Next, Memory-Based Shallow Parser (MBSP) is utilized to analyze these sentences to get some information in order to find rules in the following step. Then, some learning rules are obtained by simulating ALEPH system in the study. These generated rules are applied to catch the pairs of human genetic diseases and genes within one sentence or multi-sentences. The thesis also proposes a statistical approach, called Z-score method, to determine whether the pairs are valid or not. Finally, the experiments are made with considering some constraints and different numbers of rules. Furthermore, the evaluation metrics in the experiments are precision, recall rates, and F-scores.en_US
dc.description.sponsorship資訊工程學系zh_TW
dc.identifierGN060147036S
dc.identifier.urihttp://etds.lib.ntnu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dstdcdr&s=id=%22GN060147036S%22.&%22.id.&
dc.identifier.urihttp://rportal.lib.ntnu.edu.tw:80/handle/20.500.12235/106603
dc.language中文
dc.subject規則學習zh_TW
dc.subject統計方法zh_TW
dc.subject疾病與基因關係zh_TW
dc.subject生物醫學文獻探勘zh_TW
dc.subjectRule learningen_US
dc.subjectStatistical methoden_US
dc.subjectGene-disease relationshipen_US
dc.subjectBiomedical text miningen_US
dc.title結合統計與規則探討生醫文件疾病與基因之關係zh_TW
dc.titleA Hybrid Method for Discovering Disease-Gene Associations from Biomedical Textsen_US

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
n060147036s01.pdf
Size:
946.24 KB
Format:
Adobe Portable Document Format

Collections