結合統計與規則探討生醫文件疾病與基因之關係
dc.contributor | 侯文娟 | zh_TW |
dc.contributor.author | 郭博元 | zh_TW |
dc.date.accessioned | 2019-09-05T11:19:32Z | |
dc.date.available | 2019-8-14 | |
dc.date.available | 2019-09-05T11:19:32Z | |
dc.date.issued | 2014 | |
dc.description.abstract | 本研究嘗試在生醫文獻中探討基因以及疾病的關聯度,所使用的資料為孟德爾遺傳學(Online Mendelian Inheritance in Man, OMIM)網站中提供的morbid中所包含的Mendelian Inheritance in Man (MIM)文獻。在本論文中,首先從生醫文獻找出含有人類遺傳疾病與基因之句子,視為正確的句子;以及不包含疾病與基因的句子,視為錯誤的句子。然後透過Memory-Based Shallow Parser (MBSP)標記句子以取得我們需要的資訊,模擬ALEPH系統進行規則的學習,並利用這些規則在本實驗的生醫文獻中,抓取單一句子以及相鄰句子配對到的基因與疾病,再使用統計方法中驗證值減期望值所得到的Z-Score值來判斷該配對是否可以列為有效配對,接著結合一些限制條件、Rule數之多寡等因素進行其他實驗,最後以Precision、Recall以及F-Score值當作評估的標準。 | zh_TW |
dc.description.abstract | The study focuses on automatically extracting the relationships between human genetic diseases and genes from the biomedical literatures. The experimental data is retrieved from Mendelian Inheritance in Man (MIM) literatures of morbid in Online Mendelian Inheritance in Man (OMIM) database. To collect the corpus used in the research, the first step is to find the sentences that include both the related human genetic diseases and genes mentioned from the morbid file, and they are regarded as the correct sentences. In the second step, the sentences that neither have the related human genetic diseases nor the genes mentioned from the morbid file are randomly selected, and they are regarded as the incorrect sentences. Next, Memory-Based Shallow Parser (MBSP) is utilized to analyze these sentences to get some information in order to find rules in the following step. Then, some learning rules are obtained by simulating ALEPH system in the study. These generated rules are applied to catch the pairs of human genetic diseases and genes within one sentence or multi-sentences. The thesis also proposes a statistical approach, called Z-score method, to determine whether the pairs are valid or not. Finally, the experiments are made with considering some constraints and different numbers of rules. Furthermore, the evaluation metrics in the experiments are precision, recall rates, and F-scores. | en_US |
dc.description.sponsorship | 資訊工程學系 | zh_TW |
dc.identifier | GN060147036S | |
dc.identifier.uri | http://etds.lib.ntnu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dstdcdr&s=id=%22GN060147036S%22.&%22.id.& | |
dc.identifier.uri | http://rportal.lib.ntnu.edu.tw:80/handle/20.500.12235/106603 | |
dc.language | 中文 | |
dc.subject | 規則學習 | zh_TW |
dc.subject | 統計方法 | zh_TW |
dc.subject | 疾病與基因關係 | zh_TW |
dc.subject | 生物醫學文獻探勘 | zh_TW |
dc.subject | Rule learning | en_US |
dc.subject | Statistical method | en_US |
dc.subject | Gene-disease relationship | en_US |
dc.subject | Biomedical text mining | en_US |
dc.title | 結合統計與規則探討生醫文件疾病與基因之關係 | zh_TW |
dc.title | A Hybrid Method for Discovering Disease-Gene Associations from Biomedical Texts | en_US |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- n060147036s01.pdf
- Size:
- 946.24 KB
- Format:
- Adobe Portable Document Format