利用相鄰句子資訊探討人類疾病與基因之關係

dc.contributor侯文娟zh_TW
dc.contributorWen-Juan Houen_US
dc.contributor.author劉宇錚zh_TW
dc.contributor.authorYu-Jeng Liuen_US
dc.date.accessioned2019-09-05T11:17:46Z
dc.date.available2016-8-7
dc.date.available2019-09-05T11:17:46Z
dc.date.issued2013
dc.description.abstract本研究嘗試在生醫文獻中找出人類遺傳疾病與基因的關聯度,並在人類遺傳疾病及基因之間得到一些規則或關聯性。若能自動從文獻中預測疾病與基因能達到某種程度的相關性,對於以後生醫研究人員在探討人類遺傳疾病與基因等等的文獻資料時,相信都可以利用此關聯性或規則快速了解兩者之間的關係,達到快速閱讀的目的,在節省人力成本及時間之餘,更希望透過此研究能加速生物醫學的發展速度。 本研究使用的資料為孟德爾遺傳學(Online Mendelian Inheritance in Man, OMIM)網站中提供的morbid所中包含的Mendelian Inheritance in Man (MIM)文獻。在本研究中,首先在文獻中找出含有morbid所提及的人類遺傳疾病與基因共存的句子,視為正確的句子;以及不包含morbid所提及的疾病與基因的句子,視為不正確的句子。透過Memory-Based Shallow Parser (MBSP)來分析這些段落中的句子,將會得到句子文法相關的資訊(例如詞性),接著將MBSP標記好的句子利用自製的學習系統學習規則,在學習前需要準備三個檔案,第一個檔案需要寫入規則的模式、句子的詳細資訊與規則所需的元素,本實驗所需的元素為SVO-relation,表示主詞-動詞-受詞之間的關係;第二個檔案是在學習規則時用到的正確句子的編號;第三個檔案是在學習規則時用到的不正確的句子。利用這些資料訓練出的規則,再加入本論文所提出的多重句子探勘演算法,以便擴展原有規則的結果而得到新的關係。最後,對於實驗結果產生出來的人類遺傳疾病與基因,本研究以準確度和回收率當作評估的標準,並記錄各個門檻值的結果。實驗在多重句子探勘得到最好的F-score為72.18%,此時的準確度為72.66%,回收率為71.71%;而未使用多重句子探勘得到最好的F-score為67.32%,此時的準確度為76.29%,回收率為60.24%。zh_TW
dc.description.abstractIn this study, we automatically find relations between human genetic diseases and genes from biomedical literatures. Thus, we can get some rules or relations between human genetic diseases and genes after mining biomedical literatures. Consequently, when biomedical researchers study about biomedical literatures between human genetic diseases and genes, they can understand the relations between diseases and genes by using the rules or the correlation that we proposed. Not only saving human resource cost and time, but also achieving the purpose of fast reading the literatures, we hope that our study can promote the speed of development of biomedical domain. We use data provided by Mendelian Inheritance in Man (MIM) literatures of morbid from Online Mendelian Inheritance in Man (OMIM) database. We first find the paragraphs that include both the related human genetic diseases and genes mentioned in the morbid file and regard them as correct paragraphs. Then we find other paragraphs and reference as to incorrect paragraphs. After that, we use Memory-Based Shallow Parser (MBSP) to analyze the sentences so that we get the syntactic information such as parts of speech. To learn the rules need to be prepared three files, one file is rules pattern, sentences information and some elements of SVO-relation, SVO-relation is the relation of subject, verb and object. Second one is the number of correct sentences in learning rules. Third one is the number of incorrect sentences in learning rules. Using these rules, we then apply some multi-sentence mining algorithms to extend our results. At last we use precision and recall rates as the evaluation metrics in the experiments and record the results of all thresholds. The experiment’s results showed that the best F-score is 72.18% where the precision is 72.66% and the recall is 71.71% with Multi-Sentences Mining algorithm. And we get the best F-score is 67.32% where the precision is 76.29% and the recall is 60.24% without Multi-Sentences Mining algorithm.en_US
dc.description.sponsorship資訊工程學系zh_TW
dc.identifierGN060047061S
dc.identifier.urihttp://etds.lib.ntnu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dstdcdr&s=id=%22GN060047061S%22.&%22.id.&
dc.identifier.urihttp://rportal.lib.ntnu.edu.tw:80/handle/20.500.12235/106569
dc.language中文
dc.subject規則學習zh_TW
dc.subject疾病與基因關係zh_TW
dc.subject生物醫學文獻探勘zh_TW
dc.subjectrule learningen_US
dc.subjectgene-disease relationshipen_US
dc.subjectbiomedical text miningen_US
dc.title利用相鄰句子資訊探討人類疾病與基因之關係zh_TW
dc.titleUsing Adjacent Sentences Information for Finding Relationship between Diseases and Genesen_US

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
n060047061s01.pdf
Size:
818.3 KB
Format:
Adobe Portable Document Format

Collections