以機率模型為基礎之生醫文件指代消解方法
No Thumbnail Available
Date
2013
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
指代問題是自然語言中的普遍現象,隨著科技進步,生物醫學文件也需要處理指代消解問題以便擷取正確的訊息。若能解決文獻中具有指代關係的名詞片語,對於以後生醫研究人員在使用文獻上得到正確的描述會有很大的幫助,更希望透過此研究能夠加速生物醫學的發展。
在本研究中應用QA4MRE (Question Answering for Machine Reading Evaluation)提供的四篇關於阿茲海默症的生物醫學文件上進行非代名詞的指代消解,依照下列步驟擷取有意義的資訊:(1)為了得到句子的範圍,進行分句的處理,(2)為了得到句法的相關資訊,使用GDep (GENIA Dependency parser)對文件進行詞性標記,(3)為了聚集更好的特徵資訊,擷取出句子中主要的名詞以及前位修飾詞,(4)為了得到更準確的指代詞,使用規則對候選指代詞進行過濾,最後經由規則集和特徵集擷取出特徵資訊。在這篇論文中使用貝式理論的機率模型進行指代消解,應用了7種特徵值來進行實驗,實驗結果顯示precision為73.83%、recall為67.36%和F-measure為70.36%,在生醫文件的指代消解問題上屬於不錯的結果。
Anaphora is a common phenomenon in our language. With advances in technology, anaphora resolution needs to be addressed in order to retrieve the correct message in biomedical texts. Consequently, when biomedical researchers study about biomedical literatures, they can get the right description and we hope that our study can promote the speed of development of biomedical domain. In this study, we apply a statistical model for resolution of non-pronominal anaphora in biomedical texts. The following procedures are applied to extract the relevant information: (1) applying sentence splitting for boundary detection, (2) employing the part-of-speech tagging such that the syntactic information is extracted, (3) for grouping the information of features, identifying head-noun and pre-modifiers, and (4) utilizing rules to obtain correct anaphora candidates, and at last using rule sets and feature sets for extracting feature information. This thesis presents a statistical point of view for resolution of non-pronominal anaphora, and there are seven features to be used in this experiment. The experiment achieves 73.83% precision rate, and it shows good performance of anaphora resolution in biomedical texts.
Anaphora is a common phenomenon in our language. With advances in technology, anaphora resolution needs to be addressed in order to retrieve the correct message in biomedical texts. Consequently, when biomedical researchers study about biomedical literatures, they can get the right description and we hope that our study can promote the speed of development of biomedical domain. In this study, we apply a statistical model for resolution of non-pronominal anaphora in biomedical texts. The following procedures are applied to extract the relevant information: (1) applying sentence splitting for boundary detection, (2) employing the part-of-speech tagging such that the syntactic information is extracted, (3) for grouping the information of features, identifying head-noun and pre-modifiers, and (4) utilizing rules to obtain correct anaphora candidates, and at last using rule sets and feature sets for extracting feature information. This thesis presents a statistical point of view for resolution of non-pronominal anaphora, and there are seven features to be used in this experiment. The experiment achieves 73.83% precision rate, and it shows good performance of anaphora resolution in biomedical texts.
Description
Keywords
指代消解, 自然語言處理, 貝式理論, 機率模型, anaphora resolution, natural language processing, Bayes' theorem, probabilistic model