開放領域中文問答系統之建置與評估

dc.contributor曾元顯zh_TW
dc.contributorTseng, Yuen-Hsienen_US
dc.contributor.author楊平zh_TW
dc.contributor.authorYang, Pingen_US
dc.date.accessioned2022-06-08T03:01:20Z
dc.date.available2021-12-31
dc.date.available2022-06-08T03:01:20Z
dc.date.issued2021
dc.description.abstract近年來隨著人工智慧技術日新月異,答案抽取式機器閱讀理解模型在 SQuAD 等資料集上已可超出人類的表現。而基於機器閱讀理解模型,加入了文章庫以及文件檢索器的問答系統架構,亦取得良好的成績。然而這樣子的資料集測試成效於實際應用上,可以達到什麼樣的效果是本研究好奇的問題。本研究主要進行了兩個任務,第一個為開發並比較不同的問答系統實作方式,以資料集自動化測試的方式評估何種實作方式的成效最好。第二個為將自動化測試表現最好的問答系統,交由受試者進行測試,並對實驗結果進行分析。最終得到的結果有四個。第一,本研究以中文維基百科做為文章庫;以Elasticsearch作為文件檢索器;以Bert-Base Chinese作為預訓練模型,並以DRCD資料集進行訓練的Sentence Pair Classification模型作為文件重排序器;以MacBERT-large作為預訓練模型,並以DRCD加上CMRC 2018資料集進行訓練的答案抽取式機器閱讀理解模型,作為文件閱讀器。此問答系統架構可以在Top 10取得本研究實驗的所有系統當中最好的成效,以DRCD Test set加上CMRC 2018 Dev set進行測試,得到的分數為F1 = 71.355,EM = 55.17。第二,本研究招募33位受試者,總計對系統進行了289道題目的測試,最終的成果為,在Top 10的時候有70.24%的問題能被系統回答,此分數介於自動化測試的F1與EM之間,代表自動化測試與使用者測試所得到的結果是相似的。第三,針對29.76%無法得到答案的問題進行分析,得到的結論是,大部分無法回答的原因是因為無法從文件庫中檢索正確的文章。第四,Top 1可回答的問題佔所有問題中的26.3%,而Top 2 ~ 10的佔比為43.94%。代表許多問題並非系統無法得出解答,而是排序位置不正確,若能建立更好的答案排序機制,將能大幅提升問答系統的實用性。zh_TW
dc.description.abstractWith rapid development, artificial intelligence has surpassed human performance in span-extraction machine reading comprehension datasets such as SQuAD. And based on this achievement, question answering system architecture with documents collection, document retriever, and document reader, have also been achieved a good result. However, will the system gets similar results in the real world? That is a question our research curious about.Our research has two tasks. The first one is to develop and compare different QA system implementations using datasets. Second, we ask users to test the QA system and analysis the results.Finally, we got four results. First, with the QA system architecture of Chinese Wikipedia as the documents collection; Elasticsearch as the document retriever; Sentence Pair Classification model trained on DRCD dataset with Bert-Base Chinese pre-training model, as the document re-ranker; Span-extraction machine reading comprehension model trained on DRCD and CMRC 2018 dataset, with MacBERT-large pre-trained model, as the document reader. This architecture has achieved the best Top 10 result among all the systems tested in our research. The score is F1 = 71.355, EM = 55.17 in Top 10, tested on DRCD Test set and CMRC 2018 Dev set.Second, this study recruited 33 users and those users tested the system with 289 questions. The result is, 70.24% of the questions can be answered by the system in the Top 10. This score is between the F1 and EM scores of datasets testing, it means that the results of datasets testing and user testing are similar.Third, we analyzed 29.76% of the unanswered questions and find that most of the reasons were because the correct document could not be retrieved from the documents collection.Fourth, Top 1 answerable questions take 26.3% of all questions, while Top 2 ~ 10 take 43.94%. This means lots of questions are answerable if the sorting is correct. The practicality of the QA system will be greatly improved if a better answer sorting mechanism can be performed.en_US
dc.description.sponsorship圖書資訊學研究所zh_TW
dc.identifier60815008E-39805
dc.identifier.urihttps://etds.lib.ntnu.edu.tw/thesis/detail/4dd33ea0ba5bf376dfa4eb061a2af228/
dc.identifier.urihttp://rportal.lib.ntnu.edu.tw/handle/20.500.12235/118327
dc.language中文
dc.subject中文開放領域問答系統zh_TW
dc.subject問答系統使用者測試zh_TW
dc.subject機器閱讀理解zh_TW
dc.subject深度學習zh_TW
dc.subject人工智慧zh_TW
dc.subjectChinese Open-Domain Question Answering Systemen_US
dc.subjectUser Testing of Question Answering Systemen_US
dc.subjectMachine Reading Comprehensionen_US
dc.subjectDeep Learningen_US
dc.subjectArtificial Intelligenceen_US
dc.title開放領域中文問答系統之建置與評估zh_TW
dc.titleDevelopment and Evaluation of Chinese Open-Domain Question Answering Systemen_US
dc.type學術論文

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
60815008E-39805.pdf
Size:
3.82 MB
Format:
Adobe Portable Document Format
Description:
學術論文

Collections