中文期刊論文資訊擷取之研究 — 以圖書資訊學領域為例

dc.contributor曾元顯zh_TW
dc.contributorTseng, Yuen-Hsienen_US
dc.contributor.author黃冠綸zh_TW
dc.contributor.authorHuang, Guan-Lunen_US
dc.date.accessioned2023-12-08T07:33:51Z
dc.date.available2023-07-20
dc.date.available2023-12-08T07:33:51Z
dc.date.issued2023
dc.description.abstract目前的科學文獻數量以相當驚人的速度在成長當中,如何將這些巨量、富 含知識的科學文獻內容從 PDF 中剖析出來,是當前相當重要的課題。然而在臺 灣鮮少看到有相關的研究,本研究的目的在於提出臺灣中文學術期刊資訊擷取 的解決方案,並以圖書資訊學領域期刊論文為例。本研究透過重新訓練開放原始碼科學文獻剖析工具 GROBID,達成擷取中 文學術期刊資訊(篇名、作者、摘要、關鍵字、具章節邏輯的內文等)的目 的,並透過十倍交叉驗證法(10 Fold Cross-Validation)來評估訓練成效。本研究 透過重新訓練後的模型剖析 725 篇台灣圖書資訊領域期刊論文,觀察與分析可 能影響剖析成功率的原因。本研究發現,三個模型(Segmentation、Header、Fulltext)在訓練資料 n = 100 與 n = 250 時, F1 score 沒有特別明顯的成長。相同期刊的論文會因為不同 年代出版而有不同的版型,這個現象對於剖析成功率有影響。本研究透過將剖析後的科學文獻內文匯入QA系統中,使得QA系統可以 回答更專業的問題,作為對剖析科學文獻後的加值利用範例。zh_TW
dc.description.abstractThe current volume of scientific literature is growing astonishingly, and the extraction of the vast amount of knowledge-rich content from scientific article PDFs has become a critical issue. However, there is a scarcity of research focusing on this area in Taiwan. This study aims to propose a solution for extracting information from Chinese academic journals in Taiwan, using the field of library and information science as an example.This study successfully extracts Chinese academic journal information by retrain- ing the open-source scientific literature parsing tool GROBID, including article titles, authors, abstracts, keywords, and structured full text with logical sections. The effectiveness of the training is evaluated using a ten-fold cross-validation method. The retrained model is applied to analyze 725 journal articles in the library and information science field in Taiwan, observing and analyzing factors that may affect the success rate of parsing.The study found that the three models (Segmentation, Header, Fulltext) did not significantly improve the F1 score when trained on n = 100 and n = 250 data samples. The variation in document layouts due to different publication years of articles within the same journal impacts the parsing success rate.Finally, we Incorporate the parsed scientific literature into a Question-Answering (QA) system, making an example of the added value of parsed scientific literature.en_US
dc.description.sponsorship圖書資訊學研究所zh_TW
dc.identifier60915003E-43482
dc.identifier.urihttps://etds.lib.ntnu.edu.tw/thesis/detail/ca7d50aaf50c66fd6701ca5130c9e0f7/
dc.identifier.urihttp://rportal.lib.ntnu.edu.tw/handle/20.500.12235/119380
dc.language中文
dc.subject資訊擷取zh_TW
dc.subject開放原始碼zh_TW
dc.subjectGROBIDzh_TW
dc.subject全文資料集zh_TW
dc.subjectInformation Extractionen_US
dc.subjectOpen Sourceen_US
dc.subjectGROBIDen_US
dc.subjectFull Text Dataseten_US
dc.title中文期刊論文資訊擷取之研究 — 以圖書資訊學領域為例zh_TW
dc.titleInformation Extraction From Chinese Scientific Article — A Case Study of Library and Information Scienceen_US
dc.typeetd

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
202300043482-105825.pdf
Size:
5.15 MB
Format:
Adobe Portable Document Format
Description:
etd

Collections