開放取用學術資源的自動化擷取系統實作：以臺灣人文社會領域期刊資料為例

余宗翰; Yu, Zong-Han

開放取用學術資源的自動化擷取系統實作：以臺灣人文社會領域期刊資料為例

dc.contributor	曾元顯	zh_TW
dc.contributor	Tseng, Yuen-Hsien	en_US
dc.contributor.author	余宗翰	zh_TW
dc.contributor.author	Yu, Zong-Han	en_US
dc.date.accessioned	2025-12-09T07:37:20Z
dc.date.available	2025-06-25
dc.date.issued	2025
dc.description.abstract	本研究旨在建置一套可穩定運作且具高度擴展性的自動化學術資源擷取系統，針對臺灣人文社會領域開放取用期刊進行抓取。現行國內引文索引資料庫多仰賴人工建檔與維護，導致資料更新與整合流程費時費力；而開放取用平台則受限於期刊端主動上架與維運意願，造成資料時效與涵蓋範圍不足，進而影響學術資源的可用性與知識庫建構的穩定性。為此，本研究設計並實作「Social and Theoretical Academic Repository（STAR）」系統，結合 Scrapy 爬蟲框架與 Docker 容器化部署技術，整合 MySQL、Redis、Django、Playwright、FTP 等模組，建立排程式爬取、結構化檔案儲存與網頁式管理操作的自動化平台。系統具備網頁式管理介面，支援管理者透過 Django 後台調整排程或即時執行爬蟲任務，亦提供 FTP 批次下載功能供使用者取得期刊全文檔案。系統完成部署後共建置 46 支期刊爬蟲模組，成功擷取 17,865 篇 PDF 文章檔案，總容量達 79 GB。比較首次與後續爬取平均耗時，整體處理效率提升 73.4 %，顯示系統具備長期穩定運作與低維運負擔的特性。本研究驗證了以模組化容器架構整合開源爬蟲技術，能有效支援多網站資料擷取與期刊資料彙整之需求，並為後續文本生成、語意比對與知識問答等應用場景，提供可重複使用之期刊資料擷取基礎。未來可進一步結合語意嵌入與文本分析工具，拓展資料加值應用場景。	zh_TW
dc.description.abstract	This study aims to develop a stable and highly scalable automated system for extracting open-access academic resources, specifically targeting journals in the humanities and social sciences in Taiwan. Existing domestic citation index databases rely heavily on manual curation and maintenance, resulting in time-consuming and labor-intensive update and integration processes. In addition, open access platforms are constrained by the willingness of journal publishers to actively upload and maintain content, leading to issues of timeliness and coverage, which ultimately affect the usability of academic resources and the stability of knowledge base construction.To address these challenges, this study designs and implements the “Social and Theoretical Academic Repository” (STAR) system. The system integrates the Scrapy web crawling framework with Docker-based container deployment, combining MySQL, Redis, Django, Playwright, and FTP modules to create an automated platform featuring scheduled crawling, structured file storage, and web-based administrative control.The system provides a web interface that allows administrators to adjust schedules or execute crawlers in real time via the Django backend. It also offers FTP-based batch downloading for users to access full-text journal PDFs. Upon deployment, 46 crawler modules were implemented, successfully harvesting 17,865 PDF articles with a total data volume of 79 GB. A comparison of initial and subsequent crawls showed a 73.4 % improvement in processing efficiency, demonstrating the system's long-term stability and low maintenance cost. This study confirms that integrating open-source crawling technologies within a modular container architecture effectively supports multi-site data extraction and journal metadata aggregation. The system also provides a reusable infrastructure for future applications in text generation, semantic similarity analysis, and knowledge-based question answering. Further development may incorporate semantic embedding and text analysis techniques to enhance value-added applications.	en_US
dc.description.sponsorship	圖書資訊學研究所圖書資訊學數位學習碩士在職專班	zh_TW
dc.identifier	012153203-47248
dc.identifier.uri	https://etds.lib.ntnu.edu.tw/thesis/detail/71588f271e3f9d6f9be2a38298691a90/
dc.identifier.uri	http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/124504
dc.language	中文
dc.subject	資料擷取	zh_TW
dc.subject	開放取用	zh_TW
dc.subject	網頁爬蟲	zh_TW
dc.subject	Scrapy	zh_TW
dc.subject	Docker	zh_TW
dc.subject	容器化	zh_TW
dc.subject	Data Extraction	en_US
dc.subject	Open Access	en_US
dc.subject	Web Crawling	en_US
dc.subject	Scrapy	en_US
dc.subject	Docker	en_US
dc.subject	Containerization	en_US
dc.title	開放取用學術資源的自動化擷取系統實作：以臺灣人文社會領域期刊資料為例	zh_TW
dc.title	Design and Implementation of an Automated System for Extracting Open Access Academic Resources: A Case Study of Taiwanese Journals in the Humanities and Social Sciences	en_US
dc.type	專業實務報告（專業實務類）

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 202500047248-109609.pdf
Size:: 3.13 MB
Format:: Adobe Portable Document Format
Description:: 專業實務報告（專業實務類）

Download

Collections

學位論文