基於借閱目的之資料清理機制研究 -以興趣目的為例

No Thumbnail Available

Date

2010

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

研究人員經常使用現實社會中的資料進行研究分析,但這些資料通常存在些許問題,如此將可能降低資料分析的效率,甚至產生錯誤的結果。圖書館經常藉由分析讀者的歷史借閱紀錄作為提供各項服務之依據,但過去在分析前並未考量讀者的借閱目的進行清理。歷史借閱紀錄大多包含一個以上的借閱目的,若在分析前未依借閱目的進行清理,極可能產生錯誤的結果。 本研究透過考量讀者借閱目的中的興趣目的,設計啟發式清理機制,嘗試去除讀者歷史借閱紀錄中的非興趣紀錄,並透過F-Measure評估清理結果,歸納出合適的清理方法與屬性。此外,本研究透過調整各清理機制的參數,嘗試進行個人化清理,以瞭解個人化清理的步驟與流程。 由研究結果可知,讀者的歷史借閱紀錄無法輕易地依據興趣借閱目的進行清理,但可嘗試透過群集分析的E-M演算法,並使用「第三層分類號、借閱日、作者」屬性組合來進行清理。在個人化清理方面,透過調整參數可獲得更佳的清理結果。此外,若使用F-Measure評估清理結果,讀者的原始興趣比越高,其清理難度也越高。
Researchers often use statistics from previous events to serve as a basis for analysis, but the acquired data usually has its problems, which in turn may reduce the efficiency of the researcher’s analysis or even create erroneous results. Libraries often analyze the patron’s borrowing history in order to adjust and improve its services, but often does not consider the patron’s purpose behind borrowing his or her information from the library. Most patrons have several reasons behind their borrowings, and it is may create erroneous results if we don’t clean it before analyzing. In this paper we analyze the effectiveness of a heuristic data-cleaning approach to remove the areas of non-interest in the patron’s historical loan record. Meanwhile, we also use F-Measure analysis to evaluate the results in order to suggest suitable cleaning methods. In addition, personal cleaning processes for patrons is implemented by adjusting the parameters of the clean-up mechanisms. From the study results, the patron’s borrowing history cannot be easily cleaned based on interest purposes, but you can attempt to clean the data by the E-M algorithm using cluster analysis, and use the properties of third tier classification: number, loan date, and author. Using personal cleaning, it is concluded that adjustments in the parameters could produce more satisfying results. In addition, if use F-Measure, more interesting parts in the patron’s borrowing history, the cleaning process will be more difficult.

Description

Keywords

資料清理, 書目探勘, F-Measure, Data cleaning, Bibliomining, F-Measure

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By