社群資料對圖書搜尋系統效能之研究

No Thumbnail Available

Date

2014

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

隨著Web 2.0的風潮,社群資料(Social Data)被廣泛應用於各類型的網站,其中網路書店、網路書櫃等書目社群網站迅速累積了大量由使用者產生的社群資料。而INEX (INitiative for the Evaluation of XML retrieval)自2011年開始自Amazon、LibraryThing搜集整理包含社群資料的書目資料,並做為圖書搜尋任務之測試資料集。 本研究利用實驗法以INEX 2013圖書與社群搜尋任務的測試資料集進行圖書搜尋實驗,並探究不同欄位對搜尋結果以及應用社群資料重新排序結果之影響。在實驗中分別以傳統書目資料、社群資料和兩者合併的資料製作索引,並以社群資料將搜尋結果重新排序。主要之研究結果如下: 1. 運用社群資料在機率模型的圖書搜尋,比目前圖書館使用的傳統書目資料,可以得到更好的檢索效能。 2. 社會評論資料(Review)在機率模型的檢索之中,可以得到最好的結果。 3. 社會標記(Tag)的資料在機率模型的檢索之中,與傳統書目資料並無明顯的差異,但是以被標記次數做為權重調整之後,其檢索效能提升270%,明顯高於未權重調整前之結果,僅次於社會評論資料索引。 4. 使用社會評論將圖書搜尋結果重新排序,可以得到本研究中最好的檢索結果,可以提升3.1%的nDCG分數。 5. 使用社會標記將圖書搜尋結果重新排序,其結果不如使用社會評論重新排序的結果,但是其對圖書搜尋效能可以最高提升25%的nDCG分數。 前述之研究結果可進一步應用於資訊系統的設計,包含圖書搜尋、推薦系統,期使讀者有更好的使用者經驗。
With the proliferation of Web 2.0, social tag is widely used in various applications. Online bookstores (like Amazon) and online bibliographic community Websites (like LibraryThing) have quickly accumulated a large amount of user-generated information. INEX (INitiative for the Evaluation of XML retrieval) have been using the Amazon/LibraryThing corpus for its Social Book Search Track since 2011. The purpose of the INEX Social Book Search Track is to develop novel algorithms leveraging professional metadata and user-generated metadata for effectively retrieving books. This thesis uses INEX 2013 Social Book Search Track test data set to conduct book search experiments and evaluate the retrieval results. Indices based on professional metadata, user-generated metadata and both are created respectively. The results of this study are summarized as follows:  Using social data in the probabilistic retrieval model for Book Search outperforms using traditional bibliographic data.  Using all book data including reviews in the probabilistic retrieval model for Book Search can get the best retrieval performance.  Using social tag information in the probabilistic retrieval model for Book Search has no significant difference with traditional bibliographic data, but using the number of times a tag used as weight to retrieval can improve the retrieval performance.  Using reviews data for re-ranking can achieve the best search results in this study; it can improve 3.1% of the nDCG scores.  Using tag data for reranking can improve 25% of the nDCG score. Practically, the results of this thesis can be used as a clue for the design of a book search system and a book recommendations system.

Description

Keywords

圖書搜尋, 社會標記, 社群資料, 搜尋引擎, Book Search, Social Tag, Social Data, Search Engine

Citation

Collections