對社交標籤系統提出有效率的 Top-k 近似查詢處理方法之研究
No Thumbnail Available
Date
2012
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
本論文研究目的為針對社交標籤網站提供一個有效率的標籤集近似查詢處理系統,我們使用一個多階層的標籤集索引結構來對系統中資料物件的標籤集建立索引,此索引結構內記錄的資訊能夠估算一群資料物件與使用者所給的查詢標籤集合距離的上限及下限值,能夠篩除掉多個和查詢標籤集距離較大的標籤集,使查詢更有效率。本論文根據此索引結構設計了傑卡德距離與重疊距離兩個標籤集距離的計算方法,並提出考量語義的修正計算方法及對應的距離上下限值估算方法,實驗評估結果可顯示這些距離計算方法對近似查詢結果的有效性。本論文亦針對此索引結構提出一個有效率的 Top-k 近似查詢演算法,使用者僅需設定資料結果筆數 ( k ),可找出標籤集和給定查詢標籤集合最相似的前 k 筆資料物件。實驗結果顯示本研究提出的 Top-k 搜尋處理方法和基本方法比較,在多數情況下可有效提昇處理效率。
This thesis mainly aims to provide an efficient similar search system for tag sets on the social tagging system. We use a multi-level index structure to construct an index on the tag sets of the objects. We can estimate the upper/lower bound of the distance value between a given query tag set and the tag sets of a group of objects according to the information maintained in the index structure, which can be used to improve the efficiency of searching by applying the pruning strategies. According to the property of this index structure, we design the Jaccard distance and Overlap distance measure function for evaluating the similarity distance betwen two tag sets. Then the modified distance measure functions and the upper/lower bound estimating methods corresponding to the specific distance measure functions are proposed. The experimental results show that the effectiveness on similarity searches of the proposed distance measure functions. We also proposes an efficient top-k similar search algorithm based on the index structure. Users only need to give the number of required search results, k, the system could find the top-k most similar objects with the query tag set. The experimental results show that, in most test cases, the proposed algorithm performs better in efficiency than the baseline method.
This thesis mainly aims to provide an efficient similar search system for tag sets on the social tagging system. We use a multi-level index structure to construct an index on the tag sets of the objects. We can estimate the upper/lower bound of the distance value between a given query tag set and the tag sets of a group of objects according to the information maintained in the index structure, which can be used to improve the efficiency of searching by applying the pruning strategies. According to the property of this index structure, we design the Jaccard distance and Overlap distance measure function for evaluating the similarity distance betwen two tag sets. Then the modified distance measure functions and the upper/lower bound estimating methods corresponding to the specific distance measure functions are proposed. The experimental results show that the effectiveness on similarity searches of the proposed distance measure functions. We also proposes an efficient top-k similar search algorithm based on the index structure. Users only need to give the number of required search results, k, the system could find the top-k most similar objects with the query tag set. The experimental results show that, in most test cases, the proposed algorithm performs better in efficiency than the baseline method.
Description
Keywords
社交標籤系統, 近似距離公式評估, Top-k 近似查詢處理方法, Social tagging system, Distance measure evaluation, Top-k similar search algorithm