有效率探勘社交標籤系統中前k名擴展查詢字集之研究

No Thumbnail Available

Date

2012

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

本論文考慮資料物件具有標籤集且含有評分資訊的社交標籤系統,當給定一個查詢找出標籤集中包含所有查詢字的物件資料,本論文方法將對這些物件資料所包含的標籤進行處理,找出可用性分數值最高的前k名擴展查詢字集,且每一個擴展查詢字集都能找到指定數量以上的資料物件。本論文提出的方法分成從查詢結果挑選具代表標籤,以及有效率地探勘前k名擴展查詢字集兩部分。首先,我們運用兩種挑選具代表性標籤的方法:平均差異性及新穎性,計算標籤的代表分數,選定代表分數最高的n個標籤為代表性標籤。接下來,本論文採用一個稱為UT-tree的樹狀結構,用來儲存可形成擴展查詢字集的標籤集資訊。我們提出一個稱為UT-growth的演算法,可從UT-tree中有效率找出可用性最高的前 名擴展查詢字集,且不會產生過多不必要檢查的擴展查詢字集。此外,我們運用動態估算一個擴展查詢字集可用性分數的上限值和下限值的概念,提出一個動態建立UT-tree的方法,並提出dynamic UT-growth演算法,動態更新所找到可用性前 名的擴展查詢字集,若第 名擴展查詢字集的下限值比第 名的擴展查詢字集上限值高,即可提前結束UT-tree的建立及探勘。實驗結果證實先採用代表標籤選取比未採用代表標籤選取,其找出的擴展查詢字集可達到較好的查詢效果。此外,實驗結果顯示UT-growth演算法比相關方法有較好的執行效率,且Dynamic UT-growth演算法在多數情況可提供比UT-growth演算法更有效率的處理。
In this thesis, we consider the social tagging systems in which each object contains a tag set and a corresponding score. After giving a query to find all the objects whose tagsets contain the query keywords, our goal is to find the top-k utility of query expansions from the tags of the objects such that, each query expansion can find the required number of objects. The proposed methods consist of two parts: to select representative tags from query results and to efficiently discover the top-k query expansions. Firstly, we apply two different functions, AveDiversity and Novelty, to compute the representative score of each tag. Then the tags with the top-n highest scores are selected as the representative tags. Next, we design a tree structure called UT-tree which is used to store the tag sets of objects and their corresponding information for generating query expansions. We propose an algorithm called UT-growth, which can efficiently discover out the top-k utility query expansions from the UT-tree by prevent from generating unnecessary query expansions for checking. In addition, by dynamically estimating the upper bound and lower bound of the utility for a query expansion, we provide a dynamic approach to construct UT-tree and propose the Dynamic UT-growth algorithm. This approach dynamically updates the top-k utility of the query expansions. Accordingly, when the utility lower bound of the kth query expansion is larger than the utility upper bound of the (k+1)th query expansion, the construction and mining process on the UT-tree can be early terminated. The experimental results show that finding representative tags before mining top-k query expansions can improve the searching effectiveness of the discovered top-k query expansions. Furthermore, the UT-growth algorithm has better performance on efficiency than the related method, and the Dynamic UT-growth algorithm can provide even more efficient processing than the UT-growth algorithm in most cases.

Description

Keywords

社交標籤系統, 前k名擴展查詢字集, 動態樹狀結構, social-tagging system, top-k query expansion, dynamic tree structure

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By