融入文件關聯與查詢清晰度資訊於虛擬關聯回饋之研究
No Thumbnail Available
Date
2014
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
虛擬關連回饋技術能透過虛擬關聯文件選取進行有效虛擬關聯文件以查詢重組,並用於資訊檢索系統中。大部分的資訊檢索系統是簡單的基於初步檢索結果所得到的查詢與文件之關聯分數來挑選用於查詢重組之虛擬關聯文件。故本論文藉由同時考慮文件間之關聯以及查詢與文件間之關聯來進行虛擬關聯文件之選取,而馬可夫隨機漫步(Markov Random Walk)概念之利用,能讓我們對前面所述的關係加以估測,並找到更佳之虛擬關聯文件。在關聯文件選取完成後,基於使用在資訊檢索的查詢模型上,我們亦探討如何有效的將原始查模型與利用虛擬關聯文件資訊之新查詢模型加以結合,而結合之權重則是以所謂的查詢清晰度決定。本論文中之實驗驗證主要進行於Topic Detection and Tracking collection (TDT-2)、Topic Detection and Tracking collection (TDT-3)以及Wall Street Journal (WSJ)語料庫上,而實驗結果顯示本論文所提出之虛擬關聯回饋之各類改進方法能夠提升資訊檢索之效能。
Pseudo-relevant document selection figures prominently in query reformulation with pseudo-relevance feedback (PRF) for an information retrieval (IR) system. Most of conventional IR systems select pseudo-relevant documents for query reformulation simply based on the query-document relevance scores returned by the initial round of retrieval. In this thesis, we propose a novel method for pseudo-relevant document selection that considers not only the query-document relevance scores but also the relatedness cues among documents. To this end, we adopt and formalize the notion of Markov random walk (MRW) to glean the relatedness cues among documents, which in turn can be used in concert with the query-document relevance scores to select representative documents for PRF. Furthermore, on top of the language modeling (LM) framework for IR, we also investigate how to effectively combine the original query model and new query model estimated from the selected pseudo-relevant documents in a more effective manner by virtue of the so-called query clarity measure. A series of experiments conducted on both the TDT (Topic Detection and Tracking) collection and the WSJ (Wall Street Journal) collection seem to demonstrate the performance merits of our proposed methods.
Pseudo-relevant document selection figures prominently in query reformulation with pseudo-relevance feedback (PRF) for an information retrieval (IR) system. Most of conventional IR systems select pseudo-relevant documents for query reformulation simply based on the query-document relevance scores returned by the initial round of retrieval. In this thesis, we propose a novel method for pseudo-relevant document selection that considers not only the query-document relevance scores but also the relatedness cues among documents. To this end, we adopt and formalize the notion of Markov random walk (MRW) to glean the relatedness cues among documents, which in turn can be used in concert with the query-document relevance scores to select representative documents for PRF. Furthermore, on top of the language modeling (LM) framework for IR, we also investigate how to effectively combine the original query model and new query model estimated from the selected pseudo-relevant documents in a more effective manner by virtue of the so-called query clarity measure. A series of experiments conducted on both the TDT (Topic Detection and Tracking) collection and the WSJ (Wall Street Journal) collection seem to demonstrate the performance merits of our proposed methods.
Description
Keywords
虛擬關聯回饋, 虛擬關聯文件選取, 馬可夫隨機漫步, 查詢清晰度, 查詢模型, pseudo-relevance feedback, pseudo-relevant document selection, Markov random walk, query clarity, query model