以區域為基礎的影像搜尋 — 影樣表達、比對與學習

Abstract

本論文針對以區域為基礎的影像搜尋為主題進行研究。以區域為基礎的影像搜尋將影像分割成許多區域,各區域涵括相近的主題或特徵,然後用影像區域建立影像搜尋之索引與機制。我們建構了一個以區域為基礎的影像搜尋架構,包含了特徵擷取-包含視覺性特徵與語意性特徵-來表達影像資訊,設計一套影像比對機制,然後建置一個重複學習的機制允許使用者可以回饋他們的意見。 本論文首先定義一種新的影像視覺特徵,我們稱之為顏色-大小直方圖,本特徵的主要概念是結合傳統的顏色特徵與影像區域大小的資訊,來增加影像資訊的表達能力。此外,我們設計一個視覺-文字模式的影像特徵,這種特徵並不是單純的訊號低階特徵,而是建構在低階特徵之上彙整過的中階資訊。我們同時設計一套影像的語意特徵,這種語意特徵可以透過機器學習的機制,自動擷取影像內涵的語義概念。透過這三層的特徵描述,影像表達的機制可以更加完善。 除了三個不同階層的影像特徵,本論文還設計一套重複的機器學習模式,使系統可以根據使用者回饋的資訊,判斷使用者希望搜尋的影像資訊類型。這樣的學習模是可以讓影像搜尋的結果更加精確,而彌補使用影像視覺特徵之不足。因此,由影像特徵來表達影像資訊,到設計影像比對之學習模式,本論文建構一整套以區域為基礎的影像搜尋機制。
This thesis focuses on issues of region-based image retrieval, which employs image regions, parts of an image with homogeneous subjects or features, to index and represent an image. We design a framework of region-based image retrieval involving the feature extraction-both visual and semantic-for image representation, the similarity measure for image matching and ranking, and the interactively learning scheme for estimating the user requests in relevance feedbacks. We first propose a new type of visual features, called color-size feature, which embeds region-size attributes in color features. The color-size feature does not only provide color features but also contain the structure information in an image. We also design a visual-word-based image feature that categorizes region features in visual feature spaces. The design of the visual-word-based image feature expects to yield a compact region-based image representation. However, the semantic gap between visual features and human perception is challenging in image understanding and retrieval. The users usually recognize an image by their concepts, but, unfortunately, only low-level feature vectors can be directly extracted for digital images. We try to handle the problem of semantic gap in the two ways: (i) image annotation to discover the semantic contents in images and (ii) relevance feedbacks to interactively learn what the users’ requests are. We design a hierarchical approach of image annotation such that more information with higher-level concepts can be included in the retrieval task. We employ the proposed image annotation to design a type of semantic-based image features that contains semantic information in human views. Also, we propose an interactive approach to estimating the user intention according to the positive examples in relevance feedbacks. Our proposed approach does not only consider the likelihood measure that analyzes which representing units are appropriate to represent the user intention implicit in positive examples of user feedbacks, but also involve the confusion measure that records the degree of the confusion between any two representing units. Either the proposed visual-word-based or semantic-based images feature is used to be the representing units for the user requests in our work. Therefore, we design the similarity measure of images using the estimation of the user intention for image matching and ranking.

Description

Keywords

影像搜尋, 視覺特徵, 影像注釋, 語意間隙, 相關回饋, Region-Based Image Retrieval, Visual Features, Image Annotation, Semantic Gap, Relevance Feedback

Citation

Collections