通過間接視覺語義對齊改進廣義零樣本學習的視覺表徵

dc.contributor葉梅珍zh_TW
dc.contributorYeh, Mei-Chenen_US
dc.contributor.author陳彥合zh_TW
dc.contributor.authorChen, Yan-Heen_US
dc.date.accessioned2023-12-08T08:02:42Z
dc.date.available2027-07-01
dc.date.available2023-12-08T08:02:42Z
dc.date.issued2022
dc.description.abstract我們探討廣義零樣本學習的問題,其任務是預測目標圖像的標籤,無論其標籤屬於可見類別或是未見類別。我們發現大多數方法都學習了一個聯合嵌入空間,其中圖像特徵及其相應的類原型是對齊的。由於視覺空間和語義空間之間的固有差距,這種直接對齊可能很困難。我們提出放寬對齊要求,避免在圖像和語意嵌入之間進行成對比較,來實現一個新的學習框架。我們提出的間接視覺語意對齊方法 (Soft Visual-Semantic Alignment),是通過對由精粹後的視覺特徵和目標類的類原型組成的連接特徵向量進行分類。此外我們使用圓損失(Circle Loss)來優化嵌入模型,該損失函數允許對不同的類內和類間相似性進行不同的懲罰強度。我們廣泛的實驗表明,間接對齊方式在學習區辨性和廣義視覺特徵方面更加靈活。我們證明了所提出方法的優越性,其性能與五個基準上的最新技術相當。zh_TW
dc.description.abstractWe address the problem of generalized zero-shot learning where the task is to predict the label of a target image whether its label belongs to the seen or unseen category. We find a majority of methods learn a joint embedding space where image features and their corresponding class prototypes are aligned. Such a direct alignment can be difficult, because of the inherent gap between the visual and the semantic space. We propose to relax the alignment requirement, accomplished by a learning framework that avoids performing pair-wise comparisons between the image and the class embeddings. The soft visual-semantic alignment is performed by classifying a concatenated feature vector consisting of the refined visual features and the class prototype of the target class. Furthermore, we employ circle loss to optimize the embedding model that allows different penalty strength on different within-class and between-class similarities. Our extensive experiments show that the indirect alignment manner is more flexible to learn discriminative and generalized visual features. We demonstrate the superiority of the proposed method with performance on par with the state of the art on five benchmarks.en_US
dc.description.sponsorship資訊工程學系zh_TW
dc.identifier60947074S-41548
dc.identifier.urihttps://etds.lib.ntnu.edu.tw/thesis/detail/c42929ab495fe48d8367a6aef2440c73/
dc.identifier.urihttp://rportal.lib.ntnu.edu.tw/handle/20.500.12235/121598
dc.language中文
dc.subject廣義零樣本學習zh_TW
dc.subject細粒度視覺辨識zh_TW
dc.subject視覺語義嵌入zh_TW
dc.subject間接對齊zh_TW
dc.subject圓損失函數zh_TW
dc.subjectGeneralized Zero-Shot Learningen_US
dc.subjectFine-Grained Visual Recognitionen_US
dc.subjectVisual-Semantic Embeddingen_US
dc.subjectSoft Alignmenten_US
dc.subjectCircle Lossen_US
dc.title通過間接視覺語義對齊改進廣義零樣本學習的視覺表徵zh_TW
dc.titleRefining Visual Representation for Generalized Zero-Shot Learningvia Soft Visual-Semantic Alignmenten_US
dc.typeetd

Files

Collections