葉梅珍Yeh, Mei-Chen陳彥合Chen, Yan-He2023-12-082027-07-012023-12-082022https://etds.lib.ntnu.edu.tw/thesis/detail/c42929ab495fe48d8367a6aef2440c73/http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/121598我們探討廣義零樣本學習的問題,其任務是預測目標圖像的標籤,無論其標籤屬於可見類別或是未見類別。我們發現大多數方法都學習了一個聯合嵌入空間,其中圖像特徵及其相應的類原型是對齊的。由於視覺空間和語義空間之間的固有差距,這種直接對齊可能很困難。我們提出放寬對齊要求,避免在圖像和語意嵌入之間進行成對比較,來實現一個新的學習框架。我們提出的間接視覺語意對齊方法 (Soft Visual-Semantic Alignment),是通過對由精粹後的視覺特徵和目標類的類原型組成的連接特徵向量進行分類。此外我們使用圓損失(Circle Loss)來優化嵌入模型,該損失函數允許對不同的類內和類間相似性進行不同的懲罰強度。我們廣泛的實驗表明,間接對齊方式在學習區辨性和廣義視覺特徵方面更加靈活。我們證明了所提出方法的優越性,其性能與五個基準上的最新技術相當。We address the problem of generalized zero-shot learning where the task is to predict the label of a target image whether its label belongs to the seen or unseen category. We find a majority of methods learn a joint embedding space where image features and their corresponding class prototypes are aligned. Such a direct alignment can be difficult, because of the inherent gap between the visual and the semantic space. We propose to relax the alignment requirement, accomplished by a learning framework that avoids performing pair-wise comparisons between the image and the class embeddings. The soft visual-semantic alignment is performed by classifying a concatenated feature vector consisting of the refined visual features and the class prototype of the target class. Furthermore, we employ circle loss to optimize the embedding model that allows different penalty strength on different within-class and between-class similarities. Our extensive experiments show that the indirect alignment manner is more flexible to learn discriminative and generalized visual features. We demonstrate the superiority of the proposed method with performance on par with the state of the art on five benchmarks.廣義零樣本學習細粒度視覺辨識視覺語義嵌入間接對齊圓損失函數Generalized Zero-Shot LearningFine-Grained Visual RecognitionVisual-Semantic EmbeddingSoft AlignmentCircle Loss通過間接視覺語義對齊改進廣義零樣本學習的視覺表徵Refining Visual Representation for Generalized Zero-Shot Learningvia Soft Visual-Semantic Alignmentetd