具泛化能力的槽位表示之自監督學習
No Thumbnail Available
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
自監督的發展在近年來受到了巨大的關注,其中對比式學習透過拉近同一圖像的另一視圖並推遠來自其他圖像的視圖,從而從未具標註的資料中學習表徵。近年以場景為主的影像資料集也開始被使用於預訓練,並且著重局部的學習可以在場景資料集表現得更好,這些方法大多依賴密集的匹配機制或是透過Selective Search找出可能物件,最近的方法透過將像素進行分群學習(稱作槽位),將相同語義的像素分配到同一槽位內,並讓習得的槽位可以隨著資料進行調整。我們發現全局的增強方法無法針對槽位調整,因此,我們提出了一種局部的特徵增強方法,透過對每個槽位進行特徵級別的增強,使槽位可以學習到資料的更多變化與型態,以提升泛化能力。我們在物件偵測、語義分割、多標籤分類等下游任務上評估我們所開發的自監督方法的性能,我們引入的方法不會增加訓練參數,並且在各個下游任務的表現上都有所提升。
Self-supervise learning has received a great deal of attention in recent years. In particular, contrastive learning learns representations by pulling another view of the same image closer to it and pushing away the view from the other image. Recently, scene-centric image datasets have been used for pre-training as region-level learning performs better than image-level learning with the scene-centric images, and most of these methods rely on dense matching mechanisms or Selective Search to find possible objects. Recent methods learn by clustering pixels into groups (referred to as slots), assigning pixels with the same semantics to the same slot, and allowing the learned slots to adjust dynamically with the data.We propose a local feature augmentation method that enhances features at the slot level, the semantic slots can learn more variations and patterns of the data to improve the generalization ability.We evaluate the performance of the feature-enhanced self-supervised approach on downstream tasks such as object detection, semantic segmentation, and multi-label classification. The proposed approach does not increase the training parameters and improves on each downstream task.
Self-supervise learning has received a great deal of attention in recent years. In particular, contrastive learning learns representations by pulling another view of the same image closer to it and pushing away the view from the other image. Recently, scene-centric image datasets have been used for pre-training as region-level learning performs better than image-level learning with the scene-centric images, and most of these methods rely on dense matching mechanisms or Selective Search to find possible objects. Recent methods learn by clustering pixels into groups (referred to as slots), assigning pixels with the same semantics to the same slot, and allowing the learned slots to adjust dynamically with the data.We propose a local feature augmentation method that enhances features at the slot level, the semantic slots can learn more variations and patterns of the data to improve the generalization ability.We evaluate the performance of the feature-enhanced self-supervised approach on downstream tasks such as object detection, semantic segmentation, and multi-label classification. The proposed approach does not increase the training parameters and improves on each downstream task.
Description
Keywords
自監督學習, 局部增強, 物件偵測, 實例分割, Self-supervised learning, Local augmentation, Object detection, Instance segmentation