Multilevel diffusion-based domain adaptation: image, pixel, and category
Abstract
In this paper, we investigate Diffusion-Based Domain Adaptation, leveraging emerging diffusion models to address domain adaptation tasks. The motivation behind our research stems from the powerful distribution transformation capabilities of diffusion models, which we aim to harness to help AI models adapt to new data distributions. We explore this research from three levels of granularity: image level, pixel level, and category level, each progressively more challenging and addressing different application scenarios.
At the image level of domain adaptation, the model is tasked with adapting to classify images in a new domain. Traditional methods typically focus on feature alignment techniques to minimize the distribution gap between the source and target domains. However, when the domain gap is large, such direct alignment often yields limited results. In contrast, we propose the Domain-Adaptive Diffusion (DAD) module, which effectively divides the large domain gap into smaller, manageable gaps. By leveraging the powerful feature transformation capabilities of diffusion models, our DAD module enables a smooth adaptation of source domain models to the target domain, significantly outperforming prior methods.
At the pixel level of domain adaptation, the task becomes more challenging, as it involves pixel-level classification (i.e., segmentation) on images from a new domain. Unlike image-level domain adaptation, pixel-level tasks require the model to capture fine-grained domain distribution shifts. Traditional methods often rely on GAN-based models to transfer the source domain’s style to the target domain. However, segmentation tasks require the model to preserve pixel-level semantics, and GAN-based methods have shown limited performance due to issues with detail preservation and training instability. To address this, we enhance the conventional diffusion model by introducing the Semantic Gradient Guidance (SGG) module. This module facilitates effective style transfer from source domain images to target domain images while preserving pixel-level semantic content. By innovatively incorporating gradient guidance, we ensure that each pixel’s semantic information remains consistent throughout the transformation process.
At the category level of domain adaptation, the difficulty increases further as the model needs to adapt to new categories with very few training samples. With the recent advances in large diffusion models, which possess vast amounts of pre-learned knowledge, we hypothesize that these models can aid in adapting to new categories. Unlike traditional methods that rely on extensive training across numerous categories, we leverage pre-trained large diffusion models and propose a novel Prompt Pose Matching (PPM) approach to exploit the rich semantic knowledge embedded within these models. By learning proper prompt-visual correspondences, our method enables effective adaptation to new categories with only a few samples, vastly reducing computation costs and significantly outperforming prior methods. Extensive experimental results validate the superiority of our approach across all three levels of domain adaptation.
Speaker’s profile
Duo Peng is a PhD student at Information Systems Technology and Design Pillar, Singapore University of Technology and Design, working under the supervision of Professor De Wen Soh. He received the Bachelor of Engineering and Master of Science degrees in electronic information engineering from Sichuan University. His main research interests include transfer learning and generative AI.