MuGa-VTON: Multi-Garment Virtual Try-On via Diffusion Transformers with Prompt Customization

Deria, Ankan; Mahapatra, Dwarikanath; Bozorgtabar, Behzad; Chakraborty, Mohna; Chakraborty, Snehashis; Roy, Sudipta

Computer Science > Computer Vision and Pattern Recognition

arXiv:2508.08488 (cs)

[Submitted on 11 Aug 2025]

Title:MuGa-VTON: Multi-Garment Virtual Try-On via Diffusion Transformers with Prompt Customization

Authors:Ankan Deria, Dwarikanath Mahapatra, Behzad Bozorgtabar, Mohna Chakraborty, Snehashis Chakraborty, Sudipta Roy

View PDF

Abstract:Virtual try-on seeks to generate photorealistic images of individuals in desired garments, a task that must simultaneously preserve personal identity and garment fidelity for practical use in fashion retail and personalization. However, existing methods typically handle upper and lower garments separately, rely on heavy preprocessing, and often fail to preserve person-specific cues such as tattoos, accessories, and body shape-resulting in limited realism and flexibility. To this end, we introduce MuGa-VTON, a unified multi-garment diffusion framework that jointly models upper and lower garments together with person identity in a shared latent space. Specifically, we proposed three key modules: the Garment Representation Module (GRM) for capturing both garment semantics, the Person Representation Module (PRM) for encoding identity and pose cues, and the A-DiT fusion module, which integrates garment, person, and text-prompt features through a diffusion transformer. This architecture supports prompt-based customization, allowing fine-grained garment modifications with minimal user input. Extensive experiments on the VITON-HD and DressCode benchmarks demonstrate that MuGa-VTON outperforms existing methods in both qualitative and quantitative evaluations, producing high-fidelity, identity-preserving results suitable for real-world virtual try-on applications.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2508.08488 [cs.CV]
	(or arXiv:2508.08488v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2508.08488

Submission history

From: Ankan Deria [view email]
[v1] Mon, 11 Aug 2025 21:45:07 UTC (20,287 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MuGa-VTON: Multi-Garment Virtual Try-On via Diffusion Transformers with Prompt Customization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MuGa-VTON: Multi-Garment Virtual Try-On via Diffusion Transformers with Prompt Customization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators