Skip to main content

Showing 1–5 of 5 results for author: Karmanov, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.07706  [pdf, ps, other

    cs.LG

    Evaluating Robustness in Latent Diffusion Models via Embedding Level Augmentation

    Authors: Boris Martirosyan, Alexey Karmanov

    Abstract: Latent diffusion models (LDMs) achieve state-of-the-art performance across various tasks, including image generation and video synthesis. However, they generally lack robustness, a limitation that remains not fully explored in current research. In this paper, we propose several methods to address this gap. First, we hypothesize that the robustness of LDMs primarily should be measured without their… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  2. SOAP: Style-Omniscient Animatable Portraits

    Authors: Tingting Liao, Yujian Zheng, Adilbek Karmanov, Liwen Hu, Leyang Jin, Yuliang Xiu, Hao Li

    Abstract: Creating animatable 3D avatars from a single image remains challenging due to style limitations (realistic, cartoon, anime) and difficulties in handling accessories or hairstyles. While 3D diffusion models advance single-view reconstruction for general objects, outputs often lack animation controls or suffer from artifacts because of the domain gap. We propose SOAP, a style-omniscient framework to… ▽ More

    Submitted 18 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Journal ref: Siggraph 2025, page: https://tingtingliao.github.io/soap/

  3. arXiv:2503.15667  [pdf, other

    cs.CV

    DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis

    Authors: Yuming Gu, Phong Tran, Yujian Zheng, Hongyi Xu, Heyuan Li, Adilbek Karmanov, Hao Li

    Abstract: Generating high-quality 360-degree views of human heads from single-view images is essential for enabling accessible immersive telepresence applications and scalable personalized content creation. While cutting-edge methods for full head generation are limited to modeling realistic human heads, the latest diffusion-based approaches for style-omniscient head synthesis can produce only frontal views… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: Page:https://freedomgu.github.io/DiffPortrait360 Code:https://github.com/FreedomGu/DiffPortrait360/

  4. arXiv:2405.16204  [pdf, other

    cs.CV cs.AI cs.GR

    VOODOO XP: Expressive One-Shot Head Reenactment for VR Telepresence

    Authors: Phong Tran, Egor Zakharov, Long-Nhat Ho, Liwen Hu, Adilbek Karmanov, Aviral Agarwal, McLean Goldwhite, Ariana Bermudez Venegas, Anh Tuan Tran, Hao Li

    Abstract: We introduce VOODOO XP: a 3D-aware one-shot head reenactment method that can generate highly expressive facial expressions from any input driver video and a single 2D portrait. Our solution is real-time, view-consistent, and can be instantly used without calibration or fine-tuning. We demonstrate our solution on a monocular video setting and an end-to-end VR telepresence system for two-way communi… ▽ More

    Submitted 28 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  5. arXiv:2403.18293  [pdf, other

    cs.CV

    Efficient Test-Time Adaptation of Vision-Language Models

    Authors: Adilbek Karmanov, Dayan Guan, Shijian Lu, Abdulmotaleb El Saddik, Eric Xing

    Abstract: Test-time adaptation with pre-trained vision-language models has attracted increasing attention for tackling distribution shifts during the test time. Though prior studies have achieved very promising performance, they involve intensive computation which is severely unaligned with test-time adaptation. We design TDA, a training-free dynamic adapter that enables effective and efficient test-time ad… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024. The code has been released in \url{https://kdiaaa.github.io/tda/}