Skip to main content

Showing 1–50 of 104 results for author: Yi, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.14967  [pdf, other

    cs.CV

    3D Gaussian Head Avatars with Expressive Dynamic Appearances by Compact Tensorial Representations

    Authors: Yating Wang, Xuan Wang, Ran Yi, Yanbo Fan, Jichen Hu, Jingcheng Zhu, Lizhuang Ma

    Abstract: Recent studies have combined 3D Gaussian and 3D Morphable Models (3DMM) to construct high-quality 3D head avatars. In this line of research, existing methods either fail to capture the dynamic textures or incur significant overhead in terms of runtime speed or storage space. To this end, we propose a novel method that addresses all the aforementioned demands. In specific, we introduce an expressiv… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  2. arXiv:2504.06982  [pdf, other

    cs.CV

    SIGMAN:Scaling 3D Human Gaussian Generation with Millions of Assets

    Authors: Yuhang Yang, Fengqi Liu, Yixing Lu, Qin Zhao, Pingyu Wu, Wei Zhai, Ran Yi, Yang Cao, Lizhuang Ma, Zheng-Jun Zha, Junting Dong

    Abstract: 3D human digitization has long been a highly pursued yet challenging task. Existing methods aim to generate high-quality 3D digital humans from single or multiple views, but remain primarily constrained by current paradigms and the scarcity of 3D human assets. Specifically, recent approaches fall into several paradigms: optimization-based and feed-forward (both single-view regression and multi-vie… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: project page:https://yyvhang.github.io/SIGMAN_3D/

  3. arXiv:2504.01603  [pdf, other

    cs.CV

    A$^\text{T}$A: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting

    Authors: Yizhe Tang, Zhimin Sun, Yuzhen Du, Ran Yi, Guangben Lu, Teng Hu, Luying Li, Lizhuang Ma, Fangyuan Zou

    Abstract: Image inpainting aims to fill the missing region of an image. Recently, there has been a surge of interest in foreground-conditioned background inpainting, a sub-task that fills the background of an image while the foreground subject and associated text prompt are provided. Existing background inpainting methods typically strictly preserve the subject's original position from the source image, res… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR 2025

  4. arXiv:2503.12758  [pdf, other

    cs.CV eess.IV

    VasTSD: Learning 3D Vascular Tree-state Space Diffusion Model for Angiography Synthesis

    Authors: Zhifeng Wang, Renjiao Yi, Xin Wen, Chenyang Zhu, Kai Xu

    Abstract: Angiography imaging is a medical imaging technique that enhances the visibility of blood vessels within the body by using contrast agents. Angiographic images can effectively assist in the diagnosis of vascular diseases. However, contrast agents may bring extra radiation exposure which is harmful to patients with health risks. To mitigate these concerns, in this paper, we aim to automatically gene… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  5. arXiv:2503.12035  [pdf, other

    cs.CV

    MOS: Modeling Object-Scene Associations in Generalized Category Discovery

    Authors: Zhengyuan Peng, Jinpeng Ma, Zhimin Sun, Ran Yi, Haichuan Song, Xin Tan, Lizhuang Ma

    Abstract: Generalized Category Discovery (GCD) is a classification task that aims to classify both base and novel classes in unlabeled images, using knowledge from a labeled dataset. In GCD, previous research overlooks scene information or treats it as noise, reducing its impact during model training. However, in this paper, we argue that scene information should be viewed as a strong prior for inferring no… ▽ More

    Submitted 17 March, 2025; v1 submitted 15 March, 2025; originally announced March 2025.

    Comments: Accepted to CVPR 2025.The code is available at https://github.com/JethroPeng/MOS

  6. arXiv:2502.11974  [pdf, other

    cs.CV

    Image Inversion: A Survey from GANs to Diffusion and Beyond

    Authors: Yinan Chen, Jiangning Zhang, Yali Bi, Xiaobin Hu, Teng Hu, Zhucun Xue, Ran Yi, Yong Liu, Ying Tai

    Abstract: Image inversion is a fundamental task in generative models, aiming to map images back to their latent representations to enable downstream applications such as editing, restoration, and style transfer. This paper provides a comprehensive review of the latest advancements in image inversion techniques, focusing on two main paradigms: Generative Adversarial Network (GAN) inversion and diffusion mode… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: 10 pages, 2 figures

  7. arXiv:2502.09649  [pdf, other

    cs.AI cs.CV cs.LG cs.RO

    Imit Diff: Semantics Guided Diffusion Transformer with Dual Resolution Fusion for Imitation Learning

    Authors: Yuhang Dong, Haizhou Ge, Yupei Zeng, Jiangning Zhang, Beiwen Tian, Guanzhong Tian, Hongrui Zhu, Yufei Jia, Ruixiang Wang, Ran Yi, Guyue Zhou, Longhua Ma

    Abstract: Visuomotor imitation learning enables embodied agents to effectively acquire manipulation skills from video demonstrations and robot proprioception. However, as scene complexity and visual distractions increase, existing methods that perform well in simple scenes tend to degrade in performance. To address this challenge, we introduce Imit Diff, a semanstic guided diffusion transformer with dual re… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

  8. Adaptive Multi-Objective Bayesian Optimization for Capacity Planning of Hybrid Heat Sources in Electric-Heat Coupling Systems of Cold Regions

    Authors: Ruizhe Yang, Zhongkai Yi, Ying Xu, Guiyu Chen, Haojie Yang, Rong Yi, Tongqing Li, Miaozhe ShenJin Li, Haoxiang Gao, Hongyu Duan

    Abstract: The traditional heat-load generation pattern of combined heat and power generators has become a problem leading to renewable energy source (RES) power curtailment in cold regions, motivating the proposal of a planning model for alternative heat sources. The model aims to identify non-dominant capacity allocation schemes for heat pumps, thermal energy storage, electric boilers, and combined storage… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

    Comments: 11 pages, 11 figures

    Journal ref: IEEE Transactions on Industry Applications 2025 ( Early Access )

  9. arXiv:2502.00307  [pdf, other

    cs.CV

    A Diffusion Model Translator for Efficient Image-to-Image Translation

    Authors: Mengfei Xia, Yu Zhou, Ran Yi, Yong-Jin Liu, Wenping Wang

    Abstract: Applying diffusion models to image-to-image translation (I2I) has recently received increasing attention due to its practical applications. Previous attempts inject information from the source image into each denoising step for an iterative refinement, thus resulting in a time-consuming implementation. We propose an efficient method that equips a diffusion model with a lightweight translator, dubb… ▽ More

    Submitted 31 January, 2025; originally announced February 2025.

  10. arXiv:2501.00880  [pdf, other

    cs.CV

    Improving Autoregressive Visual Generation with Cluster-Oriented Token Prediction

    Authors: Teng Hu, Jiangning Zhang, Ran Yi, Jieyu Weng, Yabiao Wang, Xianfang Zeng, Zhucun Xue, Lizhuang Ma

    Abstract: Employing LLMs for visual generation has recently become a research focus. However, the existing methods primarily transfer the LLM architecture to visual generation but rarely investigate the fundamental differences between language and vision. This oversight may lead to suboptimal utilization of visual generation capabilities within the LLM framework. In this paper, we explore the characteristic… ▽ More

    Submitted 15 March, 2025; v1 submitted 1 January, 2025; originally announced January 2025.

    Comments: Accepted by CVPR 2025

  11. arXiv:2501.00342  [pdf, other

    cs.CV

    SG-Splatting: Accelerating 3D Gaussian Splatting with Spherical Gaussians

    Authors: Yiwen Wang, Siyuan Chen, Ran Yi

    Abstract: 3D Gaussian Splatting is emerging as a state-of-the-art technique in novel view synthesis, recognized for its impressive balance between visual quality, speed, and rendering efficiency. However, reliance on third-degree spherical harmonics for color representation introduces significant storage demands and computational overhead, resulting in a large memory footprint and slower rendering speed. We… ▽ More

    Submitted 31 December, 2024; originally announced January 2025.

  12. arXiv:2412.09177  [pdf, other

    cs.CV cs.CG

    Weighted Poisson-disk Resampling on Large-Scale Point Clouds

    Authors: Xianhe Jiao, Chenlei Lv, Junli Zhao, Ran Yi, Yu-Hui Wen, Zhenkuan Pan, Zhongke Wu, Yong-jin Liu

    Abstract: For large-scale point cloud processing, resampling takes the important role of controlling the point number and density while keeping the geometric consistency. % in related tasks. However, current methods cannot balance such different requirements. Particularly with large-scale point clouds, classical methods often struggle with decreased efficiency and accuracy. To address such issues, we propos… ▽ More

    Submitted 16 December, 2024; v1 submitted 12 December, 2024; originally announced December 2024.

    Comments: Accepted to AAAI 2025

  13. arXiv:2412.07962  [pdf, other

    cs.CR cs.DB

    Mayfly: Private Aggregate Insights from Ephemeral Streams of On-Device User Data

    Authors: Christopher Bian, Albert Cheu, Stanislav Chiknavaryan, Zoe Gong, Marco Gruteser, Oliver Guinan, Yannis Guzman, Peter Kairouz, Artem Lagzdin, Ryan McKenna, Grace Ni, Edo Roth, Maya Spivak, Timon Van Overveldt, Ren Yi

    Abstract: This paper introduces Mayfly, a federated analytics approach enabling aggregate queries over ephemeral on-device data streams without central persistence of sensitive user data. Mayfly minimizes data via on-device windowing and contribution bounding through SQL-programmability, anonymizes user data via streaming differential privacy (DP), and mandates immediate in-memory cross-device aggregation o… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

    Comments: 22 pages, 7 figures

    ACM Class: H.2.8; K.4.1; H.4

  14. arXiv:2412.03812  [pdf, other

    cs.CV

    Pinco: Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting

    Authors: Guangben Lu, Yuzhen Du, Zhimin Sun, Ran Yi, Yifan Qi, Yizhe Tang, Tianyi Wang, Lizhuang Ma, Fangyuan Zou

    Abstract: Foreground-conditioned inpainting aims to seamlessly fill the background region of an image by utilizing the provided foreground subject and a text description. While existing T2I-based image inpainting methods can be applied to this task, they suffer from issues of subject shape expansion, distortion, or impaired ability to align with the text description, resulting in inconsistencies between the… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

  15. arXiv:2412.03632  [pdf, other

    cs.CV

    MV-Adapter: Multi-view Consistent Image Generation Made Easy

    Authors: Zehuan Huang, Yuan-Chen Guo, Haoran Wang, Ran Yi, Lizhuang Ma, Yan-Pei Cao, Lu Sheng

    Abstract: Existing multi-view image generation methods often make invasive modifications to pre-trained text-to-image (T2I) models and require full fine-tuning, leading to (1) high computational costs, especially with large base models and high-resolution images, and (2) degradation in image quality due to optimization difficulties and scarce high-quality 3D data. In this paper, we propose the first adapter… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

    Comments: Project page: https://huanngzh.github.io/MV-Adapter-Page/

  16. arXiv:2412.00402  [pdf, other

    cs.AI

    DroidCall: A Dataset for LLM-powered Android Intent Invocation

    Authors: Weikai Xie, Li Zhang, Shihe Wang, Rongjie Yi, Mengwei Xu

    Abstract: The growing capabilities of large language models in natural language understanding significantly strengthen existing agentic systems. To power performant on-device mobile agents for better data privacy, we introduce DroidCall, the first training and testing dataset for accurate Android intent invocation. With a highly flexible and reusable data generation pipeline, we constructed 10k samples in D… ▽ More

    Submitted 30 November, 2024; originally announced December 2024.

  17. arXiv:2411.17515  [pdf, other

    cs.CV

    SuperMat: Physically Consistent PBR Material Estimation at Interactive Rates

    Authors: Yijia Hong, Yuan-Chen Guo, Ran Yi, Yulong Chen, Yan-Pei Cao, Lizhuang Ma

    Abstract: Decomposing physically-based materials from images into their constituent properties remains challenging, particularly when maintaining both computational efficiency and physical consistency. While recent diffusion-based approaches have shown promise, they face substantial computational overhead due to multiple denoising steps and separate models for different material properties. We present Super… ▽ More

    Submitted 29 November, 2024; v1 submitted 26 November, 2024; originally announced November 2024.

    Comments: https://hyj542682306.github.io/SuperMat/

  18. arXiv:2411.05046  [pdf, other

    cs.CL cs.AI cs.LG

    PhoneLM:an Efficient and Capable Small Language Model Family through Principled Pre-training

    Authors: Rongjie Yi, Xiang Li, Weikai Xie, Zhenyan Lu, Chenghua Wang, Ao Zhou, Shangguang Wang, Xiwen Zhang, Mengwei Xu

    Abstract: The interest in developing small language models (SLM) for on-device deployment is fast growing. However, the existing SLM design hardly considers the device hardware characteristics. Instead, this work presents a simple yet effective principle for SLM design: architecture searching for (near-)optimal runtime efficiency before pre-training. Guided by this principle, we develop PhoneLM SLM family (… ▽ More

    Submitted 6 November, 2024; originally announced November 2024.

  19. arXiv:2411.04079  [pdf, other

    cs.CV

    Textual Decomposition Then Sub-motion-space Scattering for Open-Vocabulary Motion Generation

    Authors: Ke Fan, Jiangning Zhang, Ran Yi, Jingyu Gong, Yabiao Wang, Yating Wang, Xin Tan, Chengjie Wang, Lizhuang Ma

    Abstract: Text-to-motion generation is a crucial task in computer vision, which generates the target 3D motion by the given text. The existing annotated datasets are limited in scale, resulting in most existing methods overfitting to the small datasets and unable to generalize to the motions of the open domain. Some methods attempt to solve the open-vocabulary motion generation problem by aligning to the CL… ▽ More

    Submitted 6 November, 2024; originally announced November 2024.

    Comments: project page: https://vankouf.github.io/DSONet/

  20. CAD-NeRF: Learning NeRFs from Uncalibrated Few-view Images by CAD Model Retrieval

    Authors: Xin Wen, Xuening Zhu, Renjiao Yi, Zhifeng Wang, Chenyang Zhu, Kai Xu

    Abstract: Reconstructing from multi-view images is a longstanding problem in 3D vision, where neural radiance fields (NeRFs) have shown great potential and get realistic rendered images of novel views. Currently, most NeRF methods either require accurate camera poses or a large number of input images, or even both. Reconstructing NeRF from few-view images without poses is challenging and highly ill-posed. T… ▽ More

    Submitted 4 May, 2025; v1 submitted 5 November, 2024; originally announced November 2024.

    Comments: The article has been accepted by Frontiers of Computer Science (FCS)

  21. arXiv:2410.18737  [pdf, other

    cs.CV

    Rectified Diffusion Guidance for Conditional Generation

    Authors: Mengfei Xia, Nan Xue, Yujun Shen, Ran Yi, Tieliang Gong, Yong-Jin Liu

    Abstract: Classifier-Free Guidance (CFG), which combines the conditional and unconditional score functions with two coefficients summing to one, serves as a practical technique for diffusion model sampling. Theoretically, however, denoising with CFG cannot be expressed as a reciprocal diffusion process, which may consequently leave some hidden risks during use. In this work, we revisit the theory behind CFG… ▽ More

    Submitted 24 October, 2024; originally announced October 2024.

  22. arXiv:2410.16418  [pdf, other

    cs.CV

    AttentionPainter: An Efficient and Adaptive Stroke Predictor for Scene Painting

    Authors: Yizhe Tang, Yue Wang, Teng Hu, Ran Yi, Xin Tan, Lizhuang Ma, Yu-Kun Lai, Paul L. Rosin

    Abstract: Stroke-based Rendering (SBR) aims to decompose an input image into a sequence of parameterized strokes, which can be rendered into a painting that resembles the input image. Recently, Neural Painting methods that utilize deep learning and reinforcement learning models to predict the stroke sequences have been developed, but suffer from longer inference time or unstable training. To address these i… ▽ More

    Submitted 25 October, 2024; v1 submitted 21 October, 2024; originally announced October 2024.

  23. arXiv:2410.13786  [pdf, other

    cs.CV

    Emphasizing Semantic Consistency of Salient Posture for Speech-Driven Gesture Generation

    Authors: Fengqi Liu, Hexiang Wang, Jingyu Gong, Ran Yi, Qianyu Zhou, Xuequan Lu, Jiangbo Lu, Lizhuang Ma

    Abstract: Speech-driven gesture generation aims at synthesizing a gesture sequence synchronized with the input speech signal. Previous methods leverage neural networks to directly map a compact audio representation to the gesture sequence, ignoring the semantic association of different modalities and failing to deal with salient gestures. In this paper, we propose a novel speech-driven gesture generation me… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  24. arXiv:2409.15790  [pdf, other

    cs.CL cs.AI cs.LG

    Small Language Models: Survey, Measurements, and Insights

    Authors: Zhenyan Lu, Xiang Li, Dongqi Cai, Rongjie Yi, Fangming Liu, Xiwen Zhang, Nicholas D. Lane, Mengwei Xu

    Abstract: Small language models (SLMs), despite their widespread adoption in modern smart devices, have received significantly less academic attention compared to their large language model (LLM) counterparts, which are predominantly deployed in data centers and cloud environments. While researchers continue to improve the capabilities of LLMs in the pursuit of artificial general intelligence, SLM research… ▽ More

    Submitted 26 February, 2025; v1 submitted 24 September, 2024; originally announced September 2024.

  25. arXiv:2409.13903  [pdf, other

    cs.AI

    CI-Bench: Benchmarking Contextual Integrity of AI Assistants on Synthetic Data

    Authors: Zhao Cheng, Diane Wan, Matthew Abueg, Sahra Ghalebikesabi, Ren Yi, Eugene Bagdasarian, Borja Balle, Stefan Mellem, Shawn O'Banion

    Abstract: Advances in generative AI point towards a new era of personalized applications that perform diverse tasks on behalf of users. While general AI assistants have yet to fully emerge, their potential to share personal data raises significant privacy challenges. This paper introduces CI-Bench, a comprehensive synthetic benchmark for evaluating the ability of AI assistants to protect personal informatio… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

  26. arXiv:2409.09071  [pdf, other

    cs.DC cs.AI

    ELMS: Elasticized Large Language Models On Mobile Devices

    Authors: Wangsong Yin, Rongjie Yi, Daliang Xu, Gang Huang, Mengwei Xu, Xuanzhe Liu

    Abstract: On-device Large Language Models (LLMs) are revolutionizing mobile AI, enabling applications such as UI automation while addressing privacy concerns. Currently, the standard approach involves deploying a single, robust LLM as a universal solution for various applications, often referred to as LLM-as-a-Service (LLMaaS). However, this approach faces a significant system challenge: existing LLMs lack… ▽ More

    Submitted 8 September, 2024; originally announced September 2024.

    Comments: Technical Report

  27. AdR-Gaussian: Accelerating Gaussian Splatting with Adaptive Radius

    Authors: Xinzhe Wang, Ran Yi, Lizhuang Ma

    Abstract: 3D Gaussian Splatting (3DGS) is a recent explicit 3D representation that has achieved high-quality reconstruction and real-time rendering of complex scenes. However, the rasterization pipeline still suffers from unnecessary overhead resulting from avoidable serial Gaussian culling, and uneven load due to the distinct number of Gaussian to be rendered across pixels, which hinders wider promotion an… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

    Comments: SIGGRAPH Asia 2024 Conference Papers (SA Conference Papers '24), December 03-06, 2024, Tokyo, Japan

  28. arXiv:2409.06633  [pdf, other

    cs.CV

    SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation

    Authors: Teng Hu, Jiangning Zhang, Ran Yi, Hongrui Huang, Yabiao Wang, Lizhuang Ma

    Abstract: In recent years, the development of diffusion models has led to significant progress in image and video generation tasks, with pre-trained models like the Stable Diffusion series playing a crucial role. Inspired by model pruning which lightens large pre-trained models by removing unimportant parameters, we propose a novel model fine-tuning method to make full use of these ineffective parameters an… ▽ More

    Submitted 2 April, 2025; v1 submitted 10 September, 2024; originally announced September 2024.

    Comments: Accepted by ICLR 2025

  29. arXiv:2409.05474  [pdf, other

    cs.CV cs.GR

    PVP-Recon: Progressive View Planning via Warping Consistency for Sparse-View Surface Reconstruction

    Authors: Sheng Ye, Yuze He, Matthieu Lin, Jenny Sheng, Ruoyu Fan, Yiheng Han, Yubin Hu, Ran Yi, Yu-Hui Wen, Yong-Jin Liu, Wenping Wang

    Abstract: Neural implicit representations have revolutionized dense multi-view surface reconstruction, yet their performance significantly diminishes with sparse input views. A few pioneering works have sought to tackle the challenge of sparse-view reconstruction by leveraging additional geometric priors or multi-scene generalizability. However, they are still hindered by the imperfect choice of input views… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  30. arXiv:2408.16690  [pdf, other

    cs.CV

    Generic Objects as Pose Probes for Few-shot View Synthesis

    Authors: Zhirui Gao, Renjiao Yi, Chenyang Zhu, Ke Zhuang, Wei Chen, Kai Xu

    Abstract: Radiance fields including NeRFs and 3D Gaussians demonstrate great potential in high-fidelity rendering and scene reconstruction, while they require a substantial number of posed images as inputs. COLMAP is frequently employed for preprocessing to estimate poses, while it necessitates a large number of feature matches to operate effectively, and it struggles with scenes characterized by sparse fea… ▽ More

    Submitted 29 April, 2025; v1 submitted 29 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE TCSVT 2025 Project page: https://zhirui-gao.github.io/PoseProbe.github.io/

  31. arXiv:2408.10789  [pdf, other

    cs.CV

    PartGS:Learning Part-aware 3D Representations by Fusing 2D Gaussians and Superquadrics

    Authors: Zhirui Gao, Renjiao Yi, Yuhang Huang, Wei Chen, Chenyang Zhu, Kai Xu

    Abstract: Low-level 3D representations, such as point clouds, meshes, NeRFs, and 3D Gaussians, are commonly used to represent 3D objects or scenes. However, human perception typically understands 3D objects at a higher level as a composition of parts or structures rather than points or voxels. Representing 3D objects or scenes as semantic parts can benefit further understanding and applications. In this pap… ▽ More

    Submitted 2 December, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

  32. arXiv:2408.02373  [pdf, other

    cs.AI

    Operationalizing Contextual Integrity in Privacy-Conscious Assistants

    Authors: Sahra Ghalebikesabi, Eugene Bagdasaryan, Ren Yi, Itay Yona, Ilia Shumailov, Aneesh Pappu, Chongyang Shi, Laura Weidinger, Robert Stanforth, Leonard Berrada, Pushmeet Kohli, Po-Sen Huang, Borja Balle

    Abstract: Advanced AI assistants combine frontier LLMs and tool access to autonomously perform complex tasks on behalf of users. While the helpfulness of such assistants can increase dramatically with access to user information including emails and documents, this raises privacy concerns about assistants sharing inappropriate information with third parties without user supervision. To steer information-shar… ▽ More

    Submitted 13 September, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

  33. arXiv:2406.19705  [pdf, other

    cs.AI

    DISCO: Efficient Diffusion Solver for Large-Scale Combinatorial Optimization Problems

    Authors: Kexiong Yu, Hang Zhao, Yuhang Huang, Renjiao Yi, Kai Xu, Chenyang Zhu

    Abstract: Combinatorial Optimization (CO) problems are fundamentally important in numerous real-world applications across diverse industries, characterized by entailing enormous solution space and demanding time-sensitive response. Despite recent advancements in neural solvers, their limited expressiveness struggles to capture the multi-modal nature of CO landscapes. While some research has shifted towards… ▽ More

    Submitted 21 October, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

  34. arXiv:2406.16710  [pdf, other

    cs.CV

    ID-Sculpt: ID-aware 3D Head Generation from Single In-the-wild Portrait Image

    Authors: Jinkun Hao, Junshu Tang, Jiangning Zhang, Ran Yi, Yijia Hong, Moran Li, Weijian Cao, Yating Wang, Chengjie Wang, Lizhuang Ma

    Abstract: While recent works have achieved great success on image-to-3D object generation, high quality and fidelity 3D head generation from a single image remains a great challenge. Previous text-based methods for generating 3D heads were limited by text descriptions and image-based methods struggled to produce high-quality head geometry. To handle this challenging problem, we propose a novel framework, ID… ▽ More

    Submitted 22 December, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted by AAAI 2025; Project page: https://jinkun-hao.github.io/ID-Sculpt/

  35. arXiv:2406.14806  [pdf, other

    cs.CV cs.GR

    Relighting Scenes with Object Insertions in Neural Radiance Fields

    Authors: Xuening Zhu, Renjiao Yi, Xin Wen, Chenyang Zhu, Kai Xu

    Abstract: The insertion of objects into a scene and relighting are commonly utilized applications in augmented reality (AR). Previous methods focused on inserting virtual objects using CAD models or real objects from single-view images, resulting in highly limited AR application scenarios. We propose a novel NeRF-based pipeline for inserting object NeRFs into scene NeRFs, enabling novel view synthesis and r… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 14 pages

  36. arXiv:2406.09794  [pdf, other

    cs.CV

    SuperSVG: Superpixel-based Scalable Vector Graphics Synthesis

    Authors: Teng Hu, Ran Yi, Baihong Qian, Jiangning Zhang, Paul L. Rosin, Yu-Kun Lai

    Abstract: SVG (Scalable Vector Graphics) is a widely used graphics format that possesses excellent scalability and editability. Image vectorization, which aims to convert raster images to SVGs, is an important yet challenging problem in computer vision and graphics. Existing image vectorization methods either suffer from low reconstruction accuracy for complex images or require long computation time. To add… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: CVPR 2024

  37. arXiv:2406.02263  [pdf, other

    cs.CV

    M3DM-NR: RGB-3D Noisy-Resistant Industrial Anomaly Detection via Multimodal Denoising

    Authors: Chengjie Wang, Haokun Zhu, Jinlong Peng, Yue Wang, Ran Yi, Yunsheng Wu, Lizhuang Ma, Jiangning Zhang

    Abstract: Existing industrial anomaly detection methods primarily concentrate on unsupervised learning with pristine RGB images. Yet, both RGB and 3D data are crucial for anomaly detection, and the datasets are seldom completely clean in practical scenarios. To address above challenges, this paper initially delves into the RGB-3D multi-modal noisy anomaly detection, proposing a novel noise-resistant M3DM-NR… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  38. arXiv:2405.15763  [pdf, other

    cs.CV

    FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis

    Authors: Ke Fan, Junshu Tang, Weijian Cao, Ran Yi, Moran Li, Jingyu Gong, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Lizhuang Ma

    Abstract: Text-to-motion synthesis is a crucial task in computer vision. Existing methods are limited in their universality, as they are tailored for single-person or two-person scenarios and can not be applied to generate motions for more individuals. To achieve the number-free motion synthesis, this paper reconsiders motion generation and proposes to unify the single and multi-person motion by the conditi… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  39. arXiv:2405.05175  [pdf, other

    cs.CR cs.CL cs.LG

    AirGapAgent: Protecting Privacy-Conscious Conversational Agents

    Authors: Eugene Bagdasarian, Ren Yi, Sahra Ghalebikesabi, Peter Kairouz, Marco Gruteser, Sewoong Oh, Borja Balle, Daniel Ramage

    Abstract: The growing use of large language model (LLM)-based conversational agents to manage sensitive user data raises significant privacy concerns. While these agents excel at understanding and acting on context, this capability can be exploited by malicious actors. We introduce a novel threat model where adversarial third-party apps manipulate the context of interaction to trick LLM-based agents into re… ▽ More

    Submitted 18 September, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

    Comments: at CCS'24

  40. arXiv:2405.00507  [pdf, other

    cs.CV

    F2M-Reg: Unsupervised RGB-D Point Cloud Registration with Frame-to-Model Optimization

    Authors: Zhinan Yu, Zheng Qin, Yijie Tang, Yongjun Wang, Renjiao Yi, Chenyang Zhu, Kai Xu

    Abstract: This work studies the problem of unsupervised RGB-D point cloud registration, which aims at training a robust registration model without ground-truth pose supervision. Existing methods usually leverages unposed RGB-D sequences and adopt a frame-to-frame framework based on differentiable rendering to train the registration model, which enforces the photometric and geometric consistency between the… ▽ More

    Submitted 1 May, 2025; v1 submitted 1 May, 2024; originally announced May 2024.

  41. arXiv:2404.19040  [pdf, other

    cs.CV

    GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting

    Authors: Bo Chen, Shoukang Hu, Qi Chen, Chenpeng Du, Ran Yi, Yanmin Qian, Xie Chen

    Abstract: We present GStalker, a 3D audio-driven talking face generation model with Gaussian Splatting for both fast training (40 minutes) and real-time rendering (125 FPS) with a 3$\sim$5 minute video for training material, in comparison with previous 2D and 3D NeRF-based modeling frameworks which require hours of training and seconds of rendering per frame. Specifically, GSTalker learns an audio-driven Ga… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  42. arXiv:2404.15789  [pdf, other

    cs.CV

    MotionMaster: Training-free Camera Motion Transfer For Video Generation

    Authors: Teng Hu, Jiangning Zhang, Ran Yi, Yating Wang, Hongrui Huang, Jieyu Weng, Yabiao Wang, Lizhuang Ma

    Abstract: The emergence of diffusion models has greatly propelled the progress in image and video generation. Recently, some efforts have been made in controllable video generation, including text-to-video generation and video motion control, among which camera motion control is an important topic. However, existing camera motion control methods rely on training a temporal camera module, and necessitate sub… ▽ More

    Submitted 30 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  43. arXiv:2404.05606  [pdf, other

    cs.CV

    Learning Topology Uniformed Face Mesh by Volume Rendering for Multi-view Reconstruction

    Authors: Yating Wang, Ran Yi, Ke Fan, Jinkun Hao, Jiangbo Lu, Lizhuang Ma

    Abstract: Face meshes in consistent topology serve as the foundation for many face-related applications, such as 3DMM constrained face reconstruction and expression retargeting. Traditional methods commonly acquire topology uniformed face meshes by two separate steps: multi-view stereo (MVS) to reconstruct shapes followed by non-rigid registration to align topology, but struggles with handling noise and non… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  44. arXiv:2404.03518  [pdf, other

    cs.CV

    SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation

    Authors: Sichen Chen, Yingyi Zhang, Siming Huang, Ran Yi, Ke Fan, Ruixin Zhang, Peixian Chen, Jun Wang, Shouhong Ding, Lizhuang Ma

    Abstract: Recently, transformer-based methods have achieved state-of-the-art prediction quality on human pose estimation(HPE). Nonetheless, most of these top-performing transformer-based models are too computation-consuming and storage-demanding to deploy on edge computing platforms. Those transformer-based models that require fewer resources are prone to under-fitting due to their smaller scale and thus pe… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  45. arXiv:2401.09146  [pdf, other

    cs.CV

    Continuous Piecewise-Affine Based Motion Model for Image Animation

    Authors: Hexiang Wang, Fengqi Liu, Qianyu Zhou, Ran Yi, Xin Tan, Lizhuang Ma

    Abstract: Image animation aims to bring static images to life according to driving videos and create engaging visual content that can be used for various purposes such as animation, entertainment, and education. Recent unsupervised methods utilize affine and thin-plate spline transformations based on keypoints to transfer the motion in driving frames to the source image. However, limited by the expressive p… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  46. arXiv:2401.08092  [pdf, other

    cs.LG cs.AI cs.DC

    A Survey of Resource-efficient LLM and Multimodal Foundation Models

    Authors: Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu

    Abstract: Large foundation models, including large language models (LLMs), vision transformers (ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine learning lifecycle, from training to deployment. However, the substantial advancements in versatility and performance these models offer come at a significant cost in terms of hardware resources. To support the growth of the… ▽ More

    Submitted 23 September, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

  47. arXiv:2401.02032  [pdf, other

    cs.CV

    DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection

    Authors: Yunfan Ye, Kai Xu, Yuhang Huang, Renjiao Yi, Zhiping Cai

    Abstract: Limited by the encoder-decoder architecture, learning-based edge detectors usually have difficulty predicting edge maps that satisfy both correctness and crispness. With the recent success of the diffusion probabilistic model (DPM), we found it is especially suitable for accurate and crisp edge detection since the denoising process is directly applied to the original image size. Therefore, we prop… ▽ More

    Submitted 9 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: AAAI 2024

  48. arXiv:2312.15139  [pdf, other

    cs.CV

    Automatic Tooth Arrangement with Joint Features of Point and Mesh Representations via Diffusion Probabilistic Models

    Authors: Changsong Lei, Mengfei Xia, Shaofeng Wang, Yaqian Liang, Ran Yi, Yuhui Wen, Yongjin Liu

    Abstract: Tooth arrangement is a crucial step in orthodontics treatment, in which aligning teeth could improve overall well-being, enhance facial aesthetics, and boost self-confidence. To improve the efficiency of tooth arrangement and minimize errors associated with unreasonable designs by inexperienced practitioners, some deep learning-based tooth arrangement methods have been proposed. Currently, most ex… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  49. arXiv:2312.10111  [pdf, other

    cs.CV

    Plasticine3D: 3D Non-Rigid Editing with Text Guidance by Multi-View Embedding Optimization

    Authors: Yige Chen, Teng Hu, Yizhe Tang, Siyuan Chen, Ang Chen, Ran Yi

    Abstract: With the help of Score Distillation Sampling (SDS) and the rapid development of neural 3D representations, some methods have been proposed to perform 3D editing such as adding additional geometries, or overwriting textures. However, generalized 3D non-rigid editing task, which requires changing both the structure (posture or composition) and appearance (texture) of the original object, remains to… ▽ More

    Submitted 9 July, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

  50. arXiv:2312.05767  [pdf, other

    cs.CV

    AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model

    Authors: Teng Hu, Jiangning Zhang, Ran Yi, Yuzhen Du, Xu Chen, Liang Liu, Yabiao Wang, Chengjie Wang

    Abstract: Anomaly inspection plays an important role in industrial manufacture. Existing anomaly inspection methods are limited in their performance due to insufficient anomaly data. Although anomaly generation methods have been proposed to augment the anomaly data, they either suffer from poor generation authenticity or inaccurate alignment between the generated anomalies and masks. To address the above pr… ▽ More

    Submitted 21 February, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

    Comments: AAAI 2024