Skip to main content

Showing 1–11 of 11 results for author: Lingelbach, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.13319  [pdf, other

    cs.CV

    MagicDistillation: Weak-to-Strong Video Distillation for Large-Scale Few-Step Synthesis

    Authors: Shitong Shao, Hongwei Yi, Hanzhong Guo, Tian Ye, Daquan Zhou, Michael Lingelbach, Zhiqiang Xu, Zeke Xie

    Abstract: Recently, open-source video diffusion models (VDMs), such as WanX, Magic141 and HunyuanVideo, have been scaled to over 10 billion parameters. These large-scale VDMs have demonstrated significant improvements over smaller-scale VDMs across multiple dimensions, including enhanced visual quality and more natural motion dynamics. However, these models face two major limitations: (1) High inference ove… ▽ More

    Submitted 31 March, 2025; v1 submitted 17 March, 2025; originally announced March 2025.

  2. arXiv:2503.05978  [pdf, other

    cs.CV

    MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice

    Authors: Hongwei Yi, Tian Ye, Shitong Shao, Xuancheng Yang, Jiantong Zhao, Hanzhong Guo, Terrance Wang, Qingyu Yin, Zeke Xie, Lei Zhu, Wei Li, Michael Lingelbach, Daquan Zhou

    Abstract: We present MagicInfinite, a novel diffusion Transformer (DiT) framework that overcomes traditional portrait animation limitations, delivering high-fidelity results across diverse character types-realistic humans, full-body figures, and stylized anime characters. It supports varied facial poses, including back-facing views, and animates single or multiple characters with input masks for precise spe… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

    Comments: MagicInfinite is publicly accessible at https://www.hedra.com/. More examples are at https://magicinfinite.github.io/

  3. arXiv:2502.07701   

    cs.CV

    Magic 1-For-1: Generating One Minute Video Clips within One Minute

    Authors: Hongwei Yi, Shitong Shao, Tian Ye, Jiantong Zhao, Qingyu Yin, Michael Lingelbach, Li Yuan, Yonghong Tian, Enze Xie, Daquan Zhou

    Abstract: In this technical report, we present Magic 1-For-1 (Magic141), an efficient video generation model with optimized memory consumption and inference latency. The key idea is simple: factorize the text-to-video generation task into two separate easier tasks for diffusion step distillation, namely text-to-image generation and image-to-video generation. We verify that with the same optimization algorit… ▽ More

    Submitted 16 February, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

    Comments: Serious updates are needed

  4. arXiv:2412.13479  [pdf, other

    cs.CV

    Real-time One-Step Diffusion-based Expressive Portrait Videos Generation

    Authors: Hanzhong Guo, Hongwei Yi, Daquan Zhou, Alexander William Bergman, Michael Lingelbach, Yizhou Yu

    Abstract: Latent diffusion models have made great strides in generating expressive portrait videos with accurate lip-sync and natural motion from a single reference image and audio input. However, these models are far from real-time, often requiring many sampling steps that take minutes to generate even one second of video-significantly limiting practical use. We introduce OSA-LCM (One-Step Avatar Latent Co… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

    Comments: 14 pages

  5. arXiv:2405.18407  [pdf, other

    cs.LG cs.CV

    Phased Consistency Models

    Authors: Fu-Yun Wang, Zhaoyang Huang, Alexander William Bergman, Dazhong Shen, Peng Gao, Michael Lingelbach, Keqiang Sun, Weikang Bian, Guanglu Song, Yu Liu, Xiaogang Wang, Hongsheng Li

    Abstract: Consistency Models (CMs) have made significant progress in accelerating the generation of diffusion models. However, their application to high-resolution, text-conditioned image generation in the latent space remains unsatisfactory. In this paper, we identify three key flaws in the current design of Latent Consistency Models (LCMs). We investigate the reasons behind these limitations and propose P… ▽ More

    Submitted 4 December, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: NeurIPS 2024

  6. arXiv:2403.09227  [pdf, other

    cs.RO cs.AI

    BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation

    Authors: Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gokmen, Sanjana Srivastava, Roberto Martín-Martín, Chen Wang, Gabrael Levine, Wensi Ai, Benjamin Martinez, Hang Yin, Michael Lingelbach, Minjune Hwang, Ayano Hiranaka, Sujay Garlanka, Arman Aydin, Sharon Lee, Jiankai Sun, Mona Anvari, Manasi Sharma, Dhruva Bansal, Samuel Hunter, Kyu-Young Kim, Alan Lou, Caleb R Matthews , et al. (10 additional authors not shown)

    Abstract: We present BEHAVIOR-1K, a comprehensive simulation benchmark for human-centered robotics. BEHAVIOR-1K includes two components, guided and motivated by the results of an extensive survey on "what do you want robots to do for you?". The first is the definition of 1,000 everyday activities, grounded in 50 scenes (houses, gardens, restaurants, offices, etc.) with more than 9,000 objects annotated with… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: A preliminary version was published at 6th Conference on Robot Learning (CoRL 2022)

  7. arXiv:2306.13760  [pdf, other

    cs.AI

    Task-Driven Graph Attention for Hierarchical Relational Object Navigation

    Authors: Michael Lingelbach, Chengshu Li, Minjune Hwang, Andrey Kurenkov, Alan Lou, Roberto Martín-Martín, Ruohan Zhang, Li Fei-Fei, Jiajun Wu

    Abstract: Embodied AI agents in large scenes often need to navigate to find objects. In this work, we study a naturally emerging variant of the object navigation task, hierarchical relational object navigation (HRON), where the goal is to find objects specified by logical predicates organized in a hierarchical structure - objects related to furniture and then to rooms - such as finding an apple on top of a… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

  8. arXiv:2305.17537  [pdf, other

    cs.LG cs.AI

    Modeling Dynamic Environments with Scene Graph Memory

    Authors: Andrey Kurenkov, Michael Lingelbach, Tanmay Agarwal, Emily Jin, Chengshu Li, Ruohan Zhang, Li Fei-Fei, Jiajun Wu, Silvio Savarese, Roberto Martín-Martín

    Abstract: Embodied AI agents that search for objects in large environments such as households often need to make efficient decisions by predicting object locations based on partial information. We pose this as a new type of link prediction problem: link prediction on partially observable dynamic graphs. Our graph is a representation of a scene in which rooms and objects are nodes, and their relationships ar… ▽ More

    Submitted 12 June, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

  9. arXiv:2108.03332  [pdf, other

    cs.RO cs.AI cs.CV

    BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments

    Authors: Sanjana Srivastava, Chengshu Li, Michael Lingelbach, Roberto Martín-Martín, Fei Xia, Kent Vainio, Zheng Lian, Cem Gokmen, Shyamal Buch, C. Karen Liu, Silvio Savarese, Hyowon Gweon, Jiajun Wu, Li Fei-Fei

    Abstract: We introduce BEHAVIOR, a benchmark for embodied AI with 100 activities in simulation, spanning a range of everyday household chores such as cleaning, maintenance, and food preparation. These activities are designed to be realistic, diverse, and complex, aiming to reproduce the challenges that agents must face in the real world. Building such a benchmark poses three fundamental difficulties for eac… ▽ More

    Submitted 6 August, 2021; originally announced August 2021.

  10. arXiv:2108.03272  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks

    Authors: Chengshu Li, Fei Xia, Roberto Martín-Martín, Michael Lingelbach, Sanjana Srivastava, Bokui Shen, Kent Vainio, Cem Gokmen, Gokul Dharan, Tanish Jain, Andrey Kurenkov, C. Karen Liu, Hyowon Gweon, Jiajun Wu, Li Fei-Fei, Silvio Savarese

    Abstract: Recent research in embodied AI has been boosted by the use of simulation environments to develop and train robot learning approaches. However, the use of simulation has skewed the attention to tasks that only require what robotics simulators can simulate: motion and physical contact. We present iGibson 2.0, an open-source simulation environment that supports the simulation of a more diverse set of… ▽ More

    Submitted 3 November, 2021; v1 submitted 6 August, 2021; originally announced August 2021.

    Comments: Accepted at Conference on Robot Learning (CoRL) 2021. Project website: http://svl.stanford.edu/igibson/

  11. arXiv:2007.04954  [pdf, other

    cs.CV cs.GR cs.LG cs.RO

    ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation

    Authors: Chuang Gan, Jeremy Schwartz, Seth Alter, Damian Mrowca, Martin Schrimpf, James Traer, Julian De Freitas, Jonas Kubilius, Abhishek Bhandwaldar, Nick Haber, Megumi Sano, Kuno Kim, Elias Wang, Michael Lingelbach, Aidan Curtis, Kevin Feigelis, Daniel M. Bear, Dan Gutfreund, David Cox, Antonio Torralba, James J. DiCarlo, Joshua B. Tenenbaum, Josh H. McDermott, Daniel L. K. Yamins

    Abstract: We introduce ThreeDWorld (TDW), a platform for interactive multi-modal physical simulation. TDW enables simulation of high-fidelity sensory data and physical interactions between mobile agents and objects in rich 3D environments. Unique properties include: real-time near-photo-realistic image rendering; a library of objects and environments, and routines for their customization; generative procedu… ▽ More

    Submitted 28 December, 2021; v1 submitted 9 July, 2020; originally announced July 2020.

    Comments: Oral Presentation at NeurIPS 21 Datasets and Benchmarks Track. Project page: http://www.threedworld.org