Skip to main content

Showing 1–30 of 30 results for author: Saragih, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.04956  [pdf, other

    cs.GR cs.CV

    REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning

    Authors: Jihyun Lee, Weipeng Xu, Alexander Richard, Shih-En Wei, Shunsuke Saito, Shaojie Bai, Te-Li Wang, Minhyuk Sung, Tae-Kyun Kim, Jason Saragih

    Abstract: We present REWIND (Real-Time Egocentric Whole-Body Motion Diffusion), a one-step diffusion model for real-time, high-fidelity human motion estimation from egocentric image inputs. While an existing method for egocentric whole-body (i.e., body and hands) motion estimation is non-real-time and acausal due to diffusion-based iterative motion refinement to capture correlations between body and hand po… ▽ More

    Submitted 7 April, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

    Comments: Accepted to CVPR 2025, project page: https://jyunlee.github.io/projects/rewind/

  2. arXiv:2503.19207  [pdf, other

    cs.CV

    FRESA: Feedforward Reconstruction of Personalized Skinned Avatars from Few Images

    Authors: Rong Wang, Fabian Prada, Ziyan Wang, Zhongshi Jiang, Chengxiang Yin, Junxuan Li, Shunsuke Saito, Igor Santesteban, Javier Romero, Rohan Joshi, Hongdong Li, Jason Saragih, Yaser Sheikh

    Abstract: We present a novel method for reconstructing personalized 3D human avatars with realistic animation from only a few images. Due to the large variations in body shapes, poses, and cloth types, existing methods mostly require hours of per-subject optimization during inference, which limits their practical applications. In contrast, we learn a universal prior from over a thousand clothed humans to ac… ▽ More

    Submitted 4 April, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

    Comments: Published in CVPR 2025

  3. arXiv:2502.19739  [pdf, other

    cs.CV

    LUCAS: Layered Universal Codec Avatars

    Authors: Di Liu, Teng Deng, Giljoo Nam, Yu Rong, Stanislav Pidhorskyi, Junxuan Li, Jason Saragih, Dimitris N. Metaxas, Chen Cao

    Abstract: Photorealistic 3D head avatar reconstruction faces critical challenges in modeling dynamic face-hair interactions and achieving cross-identity generalization, particularly during expressions and head movements. We present LUCAS, a novel Universal Prior Model (UPM) for codec avatar modeling that disentangles face and hair through a layered representation. Unlike previous UPMs that treat hair as an… ▽ More

    Submitted 17 March, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

  4. arXiv:2501.14726  [pdf, other

    cs.CV cs.GR

    Relightable Full-Body Gaussian Codec Avatars

    Authors: Shaofei Wang, Tomas Simon, Igor Santesteban, Timur Bagautdinov, Junxuan Li, Vasu Agrawal, Fabian Prada, Shoou-I Yu, Pace Nalbone, Matt Gramlich, Roman Lubachersky, Chenglei Wu, Javier Romero, Jason Saragih, Michael Zollhoefer, Andreas Geiger, Siyu Tang, Shunsuke Saito

    Abstract: We propose Relightable Full-Body Gaussian Codec Avatars, a new approach for modeling relightable full-body avatars with fine-grained details including face and hands. The unique challenge for relighting full-body avatars lies in the large deformations caused by body articulation and the resulting impact on appearance caused by light transport. Changes in body pose can dramatically change the orien… ▽ More

    Submitted 24 January, 2025; originally announced January 2025.

    Comments: 14 pages, 9 figures. Project page: https://neuralbodies.github.io/RFGCA

  5. arXiv:2407.13038  [pdf, other

    cs.CV cs.LG

    Universal Facial Encoding of Codec Avatars from VR Headsets

    Authors: Shaojie Bai, Te-Li Wang, Chenghui Li, Akshay Venkatesh, Tomas Simon, Chen Cao, Gabriel Schwartz, Ryan Wrench, Jason Saragih, Yaser Sheikh, Shih-En Wei

    Abstract: Faithful real-time facial animation is essential for avatar-mediated telepresence in Virtual Reality (VR). To emulate authentic communication, avatar animation needs to be efficient and accurate: able to capture both extreme and subtle expressions within a few milliseconds to sustain the rhythm of natural conversations. The oblique and incomplete views of the face, variability in the donning of he… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: SIGGRAPH 2024 (ACM Transactions on Graphics (TOG))

    Journal ref: ACM Trans. Graph. 43, 4, Article 93 (July 2024), 22 pages.

  6. arXiv:2405.02508  [pdf, other

    cs.CV cs.GR

    Rasterized Edge Gradients: Handling Discontinuities Differentiably

    Authors: Stanislav Pidhorskyi, Tomas Simon, Gabriel Schwartz, He Wen, Yaser Sheikh, Jason Saragih

    Abstract: Computing the gradients of a rendering process is paramount for diverse applications in computer vision and graphics. However, accurate computation of these gradients is challenging due to discontinuities and rendering approximations, particularly for surface-based representations and rasterization-based rendering. We present a novel method for computing gradients at visibility discontinuities for… ▽ More

    Submitted 23 July, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  7. arXiv:2401.11002  [pdf, other

    cs.CV cs.AI

    Fast Registration of Photorealistic Avatars for VR Facial Animation

    Authors: Chaitanya Patel, Shaojie Bai, Te-Li Wang, Jason Saragih, Shih-En Wei

    Abstract: Virtual Reality (VR) bares promise of social interactions that can feel more immersive than other media. Key to this is the ability to accurately animate a personalized photorealistic avatar, and hence the acquisition of the labels for headset-mounted camera (HMC) images need to be efficient and accurate, while wearing a VR headset. This is challenging due to oblique camera views and differences i… ▽ More

    Submitted 18 July, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: ECCV 2024. Project page: https://chaitanya100100.github.io/FastRegistration/

  8. arXiv:2312.08679  [pdf, other

    cs.CV cs.AI cs.GR

    A Local Appearance Model for Volumetric Capture of Diverse Hairstyle

    Authors: Ziyan Wang, Giljoo Nam, Aljaz Bozic, Chen Cao, Jason Saragih, Michael Zollhoefer, Jessica Hodgins

    Abstract: Hair plays a significant role in personal identity and appearance, making it an essential component of high-quality, photorealistic avatars. Existing approaches either focus on modeling the facial region only or rely on personalized models, limiting their generalizability and scalability. In this paper, we present a novel method for creating high-fidelity avatars with diverse hairstyles. Our metho… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  9. arXiv:2304.11835  [pdf, other

    cs.CV

    Auto-CARD: Efficient and Robust Codec Avatar Driving for Real-time Mobile Telepresence

    Authors: Yonggan Fu, Yuecheng Li, Chenghui Li, Jason Saragih, Peizhao Zhang, Xiaoliang Dai, Yingyan Celine Lin

    Abstract: Real-time and robust photorealistic avatars for telepresence in AR/VR have been highly desired for enabling immersive photorealistic telepresence. However, there still exists one key bottleneck: the considerable computational expense needed to accurately infer facial expressions captured from headset-mounted cameras with a quality level that can match the realism of the avatar's human appearance.… ▽ More

    Submitted 3 January, 2025; v1 submitted 24 April, 2023; originally announced April 2023.

    Comments: Accepted by CVPR 2023

  10. arXiv:2302.04868  [pdf, other

    cs.CV cs.GR

    MEGANE: Morphable Eyeglass and Avatar Network

    Authors: Junxuan Li, Shunsuke Saito, Tomas Simon, Stephen Lombardi, Hongdong Li, Jason Saragih

    Abstract: Eyeglasses play an important role in the perception of identity. Authentic virtual representations of faces can benefit greatly from their inclusion. However, modeling the geometric and appearance interactions of glasses and the face of virtual representations of humans is challenging. Glasses and faces affect each other's geometry at their contact points, and also induce appearance changes due to… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

    Comments: Project page: https://junxuan-li.github.io/megane/

  11. arXiv:2302.04866  [pdf, other

    cs.CV cs.GR

    RelightableHands: Efficient Neural Relighting of Articulated Hand Models

    Authors: Shun Iwase, Shunsuke Saito, Tomas Simon, Stephen Lombardi, Timur Bagautdinov, Rohan Joshi, Fabian Prada, Takaaki Shiratori, Yaser Sheikh, Jason Saragih

    Abstract: We present the first neural relighting approach for rendering high-fidelity personalized hands that can be animated in real-time under novel illumination. Our approach adopts a teacher-student framework, where the teacher learns appearance under a single point light from images captured in a light-stage, allowing us to synthesize hands in arbitrary illuminations but with heavy compute. Using image… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

    Comments: 8 pages, 16 figures, Website: https://sh8.io/#/relightable_hands

  12. arXiv:2212.00613  [pdf, other

    cs.CV cs.GR

    NeuWigs: A Neural Dynamic Model for Volumetric Hair Capture and Animation

    Authors: Ziyan Wang, Giljoo Nam, Tuur Stuyck, Stephen Lombardi, Chen Cao, Jason Saragih, Michael Zollhoefer, Jessica Hodgins, Christoph Lassner

    Abstract: The capture and animation of human hair are two of the major challenges in the creation of realistic avatars for the virtual reality. Both problems are highly challenging, because hair has complex geometry and appearance, as well as exhibits challenging motion. In this paper, we present a two-stage approach that models hair independently from the head to address these challenges in a data-driven m… ▽ More

    Submitted 11 October, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

  13. arXiv:2207.11243  [pdf, other

    cs.CV cs.GR

    Multiface: A Dataset for Neural Face Rendering

    Authors: Cheng-hsin Wuu, Ningyuan Zheng, Scott Ardisson, Rohan Bali, Danielle Belko, Eric Brockmeyer, Lucas Evans, Timothy Godisart, Hyowon Ha, Xuhua Huang, Alexander Hypes, Taylor Koska, Steven Krenn, Stephen Lombardi, Xiaomin Luo, Kevyn McPhail, Laura Millerschoen, Michal Perdoch, Mark Pitts, Alexander Richard, Jason Saragih, Junko Saragih, Takaaki Shiratori, Tomas Simon, Matt Stewart , et al. (6 additional authors not shown)

    Abstract: Photorealistic avatars of human faces have come a long way in recent years, yet research along this area is limited by a lack of publicly available, high-quality datasets covering both, dense multi-view camera captures, and rich facial expressions of the captured subjects. In this work, we present Multiface, a new multi-view, high-resolution human face dataset collected from 13 identities at Reali… ▽ More

    Submitted 26 June, 2023; v1 submitted 22 July, 2022; originally announced July 2022.

  14. arXiv:2207.09774  [pdf, other

    cs.CV

    Drivable Volumetric Avatars using Texel-Aligned Features

    Authors: Edoardo Remelli, Timur Bagautdinov, Shunsuke Saito, Tomas Simon, Chenglei Wu, Shih-En Wei, Kaiwen Guo, Zhe Cao, Fabian Prada, Jason Saragih, Yaser Sheikh

    Abstract: Photorealistic telepresence requires both high-fidelity body modeling and faithful driving to enable dynamically synthesized appearance that is indistinguishable from reality. In this work, we propose an end-to-end framework that addresses two core challenges in modeling and driving full-body avatars of real people. One challenge is driving an avatar while staying faithful to details and dynamics… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Journal ref: SIGGRAPH 2022 Conference Proceedings

  15. arXiv:2203.07881  [pdf, other

    cs.CV

    LiP-Flow: Learning Inference-time Priors for Codec Avatars via Normalizing Flows in Latent Space

    Authors: Emre Aksan, Shugao Ma, Akin Caliskan, Stanislav Pidhorskyi, Alexander Richard, Shih-En Wei, Jason Saragih, Otmar Hilliges

    Abstract: Neural face avatars that are trained from multi-view data captured in camera domes can produce photo-realistic 3D reconstructions. However, at inference time, they must be driven by limited inputs such as partial views recorded by headset-mounted cameras or a front-facing camera, and sparse facial landmarks. To mitigate this asymmetry, we introduce a prior model that is conditioned on the runtime… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

  16. arXiv:2105.10441  [pdf, other

    cs.CV cs.AI cs.GR

    Driving-Signal Aware Full-Body Avatars

    Authors: Timur Bagautdinov, Chenglei Wu, Tomas Simon, Fabian Prada, Takaaki Shiratori, Shih-En Wei, Weipeng Xu, Yaser Sheikh, Jason Saragih

    Abstract: We present a learning-based method for building driving-signal aware full-body avatars. Our model is a conditional variational autoencoder that can be animated with incomplete driving signals, such as human pose and facial keypoints, and produces a high-quality representation of human geometry and view-dependent appearance. The core intuition behind our method is that better drivability and genera… ▽ More

    Submitted 25 June, 2021; v1 submitted 21 May, 2021; originally announced May 2021.

  17. arXiv:2104.04794  [pdf, other

    cs.CV

    Robust Egocentric Photo-realistic Facial Expression Transfer for Virtual Reality

    Authors: Amin Jourabloo, Baris Gecer, Fernando De la Torre, Jason Saragih, Shih-En Wei, Te-Li Wang, Stephen Lombardi, Danielle Belko, Autumn Trimble, Hernan Badino

    Abstract: Social presence, the feeling of being there with a real person, will fuel the next generation of communication systems driven by digital humans in virtual reality (VR). The best 3D video-realistic VR avatars that minimize the uncanny effect rely on person-specific (PS) models. However, these PS models are time-consuming to build and are typically trained with limited data variability, which result… ▽ More

    Submitted 4 July, 2022; v1 submitted 10 April, 2021; originally announced April 2021.

  18. arXiv:2104.04638  [pdf, other

    cs.CV

    Pixel Codec Avatars

    Authors: Shugao Ma, Tomas Simon, Jason Saragih, Dawei Wang, Yuecheng Li, Fernando De La Torre, Yaser Sheikh

    Abstract: Telecommunication with photorealistic avatars in virtual or augmented reality is a promising path for achieving authentic face-to-face communication in 3D over remote physical distances. In this work, we present the Pixel Codec Avatars (PiCA): a deep generative model of 3D human faces that achieves state of the art reconstruction performance while being computationally efficient and adaptive to th… ▽ More

    Submitted 9 April, 2021; originally announced April 2021.

    Comments: CVPR 2021 Oral

  19. arXiv:2104.00683  [pdf, other

    cs.CV cs.LG

    SimPoE: Simulated Character Control for 3D Human Pose Estimation

    Authors: Ye Yuan, Shih-En Wei, Tomas Simon, Kris Kitani, Jason Saragih

    Abstract: Accurate estimation of 3D human motion from monocular video requires modeling both kinematics (body motion without physical forces) and dynamics (motion with physical forces). To demonstrate this, we present SimPoE, a Simulation-based approach for 3D human Pose Estimation, which integrates image-based kinematic inference and physics-based dynamics modeling. SimPoE learns a policy that takes as inp… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

    Comments: CVPR 2021 (Oral). Project page: https://www.ye-yuan.com/simpoe/

  20. arXiv:2103.15876  [pdf, other

    cs.CV eess.IV

    High-fidelity Face Tracking for AR/VR via Deep Lighting Adaptation

    Authors: Lele Chen, Chen Cao, Fernando De la Torre, Jason Saragih, Chenliang Xu, Yaser Sheikh

    Abstract: 3D video avatars can empower virtual communications by providing compression, privacy, entertainment, and a sense of presence in AR/VR. Best 3D photo-realistic AR/VR avatars driven by video, that can minimize uncanny effects, rely on person-specific models. However, existing person-specific photo-realistic 3D models are not robust to lighting, hence their results typically miss subtle facial behav… ▽ More

    Submitted 29 March, 2021; originally announced March 2021.

    Comments: The paper is accepted to CVPR 2021

  21. arXiv:2103.01954  [pdf, other

    cs.GR cs.CV

    Mixture of Volumetric Primitives for Efficient Neural Rendering

    Authors: Stephen Lombardi, Tomas Simon, Gabriel Schwartz, Michael Zollhoefer, Yaser Sheikh, Jason Saragih

    Abstract: Real-time rendering and animation of humans is a core function in games, movies, and telepresence applications. Existing methods have a number of drawbacks we aim to address with our work. Triangle meshes have difficulty modeling thin structures like hair, volumetric representations like Neural Volumes are too low-resolution given a reasonable memory budget, and high-resolution implicit representa… ▽ More

    Submitted 6 May, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

    Comments: 13 pages; SIGGRAPH 2021

  22. arXiv:2101.02697  [pdf, other

    cs.CV

    PVA: Pixel-aligned Volumetric Avatars

    Authors: Amit Raj, Michael Zollhoefer, Tomas Simon, Jason Saragih, Shunsuke Saito, James Hays, Stephen Lombardi

    Abstract: Acquisition and rendering of photo-realistic human heads is a highly challenging research problem of particular importance for virtual telepresence. Currently, the highest quality is achieved by volumetric approaches trained in a person specific manner on multi-view data. These models better represent fine structure, such as hair, compared to simpler mesh-based models. Volumetric models typically… ▽ More

    Submitted 7 January, 2021; originally announced January 2021.

    Comments: Project page located at https://volumetric-avatars.github.io/

  23. arXiv:2012.09955  [pdf, other

    cs.CV cs.GR

    Learning Compositional Radiance Fields of Dynamic Human Heads

    Authors: Ziyan Wang, Timur Bagautdinov, Stephen Lombardi, Tomas Simon, Jason Saragih, Jessica Hodgins, Michael Zollhöfer

    Abstract: Photorealistic rendering of dynamic humans is an important ability for telepresence systems, virtual shopping, synthetic data generation, and more. Recently, neural rendering methods, which combine techniques from computer graphics and machine learning, have created high-fidelity models of humans and objects. Some of these methods do not produce results with high-enough fidelity for driveable huma… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.

  24. arXiv:2006.04325  [pdf, other

    cs.CV

    Fully Convolutional Mesh Autoencoder using Efficient Spatially Varying Kernels

    Authors: Yi Zhou, Chenglei Wu, Zimo Li, Chen Cao, Yuting Ye, Jason Saragih, Hao Li, Yaser Sheikh

    Abstract: Learning latent representations of registered meshes is useful for many 3D tasks. Techniques have recently shifted to neural mesh autoencoders. Although they demonstrate higher precision than traditional methods, they remain unable to capture fine-grained deformations. Furthermore, these methods can only be applied to a template-specific surface mesh, and is not applicable to more general meshes,… ▽ More

    Submitted 21 October, 2020; v1 submitted 7 June, 2020; originally announced June 2020.

    Comments: 12 pages

  25. arXiv:2004.03805  [pdf, other

    cs.CV cs.GR

    State of the Art on Neural Rendering

    Authors: Ayush Tewari, Ohad Fried, Justus Thies, Vincent Sitzmann, Stephen Lombardi, Kalyan Sunkavalli, Ricardo Martin-Brualla, Tomas Simon, Jason Saragih, Matthias Nießner, Rohit Pandey, Sean Fanello, Gordon Wetzstein, Jun-Yan Zhu, Christian Theobalt, Maneesh Agrawala, Eli Shechtman, Dan B Goldman, Michael Zollhöfer

    Abstract: Efficient rendering of photo-realistic virtual worlds is a long standing effort of computer graphics. Modern graphics techniques have succeeded in synthesizing photo-realistic images from hand-crafted scene representations. However, the automatic generation of shape, materials, lighting, and other aspects of scenes remains a challenging problem that, if solved, would make photo-realistic computer… ▽ More

    Submitted 8 April, 2020; originally announced April 2020.

    Comments: Eurographics 2020 survey paper

  26. arXiv:2004.00452  [pdf, other

    cs.CV cs.GR

    PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization

    Authors: Shunsuke Saito, Tomas Simon, Jason Saragih, Hanbyul Joo

    Abstract: Recent advances in image-based 3D human shape estimation have been driven by the significant improvement in representation power afforded by deep neural networks. Although current approaches have demonstrated the potential in real world settings, they still fail to produce reconstructions with the level of detail often present in the input images. We argue that this limitation stems primarily form… ▽ More

    Submitted 1 April, 2020; originally announced April 2020.

    Comments: project page: https://shunsukesaito.github.io/PIFuHD

    Journal ref: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020

  27. Neural Volumes: Learning Dynamic Renderable Volumes from Images

    Authors: Stephen Lombardi, Tomas Simon, Jason Saragih, Gabriel Schwartz, Andreas Lehrmann, Yaser Sheikh

    Abstract: Modeling and rendering of dynamic scenes is challenging, as natural scenes often contain complex phenomena such as thin structures, evolving topology, translucency, scattering, occlusion, and biological motion. Mesh-based reconstruction and tracking often fail in these cases, and other approaches (e.g., light field video) typically rely on constrained viewing conditions, which limit interactivity.… ▽ More

    Submitted 18 June, 2019; originally announced June 2019.

    Comments: Accepted to SIGGRAPH 2019

    Journal ref: ACM Transactions on Graphics (SIGGRAPH 2019) 38, 4, Article 65

  28. arXiv:1904.10037  [pdf, other

    cs.CV cs.LG

    LBS Autoencoder: Self-supervised Fitting of Articulated Meshes to Point Clouds

    Authors: Chun-Liang Li, Tomas Simon, Jason Saragih, Barnabás Póczos, Yaser Sheikh

    Abstract: We present LBS-AE; a self-supervised autoencoding algorithm for fitting articulated mesh models to point clouds. As input, we take a sequence of point clouds to be registered as well as an artist-rigged mesh, i.e. a template mesh equipped with a linear-blend skinning (LBS) deformation space parameterized by a skeleton hierarchy. As output, we learn an LBS-based autoencoder that produces registered… ▽ More

    Submitted 22 April, 2019; originally announced April 2019.

    Comments: In the Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019)

  29. arXiv:1901.03628  [pdf, other

    cs.CV

    Image Disentanglement and Uncooperative Re-Entanglement for High-Fidelity Image-to-Image Translation

    Authors: Adam W. Harley, Shih-En Wei, Jason Saragih, Katerina Fragkiadaki

    Abstract: Cross-domain image-to-image translation should satisfy two requirements: (1) preserve the information that is common to both domains, and (2) generate convincing images covering variations that appear in the target domain. This is challenging, especially when there are no example translations available as supervision. Adversarial cycle consistency was recently proposed as a solution, with beautifu… ▽ More

    Submitted 19 October, 2019; v1 submitted 11 January, 2019; originally announced January 2019.

  30. Deep Appearance Models for Face Rendering

    Authors: Stephen Lombardi, Jason Saragih, Tomas Simon, Yaser Sheikh

    Abstract: We introduce a deep appearance model for rendering the human face. Inspired by Active Appearance Models, we develop a data-driven rendering pipeline that learns a joint representation of facial geometry and appearance from a multiview capture setup. Vertex positions and view-specific textures are modeled using a deep variational autoencoder that captures complex nonlinear effects while producing a… ▽ More

    Submitted 1 August, 2018; originally announced August 2018.

    Comments: Accepted to SIGGRAPH 2018

    Journal ref: ACM Transactions on Graphics (SIGGRAPH 2018) 37, 4, Article 68