Skip to main content

Showing 1–16 of 16 results for author: Olszewski, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.07239  [pdf, other

    cs.CV cs.AI

    Contextual Gesture: Co-Speech Gesture Video Generation through Context-aware Gesture Representation

    Authors: Pinxin Liu, Pengfei Zhang, Hyeongwoo Kim, Pablo Garrido, Ari Sharpio, Kyle Olszewski

    Abstract: Co-speech gesture generation is crucial for creating lifelike avatars and enhancing human-computer interactions by synchronizing gestures with speech. Despite recent advancements, existing methods struggle with accurately identifying the rhythmic or semantic triggers from audio for generating contextualized gesture patterns and achieving pixel-level realism. To address these challenges, we introdu… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  2. arXiv:2307.05445  [pdf, other

    cs.CV

    AutoDecoding Latent 3D Diffusion Models

    Authors: Evangelos Ntavelis, Aliaksandr Siarohin, Kyle Olszewski, Chaoyang Wang, Luc Van Gool, Sergey Tulyakov

    Abstract: We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core. The 3D autodecoder framework embeds properties learned from the target dataset in the latent space, which can then be decoded into a volumetric representation for rendering view-consistent appearance and geometry. We then identify the appropriate intermediate volumetric latent s… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

    Comments: Project page: https://snap-research.github.io/3DVADER/

  3. arXiv:2301.11326  [pdf, other

    cs.CV

    Unsupervised Volumetric Animation

    Authors: Aliaksandr Siarohin, Willi Menapace, Ivan Skorokhodov, Kyle Olszewski, Jian Ren, Hsin-Ying Lee, Menglei Chai, Sergey Tulyakov

    Abstract: We propose a novel approach for unsupervised 3D animation of non-rigid deformable objects. Our method learns the 3D structure and dynamics of objects solely from single-view RGB videos, and can decompose them into semantically meaningful parts that can be tracked and animated. Using a 3D autodecoder framework, paired with a keypoint estimator via a differentiable PnP algorithm, our model learns th… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

  4. arXiv:2212.06250  [pdf, other

    cs.CV

    ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved Visio-Linguistic Models in 3D Scenes

    Authors: Ahmed Abdelreheem, Kyle Olszewski, Hsin-Ying Lee, Peter Wonka, Panos Achlioptas

    Abstract: The two popular datasets ScanRefer [16] and ReferIt3D [3] connect natural language to real-world 3D data. In this paper, we curate a large-scale and complementary dataset extending both the aforementioned ones by associating all objects mentioned in a referential sentence to their underlying instances inside a 3D scene. Specifically, our Scan Entities in 3D (ScanEnts3D) dataset provides explicit c… ▽ More

    Submitted 1 April, 2023; v1 submitted 12 December, 2022; originally announced December 2022.

    Comments: The project's webpage is https://scanents3d.github.io/

  5. arXiv:2207.11795  [pdf, other

    cs.CV

    Cross-Modal 3D Shape Generation and Manipulation

    Authors: Zezhou Cheng, Menglei Chai, Jian Ren, Hsin-Ying Lee, Kyle Olszewski, Zeng Huang, Subhransu Maji, Sergey Tulyakov

    Abstract: Creating and editing the shape and color of 3D objects require tremendous human effort and expertise. Compared to direct manipulation in 3D interfaces, 2D interactions such as sketches and scribbles are usually much more natural and intuitive for the users. In this paper, we propose a generic multi-modal generative model that couples the 2D modalities and implicit 3D representations through shared… ▽ More

    Submitted 24 July, 2022; originally announced July 2022.

    Comments: ECCV 2022. Project page: https://people.cs.umass.edu/~zezhoucheng/edit3d/

  6. arXiv:2206.07771  [pdf, other

    cs.CV

    Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation

    Authors: Ye Zhu, Yu Wu, Kyle Olszewski, Jian Ren, Sergey Tulyakov, Yan Yan

    Abstract: Diffusion probabilistic models (DPMs) have become a popular approach to conditional generation, due to their promising results and support for cross-modal synthesis. A key desideratum in conditional synthesis is to achieve high correspondence between the conditioning input and generated output. Most existing methods learn such relationships implicitly, by incorporating the prior into the variation… ▽ More

    Submitted 16 February, 2023; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: ICLR 2023. Project at https://github.com/L-YeZhu/CDCD

  7. arXiv:2204.10850  [pdf, other

    cs.CV

    Control-NeRF: Editable Feature Volumes for Scene Rendering and Manipulation

    Authors: Verica Lazova, Vladimir Guzov, Kyle Olszewski, Sergey Tulyakov, Gerard Pons-Moll

    Abstract: We present a novel method for performing flexible, 3D-aware image content manipulation while enabling high-quality novel view synthesis. While NeRF-based approaches are effective for novel view synthesis, such models memorize the radiance for every point in a scene within a neural network. Since these models are scene-specific and lack a 3D scene representation, classical editing such as shape man… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

  8. arXiv:2204.00604  [pdf, other

    cs.CV cs.SD eess.AS

    Quantized GAN for Complex Music Generation from Dance Videos

    Authors: Ye Zhu, Kyle Olszewski, Yu Wu, Panos Achlioptas, Menglei Chai, Yan Yan, Sergey Tulyakov

    Abstract: We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates complex musical samples conditioned on dance videos. Our proposed framework takes dance video frames and human body motions as input, and learns to generate music samples that plausibly accompany the corresponding input. Unlike most existing conditional music generation works that generate specific types… ▽ More

    Submitted 19 July, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

    Comments: Dataset and code at https://github.com/L-YeZhu/D2M-GAN

  9. arXiv:2203.17261  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis

    Authors: Huan Wang, Jian Ren, Zeng Huang, Kyle Olszewski, Menglei Chai, Yun Fu, Sergey Tulyakov

    Abstract: Recent research explosion on Neural Radiance Field (NeRF) shows the encouraging potential to represent complex scenes with neural networks. One major drawback of NeRF is its prohibitive inference time: Rendering a single pixel requires querying the NeRF network hundreds of times. To resolve it, existing efforts mainly attempt to reduce the number of required sampled points. However, the problem of… ▽ More

    Submitted 22 July, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted by ECCV 2022. Code: https://github.com/snap-research/R2L

  10. arXiv:2203.02573  [pdf, other

    cs.CV cs.AI cs.LG

    Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning

    Authors: Ligong Han, Jian Ren, Hsin-Ying Lee, Francesco Barbieri, Kyle Olszewski, Shervin Minaee, Dimitris Metaxas, Sergey Tulyakov

    Abstract: Most methods for conditional video synthesis use a single modality as the condition. This comes with major limitations. For example, it is problematic for a model conditioned on an image to generate a specific motion trajectory desired by the user since there is no means to provide motion information. Conversely, language information can describe the desired motion, while not precisely defining th… ▽ More

    Submitted 4 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022

  11. arXiv:2201.02533  [pdf, other

    cs.CV

    NeROIC: Neural Rendering of Objects from Online Image Collections

    Authors: Zhengfei Kuang, Kyle Olszewski, Menglei Chai, Zeng Huang, Panos Achlioptas, Sergey Tulyakov

    Abstract: We present a novel method to acquire object representations from online image collections, capturing high-quality geometry and material properties of arbitrary objects from photographs with varying cameras, illumination, and backgrounds. This enables various object-centric rendering applications such as novel-view synthesis, relighting, and harmonized background composition from challenging in-the… ▽ More

    Submitted 1 September, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

    Comments: SIGGRAPH 2022 (Journal Track). Project page: https://formyfamily.github.io/NeROIC/ Code repository: https://github.com/snap-research/NeROIC/

  12. arXiv:2106.07771  [pdf, other

    cs.CV

    Flow Guided Transformable Bottleneck Networks for Motion Retargeting

    Authors: Jian Ren, Menglei Chai, Oliver J. Woodford, Kyle Olszewski, Sergey Tulyakov

    Abstract: Human motion retargeting aims to transfer the motion of one person in a "driving" video or set of images to another person. Existing efforts leverage a long training video from each target person to train a subject-specific motion transfer model. However, the scalability of such methods is limited, as each model can only generate videos for the given target subject, and such training videos are la… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.

    Comments: CVPR 2021

  13. arXiv:2104.15069  [pdf, other

    cs.CV

    A Good Image Generator Is What You Need for High-Resolution Video Synthesis

    Authors: Yu Tian, Jian Ren, Menglei Chai, Kyle Olszewski, Xi Peng, Dimitris N. Metaxas, Sergey Tulyakov

    Abstract: Image and video synthesis are closely related areas aiming at generating content from noise. While rapid progress has been demonstrated in improving image-based models to handle large resolutions, high-quality renderings, and wide variations in image content, achieving comparable video generation results remains problematic. We present a framework that leverages contemporary image generators to re… ▽ More

    Submitted 30 April, 2021; originally announced April 2021.

    Comments: Accepted to ICLR 2021

  14. arXiv:2007.13988  [pdf, other

    cs.CV cs.GR cs.LG cs.PF

    Monocular Real-Time Volumetric Performance Capture

    Authors: Ruilong Li, Yuliang Xiu, Shunsuke Saito, Zeng Huang, Kyle Olszewski, Hao Li

    Abstract: We present the first approach to volumetric performance capture and novel-view rendering at real-time speed from monocular video, eliminating the need for expensive multi-view systems or cumbersome pre-acquisition of a personalized template model. Our system reconstructs a fully textured 3D human from each frame by leveraging Pixel-Aligned Implicit Function (PIFu). While PIFu achieves high-resolut… ▽ More

    Submitted 28 July, 2020; originally announced July 2020.

  15. arXiv:2004.06848  [pdf, other

    cs.CV cs.GR cs.HC cs.LG

    Intuitive, Interactive Beard and Hair Synthesis with Generative Models

    Authors: Kyle Olszewski, Duygu Ceylan, Jun Xing, Jose Echevarria, Zhili Chen, Weikai Chen, Hao Li

    Abstract: We present an interactive approach to synthesizing realistic variations in facial hair in images, ranging from subtle edits to existing hair to the addition of complex and challenging hair in images of clean-shaven subjects. To circumvent the tedious and computationally expensive tasks of modeling, rendering and compositing the 3D geometry of the target hairstyle using the traditional graphics pip… ▽ More

    Submitted 14 April, 2020; originally announced April 2020.

    Comments: To be presented in the 2020 Conference on Computer Vision and Pattern Recognition (CVPR 2020, Oral Presentation). Supplementary video can be seen at: https://www.youtube.com/watch?v=v4qOtBATrvM

  16. arXiv:1904.06458  [pdf, other

    cs.CV

    Transformable Bottleneck Networks

    Authors: Kyle Olszewski, Sergey Tulyakov, Oliver Woodford, Hao Li, Linjie Luo

    Abstract: We propose a novel approach to performing fine-grained 3D manipulation of image content via a convolutional neural network, which we call the Transformable Bottleneck Network (TBN). It applies given spatial transformations directly to a volumetric bottleneck within our encoder-bottleneck-decoder architecture. Multi-view supervision encourages the network to learn to spatially disentangle the featu… ▽ More

    Submitted 26 August, 2019; v1 submitted 12 April, 2019; originally announced April 2019.

    Comments: Project Page: https://kyleolsz.github.io/TB-Networks/