Search | arXiv e-print repository

StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion

Authors: Ziyu Guo, Young Yoon Lee, Joseph Liu, Yizhak Ben-Shabat, Victor Zordan, Mubbasir Kapadia

Abstract: We present StyleMotif, a novel Stylized Motion Latent Diffusion model, generating motion conditioned on both content and style from multiple modalities. Unlike existing approaches that either focus on generating diverse motion content or transferring style from sequences, StyleMotif seamlessly synthesizes motion across a wide range of content while incorporating stylistic cues from multi-modal inp… ▽ More We present StyleMotif, a novel Stylized Motion Latent Diffusion model, generating motion conditioned on both content and style from multiple modalities. Unlike existing approaches that either focus on generating diverse motion content or transferring style from sequences, StyleMotif seamlessly synthesizes motion across a wide range of content while incorporating stylistic cues from multi-modal inputs, including motion, text, image, video, and audio. To achieve this, we introduce a style-content cross fusion mechanism and align a style encoder with a pre-trained multi-modal model, ensuring that the generated motion accurately captures the reference style while preserving realism. Extensive experiments demonstrate that our framework surpasses existing methods in stylized motion generation and exhibits emergent capabilities for multi-modal motion stylization, enabling more nuanced motion synthesis. Source code and pre-trained models will be released upon acceptance. Project Page: https://stylemotif.github.io △ Less

Submitted 27 March, 2025; originally announced March 2025.

Comments: Project Page: https://stylemotif.github.io

arXiv:2502.20990 [pdf, other]

The Impact of Navigation on Proxemics in an Immersive Virtual Environment with Conversational Agents

Authors: Rose Connolly, Lauren Buck, Victor Zordan, Rachel McDonnell

Abstract: As social VR grows in popularity, understanding how to optimise interactions becomes increasingly important. Interpersonal distance (the physical space people maintain between each other) is a key aspect of user experience. Previous work in psychology has shown that breaches of personal space cause stress and discomfort. Thus, effectively managing this distance is crucial in social VR, where socia… ▽ More As social VR grows in popularity, understanding how to optimise interactions becomes increasingly important. Interpersonal distance (the physical space people maintain between each other) is a key aspect of user experience. Previous work in psychology has shown that breaches of personal space cause stress and discomfort. Thus, effectively managing this distance is crucial in social VR, where social interactions are frequent. Teleportation, a commonly used locomotion method in these environments, involves distinct cognitive processes and requires users to rely on their ability to estimate distance. Despite its widespread use, the effect of teleportation on proximity remains unexplored. To investigate this, we measured the interpersonal distance of 70 participants during interactions with embodied conversational agents, comparing teleportation to natural walking. Our findings revealed that participants maintained closer proximity from the agents during teleportation. Female participants kept greater distances from the agents than male participants, and natural walking was associated with higher agency and body ownership, though co-presence remained unchanged. We propose that differences in spatial perception and spatial cognitive load contribute to reduced interpersonal distance with teleportation. These findings emphasise that proximity should be a key consideration when selecting locomotion methods in social VR, highlighting the need for further research on how locomotion impacts spatial perception and social dynamics in virtual environments. △ Less

Submitted 28 February, 2025; originally announced February 2025.

Comments: for the associated supplementary video, see project page https://connolr3.github.io/TeleportationIPD Accepted for presentation at IEEE VR 2025 and for publication in a special issue of the IEEE Transactions on Visualization and Computer Graphics (IEEE TVCG)

arXiv:2407.10954 [pdf, other]

doi 10.1145/3641519.3657484

A Unified Differentiable Boolean Operator with Fuzzy Logic

Authors: Hsueh-Ti Derek Liu, Maneesh Agrawala, Cem Yuksel, Tim Omernick, Vinith Misra, Stefano Corazza, Morgan McGuire, Victor Zordan

Abstract: This paper presents a unified differentiable boolean operator for implicit solid shape modeling using Constructive Solid Geometry (CSG). Traditional CSG relies on min, max operators to perform boolean operations on implicit shapes. But because these boolean operators are discontinuous and discrete in the choice of operations, this makes optimization over the CSG representation challenging. Drawing… ▽ More This paper presents a unified differentiable boolean operator for implicit solid shape modeling using Constructive Solid Geometry (CSG). Traditional CSG relies on min, max operators to perform boolean operations on implicit shapes. But because these boolean operators are discontinuous and discrete in the choice of operations, this makes optimization over the CSG representation challenging. Drawing inspiration from fuzzy logic, we present a unified boolean operator that outputs a continuous function and is differentiable with respect to operator types. This enables optimization of both the primitives and the boolean operations employed in CSG with continuous optimization techniques, such as gradient descent. We further demonstrate that such a continuous boolean operator allows modeling of both sharp mechanical objects and smooth organic shapes with the same framework. Our proposed boolean operator opens up new possibilities for future research toward fully continuous CSG optimization. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: SIGGRAPH'24

arXiv:2405.18609 [pdf, other]

Actuators À La Mode: Modal Actuations for Soft Body Locomotion

Authors: Otman Benchekroun, Kaixiang Xie, Hsueh-Ti Derek Liu, Eitan Grinspun, Sheldon Andrews, Victor Zordan

Abstract: Traditional character animation specializes in characters with a rigidly articulated skeleton and a bipedal/quadripedal morphology. This assumption simplifies many aspects for designing physically based animations, like locomotion, but comes with the price of excluding characters of arbitrary deformable geometries. To remedy this, our framework makes use of a spatio-temporal actuation subspace bui… ▽ More Traditional character animation specializes in characters with a rigidly articulated skeleton and a bipedal/quadripedal morphology. This assumption simplifies many aspects for designing physically based animations, like locomotion, but comes with the price of excluding characters of arbitrary deformable geometries. To remedy this, our framework makes use of a spatio-temporal actuation subspace built off of the natural vibration modes of the character geometry. The resulting actuation is coupled to a reduced fast soft body simulation, allowing us to formulate a locomotion optimization problem that is tractable for a wide variety of high resolution deformable characters. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 15 pages, 14 figures

arXiv:2405.14595 [pdf, other]

Elastic Locomotion with Mixed Second-order Differentiation

Authors: Siyuan Shen, Tianjia Shao, Kun Zhou, Chenfanfu Jiang, Sheldon Andrews, Victor Zordan, Yin Yang

Abstract: We present a framework of elastic locomotion, which allows users to enliven an elastic body to produce interesting locomotion by prescribing its high-level kinematics. We formulate this problem as an inverse simulation problem and seek the optimal muscle activations to drive the body to complete the desired actions. We employ the interior-point method to model wide-area contacts between the body a… ▽ More We present a framework of elastic locomotion, which allows users to enliven an elastic body to produce interesting locomotion by prescribing its high-level kinematics. We formulate this problem as an inverse simulation problem and seek the optimal muscle activations to drive the body to complete the desired actions. We employ the interior-point method to model wide-area contacts between the body and the environment with logarithmic barrier penalties. The core of our framework is a mixed second-order differentiation algorithm. By combining both analytic differentiation and numerical differentiation modalities, a general-purpose second-order differentiation scheme is made possible. Specifically, we augment complex-step finite difference (CSFD) with reverse automatic differentiation (AD). We treat AD as a generic function, mapping a computing procedure to its derivative w.r.t. output loss, and promote CSFD along the AD computation. To this end, we carefully implement all the arithmetics used in elastic locomotion, from elementary functions to linear algebra and matrix operation for CSFD promotion. With this novel differentiation tool, elastic locomotion can directly exploit Newton's method and use its strong second-order convergence to find the needed activations at muscle fibers. This is not possible with existing first-order inverse or differentiable simulation techniques. We showcase a wide range of interesting locomotions of soft bodies and creatures to validate our method. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 13 pages, 14 figures

arXiv:2310.00239 [pdf, other]

doi 10.1145/3618375

AdaptNet: Policy Adaptation for Physics-Based Character Control

Authors: Pei Xu, Kaixiang Xie, Sheldon Andrews, Paul G. Kry, Michael Neff, Morgan McGuire, Ioannis Karamouzas, Victor Zordan

Abstract: Motivated by humans' ability to adapt skills in the learning of new ones, this paper presents AdaptNet, an approach for modifying the latent space of existing policies to allow new behaviors to be quickly learned from like tasks in comparison to learning from scratch. Building on top of a given reinforcement learning controller, AdaptNet uses a two-tier hierarchy that augments the original state e… ▽ More Motivated by humans' ability to adapt skills in the learning of new ones, this paper presents AdaptNet, an approach for modifying the latent space of existing policies to allow new behaviors to be quickly learned from like tasks in comparison to learning from scratch. Building on top of a given reinforcement learning controller, AdaptNet uses a two-tier hierarchy that augments the original state embedding to support modest changes in a behavior and further modifies the policy network layers to make more substantive changes. The technique is shown to be effective for adapting existing physics-based controllers to a wide range of new styles for locomotion, new task targets, changes in character morphology and extensive changes in environment. Furthermore, it exhibits significant increase in learning efficiency, as indicated by greatly reduced training times when compared to training from scratch or using other approaches that modify existing policies. Code is available at https://motion-lab.github.io/AdaptNet. △ Less

Submitted 14 November, 2023; v1 submitted 29 September, 2023; originally announced October 2023.

Comments: SIGGRAPH Asia 2023. Video: https://youtu.be/WxmJSCNFb28. Website: https://motion-lab.github.io/AdaptNet, https://pei-xu.github.io/AdaptNet

Journal ref: ACM Transactions on Graphics 42, 6, Article 112.1522 (December 2023)

arXiv:2306.01201 [pdf, other]

Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models

Authors: Liam Dugan, Anshul Wadhawan, Kyle Spence, Chris Callison-Burch, Morgan McGuire, Victor Zordan

Abstract: Recent work in speech-to-speech translation (S2ST) has focused primarily on offline settings, where the full input utterance is available before any output is given. This, however, is not reasonable in many real-world scenarios. In latency-sensitive applications, rather than waiting for the full utterance, translations should be spoken as soon as the information in the input is present. In this wo… ▽ More Recent work in speech-to-speech translation (S2ST) has focused primarily on offline settings, where the full input utterance is available before any output is given. This, however, is not reasonable in many real-world scenarios. In latency-sensitive applications, rather than waiting for the full utterance, translations should be spoken as soon as the information in the input is present. In this work, we introduce a system for simultaneous S2ST targeting real-world use cases. Our system supports translation from 57 languages to English with tunable parameters for dynamically adjusting the latency of the output -- including four policies for determining when to speak an output sequence. We show that these policies achieve offline-level accuracy with minimal increases in latency over a Greedy (wait-$k$) baseline. We open-source our evaluation code and interactive test script to aid future SimulS2ST research and application development. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: To appear at INTERSPEECH 2023

arXiv:2305.03286 [pdf, other]

doi 10.1145/3592447

Composite Motion Learning with Task Control

Authors: Pei Xu, Xiumin Shang, Victor Zordan, Ioannis Karamouzas

Abstract: We present a deep learning method for composite and task-driven motion control for physically simulated characters. In contrast to existing data-driven approaches using reinforcement learning that imitate full-body motions, we learn decoupled motions for specific body parts from multiple reference motions simultaneously and directly by leveraging the use of multiple discriminators in a GAN-like se… ▽ More We present a deep learning method for composite and task-driven motion control for physically simulated characters. In contrast to existing data-driven approaches using reinforcement learning that imitate full-body motions, we learn decoupled motions for specific body parts from multiple reference motions simultaneously and directly by leveraging the use of multiple discriminators in a GAN-like setup. In this process, there is no need of any manual work to produce composite reference motions for learning. Instead, the control policy explores by itself how the composite motions can be combined automatically. We further account for multiple task-specific rewards and train a single, multi-objective control policy. To this end, we propose a novel framework for multi-objective learning that adaptively balances the learning of disparate motions from multiple sources and multiple goal-directed control objectives. In addition, as composite motions are typically augmentations of simpler behaviors, we introduce a sample-efficient method for training composite control policies in an incremental manner, where we reuse a pre-trained policy as the meta policy and train a cooperative policy that adapts the meta one for new composite tasks. We show the applicability of our approach on a variety of challenging multi-objective tasks involving both composite motion imitation and multiple goal-directed control. △ Less

Submitted 5 May, 2023; originally announced May 2023.

Comments: SIGGRAPH 2023. Code: https://github.com/xupei0610/CompositeMotion. Video: https://youtu.be/mcRAxwoTh3E

Journal ref: ACM Transactions on Graphics (August 2023)

arXiv:2303.10551 [pdf, other]

doi 10.1109/38.851756

Combining Active and Passive Simulations for Secondary Motion

Authors: James F. O'Brien, Victor B. Zordan, Jessica K. Hodgins

Abstract: Objects that move in response to the actions of a main character often make an important contribution to the visual richness of an animated scene. We use the term "secondary motion" to refer to passive motions generated in response to the movements of characters and other objects or environmental forces. Secondary motions aren't normally the mail focus of an animated scene, yet their absence can d… ▽ More Objects that move in response to the actions of a main character often make an important contribution to the visual richness of an animated scene. We use the term "secondary motion" to refer to passive motions generated in response to the movements of characters and other objects or environmental forces. Secondary motions aren't normally the mail focus of an animated scene, yet their absence can distract or disturb the viewer, destroying the illusion of reality created by the scene. We describe how to generate secondary motion by coupling physically based simulations of passive objects to actively controlled characters. △ Less

Submitted 18 March, 2023; originally announced March 2023.

ACM Class: I.3.5

Journal ref: IEEE Computer Graphics and Applications, 20(4):86-96, 2000

arXiv:2104.04934 [pdf, other]

Velocity Skinning for Real-time Stylized Skeletal Animation

Authors: Damien Rohmer, Marco Tarini, Niranjan Kalyanasundaram, Faezeh Moshfeghifar, Marie-Paule Cani, Victor Zordan

Abstract: Secondary animation effects are essential for liveliness. We propose a simple, real-time solution for adding them on top of standard skinning, enabling artist-driven stylization of skeletal motion. Our method takes a standard skeleton animation as input, along with a skin mesh and rig weights. It then derives per-vertex deformations from the different linear and angular velocities along the skelet… ▽ More Secondary animation effects are essential for liveliness. We propose a simple, real-time solution for adding them on top of standard skinning, enabling artist-driven stylization of skeletal motion. Our method takes a standard skeleton animation as input, along with a skin mesh and rig weights. It then derives per-vertex deformations from the different linear and angular velocities along the skeletal hierarchy. We highlight two specific applications of this general framework, namely the cartoonlike "squashy" and "floppy" effects, achieved from specific combinations of velocity terms. As our results show, combining these effects enables to mimic, enhance and stylize physical-looking behaviours within a standard animation pipeline, for arbitrary skinned characters. Interactive on CPU, our method allows for GPU implementation, yielding real-time performances even on large meshes. Animator control is supported through a simple interface toolkit, enabling to refine the desired type and magnitude of deformation at relevant vertices by simply painting weights. The resulting rigged character automatically responds to new skeletal animation, without further input. △ Less

Submitted 11 April, 2021; originally announced April 2021.

Showing 1–10 of 10 results for author: Zordan, V