Search | arXiv e-print repository

Diffusion-Based Imitation Learning for Social Pose Generation

Authors: Antonio Lech Martin-Ozimek, Isuru Jayarathne, Su Larb Mon, Jouh Yeong Chew

Abstract: Intelligent agents, such as robots and virtual agents, must understand the dynamics of complex social interactions to interact with humans. Effectively representing social dynamics is challenging because we require multi-modal, synchronized observations to understand a scene. We explore how using a single modality, the pose behavior, of multiple individuals in a social interaction can be used to g… ▽ More Intelligent agents, such as robots and virtual agents, must understand the dynamics of complex social interactions to interact with humans. Effectively representing social dynamics is challenging because we require multi-modal, synchronized observations to understand a scene. We explore how using a single modality, the pose behavior, of multiple individuals in a social interaction can be used to generate nonverbal social cues for the facilitator of that interaction. The facilitator acts to make a social interaction proceed smoothly and is an essential role for intelligent agents to replicate in human-robot interactions. In this paper, we adapt an existing diffusion behavior cloning model to learn and replicate facilitator behaviors. Furthermore, we evaluate two representations of pose observations from a scene, one representation has pre-processing applied and one does not. The purpose of this paper is to introduce a new use for diffusion behavior cloning for pose generation in social interactions. The second is to understand the relationship between performance and computational load for generating social pose behavior using two different techniques for collecting scene observations. As such, we are essentially testing the effectiveness of two different types of conditioning for a diffusion model. We then evaluate the resulting generated behavior from each technique using quantitative measures such as mean per-joint position error (MPJPE), training time, and inference time. Additionally, we plot training and inference time against MPJPE to examine the trade-offs between efficiency and performance. Our results suggest that the further pre-processed data can successfully condition diffusion models to generate realistic social behavior, with reasonable trade-offs in accuracy and processing time. △ Less

Submitted 18 January, 2025; originally announced January 2025.

Comments: This paper was submitted as an LBR to HRI2025

arXiv:2501.10857 [pdf, other]

Learning Nonverbal Cues in Multiparty Social Interactions for Robotic Facilitators

Authors: Antonio Lech Martin-Ozimek, Isuru Jayarathne, Su Larb Mon, Jouhyeong Chew

Abstract: Conventional behavior cloning (BC) models often struggle to replicate the subtleties of human actions. Previous studies have attempted to address this issue through the development of a new BC technique: Implicit Behavior Cloning (IBC). This new technique consistently outperformed the conventional Mean Squared Error (MSE) BC models in a variety of tasks. Our goal is to replicate the performance of… ▽ More Conventional behavior cloning (BC) models often struggle to replicate the subtleties of human actions. Previous studies have attempted to address this issue through the development of a new BC technique: Implicit Behavior Cloning (IBC). This new technique consistently outperformed the conventional Mean Squared Error (MSE) BC models in a variety of tasks. Our goal is to replicate the performance of the IBC model by Florence [in Proceedings of the 5th Conference on Robot Learning, 164:158-168, 2022], for social interaction tasks using our custom dataset. While previous studies have explored the use of large language models (LLMs) for enhancing group conversations, they often overlook the significance of non-verbal cues, which constitute a substantial part of human communication. We propose using IBC to replicate nonverbal cues like gaze behaviors. The model is evaluated against various types of facilitator data and compared to an explicit, MSE BC model. Results show that the IBC model outperforms the MSE BC model across session types using the same metrics used in the previous IBC paper. Despite some metrics showing mixed results which are explainable for the custom dataset for social interaction, we successfully replicated the IBC model to generate nonverbal cues. Our contributions are (1) the replication and extension of the IBC model, and (2) a nonverbal cues generation model for social interaction. These advancements facilitate the integration of robots into the complex interactions between robots and humans, e.g., in the absence of a human facilitator. △ Less

Submitted 18 January, 2025; originally announced January 2025.

Comments: Submitted to as a short contribution to HRI2025

arXiv:2410.09028 [pdf, other]

Anomalously extended Floquet prethermal lifetimes and applications to long-time quantum sensing

Authors: Kieren A. Harkins, Cooper Selco, Christian Bengs, David Marchiori, Leo Joon Il Moon, Zhuo-Rui Zhang, Aristotle Yang, Angad Singh, Emanuel Druga, Yi-Qiao Song, Ashok Ajoy

Abstract: Floquet prethermalization is observed in periodically driven quantum many-body systems where the system avoids heating and maintains a stable, non-equilibrium state, for extended periods. Here we introduce a novel quantum control method using off-resonance and short-angle excitation to significantly extend Floquet prethermal lifetimes. This is demonstrated on randomly positioned, dipolar-coupled,… ▽ More Floquet prethermalization is observed in periodically driven quantum many-body systems where the system avoids heating and maintains a stable, non-equilibrium state, for extended periods. Here we introduce a novel quantum control method using off-resonance and short-angle excitation to significantly extend Floquet prethermal lifetimes. This is demonstrated on randomly positioned, dipolar-coupled, 13C nuclear spins in diamond, but the methodology is broadly applicable. We achieve a lifetime $T_2'~800 s at 100 K while tracking the transition to the prethermal state quasi-continuously. This corresponds to a >533,000-fold extension over the bare spin lifetime without prethermalization, and constitutes a new record both in terms of absolute lifetime as well as the total number of Floquet pulses applied (here exceeding 7 million). Using Laplace inversion, we develop a new form of noise spectroscopy that provides insights into the origin of the lifetime extension. Finally, we demonstrate applications of these extended lifetimes in long-time, reinitialization-free quantum sensing of time-varying magnetic fields continuously for ~10 minutes at room temperature. Our work facilitates new opportunities for stabilizing driven quantum systems through Floquet control, and opens novel applications for continuously interrogated, long-time responsive quantum sensors. △ Less

Submitted 11 October, 2024; originally announced October 2024.

Comments: 6 pages, 4 figures

arXiv:2410.05625 [pdf, other]

Discrete Time Crystal Sensing

Authors: Leo Joon Il Moon, Paul M. Schindler, Ryan J. Smith, Emanuel Druga, Zhuo-Rui Zhang, Marin Bukov, Ashok Ajoy

Abstract: Prethermal discrete time crystals (PDTCs) are a nonequilibrium state of matter characterized by long-range spatiotemporal order, and exhibiting a subharmonic response stabilized by many-body interactions under periodic driving. The inherent robustness of time crystalline order to perturbations in the drive protocol makes DTCs promising for applications in quantum technologies. We exploit the susce… ▽ More Prethermal discrete time crystals (PDTCs) are a nonequilibrium state of matter characterized by long-range spatiotemporal order, and exhibiting a subharmonic response stabilized by many-body interactions under periodic driving. The inherent robustness of time crystalline order to perturbations in the drive protocol makes DTCs promising for applications in quantum technologies. We exploit the susceptibility of PDTC order to deviations in its order parameter to devise highly frequency-selective quantum sensors for time-varying (AC) magnetic fields in a system of strongly-driven, dipolar-coupled 13C nuclear spins in diamond. Integrating a time-varying AC field into the PDTC allows us to exponentially increase its lifetime, measuring improvement of up to three orders of magnitude (44,204 cycles), and results in a strong resonant response in the time crystalline order parameter. The linewidth of our sensor is limited by the PDTC lifetime alone, as strong interspin interactions help stabilize DTC order. The sensor operates in the 0.5-50 kHz range - a blind spot for sensors based on atomic vapor or electronic spins - and attains a competitive sensitivity. PDTC sensors are resilient to errors in the drive protocol and sample inhomogeneities, and are agnostic to the macroscopic details of the physical platform: the underlying physical principle applies equally to superconducting qubits, neutral atoms, and trapped ions. △ Less

Submitted 7 October, 2024; originally announced October 2024.

Comments: 7+16 pages

arXiv:2406.17987 [pdf, other]

Multi-step Inference over Unstructured Data

Authors: Aditya Kalyanpur, Kailash Karthik Saravanakumar, Victor Barres, CJ McFate, Lori Moon, Nati Seifu, Maksim Eremeev, Jose Barrera, Abraham Bautista-Castillo, Eric Brown, David Ferrucci

Abstract: The advent of Large Language Models (LLMs) and Generative AI has revolutionized natural language applications across various domains. However, high-stakes decision-making tasks in fields such as medical, legal and finance require a level of precision, comprehensiveness, and logical consistency that pure LLM or Retrieval-Augmented-Generation (RAG) approaches often fail to deliver. At Elemental Cogn… ▽ More The advent of Large Language Models (LLMs) and Generative AI has revolutionized natural language applications across various domains. However, high-stakes decision-making tasks in fields such as medical, legal and finance require a level of precision, comprehensiveness, and logical consistency that pure LLM or Retrieval-Augmented-Generation (RAG) approaches often fail to deliver. At Elemental Cognition (EC), we have developed a neuro-symbolic AI platform to tackle these problems. The platform integrates fine-tuned LLMs for knowledge extraction and alignment with a robust symbolic reasoning engine for logical inference, planning and interactive constraint solving. We describe Cora, a Collaborative Research Assistant built on this platform, that is designed to perform complex research and discovery tasks in high-stakes domains. This paper discusses the multi-step inference challenges inherent in such domains, critiques the limitations of existing LLM-based methods, and demonstrates how Cora's neuro-symbolic approach effectively addresses these issues. We provide an overview of the system architecture, key algorithms for knowledge extraction and formal reasoning, and present preliminary evaluation results that highlight Cora's superior performance compared to well-known LLM and RAG baselines. △ Less

Submitted 24 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

arXiv:2404.05620 [pdf, other]

Experimental observation of a time rondeau crystal: Temporal Disorder in Spatiotemporal Order

Authors: Leo Joon Il Moon, Paul Manuel Schindler, Yizhe Sun, Emanuel Druga, Johannes Knolle, Roderich Moessner, Hongzheng Zhao, Marin Bukov, Ashok Ajoy

Abstract: Our understanding of phases of matter relies on symmetry breaking, one example being water ice whose crystalline structure breaks the continuous translation symmetry of space. Recently, breaking of time translation symmetry was observed in systems not in thermal equilibrium. The associated notion of time crystallinity has led to a surge of interest, raising the question about the extent to which h… ▽ More Our understanding of phases of matter relies on symmetry breaking, one example being water ice whose crystalline structure breaks the continuous translation symmetry of space. Recently, breaking of time translation symmetry was observed in systems not in thermal equilibrium. The associated notion of time crystallinity has led to a surge of interest, raising the question about the extent to which highly controllable quantum simulators can generate rich and tunable temporal orders, beyond the conventional classification of order in static systems. Here, we investigate different kinds of partial temporal orders, stabilized by non-periodic yet structured drives, which we call rondeau order. Using a $^{13}$C-nuclear-spin diamond quantum simulator, we report the first experimental observation of a -- tunable degree of -- short-time disorder in a system exhibiting long-time stroboscopic order. This is based on a novel spin control architecture that allows us to implement a family of drives ranging from structureless via structured random to quasiperiodic and periodic drives. Leveraging a high throughput read-out scheme, we continuously observe the spin polarization over 105 pulses to probe rondeau order, with controllable lifetimes exceeding 4 seconds. Using the freedom in the short-time temporal disorder of rondeau order, we show the capacity to encode information in the response of observables. Our work broadens the landscape of observed nonequilibrium temporal order, paving the way for new applications harnessing driven quantum matter. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 6+8 pages

arXiv:2403.11995 [pdf, other]

Hamiltonian-reconstruction distance as a success metric for the Variational Quantum Eigensolver

Authors: Leo Joon Il Moon, Mandar M. Sohoni, Michael A. Shimizu, Praveen Viswanathan, Kevin Zhang, Eun-Ah Kim, Peter L. McMahon

Abstract: The Variational Quantum Eigensolver (VQE) is a hybrid quantum-classical algorithm for quantum simulation that can be run on near-term quantum hardware. A challenge in VQE -- as well as any other heuristic algorithm for finding ground states of Hamiltonians -- is to know how close the algorithm's output solution is to the true ground state, when the true ground state and ground-state energy are unk… ▽ More The Variational Quantum Eigensolver (VQE) is a hybrid quantum-classical algorithm for quantum simulation that can be run on near-term quantum hardware. A challenge in VQE -- as well as any other heuristic algorithm for finding ground states of Hamiltonians -- is to know how close the algorithm's output solution is to the true ground state, when the true ground state and ground-state energy are unknown. This is especially important in iterative algorithms, such as VQE, where one wants to avoid erroneous early termination. Recent developments in Hamiltonian reconstruction -- the inference of a Hamiltonian given an eigenstate -- give a metric can be used to assess the quality of a variational solution to a Hamiltonian-eigensolving problem. This metric can assess the proximity of the variational solution to the ground state without any knowledge of the true ground state or ground-state energy. In numerical simulations and in demonstrations on a cloud-based trapped-ion quantum computer, we show that for examples of both one-dimensional transverse-field-Ising (11 qubits) and two-dimensional J1-J2 transverse-field-Ising (6 qubits) spin problems, the Hamiltonian-reconstruction distance gives a helpful indication of whether VQE has yet found the ground state or not. Our experiments included cases where the energy plateaus as a function of the VQE iteration, which could have resulted in erroneous early stopping of the VQE algorithm, but where the Hamiltonian-reconstruction distance correctly suggests to continue iterating. We find that the Hamiltonian-reconstruction distance has a useful correlation with the fidelity between the VQE solution and the true ground state. Our work suggests that the Hamiltonian-reconstruction distance may be a useful tool for assessing success in VQE, including on noisy quantum processors in practice. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 18 pages, 15 figures

arXiv:2105.07961 [pdf, other]

Joint Optimization of Hadamard Sensing and Reconstruction in Compressed Sensing Fluorescence Microscopy

Authors: Alan Q. Wang, Aaron K. LaViolette, Leo Moon, Chris Xu, Mert R. Sabuncu

Abstract: Compressed sensing fluorescence microscopy (CS-FM) proposes a scheme whereby less measurements are collected during sensing and reconstruction is performed to recover the image. Much work has gone into optimizing the sensing and reconstruction portions separately. We propose a method of jointly optimizing both sensing and reconstruction end-to-end under a total measurement constraint, enabling lea… ▽ More Compressed sensing fluorescence microscopy (CS-FM) proposes a scheme whereby less measurements are collected during sensing and reconstruction is performed to recover the image. Much work has gone into optimizing the sensing and reconstruction portions separately. We propose a method of jointly optimizing both sensing and reconstruction end-to-end under a total measurement constraint, enabling learning of the optimal sensing scheme concurrently with the parameters of a neural network-based reconstruction network. We train our model on a rich dataset of confocal, two-photon, and wide-field microscopy images comprising of a variety of biological samples. We show that our method outperforms several baseline sensing schemes and a regularized regression reconstruction algorithm. △ Less

Submitted 9 July, 2021; v1 submitted 17 May, 2021; originally announced May 2021.

Comments: Accepted at MICCAI 2021

arXiv:2010.10597 [pdf, other]

SKATE: A Natural Language Interface for Encoding Structured Knowledge

Authors: Clifton McFate, Aditya Kalyanpur, Dave Ferrucci, Andrea Bradshaw, Ariel Diertani, David Melville, Lori Moon

Abstract: In Natural Language (NL) applications, there is often a mismatch between what the NL interface is capable of interpreting and what a lay user knows how to express. This work describes a novel natural language interface that reduces this mismatch by refining natural language input through successive, automatically generated semi-structured templates. In this paper we describe how our approach, call… ▽ More In Natural Language (NL) applications, there is often a mismatch between what the NL interface is capable of interpreting and what a lay user knows how to express. This work describes a novel natural language interface that reduces this mismatch by refining natural language input through successive, automatically generated semi-structured templates. In this paper we describe how our approach, called SKATE, uses a neural semantic parser to parse NL input and suggest semi-structured templates, which are recursively filled to produce fully structured interpretations. We also show how SKATE integrates with a neural rule-generation model to interactively suggest and acquire commonsense knowledge. We provide a preliminary coverage analysis of SKATE for the task of story understanding, and then describe a current business use-case of the tool in a specific domain: COVID-19 policy design. △ Less

Submitted 10 December, 2020; v1 submitted 20 October, 2020; originally announced October 2020.

Comments: Accepted at IAAI-21

arXiv:2009.07758 [pdf, other]

GLUCOSE: GeneraLized and COntextualized Story Explanations

Authors: Nasrin Mostafazadeh, Aditya Kalyanpur, Lori Moon, David Buchanan, Lauren Berkowitz, Or Biran, Jennifer Chu-Carroll

Abstract: When humans read or listen, they make implicit commonsense inferences that frame their understanding of what happened and why. As a step toward AI systems that can build similar mental models, we introduce GLUCOSE, a large-scale dataset of implicit commonsense causal knowledge, encoded as causal mini-theories about the world, each grounded in a narrative context. To construct GLUCOSE, we drew on c… ▽ More When humans read or listen, they make implicit commonsense inferences that frame their understanding of what happened and why. As a step toward AI systems that can build similar mental models, we introduce GLUCOSE, a large-scale dataset of implicit commonsense causal knowledge, encoded as causal mini-theories about the world, each grounded in a narrative context. To construct GLUCOSE, we drew on cognitive psychology to identify ten dimensions of causal explanation, focusing on events, states, motivations, and emotions. Each GLUCOSE entry includes a story-specific causal statement paired with an inference rule generalized from the statement. This paper details two concrete contributions. First, we present our platform for effectively crowdsourcing GLUCOSE data at scale, which uses semi-structured templates to elicit causal explanations. Using this platform, we collected a total of ~670K specific statements and general rules that capture implicit commonsense knowledge about everyday situations. Second, we show that existing knowledge resources and pretrained language models do not include or readily predict GLUCOSE's rich inferential content. However, when state-of-the-art neural models are trained on this knowledge, they can start to make commonsense inferences on unseen stories that match humans' mental models. △ Less

Submitted 29 October, 2020; v1 submitted 16 September, 2020; originally announced September 2020.

Comments: EMNLP 2020 Camera ready version

Showing 1–10 of 10 results for author: Moon, L