-
TwinTrack: Bridging Vision and Contact Physics for Real-Time Tracking of Unknown Dynamic Objects
Authors:
Wen Yang,
Zhixian Xie,
Xuechao Zhang,
Heni Ben Amor,
Shan Lin,
Wanxin Jin
Abstract:
Real-time tracking of previously unseen, highly dynamic objects in contact-rich environments -- such as during dexterous in-hand manipulation -- remains a significant challenge. Purely vision-based tracking often suffers from heavy occlusions due to the frequent contact interactions and motion blur caused by abrupt motion during contact impacts. We propose TwinTrack, a physics-aware visual trackin…
▽ More
Real-time tracking of previously unseen, highly dynamic objects in contact-rich environments -- such as during dexterous in-hand manipulation -- remains a significant challenge. Purely vision-based tracking often suffers from heavy occlusions due to the frequent contact interactions and motion blur caused by abrupt motion during contact impacts. We propose TwinTrack, a physics-aware visual tracking framework that enables robust and real-time 6-DoF pose tracking of unknown dynamic objects in a contact-rich scene by leveraging the contact physics of the observed scene. At the core of TwinTrack is an integration of Real2Sim and Sim2Real. In Real2Sim, we combine the complementary strengths of vision and contact physics to estimate object's collision geometry and physical properties: object's geometry is first reconstructed from vision, then updated along with other physical parameters from contact dynamics for physical accuracy. In Sim2Real, robust pose estimation of the object is achieved by adaptive fusion between visual tracking and prediction of the learned contact physics. TwinTrack is built on a GPU-accelerated, deeply customized physics engine to ensure real-time performance. We evaluate our method on two contact-rich scenarios: object falling with rich contact impacts against the environment, and contact-rich in-hand manipulation. Experimental results demonstrate that, compared to baseline methods, TwinTrack achieves significantly more robust, accurate, and real-time 6-DoF tracking in these challenging scenarios, with tracking speed exceeding 20 Hz. Project page: https://irislab.tech/TwinTrack-webpage/
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
SAS-Prompt: Large Language Models as Numerical Optimizers for Robot Self-Improvement
Authors:
Heni Ben Amor,
Laura Graesser,
Atil Iscen,
David D'Ambrosio,
Saminda Abeyruwan,
Alex Bewley,
Yifan Zhou,
Kamalesh Kalirathinam,
Swaroop Mishra,
Pannag Sanketi
Abstract:
We demonstrate the ability of large language models (LLMs) to perform iterative self-improvement of robot policies. An important insight of this paper is that LLMs have a built-in ability to perform (stochastic) numerical optimization and that this property can be leveraged for explainable robot policy search. Based on this insight, we introduce the SAS Prompt (Summarize, Analyze, Synthesize) -- a…
▽ More
We demonstrate the ability of large language models (LLMs) to perform iterative self-improvement of robot policies. An important insight of this paper is that LLMs have a built-in ability to perform (stochastic) numerical optimization and that this property can be leveraged for explainable robot policy search. Based on this insight, we introduce the SAS Prompt (Summarize, Analyze, Synthesize) -- a single prompt that enables iterative learning and adaptation of robot behavior by combining the LLM's ability to retrieve, reason and optimize over previous robot traces in order to synthesize new, unseen behavior. Our approach can be regarded as an early example of a new family of explainable policy search methods that are entirely implemented within an LLM. We evaluate our approach both in simulation and on a real-robot table tennis task. Project website: sites.google.com/asu.edu/sas-llm/
△ Less
Submitted 29 April, 2025;
originally announced April 2025.
-
Prot42: a Novel Family of Protein Language Models for Target-aware Protein Binder Generation
Authors:
Mohammad Amaan Sayeed,
Engin Tekin,
Maryam Nadeem,
Nancy A. ElNaker,
Aahan Singh,
Natalia Vassilieva,
Boulbaba Ben Amor
Abstract:
Unlocking the next generation of biotechnology and therapeutic innovation demands overcoming the inherent complexity and resource-intensity of conventional protein engineering methods. Recent GenAI-powered computational techniques often rely on the availability of the target protein's 3D structures and specific binding sites to generate high-affinity binders, constraints exhibited by models such a…
▽ More
Unlocking the next generation of biotechnology and therapeutic innovation demands overcoming the inherent complexity and resource-intensity of conventional protein engineering methods. Recent GenAI-powered computational techniques often rely on the availability of the target protein's 3D structures and specific binding sites to generate high-affinity binders, constraints exhibited by models such as AlphaProteo and RFdiffusion. In this work, we explore the use of Protein Language Models (pLMs) for high-affinity binder generation. We introduce Prot42, a novel family of Protein Language Models (pLMs) pretrained on vast amounts of unlabeled protein sequences. By capturing deep evolutionary, structural, and functional insights through an advanced auto-regressive, decoder-only architecture inspired by breakthroughs in natural language processing, Prot42 dramatically expands the capabilities of computational protein design based on language only. Remarkably, our models handle sequences up to 8,192 amino acids, significantly surpassing standard limitations and enabling precise modeling of large proteins and complex multi-domain sequences. Demonstrating powerful practical applications, Prot42 excels in generating high-affinity protein binders and sequence-specific DNA-binding proteins. Our innovative models are publicly available, offering the scientific community an efficient and precise computational toolkit for rapid protein engineering.
△ Less
Submitted 18 May, 2025; v1 submitted 6 April, 2025;
originally announced April 2025.
-
Ab initio calculation on Herbertsmithite: exchange interactions including extra-plane magnetic impurities, Dzyaloshinskii-Moriya and anisotropic coupling
Authors:
Flaurent Heully-Alary,
Nadia Ben Amor,
Nicolas Suaud,
Laura Messio,
Coen de Graaf,
Nathalie Guihéry
Abstract:
A detailed ab initio evaluation of the isotropic and anisotropic exchange interactions of Herbertsmithite is presented. This compound crystallizes in a so-called Kagome lattice and the S=1/2 spin structure is not fully resolved despite numerous experimental and theoretical studies. The present study not only focusses on the leading in-plane nearest neighbor isotropic interactions, but also other l…
▽ More
A detailed ab initio evaluation of the isotropic and anisotropic exchange interactions of Herbertsmithite is presented. This compound crystallizes in a so-called Kagome lattice and the S=1/2 spin structure is not fully resolved despite numerous experimental and theoretical studies. The present study not only focusses on the leading in-plane nearest neighbor isotropic interactions, but also other less well studied interactions such as the anisotropic exchange and the Dzyaloshinskii-Moriya (DM) interactions. The anisotropic exchange is very weak, but the DM interactions are sizeable with a strong in-plane component, typically obviated in model studies. Moreover, it is shown that the extra-plane magnetic impurities have a non-negligible interaction with the regular in-plane magnetic sites. Combined with an estimated occurrence of these magnetic impurities of ~15%, the present results indicate that two-dimensional magnetic models only describe part of the physics.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
Gene42: Long-Range Genomic Foundation Model With Dense Attention
Authors:
Kirill Vishniakov,
Boulbaba Ben Amor,
Engin Tekin,
Nancy A. ElNaker,
Karthik Viswanathan,
Aleksandr Medvedev,
Aahan Singh,
Maryam Nadeem,
Mohammad Amaan Sayeed,
Praveenkumar Kanithi,
Tiago Magalhaes,
Natalia Vassilieva,
Dwarikanath Mahapatra,
Marco Pimentel,
and Shadab Khan
Abstract:
We introduce Gene42, a novel family of Genomic Foundation Models (GFMs) designed to manage context lengths of up to 192,000 base pairs (bp) at a single-nucleotide resolution. Gene42 models utilize a decoder-only (LLaMA-style) architecture with a dense self-attention mechanism. Initially trained on fixed-length sequences of 4,096 bp, our models underwent continuous pretraining to extend the context…
▽ More
We introduce Gene42, a novel family of Genomic Foundation Models (GFMs) designed to manage context lengths of up to 192,000 base pairs (bp) at a single-nucleotide resolution. Gene42 models utilize a decoder-only (LLaMA-style) architecture with a dense self-attention mechanism. Initially trained on fixed-length sequences of 4,096 bp, our models underwent continuous pretraining to extend the context length to 192,000 bp. This iterative extension allowed for the comprehensive processing of large-scale genomic data and the capture of intricate patterns and dependencies within the human genome. Gene42 is the first dense attention model capable of handling such extensive long context lengths in genomics, challenging state-space models that often rely on convolutional operators among other mechanisms. Our pretrained models exhibit notably low perplexity values and high reconstruction accuracy, highlighting their strong ability to model genomic data. Extensive experiments on various genomic benchmarks have demonstrated state-of-the-art performance across multiple tasks, including biotype classification, regulatory region identification, chromatin profiling prediction, variant pathogenicity prediction, and species classification. The models are publicly available at huggingface.co/inceptionai.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
Chem42: a Family of chemical Language Models for Target-aware Ligand Generation
Authors:
Aahan Singh,
Engin Tekin,
Maryam Nadeem,
Nancy A. ElNaker,
Mohammad Amaan Sayeed,
Natalia Vassilieva,
Boulbaba Ben Amor
Abstract:
Revolutionizing drug discovery demands more than just understanding molecular interactions - it requires generative models that can design novel ligands tailored to specific biological targets. While chemical Language Models (cLMs) have made strides in learning molecular properties, most fail to incorporate target-specific insights, restricting their ability to drive de-novo ligand generation. Che…
▽ More
Revolutionizing drug discovery demands more than just understanding molecular interactions - it requires generative models that can design novel ligands tailored to specific biological targets. While chemical Language Models (cLMs) have made strides in learning molecular properties, most fail to incorporate target-specific insights, restricting their ability to drive de-novo ligand generation. Chem42, a cutting-edge family of generative chemical Language Models, is designed to bridge this gap. By integrating atomic-level interactions with multimodal inputs from Prot42, a complementary protein Language Model, Chem42 achieves a sophisticated cross-modal representation of molecular structures, interactions, and binding patterns. This innovative framework enables the creation of structurally valid, synthetically accessible ligands with enhanced target specificity. Evaluations across diverse protein targets confirm that Chem42 surpasses existing approaches in chemical validity, target-aware design, and predicted binding affinity. By reducing the search space of viable drug candidates, Chem42 could accelerate the drug discovery pipeline, offering a powerful generative AI tool for precision medicine. Our Chem42 models set a new benchmark in molecule property prediction, conditional molecule generation, and target-aware ligand design. The models are publicly available at huggingface.co/inceptionai.
△ Less
Submitted 11 June, 2025; v1 submitted 20 March, 2025;
originally announced March 2025.
-
A Comprehensive Multi-Vocal Empirical Study of ML Cloud Service Misuses
Authors:
Hadil Ben Amor,
Manel Abdellatif,
Taher Ghaleb
Abstract:
Machine Learning (ML) models are widely used across various domains, including medical diagnostics and autonomous driving. To support this growth, cloud providers offer ML services to ease the integration of ML components in software systems. The evolving business requirements and the popularity of ML services have led practitioners of all skill levels to implement, and maintain ML service-based s…
▽ More
Machine Learning (ML) models are widely used across various domains, including medical diagnostics and autonomous driving. To support this growth, cloud providers offer ML services to ease the integration of ML components in software systems. The evolving business requirements and the popularity of ML services have led practitioners of all skill levels to implement, and maintain ML service-based systems. However, they may not always adhere to optimal design and usage practices for ML cloud services, resulting in common misuse which could significantly degrade the quality of ML service-based systems and adversely affect their maintenance and evolution. Though much research has been conducted on ML service misuse, a consistent terminology and specification for these misuses remain absent. We therefore conduct in this paper a comprehensive, multi-vocal empirical study exploring the prevalence of ML cloud service misuses in practice. We propose a catalog of 20 ML cloud service misuses, most of which have not been studied in prior research. To achieve this, we conducted a) a systematic literature review of studies on ML misuses, b) a gray literature review of the official documentation provided by major cloud providers, c) an empirical analysis of a curated set of 377 ML service-based systems on GitHub, and d) a survey with 50 ML practitioners. Our results show that ML service misuses are common in both open-source projects and industry, often stemming from a lack of understanding of service capabilities, and insufficient documentation. This emphasizes the importance of ongoing education in best practices for ML services, which is the focus of this paper, while also highlighting the need for tools to automatically detect and refactor ML misuses.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
Multifractal Analysis of Physiological Signals: A Novel Approach to Optimizing Pacing Strategy in a Pilot Study
Authors:
Véronique Billat,
Wejdene Nasr Ben Hadj Amor,
Guillaume Saës,
Stéphane Jaffard,
Florent Palacin
Abstract:
Marathons are one of the ultimate challenges of human endeavor. In this paper, we apply recently introduced multifractal techniques which yield a new classification parameter in the processing of physiological data captured on marathon runners. The comparison of their values gives a new insight on the way that runners of different level conduct their run, and ultimately, can be used in order to gi…
▽ More
Marathons are one of the ultimate challenges of human endeavor. In this paper, we apply recently introduced multifractal techniques which yield a new classification parameter in the processing of physiological data captured on marathon runners. The comparison of their values gives a new insight on the way that runners of different level conduct their run, and ultimately, can be used in order to give advice on how to improve their performance.
△ Less
Submitted 10 January, 2025;
originally announced January 2025.
-
Advanced Models for Hourly Marginal CO2 Emission Factor Estimation: A Synergy between Fundamental and Statistical Approaches
Authors:
Souhir Ben Amor,
Smaranda Sgarciu,
Taimyra BatzLineiro,
Felix Muesgens
Abstract:
Global warming is caused by increasing concentrations of greenhouse gases, particularly carbon dioxide (CO2). A metric used to quantify the change in CO2 emissions is the marginal emission factor, defined as the marginal change in CO2 emissions resulting from a marginal change in electricity demand over a specified period. This paper aims to present two methodologies to estimate the marginal emiss…
▽ More
Global warming is caused by increasing concentrations of greenhouse gases, particularly carbon dioxide (CO2). A metric used to quantify the change in CO2 emissions is the marginal emission factor, defined as the marginal change in CO2 emissions resulting from a marginal change in electricity demand over a specified period. This paper aims to present two methodologies to estimate the marginal emission factor in a decarbonized electricity system with high temporal resolution. First, we present an energy systems model that incrementally calculates the marginal emission factors. Second, we examine a Markov Switching Dynamic Regression model, a statistical model designed to estimate marginal emission factors faster and use an incremental marginal emission factor as a benchmark to assess its precision. For the German electricity market, we estimate the marginal emissions factor time series historically (2019, 2020) using Agora Energiewende and for the future (2025, 2030, and 2040) using estimated energy system data. The results indicate that the Markov Switching Dynamic Regression model is more accurate in estimating marginal emission factors than the Dynamic Linear Regression models, which are frequently used in the literature. Hence, the Markov Switching Dynamic Regression model is a simpler alternative to the computationally intensive incremental marginal emissions factor, especially when short-term marginal emissions factor estimation is needed. The results of the marginal emission factor estimation are applied to an exemplary low-emission vehicle charging scenario to estimate CO2 savings by shifting the charge hours to those corresponding to the lower marginal emissions factor. By implementing this emission-minimized charging approach, an average reduction of 31% in the marginal emission factor was achieved over the 5 years.
△ Less
Submitted 23 December, 2024;
originally announced December 2024.
-
Statistical Precoder Design in Multi-User Systems via Graph Neural Networks and Generative Modeling
Authors:
Nurettin Turan,
Srikar Allaparapu,
Donia Ben Amor,
Benedikt Böck,
Michael Joham,
Wolfgang Utschick
Abstract:
This letter proposes a graph neural network (GNN)-based framework for statistical precoder design that leverages model-based insights to compactly represent statistical knowledge, resulting in efficient, lightweight architectures. The framework also supports approximate statistical information in frequency division duplex (FDD) systems obtained through a Gaussian mixture model (GMM)-based limited…
▽ More
This letter proposes a graph neural network (GNN)-based framework for statistical precoder design that leverages model-based insights to compactly represent statistical knowledge, resulting in efficient, lightweight architectures. The framework also supports approximate statistical information in frequency division duplex (FDD) systems obtained through a Gaussian mixture model (GMM)-based limited feedback scheme in massive multiple-input multiple-output (MIMO) systems with low pilot overhead. Simulations using a spatial channel model and measurement data demonstrate the effectiveness of the proposed framework. It outperforms baseline methods, including stochastic iterative algorithms and Discrete Fourier transform (DFT) codebook-based approaches, particularly in low pilot overhead systems.
△ Less
Submitted 10 December, 2024;
originally announced December 2024.
-
From Mystery to Mastery: Failure Diagnosis for Improving Manipulation Policies
Authors:
Som Sagar,
Jiafei Duan,
Sreevishakh Vasudevan,
Yifan Zhou,
Heni Ben Amor,
Dieter Fox,
Ransalu Senanayake
Abstract:
Robot manipulation policies often fail for unknown reasons, posing significant challenges for real-world deployment. Researchers and engineers typically address these failures using heuristic approaches, which are not only labor-intensive and costly but also prone to overlooking critical failure modes (FMs). This paper introduces Robot Manipulation Diagnosis (RoboMD), a systematic framework design…
▽ More
Robot manipulation policies often fail for unknown reasons, posing significant challenges for real-world deployment. Researchers and engineers typically address these failures using heuristic approaches, which are not only labor-intensive and costly but also prone to overlooking critical failure modes (FMs). This paper introduces Robot Manipulation Diagnosis (RoboMD), a systematic framework designed to automatically identify FMs arising from unanticipated changes in the environment. Considering the vast space of potential FMs in a pre-trained manipulation policy, we leverage deep reinforcement learning (deep RL) to explore and uncover these FMs using a specially trained vision-language embedding that encodes a notion of failures. This approach enables users to probabilistically quantify and rank failures in previously unseen environmental conditions. Through extensive experiments across various manipulation tasks and algorithms, we demonstrate RoboMD's effectiveness in diagnosing unknown failures in unstructured environments, providing a systematic pathway to improve the robustness of manipulation policies.
△ Less
Submitted 8 February, 2025; v1 submitted 3 December, 2024;
originally announced December 2024.
-
Bridging an energy system model with an ensemble deep-learning approach for electricity price forecasting
Authors:
Souhir Ben Amor,
Thomas Möbius,
Felix Müsgens
Abstract:
This paper combines a techno-economic energy system model with an econometric model to maximise electricity price forecasting accuracy. The proposed combination model is tested on the German day-ahead wholesale electricity market. Our paper also benchmarks the results against several econometric alternatives. Lastly, we demonstrate the economic value of improved price estimators maximising the rev…
▽ More
This paper combines a techno-economic energy system model with an econometric model to maximise electricity price forecasting accuracy. The proposed combination model is tested on the German day-ahead wholesale electricity market. Our paper also benchmarks the results against several econometric alternatives. Lastly, we demonstrate the economic value of improved price estimators maximising the revenue from an electric storage resource. The results demonstrate that our integrated model improves overall forecasting accuracy by 18 %, compared to available literature benchmarks. Furthermore, our robustness checks reveal that a) the Ensemble Deep Neural Network model performs best in our dataset and b) adding output from the techno-economic energy systems model as econometric model input improves the performance of all econometric models. The empirical relevance of the forecast improvement is confirmed by the results of the exemplary storage optimisation, in which the integration of the techno-economic energy system model leads to a revenue increase of up to 10 %.
△ Less
Submitted 7 November, 2024;
originally announced November 2024.
-
Repairing Neural Networks for Safety in Robotic Systems using Predictive Models
Authors:
Keyvan Majd,
Geoffrey Clark,
Georgios Fainekos,
Heni Ben Amor
Abstract:
This paper introduces a new method for safety-aware robot learning, focusing on repairing policies using predictive models. Our method combines behavioral cloning with neural network repair in a two-step supervised learning framework. It first learns a policy from expert demonstrations and then applies repair subject to predictive models to enforce safety constraints. The predictive models can enc…
▽ More
This paper introduces a new method for safety-aware robot learning, focusing on repairing policies using predictive models. Our method combines behavioral cloning with neural network repair in a two-step supervised learning framework. It first learns a policy from expert demonstrations and then applies repair subject to predictive models to enforce safety constraints. The predictive models can encompass various aspects relevant to robot learning applications, such as proprioceptive states and collision likelihood. Our experimental results demonstrate that the learned policy successfully adheres to a predefined set of safety constraints on two applications: mobile robot navigation, and real-world lower-leg prostheses. Additionally, we have shown that our method effectively reduces repeated interaction with the robot, leading to substantial time savings during the learning process.
△ Less
Submitted 6 November, 2024;
originally announced November 2024.
-
Robust Precoding for FDD MISO Systems via Minorization Maximization
Authors:
Donia Ben Amor,
Michael Joham,
Wolfgang Utschick
Abstract:
In this work, we propose an approach to robust precoder design based on a minorization maximization technique that optimizes a surrogate function of the achievable spectral efficiency. The presented method accounts for channel estimation errors during the optimization process and is, hence, robust in the case of imperfect channel state information (CSI). Additionally, the design method is adapted…
▽ More
In this work, we propose an approach to robust precoder design based on a minorization maximization technique that optimizes a surrogate function of the achievable spectral efficiency. The presented method accounts for channel estimation errors during the optimization process and is, hence, robust in the case of imperfect channel state information (CSI). Additionally, the design method is adapted such that the need for a line search to satisfy the power constraint is eliminated, that significantly accelerates the precoder computation. Simulation results demonstrate that the proposed robust precoding method is competitive with weighted minimum mean square error (WMMSE) precoding, in particular, under imperfect CSI scenarios.
△ Less
Submitted 4 November, 2024;
originally announced November 2024.
-
Bilinear Precoder Based Efficient Rate Splitting Method in FDD Systems
Authors:
Sadaf Syed,
Donia Ben Amor,
Michael Joham,
Wolfgang Utschick
Abstract:
In this work, we propose a low-cost rate splitting (RS) technique for a multi-user multiple-input single-output (MISO) system operating in frequency division duplex (FDD) mode. The proposed iterative optimisation algorithm only depends on the second-order statistical channel knowledge and the pilot training matrix. Additionally, it offers a closed-form solution in each update step. This reduces th…
▽ More
In this work, we propose a low-cost rate splitting (RS) technique for a multi-user multiple-input single-output (MISO) system operating in frequency division duplex (FDD) mode. The proposed iterative optimisation algorithm only depends on the second-order statistical channel knowledge and the pilot training matrix. Additionally, it offers a closed-form solution in each update step. This reduces the design complexity of the system drastically as we only need to optimise the precoding filters in every coherence interval of the covariance matrices, instead of doing that in every channel state information (CSI) coherence interval. Moreover, since the algorithm is based on closed-form solutions, there is no need for interior point solvers like CVX, which are typically required in most state-of-the-art techniques.
△ Less
Submitted 4 November, 2024;
originally announced November 2024.
-
Scaling up to Problem Sizes: An Environmental Life Cycle Assessment of Quantum Computing
Authors:
Sylvain Cordier,
Karl Thibault,
Marie-Luc Arpin,
Ben Amor
Abstract:
With the demonstrated ability to perform calculations in seconds that would take classical supercomputers thousands of years, quantum computers namely hold the promise of radically advancing sustainable IT. However, quantum computers face challenges due to the inherent noise in physical qubits, necessitating error correction for reliable operation in solving industrial-scale problems, which will r…
▽ More
With the demonstrated ability to perform calculations in seconds that would take classical supercomputers thousands of years, quantum computers namely hold the promise of radically advancing sustainable IT. However, quantum computers face challenges due to the inherent noise in physical qubits, necessitating error correction for reliable operation in solving industrial-scale problems, which will require more computation time, energy, and electronic components than initial laboratory-scale experiments. Yet, while researchers have modeled and analyzed the environmental impacts of classical computers using Life Cycle Assessment (LCA), the environmental performance of quantum computing remains unknown to date. This study contributes to filling this critical gap in two ways: (1) by establishing an environmental profile for quantum computers based on superconducting qubits; and (2) by comparing it to a functionally equivalent profile of a state-of-the-art supercomputer. With the comparison based on the problem size, the paper shows how the usage time can drive an environmental advantage for quantum computers under specific scaling conditions and quantum error correcting codes. The results emphasize that quantum error correction hardware has a substantial environmental impact due to the numerous electronic components needed to achieve 100 logical qubits. This paper can serve as a basis for designing more environmentally friendly quantum computers and for establishing their environmental profiles, as well as those of the human activities that will use them.
△ Less
Submitted 15 March, 2025; v1 submitted 31 October, 2024;
originally announced November 2024.
-
From English-Centric to Effective Bilingual: LLMs with Custom Tokenizers for Underrepresented Languages
Authors:
Artur Kiulian,
Anton Polishko,
Mykola Khandoga,
Yevhen Kostiuk,
Guillermo Gabrielli,
Łukasz Gagała,
Fadi Zaraket,
Qusai Abu Obaida,
Hrishikesh Garud,
Wendy Wing Yee Mak,
Dmytro Chaplynskyi,
Selma Belhadj Amor,
Grigol Peradze
Abstract:
In this paper, we propose a model-agnostic cost-effective approach to developing bilingual base large language models (LLMs) to support English and any target language. The method includes vocabulary expansion, initialization of new embeddings, model training and evaluation. We performed our experiments with three languages, each using a non-Latin script - Ukrainian, Arabic, and Georgian.
Our ap…
▽ More
In this paper, we propose a model-agnostic cost-effective approach to developing bilingual base large language models (LLMs) to support English and any target language. The method includes vocabulary expansion, initialization of new embeddings, model training and evaluation. We performed our experiments with three languages, each using a non-Latin script - Ukrainian, Arabic, and Georgian.
Our approach demonstrates improved language performance while reducing computational costs. It mitigates the disproportionate penalization of underrepresented languages, promoting fairness and minimizing adverse phenomena such as code-switching and broken grammar. Additionally, we introduce new metrics to evaluate language quality, revealing that vocabulary size significantly impacts the quality of generated text.
△ Less
Submitted 24 October, 2024;
originally announced October 2024.
-
SiSCo: Signal Synthesis for Effective Human-Robot Communication Via Large Language Models
Authors:
Shubham Sonawani,
Fabian Weigend,
Heni Ben Amor
Abstract:
Effective human-robot collaboration hinges on robust communication channels, with visual signaling playing a pivotal role due to its intuitive appeal. Yet, the creation of visually intuitive cues often demands extensive resources and specialized knowledge. The emergence of Large Language Models (LLMs) offers promising avenues for enhancing human-robot interactions and revolutionizing the way we ge…
▽ More
Effective human-robot collaboration hinges on robust communication channels, with visual signaling playing a pivotal role due to its intuitive appeal. Yet, the creation of visually intuitive cues often demands extensive resources and specialized knowledge. The emergence of Large Language Models (LLMs) offers promising avenues for enhancing human-robot interactions and revolutionizing the way we generate context-aware visual cues. To this end, we introduce SiSCo--a novel framework that combines the computational power of LLMs with mixed-reality technologies to streamline the creation of visual cues for human-robot collaboration. Our results show that SiSCo improves the efficiency of communication in human-robot teaming tasks, reducing task completion time by approximately 73% and increasing task success rates by 18% compared to baseline natural language signals. Additionally, SiSCo reduces cognitive load for participants by 46%, as measured by the NASA-TLX subscale, and receives above-average user ratings for on-the-fly signals generated for unseen objects. To encourage further development and broader community engagement, we provide full access to SiSCo's implementation and related materials on our GitHub repository.
△ Less
Submitted 20 September, 2024;
originally announced September 2024.
-
Can language-guided unsupervised adaptation improve medical image classification using unpaired images and texts?
Authors:
Umaima Rahman,
Raza Imam,
Mohammad Yaqub,
Boulbaba Ben Amor,
Dwarikanath Mahapatra
Abstract:
In medical image classification, supervised learning is challenging due to the scarcity of labeled medical images. To address this, we leverage the visual-textual alignment within Vision-Language Models (VLMs) to enable unsupervised learning of a medical image classifier. In this work, we propose \underline{Med}ical \underline{Un}supervised \underline{A}daptation (\texttt{MedUnA}) of VLMs, where t…
▽ More
In medical image classification, supervised learning is challenging due to the scarcity of labeled medical images. To address this, we leverage the visual-textual alignment within Vision-Language Models (VLMs) to enable unsupervised learning of a medical image classifier. In this work, we propose \underline{Med}ical \underline{Un}supervised \underline{A}daptation (\texttt{MedUnA}) of VLMs, where the LLM-generated descriptions for each class are encoded into text embeddings and matched with class labels via a cross-modal adapter. This adapter attaches to a visual encoder of \texttt{MedCLIP} and aligns the visual embeddings through unsupervised learning, driven by a contrastive entropy-based loss and prompt tuning. Thereby, improving performance in scenarios where textual information is more abundant than labeled images, particularly in the healthcare domain. Unlike traditional VLMs, \texttt{MedUnA} uses \textbf{unpaired images and text} for learning representations and enhances the potential of VLMs beyond traditional constraints. We evaluate the performance on three chest X-ray datasets and two multi-class datasets (diabetic retinopathy and skin lesions), showing significant accuracy gains over the zero-shot baseline. Our code is available at https://github.com/rumaima/meduna.
△ Less
Submitted 29 March, 2025; v1 submitted 3 September, 2024;
originally announced September 2024.
-
A Comparison of Imitation Learning Algorithms for Bimanual Manipulation
Authors:
Michael Drolet,
Simon Stepputtis,
Siva Kailas,
Ajinkya Jain,
Jan Peters,
Stefan Schaal,
Heni Ben Amor
Abstract:
Amidst the wide popularity of imitation learning algorithms in robotics, their properties regarding hyperparameter sensitivity, ease of training, data efficiency, and performance have not been well-studied in high-precision industry-inspired environments. In this work, we demonstrate the limitations and benefits of prominent imitation learning approaches and analyze their capabilities regarding th…
▽ More
Amidst the wide popularity of imitation learning algorithms in robotics, their properties regarding hyperparameter sensitivity, ease of training, data efficiency, and performance have not been well-studied in high-precision industry-inspired environments. In this work, we demonstrate the limitations and benefits of prominent imitation learning approaches and analyze their capabilities regarding these properties. We evaluate each algorithm on a complex bimanual manipulation task involving an over-constrained dynamics system in a setting involving multiple contacts between the manipulated object and the environment. While we find that imitation learning is well suited to solve such complex tasks, not all algorithms are equal in terms of handling environmental and hyperparameter perturbations, training requirements, performance, and ease of use. We investigate the empirical influence of these key characteristics by employing a carefully designed experimental procedure and learning environment. Paper website: https://bimanual-imitation.github.io/
△ Less
Submitted 24 August, 2024; v1 submitted 12 August, 2024;
originally announced August 2024.
-
Achieving Human Level Competitive Robot Table Tennis
Authors:
David B. D'Ambrosio,
Saminda Abeyruwan,
Laura Graesser,
Atil Iscen,
Heni Ben Amor,
Alex Bewley,
Barney J. Reed,
Krista Reymann,
Leila Takayama,
Yuval Tassa,
Krzysztof Choromanski,
Erwin Coumans,
Deepali Jain,
Navdeep Jaitly,
Natasha Jaques,
Satoshi Kataoka,
Yuheng Kuang,
Nevena Lazic,
Reza Mahjourian,
Sherry Moore,
Kenneth Oslund,
Anish Shankar,
Vikas Sindhwani,
Vincent Vanhoucke,
Grace Vesom
, et al. (2 additional authors not shown)
Abstract:
Achieving human-level speed and performance on real world tasks is a north star for the robotics research community. This work takes a step towards that goal and presents the first learned robot agent that reaches amateur human-level performance in competitive table tennis. Table tennis is a physically demanding sport which requires human players to undergo years of training to achieve an advanced…
▽ More
Achieving human-level speed and performance on real world tasks is a north star for the robotics research community. This work takes a step towards that goal and presents the first learned robot agent that reaches amateur human-level performance in competitive table tennis. Table tennis is a physically demanding sport which requires human players to undergo years of training to achieve an advanced level of proficiency. In this paper, we contribute (1) a hierarchical and modular policy architecture consisting of (i) low level controllers with their detailed skill descriptors which model the agent's capabilities and help to bridge the sim-to-real gap and (ii) a high level controller that chooses the low level skills, (2) techniques for enabling zero-shot sim-to-real including an iterative approach to defining the task distribution that is grounded in the real-world and defines an automatic curriculum, and (3) real time adaptation to unseen opponents. Policy performance was assessed through 29 robot vs. human matches of which the robot won 45% (13/29). All humans were unseen players and their skill level varied from beginner to tournament level. Whilst the robot lost all matches vs. the most advanced players it won 100% matches vs. beginners and 55% matches vs. intermediate players, demonstrating solidly amateur human-level performance. Videos of the matches can be viewed at https://sites.google.com/view/competitive-robot-table-tennis
△ Less
Submitted 1 May, 2025; v1 submitted 7 August, 2024;
originally announced August 2024.
-
Mixture of Modular Experts: Distilling Knowledge from a Multilingual Teacher into Specialized Modular Language Models
Authors:
Mohammed Al-Maamari,
Mehdi Ben Amor,
Michael Granitzer
Abstract:
This research combines Knowledge Distillation (KD) and Mixture of Experts (MoE) to develop modular, efficient multilingual language models. Key objectives include evaluating adaptive versus fixed alpha methods in KD and comparing modular MoE architectures for handling multi-domain inputs and preventing catastrophic forgetting. KD compresses large language models (LLMs) into smaller, efficient mode…
▽ More
This research combines Knowledge Distillation (KD) and Mixture of Experts (MoE) to develop modular, efficient multilingual language models. Key objectives include evaluating adaptive versus fixed alpha methods in KD and comparing modular MoE architectures for handling multi-domain inputs and preventing catastrophic forgetting. KD compresses large language models (LLMs) into smaller, efficient models, while MoE enhances modularity with specialized tasks. Experiments showed similar performance for both KD methods, with marginal improvements from adaptive alpha. A combined loss approach provided more stable learning. The router, trained to classify input sequences into English, French, German, or Python, achieved 99.95% precision, recall, and F1 score, with Logistic Regression being the most effective classifier. Evaluations of modular MoE architectures revealed that Pre-trained Language Experts (PLE) and Joint Expert Embedding Training (JEET) performed similarly, while the MoE with Common Expert (MoE-CE) setup showed slightly lower performance. Including a common expert in MoE-CE improved its performance. Studies on catastrophic forgetting indicated that sequential training led to significant forgetting, while single-session training with balanced batches and the MoE approach mitigated this issue. The MoE architecture preserved knowledge across multiple languages effectively.
The research contributes open-sourced resources including the dataset (https://zenodo.org/doi/10.5281/zenodo.12677631), a balanced dataset creation tool (https://github.com/padas-lab-de/multi-language-dataset-creator), and the research codebase (https://github.com/ModMaamari/mixture-modular-experts).
△ Less
Submitted 28 July, 2024;
originally announced July 2024.
-
Ubiquitous Robot Control Through Multimodal Motion Capture Using Smartwatch and Smartphone Data
Authors:
Fabian C Weigend,
Neelesh Kumar,
Oya Aran,
Heni Ben Amor
Abstract:
We present an open-source library for seamless robot control through motion capture using smartphones and smartwatches. Our library features three modes: Watch Only Mode, enabling control with a single smartwatch; Upper Arm Mode, offering heightened accuracy by incorporating the smartphone attached to the upper arm; and Pocket Mode, determining body orientation via the smartphone placed in any poc…
▽ More
We present an open-source library for seamless robot control through motion capture using smartphones and smartwatches. Our library features three modes: Watch Only Mode, enabling control with a single smartwatch; Upper Arm Mode, offering heightened accuracy by incorporating the smartphone attached to the upper arm; and Pocket Mode, determining body orientation via the smartphone placed in any pocket. These modes are applied in two real-robot tasks, showcasing placement accuracy within 2 cm compared to a gold-standard motion capture system. WearMoCap stands as a suitable alternative to conventional motion capture systems, particularly in environments where ubiquity is essential. The library is available at: www.github.com/wearable-motion-capture.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Design of a Multi-User RIS-Aided System with Statistical Channel Knowledge
Authors:
Sadaf Syed,
Dominik Semmler,
Donia Ben Amor,
Michael Joham,
Wolfgang Utschick
Abstract:
Reconfigurable intelligent surface (RIS) is a promising technology to enhance the spectral and energy efficiency in a wireless communication system. The design of the phase shifts of an RIS in every channel coherence interval demands a huge training overhead, making its deployment practically infeasible. The design complexity can be significantly reduced by exploiting the second-order statistics o…
▽ More
Reconfigurable intelligent surface (RIS) is a promising technology to enhance the spectral and energy efficiency in a wireless communication system. The design of the phase shifts of an RIS in every channel coherence interval demands a huge training overhead, making its deployment practically infeasible. The design complexity can be significantly reduced by exploiting the second-order statistics of the channels. This paper is the extension of our previous work to the design of an RIS for the multi-user setup, where we employ maximisation of the lower bound of the achievable sum-rate of the users. Unlike for the single-user case, obtaining a closed-form expression for the update of the filters and phase shifts is more challenging in the multi-user case. We resort to the fractional programming (FP) approach and the non-convex block coordinate descent (BCD) method to solve the optimisation problem. As the phase shifts of the RIS obtained by the proposed algorithms are based on the statistical channel knowledge, they do not need to be updated in every channel coherence interval.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Med42 -- Evaluating Fine-Tuning Strategies for Medical LLMs: Full-Parameter vs. Parameter-Efficient Approaches
Authors:
Clément Christophe,
Praveen K Kanithi,
Prateek Munjal,
Tathagata Raha,
Nasir Hayat,
Ronnie Rajan,
Ahmed Al-Mahrooqi,
Avani Gupta,
Muhammad Umar Salman,
Gurpreet Gosal,
Bhargav Kanakiya,
Charles Chen,
Natalia Vassilieva,
Boulbaba Ben Amor,
Marco AF Pimentel,
Shadab Khan
Abstract:
This study presents a comprehensive analysis and comparison of two predominant fine-tuning methodologies - full-parameter fine-tuning and parameter-efficient tuning - within the context of medical Large Language Models (LLMs). We developed and refined a series of LLMs, based on the Llama-2 architecture, specifically designed to enhance medical knowledge retrieval, reasoning, and question-answering…
▽ More
This study presents a comprehensive analysis and comparison of two predominant fine-tuning methodologies - full-parameter fine-tuning and parameter-efficient tuning - within the context of medical Large Language Models (LLMs). We developed and refined a series of LLMs, based on the Llama-2 architecture, specifically designed to enhance medical knowledge retrieval, reasoning, and question-answering capabilities. Our experiments systematically evaluate the effectiveness of these tuning strategies across various well-known medical benchmarks. Notably, our medical LLM Med42 showed an accuracy level of 72% on the US Medical Licensing Examination (USMLE) datasets, setting a new standard in performance for openly available medical LLMs. Through this comparative analysis, we aim to identify the most effective and efficient method for fine-tuning LLMs in the medical domain, thereby contributing significantly to the advancement of AI-driven healthcare applications.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Modified Bergman spaces on the unit ball of $\mathbb C^n$ and applications
Authors:
Hajer Ben Amor,
Noureddine Ghiloufi
Abstract:
In this paper, we introduce new spaces of holomorphic functions on the unit ball $\mathbb{B}_{n}$ of $\mathbb{C}^{n}$ generalizing the classical Bergman spaces. The main results include the properties of some operators and integrals representations such as Bergman-type projections, and Berezin transform.
In this paper, we introduce new spaces of holomorphic functions on the unit ball $\mathbb{B}_{n}$ of $\mathbb{C}^{n}$ generalizing the classical Bergman spaces. The main results include the properties of some operators and integrals representations such as Bergman-type projections, and Berezin transform.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
Enabling Stateful Behaviors for Diffusion-based Policy Learning
Authors:
Xiao Liu,
Fabian Weigend,
Yifan Zhou,
Heni Ben Amor
Abstract:
While imitation learning provides a simple and effective framework for policy learning, acquiring consistent actions during robot execution remains a challenging task. Existing approaches primarily focus on either modifying the action representation at data curation stage or altering the model itself, both of which do not fully address the scalability of consistent action generation. To overcome t…
▽ More
While imitation learning provides a simple and effective framework for policy learning, acquiring consistent actions during robot execution remains a challenging task. Existing approaches primarily focus on either modifying the action representation at data curation stage or altering the model itself, both of which do not fully address the scalability of consistent action generation. To overcome this limitation, we introduce the Diff-Control policy, which utilizes a diffusion-based model to learn the action representation from a state-space modeling viewpoint. We demonstrate that we can reduce diffusion-based policies' uncertainty by making it stateful through a Bayesian formulation facilitated by ControlNet, leading to improved robustness and success rates. Our experimental results demonstrate the significance of incorporating action statefulness in policy learning, where Diff-Control shows improved performance across various tasks. Specifically, Diff-Control achieves an average success rate of 72% and 84% on stateful and dynamic tasks, respectively. Project page: https://github.com/ir-lab/Diff-Control
△ Less
Submitted 22 July, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction
Authors:
Lan Feng,
Mohammadhossein Bahari,
Kaouther Messaoud Ben Amor,
Éloi Zablocki,
Matthieu Cord,
Alexandre Alahi
Abstract:
Vehicle trajectory prediction has increasingly relied on data-driven solutions, but their ability to scale to different data domains and the impact of larger dataset sizes on their generalization remain under-explored. While these questions can be studied by employing multiple datasets, it is challenging due to several discrepancies, e.g., in data formats, map resolution, and semantic annotation t…
▽ More
Vehicle trajectory prediction has increasingly relied on data-driven solutions, but their ability to scale to different data domains and the impact of larger dataset sizes on their generalization remain under-explored. While these questions can be studied by employing multiple datasets, it is challenging due to several discrepancies, e.g., in data formats, map resolution, and semantic annotation types. To address these challenges, we introduce UniTraj, a comprehensive framework that unifies various datasets, models, and evaluation criteria, presenting new opportunities for the vehicle trajectory prediction field. In particular, using UniTraj, we conduct extensive experiments and find that model performance significantly drops when transferred to other datasets. However, enlarging data size and diversity can substantially improve performance, leading to a new state-of-the-art result for the nuScenes dataset. We provide insights into dataset characteristics to explain these findings. The code can be found here: https://github.com/vita-epfl/UniTraj
△ Less
Submitted 7 August, 2024; v1 submitted 22 March, 2024;
originally announced March 2024.
-
An Efficient Rate Splitting Precoding Approach in Multi-User MISO FDD Systems
Authors:
Donia Ben Amor,
Michael Joham,
Wolfgang Utschick
Abstract:
In this work, we develop an efficient precoding strategy for a multi-user multiple-input-single output (MU MISO) system operating in frequency-division-duplex (FDD) mode, where rate splitting multiple access (RSMA) is implemented. To this end, we consider one-layer RS and show its significant impact on the system performance, specifically in the case where the channel state information (CSI) is in…
▽ More
In this work, we develop an efficient precoding strategy for a multi-user multiple-input-single output (MU MISO) system operating in frequency-division-duplex (FDD) mode, where rate splitting multiple access (RSMA) is implemented. To this end, we consider one-layer RS and show its significant impact on the system performance, specifically in the case where the channel state information (CSI) is incomplete at the transmitter. Based on a lower bound on the achievable rate that takes into account the CSI errors, we establish an augmented weighted average mean squared error (AWAMSE) algorithm for the RS setup denoted by AWAMSE-RS, where even the updates for the common and the private precoders are computed via analytical expressions, hence circumventing the need for interior-point methods. Simulation results validate the efficiency of our approach in terms of computational time and its competitiveness in terms of the achievable system throughput compared to state-of-the-art methods and non-RS setups.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
iRoCo: Intuitive Robot Control From Anywhere Using a Smartwatch
Authors:
Fabian C Weigend,
Xiao Liu,
Shubham Sonawani,
Neelesh Kumar,
Venugopal Vasudevan,
Heni Ben Amor
Abstract:
This paper introduces iRoCo (intuitive Robot Control) - a framework for ubiquitous human-robot collaboration using a single smartwatch and smartphone. By integrating probabilistic differentiable filters, iRoCo optimizes a combination of precise robot control and unrestricted user movement from ubiquitous devices. We demonstrate and evaluate the effectiveness of iRoCo in practical teleoperation and…
▽ More
This paper introduces iRoCo (intuitive Robot Control) - a framework for ubiquitous human-robot collaboration using a single smartwatch and smartphone. By integrating probabilistic differentiable filters, iRoCo optimizes a combination of precise robot control and unrestricted user movement from ubiquitous devices. We demonstrate and evaluate the effectiveness of iRoCo in practical teleoperation and drone piloting applications. Comparative analysis shows no significant difference between task performance with iRoCo and gold-standard control systems in teleoperation tasks. Additionally, iRoCo users complete drone piloting tasks 32\% faster than with a traditional remote control and report less frustration in a subjective load index questionnaire. Our findings strongly suggest that iRoCo is a promising new approach for intuitive robot control through smartwatches and smartphones from anywhere, at any time. The code is available at www.github.com/wearable-motion-capture
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Task Success is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors
Authors:
Lin Guan,
Yifan Zhou,
Denis Liu,
Yantian Zha,
Heni Ben Amor,
Subbarao Kambhampati
Abstract:
Large-scale generative models are shown to be useful for sampling meaningful candidate solutions, yet they often overlook task constraints and user preferences. Their full power is better harnessed when the models are coupled with external verifiers and the final solutions are derived iteratively or progressively according to the verification feedback. In the context of embodied AI, verification o…
▽ More
Large-scale generative models are shown to be useful for sampling meaningful candidate solutions, yet they often overlook task constraints and user preferences. Their full power is better harnessed when the models are coupled with external verifiers and the final solutions are derived iteratively or progressively according to the verification feedback. In the context of embodied AI, verification often solely involves assessing whether goal conditions specified in the instructions have been met. Nonetheless, for these agents to be seamlessly integrated into daily life, it is crucial to account for a broader range of constraints and preferences beyond bare task success (e.g., a robot should grasp bread with care to avoid significant deformations). However, given the unbounded scope of robot tasks, it is infeasible to construct scripted verifiers akin to those used for explicit-knowledge tasks like the game of Go and theorem proving. This begs the question: when no sound verifier is available, can we use large vision and language models (VLMs), which are approximately omniscient, as scalable Behavior Critics to catch undesirable robot behaviors in videos? To answer this, we first construct a benchmark that contains diverse cases of goal-reaching yet undesirable robot policies. Then, we comprehensively evaluate VLM critics to gain a deeper understanding of their strengths and failure modes. Based on the evaluation, we provide guidelines on how to effectively utilize VLM critiques and showcase a practical way to integrate the feedback into an iterative process of policy refinement. The dataset and codebase are released at: https://guansuns.github.io/pages/vlm-critic.
△ Less
Submitted 11 August, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Towards Measuring Representational Similarity of Large Language Models
Authors:
Max Klabunde,
Mehdi Ben Amor,
Michael Granitzer,
Florian Lemmerich
Abstract:
Understanding the similarity of the numerous released large language models (LLMs) has many uses, e.g., simplifying model selection, detecting illegal model reuse, and advancing our understanding of what makes LLMs perform well. In this work, we measure the similarity of representations of a set of LLMs with 7B parameters. Our results suggest that some LLMs are substantially different from others.…
▽ More
Understanding the similarity of the numerous released large language models (LLMs) has many uses, e.g., simplifying model selection, detecting illegal model reuse, and advancing our understanding of what makes LLMs perform well. In this work, we measure the similarity of representations of a set of LLMs with 7B parameters. Our results suggest that some LLMs are substantially different from others. We identify challenges of using representational similarity measures that suggest the need of careful study of similarity scores to avoid false conclusions.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Highly Accelerated Weighted MMSE Algorithms for Designing Precoders in FDD Systems with Incomplete CSI
Authors:
Donia Ben Amor,
Michael Joham,
Wolfgang Utschick
Abstract:
In this work, we derive a lower bound on the training-based achievable downlink (DL) sum rate (SR) of a multi-user multiple-input-single-output (MISO) system operating in frequency-division-duplex (FDD) mode. Assuming linear minimum mean square error (LMMSE) channel estimation is used, we establish a connection of the derived lower bound on the signal-to-interference-noise-ratio (SINR) to an avera…
▽ More
In this work, we derive a lower bound on the training-based achievable downlink (DL) sum rate (SR) of a multi-user multiple-input-single-output (MISO) system operating in frequency-division-duplex (FDD) mode. Assuming linear minimum mean square error (LMMSE) channel estimation is used, we establish a connection of the derived lower bound on the signal-to-interference-noise-ratio (SINR) to an average MSE that allows to reformulate the SR maximization problem as the minimization of the augmented weighted average MSE (AWAMSE). We propose an iterative precoder design with three alternating steps, all given in closed form, drastically reducing the computation time. We show numerically the effectiveness of the proposed approach in challenging scenarios with limited channel knowledge, i.e., we consider scenarios with a very limited number of pilots. We additionally propose a more efficient version of the well-known stochastic iterative WMMSE (SIWMMSE) approach, where the precoder update is given in closed form.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
Multimodal Learning of Soft Robot Dynamics using Differentiable Filters
Authors:
Xiao Liu,
Yifan Zhou,
Shuhei Ikemoto,
Heni Ben Amor
Abstract:
Differentiable Filters, as recursive Bayesian estimators, possess the ability to learn complex dynamics by deriving state transition and measurement models exclusively from data. This data-driven approach eliminates the reliance on explicit analytical models while maintaining the essential algorithmic components of the filtering process. However, the gain mechanism remains non-differentiable, limi…
▽ More
Differentiable Filters, as recursive Bayesian estimators, possess the ability to learn complex dynamics by deriving state transition and measurement models exclusively from data. This data-driven approach eliminates the reliance on explicit analytical models while maintaining the essential algorithmic components of the filtering process. However, the gain mechanism remains non-differentiable, limiting its adaptability to specific task requirements and contextual variations. To address this limitation, this paper introduces an innovative approach called α-MDF (Attention-based Multimodal Differentiable Filter). α-MDF leverages modern attention mechanisms to learn multimodal latent representations for accurate state estimation in soft robots. By incorporating attention mechanisms, α-MDF offers the flexibility to tailor the gain mechanism to the unique nature of the task and context. The effectiveness of α-MDF is validated through real-world state estimation tasks on soft robots. Our experimental results demonstrate significant reductions in state estimation errors, consistently surpassing differentiable filter baselines by up to 45% in the domain of soft robotics.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Authors:
Open X-Embodiment Collaboration,
Abby O'Neill,
Abdul Rehman,
Abhinav Gupta,
Abhiram Maddukuri,
Abhishek Gupta,
Abhishek Padalkar,
Abraham Lee,
Acorn Pooley,
Agrim Gupta,
Ajay Mandlekar,
Ajinkya Jain,
Albert Tung,
Alex Bewley,
Alex Herzog,
Alex Irpan,
Alexander Khazatsky,
Anant Rai,
Anchit Gupta,
Andrew Wang,
Andrey Kolobov,
Anikait Singh,
Animesh Garg,
Aniruddha Kembhavi,
Annie Xie
, et al. (269 additional authors not shown)
Abstract:
Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method…
▽ More
Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io.
△ Less
Submitted 14 May, 2025; v1 submitted 13 October, 2023;
originally announced October 2023.
-
Mise en œuvre d'une ingénierie didactique de développement dans le cadre d'un travail collaboratif chercheur/enseignant lors de la conceptualisation des objets de l'Analyse au début du cursus dans le supérieur
Authors:
Fatma Belhaj Amor
Abstract:
At the start of the higher education curriculum, the conceptualization of local approximation objects of a function requires the articulation of knowledge and skills from Functional Analysis and Topology. In the study of functions, a number of studies have established the existence of difficulties encountered by students, mainly as a result of the change of didactic contract during the transition…
▽ More
At the start of the higher education curriculum, the conceptualization of local approximation objects of a function requires the articulation of knowledge and skills from Functional Analysis and Topology. In the study of functions, a number of studies have established the existence of difficulties encountered by students, mainly as a result of the change of didactic contract during the transition from secondary to higher education. The construction of a teaching-learning project, as part of a collaborative effort with the class teacher, a priori helps students to overcome the main difficulties inherent in conceptualizing the local approximation objects of a function in the first year of preparatory classes. In the case of the design and implementation of didactic development engineering, analysis of the reasoning produced by students confronted with a situation with an adidactic dimension will enable us a priori to study the nature and origin of these difficulties. Our methodology for analyzing student work is based on a model of reasoning analysis within the framework of the theory of didactic situations mathematics. This model has played an essential role in the development of didactic engineering, in the identification of students' conceptions, forms and functions of reasoning. It also enabled us to identify epistemological, didactic and cultural obstacles to learning the concept of local approximation of a function. These obstacles result either from the paradigm shift that takes place during the transition from secondary to higher education, or from working within the paradigm of Infinitesimal Analysis during the appropriation of this mathematical concept.
△ Less
Submitted 22 August, 2023;
originally announced October 2023.
-
Probabilistic Differentiable Filters Enable Ubiquitous Robot Control with Smartwatches
Authors:
Fabian C Weigend,
Xiao Liu,
Heni Ben Amor
Abstract:
Ubiquitous robot control and human-robot collaboration using smart devices poses a challenging problem primarily due to strict accuracy requirements and sparse information. This paper presents a novel approach that incorporates a probabilistic differentiable filter, specifically the Differentiable Ensemble Kalman Filter (DEnKF), to facilitate robot control solely using Inertial Measurement Units (…
▽ More
Ubiquitous robot control and human-robot collaboration using smart devices poses a challenging problem primarily due to strict accuracy requirements and sparse information. This paper presents a novel approach that incorporates a probabilistic differentiable filter, specifically the Differentiable Ensemble Kalman Filter (DEnKF), to facilitate robot control solely using Inertial Measurement Units (IMUs) from a smartwatch and a smartphone. The implemented system is cost-effective and achieves accurate estimation of the human pose state. Experiment results from human-robot handover tasks underscore that smart devices allow versatile and ubiquitous robot control. The code for this paper is available at https://github.com/ir-lab/DEnKF and https://github.com/wearable-motion-capture.
△ Less
Submitted 3 October, 2023; v1 submitted 12 September, 2023;
originally announced September 2023.
-
Design of a Single-User RIS-Aided MISO System Based on Statistical Channel Knowledge
Authors:
Sadaf Syed,
Dominik Semmler,
Donia Ben Amor,
Michael Joham,
Wolfgang Utschick
Abstract:
Reconfigurable intelligent surface (RIS) is considered a prospective technology for beyond fifth-generation (5G) networks to improve the spectral and energy efficiency at a low cost. Prior works on the RIS mainly rely on perfect channel state information (CSI), which imposes a huge computational complexity. This work considers a single-user RIS-assisted communication system, where the second-order…
▽ More
Reconfigurable intelligent surface (RIS) is considered a prospective technology for beyond fifth-generation (5G) networks to improve the spectral and energy efficiency at a low cost. Prior works on the RIS mainly rely on perfect channel state information (CSI), which imposes a huge computational complexity. This work considers a single-user RIS-assisted communication system, where the second-order statistical knowledge of the channels is exploited to reduce the training overhead. We present algorithms that do not require estimation of the CSI and reconfiguration of the RIS in every channel coherence interval, which constitutes one of the most critical practical issues in an RIS-aided system.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
Projecting Robot Intentions Through Visual Cues: Static vs. Dynamic Signaling
Authors:
Shubham Sonawani,
Yifan Zhou,
Heni Ben Amor
Abstract:
Augmented and mixed-reality techniques harbor a great potential for improving human-robot collaboration. Visual signals and cues may be projected to a human partner in order to explicitly communicate robot intentions and goals. However, it is unclear what type of signals support such a process and whether signals can be combined without adding additional cognitive stress to the partner. This paper…
▽ More
Augmented and mixed-reality techniques harbor a great potential for improving human-robot collaboration. Visual signals and cues may be projected to a human partner in order to explicitly communicate robot intentions and goals. However, it is unclear what type of signals support such a process and whether signals can be combined without adding additional cognitive stress to the partner. This paper focuses on identifying the effective types of visual signals and quantify their impact through empirical evaluations. In particular, the study compares static and dynamic visual signals within a collaborative object sorting task and assesses their ability to shape human behavior. Furthermore, an information-theoretic analysis is performed to numerically quantify the degree of information transfer between visual signals and human behavior. The results of a human subject experiment show that there are significant advantages to combining multiple visual signals within a single task, i.e., increased task efficiency and reduced cognitive load.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Enhancing State Estimation in Robots: A Data-Driven Approach with Differentiable Ensemble Kalman Filters
Authors:
Xiao Liu,
Geoffrey Clark,
Joseph Campbell,
Yifan Zhou,
Heni Ben Amor
Abstract:
This paper introduces a novel state estimation framework for robots using differentiable ensemble Kalman filters (DEnKF). DEnKF is a reformulation of the traditional ensemble Kalman filter that employs stochastic neural networks to model the process noise implicitly. Our work is an extension of previous research on differentiable filters, which has provided a strong foundation for our modular and…
▽ More
This paper introduces a novel state estimation framework for robots using differentiable ensemble Kalman filters (DEnKF). DEnKF is a reformulation of the traditional ensemble Kalman filter that employs stochastic neural networks to model the process noise implicitly. Our work is an extension of previous research on differentiable filters, which has provided a strong foundation for our modular and end-to-end differentiable framework. This framework enables each component of the system to function independently, leading to improved flexibility and versatility in implementation. Through a series of experiments, we demonstrate the flexibility of this model across a diverse set of real-world tracking tasks, including visual odometry and robot manipulation. Moreover, we show that our model effectively handles noisy observations, is robust in the absence of observations, and outperforms state-of-the-art differentiable filters in terms of error metrics. Specifically, we observe a significant improvement of at least 59% in translational error when using DEnKF with noisy observations. Our results underscore the potential of DEnKF in advancing state estimation for robotics. Code for DEnKF is available at https://github.com/ir-lab/DEnKF
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Learning Soft Robot Dynamics using Differentiable Kalman Filters and Spatio-Temporal Embeddings
Authors:
Xiao Liu,
Shuhei Ikemoto,
Yuhei Yoshimitsu,
Heni Ben Amor
Abstract:
This paper introduces a novel approach for modeling the dynamics of soft robots, utilizing a differentiable filter architecture. The proposed approach enables end-to-end training to learn system dynamics, noise characteristics, and temporal behavior of the robot. A novel spatio-temporal embedding process is discussed to handle observations with varying sensor placements and sampling frequencies. T…
▽ More
This paper introduces a novel approach for modeling the dynamics of soft robots, utilizing a differentiable filter architecture. The proposed approach enables end-to-end training to learn system dynamics, noise characteristics, and temporal behavior of the robot. A novel spatio-temporal embedding process is discussed to handle observations with varying sensor placements and sampling frequencies. The efficacy of this approach is demonstrated on a tensegrity robot arm by learning end-effector dynamics from demonstrations with complex bending motions. The model is proven to be robust against missing modalities, diverse sensor placement, and varying sampling rates. Additionally, the proposed framework is shown to identify physical interactions with humans during motion. The utilization of a differentiable filter presents a novel solution to the difficulties of modeling soft robot dynamics. Our approach shows substantial improvement in accuracy compared to state-of-the-art filtering methods, with at least a 24% reduction in mean absolute error (MAE) observed. Furthermore, the predicted end-effector positions show an average MAE of 25.77mm from the ground truth, highlighting the advantage of our approach. The code is available at https://github.com/ir-lab/soft_robot_DEnKF.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Anytime, Anywhere: Human Arm Pose from Smartwatch Data for Ubiquitous Robot Control and Teleoperation
Authors:
Fabian C Weigend,
Shubham Sonawani,
Michael Drolet,
Heni Ben Amor
Abstract:
This work devises an optimized machine learning approach for human arm pose estimation from a single smartwatch. Our approach results in a distribution of possible wrist and elbow positions, which allows for a measure of uncertainty and the detection of multiple possible arm posture solutions, i.e., multimodal pose distributions. Combining estimated arm postures with speech recognition, we turn th…
▽ More
This work devises an optimized machine learning approach for human arm pose estimation from a single smartwatch. Our approach results in a distribution of possible wrist and elbow positions, which allows for a measure of uncertainty and the detection of multiple possible arm posture solutions, i.e., multimodal pose distributions. Combining estimated arm postures with speech recognition, we turn the smartwatch into a ubiquitous, low-cost and versatile robot control interface. We demonstrate in two use-cases that this intuitive control interface enables users to swiftly intervene in robot behavior, to temporarily adjust their goal, or to train completely new control policies by imitation. Extensive experiments show that the approach results in a 40% reduction in prediction error over the current state-of-the-art and achieves a mean error of 2.56cm for wrist and elbow positions. The code is available at https://github.com/wearable-motion-capture.
△ Less
Submitted 17 October, 2023; v1 submitted 22 June, 2023;
originally announced June 2023.
-
Technical Report: Impact of Position Bias on Language Models in Token Classification
Authors:
Mehdi Ben Amor,
Michael Granitzer,
Jelena Mitrović
Abstract:
Language Models (LMs) have shown state-of-the-art performance in Natural Language Processing (NLP) tasks. Downstream tasks such as Named Entity Recognition (NER) or Part-of-Speech (POS) tagging are known to suffer from data imbalance issues, particularly regarding the ratio of positive to negative examples and class disparities. This paper investigates an often-overlooked issue of encoder models,…
▽ More
Language Models (LMs) have shown state-of-the-art performance in Natural Language Processing (NLP) tasks. Downstream tasks such as Named Entity Recognition (NER) or Part-of-Speech (POS) tagging are known to suffer from data imbalance issues, particularly regarding the ratio of positive to negative examples and class disparities. This paper investigates an often-overlooked issue of encoder models, specifically the position bias of positive examples in token classification tasks. For completeness, we also include decoders in the evaluation. We evaluate the impact of position bias using different position embedding techniques, focusing on BERT with Absolute Position Embedding (APE), Relative Position Embedding (RPE), and Rotary Position Embedding (RoPE). Therefore, we conduct an in-depth evaluation of the impact of position bias on the performance of LMs when fine-tuned on token classification benchmarks. Our study includes CoNLL03 and OntoNote5.0 for NER, English Tree Bank UD\_en, and TweeBank for POS tagging. We propose an evaluation approach to investigate position bias in transformer models. We show that LMs can suffer from this bias with an average drop ranging from 3\% to 9\% in their performance. To mitigate this effect, we propose two methods: Random Position Shifting and Context Perturbation, that we apply on batches during the training process. The results show an improvement of $\approx$ 2\% in the performance of the model on CoNLL03, UD\_en, and TweeBank.
△ Less
Submitted 11 April, 2024; v1 submitted 26 April, 2023;
originally announced April 2023.
-
Conditional supremum in Riesz spaces and applications
Authors:
Youssef Azouzi,
Mohamed Amine Ben Amor,
Dorsaf Cherif,
Marwa Masmoudi
Abstract:
We extend the concept of conditional supremum to the measure-free setting of Riesz spaces via the conditional expectation operator. We explore its properties and show how this tool is crucial in generalizing various results across multiple disciplines to the framework of Riesz spaces. Among other applications, we utilize this concept in finance to derive characterizations of certain financial cond…
▽ More
We extend the concept of conditional supremum to the measure-free setting of Riesz spaces via the conditional expectation operator. We explore its properties and show how this tool is crucial in generalizing various results across multiple disciplines to the framework of Riesz spaces. Among other applications, we utilize this concept in finance to derive characterizations of certain financial conditions.
△ Less
Submitted 17 March, 2023;
originally announced March 2023.
-
Teaching and learning in the age of artificial intelligence
Authors:
Margarida Romero,
Laurent Heiser,
Alexandre Lepage,
Alexandre Lepage,
Anne Gagnebien,
Audrey Bonjour,
Aurélie Lagarrigue,
Axel Palaude,
Caroline Boulord,
Charles-Antoine Gagneur,
Chloé Mercier,
Christelle Caucheteux,
Dominique Guidoni-Stoltz,
Florence Tressols,
Frédéric Alexandre,
Jean-François Céci,
Jean-François Metral,
Jérémy Camponovo,
Julie Henry,
Laurent Fouché,
Laurent Heiser,
Lianne-Blue Hodgkins,
Margarida Romero,
Marie-Hélène Comte,
Michel Durampart
, et al. (10 additional authors not shown)
Abstract:
As part of the Digital Working Group (GTnum) #Scol_IA "Renewal of digital practices and creative uses of digital and AI" we are pleased to present the white paper "Teaching and learning in the era of Artificial Intelligence, Acculturation, integration and creative uses of AI in education". The white paper edited by Margarida Romero, Laurent Heiser and Alexandre Lepage aims to provide the various e…
▽ More
As part of the Digital Working Group (GTnum) #Scol_IA "Renewal of digital practices and creative uses of digital and AI" we are pleased to present the white paper "Teaching and learning in the era of Artificial Intelligence, Acculturation, integration and creative uses of AI in education". The white paper edited by Margarida Romero, Laurent Heiser and Alexandre Lepage aims to provide the various educational actors with a diversified perspective both on the issues of acculturation and training in AI and on the resources and feedback from the various research teams and organisations. of scientific culture in the French-speaking countries. A multidisciplinary approach makes it possible to consider the perspectives of researchers in computer science as well as those of education and training sciences, information and communication sciences and the expertise of teaching professionals. and scientific mediation.
△ Less
Submitted 14 March, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
Certifiably-correct Control Policies for Safe Learning and Adaptation in Assistive Robotics
Authors:
Keyvan Majd,
Geoffrey Clark,
Tanmay Khandait,
Siyu Zhou,
Sriram Sankaranarayanan,
Georgios Fainekos,
Heni Ben Amor
Abstract:
Guaranteeing safety in human-centric applications is critical in robot learning as the learned policies may demonstrate unsafe behaviors in formerly unseen scenarios. We present a framework to locally repair an erroneous policy network to satisfy a set of formal safety constraints using Mixed Integer Quadratic Programming (MIQP). Our MIQP formulation explicitly imposes the safety constraints to th…
▽ More
Guaranteeing safety in human-centric applications is critical in robot learning as the learned policies may demonstrate unsafe behaviors in formerly unseen scenarios. We present a framework to locally repair an erroneous policy network to satisfy a set of formal safety constraints using Mixed Integer Quadratic Programming (MIQP). Our MIQP formulation explicitly imposes the safety constraints to the learned policy while minimizing the original loss function. The policy network is then verified to be locally safe. We demonstrate the application of our framework to derive safe policies for a robotic lower-leg prosthesis.
△ Less
Submitted 12 March, 2023;
originally announced March 2023.
-
Safe Robot Learning in Assistive Devices through Neural Network Repair
Authors:
Keyvan Majd,
Geoffrey Clark,
Tanmay Khandait,
Siyu Zhou,
Sriram Sankaranarayanan,
Georgios Fainekos,
Heni Ben Amor
Abstract:
Assistive robotic devices are a particularly promising field of application for neural networks (NN) due to the need for personalization and hard-to-model human-machine interaction dynamics. However, NN based estimators and controllers may produce potentially unsafe outputs over previously unseen data points. In this paper, we introduce an algorithm for updating NN control policies to satisfy a gi…
▽ More
Assistive robotic devices are a particularly promising field of application for neural networks (NN) due to the need for personalization and hard-to-model human-machine interaction dynamics. However, NN based estimators and controllers may produce potentially unsafe outputs over previously unseen data points. In this paper, we introduce an algorithm for updating NN control policies to satisfy a given set of formal safety constraints, while also optimizing the original loss function. Given a set of mixed-integer linear constraints, we define the NN repair problem as a Mixed Integer Quadratic Program (MIQP). In extensive experiments, we demonstrate the efficacy of our repair method in generating safe policies for a lower-leg prosthesis.
△ Less
Submitted 8 March, 2023;
originally announced March 2023.
-
Asymptotic Behavior of Zero-Forcing Precoding based on Imperfect Channel Knowledge for Massive MISO FDD Systems
Authors:
Donia Ben Amor,
Michael Joham,
Wolfgang Utschick
Abstract:
In this work, we study the asymptotic behavior of the zero-forcing precoder based on the least squares (LS) and the linear minimum mean-square error (LMMSE) channel estimates for the downlink (DL) of a frequency-division-duplex (FDD) massive multiple-input-single-output (MISO) system. We show analytically the rather surprising result that zero-forcing precoding based on the LS estimate leads asymp…
▽ More
In this work, we study the asymptotic behavior of the zero-forcing precoder based on the least squares (LS) and the linear minimum mean-square error (LMMSE) channel estimates for the downlink (DL) of a frequency-division-duplex (FDD) massive multiple-input-single-output (MISO) system. We show analytically the rather surprising result that zero-forcing precoding based on the LS estimate leads asymptotically to an interference-free transmission, even if the number of pilots used for DL channel training is less than the number of antennas available at the base station (BS). Although the LMMSE channel estimate exhibits a better quality in terms of the MSE due to the exploitation of the channel statistics, we show that in the case of contaminated channel observations, zero-forcing based on the LMMSE is unable to eliminate the inter-user interference in the asymptotic limit of high DL transmit powers. In order for the results to hold, mild conditions on the channel probing phase are assumed. The validity of our analytical results is demonstrated through numerical simulations for different scenarios.
△ Less
Submitted 4 December, 2023; v1 submitted 20 January, 2023;
originally announced January 2023.
-
Capturing electron-driven chiral dynamics in UV-excited molecules
Authors:
Vincent Wanie,
Etienne Bloch,
Erik P. Månsson,
Lorenzo Colaizzi,
Sergey Ryabchuk,
Krishna Saraswathula,
Andres F. Ordonez,
David Ayuso,
Olga Smirnova,
Andrea Trabattoni,
Valérie Blanchet,
Nadia Ben Amor,
Marie-Catherine Heitz,
Yann Mairesse,
Bernard Pons,
Francesca Calegari
Abstract:
Molecular chirality is a key design property for many technologies including bioresponsive imaging, circularly polarized light detection and emission, molecular motors and switches. Imaging and manipulating the primary steps of transient chirality is therefore central for controlling numerous physical, chemical and biological properties that arise from chiral molecules in response to external stim…
▽ More
Molecular chirality is a key design property for many technologies including bioresponsive imaging, circularly polarized light detection and emission, molecular motors and switches. Imaging and manipulating the primary steps of transient chirality is therefore central for controlling numerous physical, chemical and biological properties that arise from chiral molecules in response to external stimuli. So far, the manifestation of electron-driven chiral dynamics in neutral molecules has not been demonstrated at their intrinsic timescale. Here, we use time-resolved photoelectron circular dichroism (TR-PECD) with an unprecedented instrument response function of 2.9 fs to image the dynamics of coherent electronic motion activated by prompt UV-excitation in neutral chiral molecules, disclosing its impact on the molecular chiral response. We find that electronic beatings between Rydberg states lead to periodic modulations of the chiroptical response on the few-femtosecond timescale, showing a sign inversion in less than 10 fs. Calculations including both the molecular UV-excitation and subsequent photoionization confirm this interpretation and provide further evidence that the combination of the resulting photoinduced chiral current with a circularly polarized probe pulse realizes an enantio-selective filter of molecular orientations upon photoionization, opening up a route towards enantio-selective charge-directed reactivity.
△ Less
Submitted 19 January, 2024; v1 submitted 5 January, 2023;
originally announced January 2023.
-
Imitation Learning based Auto-Correction of Extrinsic Parameters for A Mixed-Reality Setup
Authors:
Shubham Sonawani,
Yifan Zhou,
Heni Ben Amor
Abstract:
In this paper, we discuss an imitation learning based method for reducing the calibration error for a mixed reality system consisting of a vision sensor and a projector. Unlike a head mounted display, in this setup, augmented information is available to a human subject via the projection of a scene into the real world. Inherently, the camera and projector need to be calibrated as a stereo setup to…
▽ More
In this paper, we discuss an imitation learning based method for reducing the calibration error for a mixed reality system consisting of a vision sensor and a projector. Unlike a head mounted display, in this setup, augmented information is available to a human subject via the projection of a scene into the real world. Inherently, the camera and projector need to be calibrated as a stereo setup to project accurate information in 3D space. Previous calibration processes require multiple recording and parameter tuning steps to achieve the desired calibration, which is usually time consuming process. In order to avoid such tedious calibration, we train a CNN model to iteratively correct the extrinsic offset given a QR code and a projected pattern. We discuss the overall system setup, data collection for training, and results of the auto-correction model.
△ Less
Submitted 16 December, 2022;
originally announced December 2022.