Search | arXiv e-print repository

arXiv:2506.11829 [pdf, ps, other]

The Space Between Us: A Methodological Framework for Researching Bonding and Proxemics in Situated Group-Agent Interactions

Authors: Ana Müller, Anja Richert

Abstract: This paper introduces a multimethod framework for studying spatial and social dynamics in real-world group-agent interactions with socially interactive agents. Drawing on proxemics and bonding theories, the method combines subjective self-reports and objective spatial tracking. Applied in two field studies in a museum (N = 187) with a robot and a virtual agent, the paper addresses the challenges i… ▽ More This paper introduces a multimethod framework for studying spatial and social dynamics in real-world group-agent interactions with socially interactive agents. Drawing on proxemics and bonding theories, the method combines subjective self-reports and objective spatial tracking. Applied in two field studies in a museum (N = 187) with a robot and a virtual agent, the paper addresses the challenges in aligning human perception and behavior. We focus on presenting an open source, scalable, and field-tested toolkit for future studies. △ Less

Submitted 13 June, 2025; originally announced June 2025.

Comments: Accepted for presentation at the Workshop on Advancing Group Understanding and Robots' Adaptive Behavior (GROUND), held at the Intelligent Autonomous Systems (IAS) Conference 2025, Genoa, Italy

arXiv:2506.10686 [pdf, ps, other]

doi 10.1109/LRA.2020.3044028

An $O(n$)-Algorithm for the Higher-Order Kinematics and Inverse Dynamics of Serial Manipulators using Spatial Representation of Twists

Authors: Andreas Mueller

Abstract: Optimal control in general, and flatness-based control in particular, of robotic arms necessitate to compute the first and second time derivatives of the joint torques/forces required to achieve a desired motion. In view of the required computational efficiency, recursive $O(n)$-algorithms were proposed to this end. Aiming at compact yet efficient formulations, a Lie group formulation was recently… ▽ More Optimal control in general, and flatness-based control in particular, of robotic arms necessitate to compute the first and second time derivatives of the joint torques/forces required to achieve a desired motion. In view of the required computational efficiency, recursive $O(n)$-algorithms were proposed to this end. Aiming at compact yet efficient formulations, a Lie group formulation was recently proposed, making use of body-fixed and hybrid representation of twists and wrenches. In this paper a formulation is introduced using the spatial representation. The second-order inverse dynamics algorithm is accompanied by a fourth-order forward and inverse kinematics algorithm. An advantage of all Lie group formulations is that they can be parameterized in terms of vectorial quantities that are readily available. The method is demonstrated for the 7 DOF Franka Emika Panda robot. △ Less

Submitted 12 June, 2025; originally announced June 2025.

Journal ref: IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 6, NO. 2, APRIL 2021

arXiv:2506.10462 [pdf, ps, other]

Are We Generalizing from the Exception? An In-the-Wild Study on Group-Sensitive Conversation Design in Human-Agent Interactions

Authors: Ana Müller, Sabina Jeschke, Anja Richert

Abstract: This paper investigates the impact of a group-adaptive conversation design in two socially interactive agents (SIAs) through two real-world studies. Both SIAs - Furhat, a social robot, and MetaHuman, a virtual agent - were equipped with a conversational artificial intelligence (CAI) backend combining hybrid retrieval and generative models. The studies were carried out in an in-the-wild setting wit… ▽ More This paper investigates the impact of a group-adaptive conversation design in two socially interactive agents (SIAs) through two real-world studies. Both SIAs - Furhat, a social robot, and MetaHuman, a virtual agent - were equipped with a conversational artificial intelligence (CAI) backend combining hybrid retrieval and generative models. The studies were carried out in an in-the-wild setting with a total of $N = 188$ participants who interacted with the SIAs - in dyads, triads or larger groups - at a German museum. Although the results did not reveal a significant effect of the group-sensitive conversation design on perceived satisfaction, the findings provide valuable insights into the challenges of adapting CAI for multi-party interactions and across different embodiments (robot vs.\ virtual agent), highlighting the need for multimodal strategies beyond linguistic pluralization. These insights contribute to the fields of Human-Agent Interaction (HAI), Human-Robot Interaction (HRI), and broader Human-Machine Interaction (HMI), providing insights for future research on effective dialogue adaptation in group settings. △ Less

Submitted 12 June, 2025; originally announced June 2025.

Comments: Accepted as a regular paper at the 2025 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN). \c{opyright} IEEE. This is the preprint version. The final version will appear in the IEEE proceedings

arXiv:2506.05584 [pdf, ps, other]

TabFlex: Scaling Tabular Learning to Millions with Linear Attention

Authors: Yuchen Zeng, Tuan Dinh, Wonjun Kang, Andreas C Mueller

Abstract: Leveraging the in-context learning (ICL) capability of Large Language Models (LLMs) for tabular classification has gained significant attention for its training-free adaptability across diverse datasets. Recent advancements, like TabPFN, excel in small-scale tabular datasets but struggle to scale for large and complex datasets. Our work enhances the efficiency and scalability of TabPFN for larger… ▽ More Leveraging the in-context learning (ICL) capability of Large Language Models (LLMs) for tabular classification has gained significant attention for its training-free adaptability across diverse datasets. Recent advancements, like TabPFN, excel in small-scale tabular datasets but struggle to scale for large and complex datasets. Our work enhances the efficiency and scalability of TabPFN for larger datasets by incorporating linear attention mechanisms as a scalable alternative to complexity-quadratic self-attention. Our model, TabFlex, efficiently handles tabular datasets with thousands of features and hundreds of classes, scaling seamlessly to millions of samples. For instance, TabFlex processes the poker-hand dataset with over a million samples in just 5 seconds. Our extensive evaluations demonstrate that TabFlex can achieve over a 2x speedup compared to TabPFN and a 1.5x speedup over XGBoost, outperforming 25 tested baselines in terms of efficiency across a diverse range of datasets. Furthermore, TabFlex remains highly effective on large-scale datasets, delivering strong performance with significantly reduced computational costs, especially when combined with data-efficient techniques such as dimensionality reduction and data sampling. △ Less

Submitted 5 June, 2025; originally announced June 2025.

Comments: 30 pages, ICML 2025

arXiv:2505.20209 [pdf, other]

How to Improve the Robustness of Closed-Source Models on NLI

Authors: Joe Stacey, Lisa Alazraki, Aran Ubhi, Beyza Ermis, Aaron Mueller, Marek Rei

Abstract: Closed-source Large Language Models (LLMs) have become increasingly popular, with impressive performance across a wide range of natural language tasks. These models can be fine-tuned to further improve performance, but this often results in the models learning from dataset-specific heuristics that reduce their robustness on out-of-distribution (OOD) data. Existing methods to improve robustness eit… ▽ More Closed-source Large Language Models (LLMs) have become increasingly popular, with impressive performance across a wide range of natural language tasks. These models can be fine-tuned to further improve performance, but this often results in the models learning from dataset-specific heuristics that reduce their robustness on out-of-distribution (OOD) data. Existing methods to improve robustness either perform poorly, or are non-applicable to closed-source models because they assume access to model internals, or the ability to change the model's training procedure. In this work, we investigate strategies to improve the robustness of closed-source LLMs through data-centric methods that do not require access to model internals. We find that the optimal strategy depends on the complexity of the OOD data. For highly complex OOD datasets, upsampling more challenging training examples can improve robustness by up to 1.5%. For less complex OOD datasets, replacing a portion of the training set with LLM-generated examples can improve robustness by 3.7%. More broadly, we find that large-scale closed-source autoregressive LLMs are substantially more robust than commonly used encoder models, and are a more appropriate choice of baseline going forward. △ Less

Submitted 26 May, 2025; originally announced May 2025.

ACM Class: I.2.7

arXiv:2505.20063 [pdf, other]

SAEs Are Good for Steering -- If You Select the Right Features

Authors: Dana Arad, Aaron Mueller, Yonatan Belinkov

Abstract: Sparse Autoencoders (SAEs) have been proposed as an unsupervised approach to learn a decomposition of a model's latent space. This enables useful applications such as steering - influencing the output of a model towards a desired concept - without requiring labeled data. Current methods identify SAE features to steer by analyzing the input tokens that activate them. However, recent work has highli… ▽ More Sparse Autoencoders (SAEs) have been proposed as an unsupervised approach to learn a decomposition of a model's latent space. This enables useful applications such as steering - influencing the output of a model towards a desired concept - without requiring labeled data. Current methods identify SAE features to steer by analyzing the input tokens that activate them. However, recent work has highlighted that activations alone do not fully describe the effect of a feature on the model's output. In this work, we draw a distinction between two types of features: input features, which mainly capture patterns in the model's input, and output features, which have a human-understandable effect on the model's output. We propose input and output scores to characterize and locate these types of features, and show that high values for both scores rarely co-occur in the same features. These findings have practical implications: after filtering out features with low output scores, we obtain 2-3x improvements when steering with SAEs, making them competitive with supervised methods. △ Less

Submitted 26 May, 2025; originally announced May 2025.

arXiv:2504.13151 [pdf, ps, other]

MIB: A Mechanistic Interpretability Benchmark

Authors: Aaron Mueller, Atticus Geiger, Sarah Wiegreffe, Dana Arad, Iván Arcuschin, Adam Belfki, Yik Siu Chan, Jaden Fiotto-Kaufman, Tal Haklay, Michael Hanna, Jing Huang, Rohan Gupta, Yaniv Nikankin, Hadas Orgad, Nikhil Prakash, Anja Reusch, Aruna Sankaranarayanan, Shun Shao, Alessandro Stolfo, Martin Tutek, Amir Zur, David Bau, Yonatan Belinkov

Abstract: How can we know whether new mechanistic interpretability methods achieve real improvements? In pursuit of lasting evaluation standards, we propose MIB, a Mechanistic Interpretability Benchmark, with two tracks spanning four tasks and five models. MIB favors methods that precisely and concisely recover relevant causal pathways or causal variables in neural language models. The circuit localization… ▽ More How can we know whether new mechanistic interpretability methods achieve real improvements? In pursuit of lasting evaluation standards, we propose MIB, a Mechanistic Interpretability Benchmark, with two tracks spanning four tasks and five models. MIB favors methods that precisely and concisely recover relevant causal pathways or causal variables in neural language models. The circuit localization track compares methods that locate the model components - and connections between them - most important for performing a task (e.g., attribution patching or information flow routes). The causal variable localization track compares methods that featurize a hidden vector, e.g., sparse autoencoders (SAEs) or distributed alignment search (DAS), and align those features to a task-relevant causal variable. Using MIB, we find that attribution and mask optimization methods perform best on circuit localization. For causal variable localization, we find that the supervised DAS method performs best, while SAE features are not better than neurons, i.e., non-featurized hidden vectors. These findings illustrate that MIB enables meaningful comparisons, and increases our confidence that there has been real progress in the field. △ Less

Submitted 9 June, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

Comments: Accepted to ICML 2025. Project website at https://mib-bench.github.io

arXiv:2504.11011 [pdf, other]

Document Quality Scoring for Web Crawling

Authors: Francesca Pezzuti, Ariane Mueller, Sean MacAvaney, Nicola Tonellotto

Abstract: The internet contains large amounts of low-quality content, yet users expect web search engines to deliver high-quality, relevant results. The abundant presence of low-quality pages can negatively impact retrieval and crawling processes by wasting resources on these documents. Therefore, search engines can greatly benefit from techniques that leverage efficient quality estimation methods to mitiga… ▽ More The internet contains large amounts of low-quality content, yet users expect web search engines to deliver high-quality, relevant results. The abundant presence of low-quality pages can negatively impact retrieval and crawling processes by wasting resources on these documents. Therefore, search engines can greatly benefit from techniques that leverage efficient quality estimation methods to mitigate these negative impacts. Quality scoring methods for web pages are useful for many processes typical for web search systems, including static index pruning, index tiering, and crawling. Building on work by Chang et al.~\cite{chang2024neural}, who proposed using neural estimators of semantic quality for static index pruning, we extend their approach and apply their neural quality scorers to assess the semantic quality of web pages in crawling prioritisation tasks. In our experimental analysis, we found that prioritising semantically high-quality pages over low-quality ones can improve downstream search effectiveness. Our software contribution consists of a Docker container that computes an effective quality score for a given web page, allowing the quality scorer to be easily included and used in other components of web search systems. △ Less

Submitted 15 April, 2025; originally announced April 2025.

Comments: Presented at WOWS2025

arXiv:2504.08165 [pdf, other]

doi 10.18653/v1/2023.conll-babylm.1

Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora

Authors: Alex Warstadt, Aaron Mueller, Leshem Choshen, Ethan Wilcox, Chengxu Zhuang, Juan Ciro, Rafael Mosquera, Bhargavi Paranjape, Adina Williams, Tal Linzen, Ryan Cotterell

Abstract: Children can acquire language from less than 100 million words of input. Large language models are far less data-efficient: they typically require 3 or 4 orders of magnitude more data and still do not perform as well as humans on many evaluations. These intensive resource demands limit the ability of researchers to train new models and use existing models as developmentally plausible cognitive mod… ▽ More Children can acquire language from less than 100 million words of input. Large language models are far less data-efficient: they typically require 3 or 4 orders of magnitude more data and still do not perform as well as humans on many evaluations. These intensive resource demands limit the ability of researchers to train new models and use existing models as developmentally plausible cognitive models. The BabyLM Challenge is a communal effort in which participants compete to optimize language model training on a fixed data budget. Submissions are compared on various evaluation tasks targeting grammatical ability, downstream task performance, and generalization. Participants can submit to up to three tracks with progressively looser data restrictions. From over 30 submissions, we extract concrete recommendations on how best to train data-efficient language models, and on where future efforts should (and perhaps should not) focus. The winning submissions using the LTG-BERT architecture (Samuel et al., 2023) outperformed models trained on trillions of words. Other submissions achieved strong results through training on shorter input sequences or training a student model on a pretrained teacher. Curriculum learning attempts, which accounted for a large number of submissions, were largely unsuccessful, though some showed modest improvements. △ Less

Submitted 10 April, 2025; originally announced April 2025.

Comments: Published in Proceedings of BabyLM. Please cite the published version on ACL anthology: http://aclanthology.org/2023.conll-babylm.1/

Journal ref: 2023. In Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning, pages 1-34, Singapore. Association for Computational Linguistics

arXiv:2503.23760 [pdf, other]

Towards a cognitive architecture to enable natural language interaction in co-constructive task learning

Authors: Manuel Scheibl, Birte Richter, Alissa Müller, Michael Beetz, Britta Wrede

Abstract: This research addresses the question, which characteristics a cognitive architecture must have to leverage the benefits of natural language in Co-Constructive Task Learning (CCTL). To provide context, we first discuss Interactive Task Learning (ITL), the mechanisms of the human memory system, and the significance of natural language and multi-modality. Next, we examine the current state of cogniti… ▽ More This research addresses the question, which characteristics a cognitive architecture must have to leverage the benefits of natural language in Co-Constructive Task Learning (CCTL). To provide context, we first discuss Interactive Task Learning (ITL), the mechanisms of the human memory system, and the significance of natural language and multi-modality. Next, we examine the current state of cognitive architectures, analyzing their capabilities to inform a concept of CCTL grounded in multiple sources. We then integrate insights from various research domains to develop a unified framework. Finally, we conclude by identifying the remaining challenges and requirements necessary to achieve CCTL in Human-Robot Interaction (HRI). △ Less

Submitted 31 March, 2025; originally announced March 2025.

Comments: 8 pages, 5 figures, submitted to: IEEE RO-MAN 2025

arXiv:2503.11404 [pdf, other]

Towards A Correct Usage of Cryptography in Semantic Watermarks for Diffusion Models

Authors: Jonas Thietke, Andreas Müller, Denis Lukovnikov, Asja Fischer, Erwin Quiring

Abstract: Semantic watermarking methods enable the direct integration of watermarks into the generation process of latent diffusion models by only modifying the initial latent noise. One line of approaches building on Gaussian Shading relies on cryptographic primitives to steer the sampling process of the latent noise. However, we identify several issues in the usage of cryptographic techniques in Gaussian… ▽ More Semantic watermarking methods enable the direct integration of watermarks into the generation process of latent diffusion models by only modifying the initial latent noise. One line of approaches building on Gaussian Shading relies on cryptographic primitives to steer the sampling process of the latent noise. However, we identify several issues in the usage of cryptographic techniques in Gaussian Shading, particularly in its proof of lossless performance and key management, causing ambiguity in follow-up works, too. In this work, we therefore revisit the cryptographic primitives for semantic watermarking. We introduce a novel, general proof of lossless performance based on IND\$-CPA security for semantic watermarks. We then discuss the configuration of the cryptographic primitives in semantic watermarks with respect to security, efficiency, and generation quality. △ Less

Submitted 14 March, 2025; originally announced March 2025.

Comments: 8 pages, 3 figures, WMark@ICLR

arXiv:2503.02922 [pdf, other]

Optimizing open-domain question answering with graph-based retrieval augmented generation

Authors: Joyce Cahoon, Prerna Singh, Nick Litombe, Jonathan Larson, Ha Trinh, Yiwen Zhu, Andreas Mueller, Fotis Psallidas, Carlo Curino

Abstract: In this work, we benchmark various graph-based retrieval-augmented generation (RAG) systems across a broad spectrum of query types, including OLTP-style (fact-based) and OLAP-style (thematic) queries, to address the complex demands of open-domain question answering (QA). Traditional RAG methods often fall short in handling nuanced, multi-document synthesis tasks. By structuring knowledge as graphs… ▽ More In this work, we benchmark various graph-based retrieval-augmented generation (RAG) systems across a broad spectrum of query types, including OLTP-style (fact-based) and OLAP-style (thematic) queries, to address the complex demands of open-domain question answering (QA). Traditional RAG methods often fall short in handling nuanced, multi-document synthesis tasks. By structuring knowledge as graphs, we can facilitate the retrieval of context that captures greater semantic depth and enhances language model operations. We explore graph-based RAG methodologies and introduce TREX, a novel, cost-effective alternative that combines graph-based and vector-based retrieval techniques. Our benchmarking across four diverse datasets highlights the strengths of different RAG methodologies, demonstrates TREX's ability to handle multiple open-domain QA types, and reveals the limitations of current evaluation methods. In a real-world technical support case study, we demonstrate how TREX solutions can surpass conventional vector-based RAG in efficiently synthesizing data from heterogeneous sources. Our findings underscore the potential of augmenting large language models with advanced retrieval and orchestration capabilities, advancing scalable, graph-based AI solutions. △ Less

Submitted 4 March, 2025; originally announced March 2025.

ACM Class: H.3.3; I.2.7

arXiv:2502.11673 [pdf, ps, other]

Best of Both Worlds: Regret Minimization versus Minimax Play

Authors: Adrian Müller, Jon Schneider, Stratis Skoulakis, Luca Viano, Volkan Cevher

Abstract: In this paper, we investigate the existence of online learning algorithms with bandit feedback that simultaneously guarantee $O(1)$ regret compared to a given comparator strategy, and $\tilde{O}(\sqrt{T})$ regret compared to any fixed strategy, where $T$ is the number of rounds. We provide the first affirmative answer to this question whenever the comparator strategy supports every action. In the… ▽ More In this paper, we investigate the existence of online learning algorithms with bandit feedback that simultaneously guarantee $O(1)$ regret compared to a given comparator strategy, and $\tilde{O}(\sqrt{T})$ regret compared to any fixed strategy, where $T$ is the number of rounds. We provide the first affirmative answer to this question whenever the comparator strategy supports every action. In the context of zero-sum games with min-max value zero, both in normal- and extensive form, we show that our results allow us to guarantee to risk at most $O(1)$ loss while being able to gain $Ω(T)$ from exploitable opponents, thereby combining the benefits of both no-regret algorithms and minimax play. △ Less

Submitted 4 June, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

arXiv:2502.10645 [pdf, other]

BabyLM Turns 3: Call for papers for the 2025 BabyLM workshop

Authors: Lucas Charpentier, Leshem Choshen, Ryan Cotterell, Mustafa Omer Gul, Michael Hu, Jaap Jumelet, Tal Linzen, Jing Liu, Aaron Mueller, Candace Ross, Raj Sanjay Shah, Alex Warstadt, Ethan Wilcox, Adina Williams

Abstract: BabyLM aims to dissolve the boundaries between cognitive modeling and language modeling. We call for both workshop papers and for researchers to join the 3rd BabyLM competition. As in previous years, we call for participants in the data-efficient pretraining challenge in the general track. This year, we also offer a new track: INTERACTION. This new track encourages interactive behavior, learning f… ▽ More BabyLM aims to dissolve the boundaries between cognitive modeling and language modeling. We call for both workshop papers and for researchers to join the 3rd BabyLM competition. As in previous years, we call for participants in the data-efficient pretraining challenge in the general track. This year, we also offer a new track: INTERACTION. This new track encourages interactive behavior, learning from a teacher, and adapting the teaching material to the student. We also call for papers outside the competition in any relevant areas. These include training efficiency, cognitively plausible research, weak model evaluation, and more. △ Less

Submitted 24 February, 2025; v1 submitted 14 February, 2025; originally announced February 2025.

Comments: EMNLP 2025 BabyLM Workshop. arXiv admin note: text overlap with arXiv:2404.06214

arXiv:2502.05392 [pdf, other]

Open Challenges in Time Series Anomaly Detection: An Industry Perspective

Authors: Andreas Mueller

Abstract: Current research in time-series anomaly detection is using definitions that miss critical aspects of how anomaly detection is commonly used in practice. We list several areas that are of practical relevance and that we believe are either under-investigated or missing entirely from the current discourse. Based on an investigation of systems deployed in a cloud environment, we motivate the areas of… ▽ More Current research in time-series anomaly detection is using definitions that miss critical aspects of how anomaly detection is commonly used in practice. We list several areas that are of practical relevance and that we believe are either under-investigated or missing entirely from the current discourse. Based on an investigation of systems deployed in a cloud environment, we motivate the areas of streaming algorithms, human-in-the-loop scenarios, point processes, conditional anomalies and populations analysis of time series. This paper serves as a motivation and call for action, including opportunities for theoretical and applied research, as well as for building new dataset and benchmarks. △ Less

Submitted 7 February, 2025; originally announced February 2025.

arXiv:2502.04577 [pdf, other]

Position-aware Automatic Circuit Discovery

Authors: Tal Haklay, Hadas Orgad, David Bau, Aaron Mueller, Yonatan Belinkov

Abstract: A widely used strategy to discover and understand language model mechanisms is circuit analysis. A circuit is a minimal subgraph of a model's computation graph that executes a specific task. We identify a gap in existing circuit discovery methods: they assume circuits are position-invariant, treating model components as equally relevant across input positions. This limits their ability to capture… ▽ More A widely used strategy to discover and understand language model mechanisms is circuit analysis. A circuit is a minimal subgraph of a model's computation graph that executes a specific task. We identify a gap in existing circuit discovery methods: they assume circuits are position-invariant, treating model components as equally relevant across input positions. This limits their ability to capture cross-positional interactions or mechanisms that vary across positions. To address this gap, we propose two improvements to incorporate positionality into circuits, even on tasks containing variable-length examples. First, we extend edge attribution patching, a gradient-based method for circuit discovery, to differentiate between token positions. Second, we introduce the concept of a dataset schema, which defines token spans with similar semantics across examples, enabling position-aware circuit discovery in datasets with variable length examples. We additionally develop an automated pipeline for schema generation and application using large language models. Our approach enables fully automated discovery of position-sensitive circuits, yielding better trade-offs between circuit size and faithfulness compared to prior work. △ Less

Submitted 6 February, 2025; originally announced February 2025.

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2502.03376 [pdf]

Ethical Considerations for the Military Use of Artificial Intelligence in Visual Reconnaissance

Authors: Mathias Anneken, Nadia Burkart, Fabian Jeschke, Achim Kuwertz-Wolf, Almuth Mueller, Arne Schumann, Michael Teutsch

Abstract: This white paper underscores the critical importance of responsibly deploying Artificial Intelligence (AI) in military contexts, emphasizing a commitment to ethical and legal standards. The evolving role of AI in the military goes beyond mere technical applications, necessitating a framework grounded in ethical principles. The discussion within the paper delves into ethical AI principles, particul… ▽ More This white paper underscores the critical importance of responsibly deploying Artificial Intelligence (AI) in military contexts, emphasizing a commitment to ethical and legal standards. The evolving role of AI in the military goes beyond mere technical applications, necessitating a framework grounded in ethical principles. The discussion within the paper delves into ethical AI principles, particularly focusing on the Fairness, Accountability, Transparency, and Ethics (FATE) guidelines. Noteworthy considerations encompass transparency, justice, non-maleficence, and responsibility. Importantly, the paper extends its examination to military-specific ethical considerations, drawing insights from the Just War theory and principles established by prominent entities. In addition to the identified principles, the paper introduces further ethical considerations specifically tailored for military AI applications. These include traceability, proportionality, governability, responsibility, and reliability. The application of these ethical principles is discussed on the basis of three use cases in the domains of sea, air, and land. Methods of automated sensor data analysis, eXplainable AI (XAI), and intuitive user experience are utilized to specify the use cases close to real-world scenarios. This comprehensive approach to ethical considerations in military AI reflects a commitment to aligning technological advancements with established ethical frameworks. It recognizes the need for a balance between leveraging AI's potential benefits in military operations while upholding moral and legal standards. The inclusion of these ethical principles serves as a foundation for responsible and accountable use of AI in the complex and dynamic landscape of military scenarios. △ Less

Submitted 5 February, 2025; originally announced February 2025.

Comments: White Paper, 30 pages, 7 figures

arXiv:2501.15849 [pdf, ps, other]

Gaussian Process-Based Prediction and Control of Hammerstein-Wiener Systems

Authors: Mingzhou Yin, Matthias A. Müller

Abstract: This work investigates data-driven prediction and control of Hammerstein-Wiener systems using physics-informed Gaussian process models. Data-driven prediction algorithms have been developed for structured nonlinear systems based on Willems' fundamental lemma. However, existing frameworks cannot treat output nonlinearities and require a dictionary of basis functions for Hammerstein systems. In this… ▽ More This work investigates data-driven prediction and control of Hammerstein-Wiener systems using physics-informed Gaussian process models. Data-driven prediction algorithms have been developed for structured nonlinear systems based on Willems' fundamental lemma. However, existing frameworks cannot treat output nonlinearities and require a dictionary of basis functions for Hammerstein systems. In this work, an implicit predictor structure is considered, leveraging the multi-step-ahead ARX structure for the linear part of the model. This implicit function is learned by Gaussian process regression with kernel functions designed from Gaussian process priors for the nonlinearities. The linear model parameters are estimated as hyperparameters by assuming a stable spline hyperprior. The implicit Gaussian process model provides explicit output prediction by optimizing selected optimality criteria. The model is also applied to receding horizon control with the expected control cost and chance constraint satisfaction guarantee. Numerical results demonstrate that the proposed prediction and control algorithms are superior to black-box Gaussian process models. △ Less

Submitted 27 January, 2025; originally announced January 2025.

arXiv:2501.10713 [pdf]

Human-like Nonverbal Behavior with MetaHumans in Real-World Interaction Studies: An Architecture Using Generative Methods and Motion Capture

Authors: Oliver Chojnowski, Alexander Eberhard, Michael Schiffmann, Ana Müller, Anja Richert

Abstract: Socially interactive agents are gaining prominence in domains like healthcare, education, and service contexts, particularly virtual agents due to their inherent scalability. To facilitate authentic interactions, these systems require verbal and nonverbal communication through e.g., facial expressions and gestures. While natural language processing technologies have rapidly advanced, incorporating… ▽ More Socially interactive agents are gaining prominence in domains like healthcare, education, and service contexts, particularly virtual agents due to their inherent scalability. To facilitate authentic interactions, these systems require verbal and nonverbal communication through e.g., facial expressions and gestures. While natural language processing technologies have rapidly advanced, incorporating human-like nonverbal behavior into real-world interaction contexts is crucial for enhancing the success of communication, yet this area remains underexplored. One barrier is creating autonomous systems with sophisticated conversational abilities that integrate human-like nonverbal behavior. This paper presents a distributed architecture using Epic Games MetaHuman, combined with advanced conversational AI and camera-based user management, that supports methods like motion capture, handcrafted animation, and generative approaches for nonverbal behavior. We share insights into a system architecture designed to investigate nonverbal behavior in socially interactive agents, deployed in a three-week field study in the Deutsches Museum Bonn, showcasing its potential in realistic nonverbal behavior research. △ Less

Submitted 18 January, 2025; originally announced January 2025.

Comments: Accepted for presentation at the ACM/IEEE International Conference on Human-Robot Interaction (HRI 2025) as a Late-Breaking Report

arXiv:2501.08618 [pdf, other]

Disjoint Processing Mechanisms of Hierarchical and Linear Grammars in Large Language Models

Authors: Aruna Sankaranarayanan, Dylan Hadfield-Menell, Aaron Mueller

Abstract: All natural languages are structured hierarchically. In humans, this structural restriction is neurologically coded: when two grammars are presented with identical vocabularies, brain areas responsible for language processing are only sensitive to hierarchical grammars. Using large language models (LLMs), we investigate whether such functionally distinct hierarchical processing regions can arise s… ▽ More All natural languages are structured hierarchically. In humans, this structural restriction is neurologically coded: when two grammars are presented with identical vocabularies, brain areas responsible for language processing are only sensitive to hierarchical grammars. Using large language models (LLMs), we investigate whether such functionally distinct hierarchical processing regions can arise solely from exposure to large-scale language distributions. We generate inputs using English, Italian, Japanese, or nonce words, varying the underlying grammars to conform to either hierarchical or linear/positional rules. Using these grammars, we first observe that language models show distinct behaviors on hierarchical versus linearly structured inputs. Then, we find that the components responsible for processing hierarchical grammars are distinct from those that process linear grammars; we causally verify this in ablation experiments. Finally, we observe that hierarchy-selective components are also active on nonce grammars; this suggests that hierarchy sensitivity is not tied to meaning, nor in-distribution inputs. △ Less

Submitted 15 January, 2025; originally announced January 2025.

arXiv:2501.06346 [pdf, other]

Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages

Authors: Jannik Brinkmann, Chris Wendler, Christian Bartelt, Aaron Mueller

Abstract: Human bilinguals often use similar brain regions to process multiple languages, depending on when they learned their second language and their proficiency. In large language models (LLMs), how are multiple languages learned and encoded? In this work, we explore the extent to which LLMs share representations of morphsyntactic concepts such as grammatical number, gender, and tense across languages.… ▽ More Human bilinguals often use similar brain regions to process multiple languages, depending on when they learned their second language and their proficiency. In large language models (LLMs), how are multiple languages learned and encoded? In this work, we explore the extent to which LLMs share representations of morphsyntactic concepts such as grammatical number, gender, and tense across languages. We train sparse autoencoders on Llama-3-8B and Aya-23-8B, and demonstrate that abstract grammatical concepts are often encoded in feature directions shared across many languages. We use causal interventions to verify the multilingual nature of these representations; specifically, we show that ablating only multilingual features decreases classifier performance to near-chance across languages. We then use these features to precisely modify model behavior in a machine translation task; this demonstrates both the generality and selectivity of these feature's roles in the network. Our findings suggest that even models trained predominantly on English data can develop robust, cross-lingual abstractions of morphosyntactic concepts. △ Less

Submitted 23 May, 2025; v1 submitted 10 January, 2025; originally announced January 2025.

arXiv:2412.20409 [pdf, other]

doi 10.1007/978-3-031-64057-5_29

Analytically Informed Inverse Kinematics Solution at Singularities

Authors: Andreas Mueller

Abstract: Near kinematic singularities of a serial manipulator, the inverse kinematics (IK) problem becomes ill-conditioned, which poses computational problems for the numerical solution. Computational methods to tackle this issue are based on various forms of a pseudoinverse (PI) solution to the velocity IK problem. The damped least squares (DLS) method provides a robust solution with controllable converge… ▽ More Near kinematic singularities of a serial manipulator, the inverse kinematics (IK) problem becomes ill-conditioned, which poses computational problems for the numerical solution. Computational methods to tackle this issue are based on various forms of a pseudoinverse (PI) solution to the velocity IK problem. The damped least squares (DLS) method provides a robust solution with controllable convergence rate. However, at singularities, it may not even be possible to solve the IK problem using any PI solution when certain end-effector motions are prescribed. To overcome this problem, an analytically informed inverse kinematics (AI-IK) method is proposed. The key step of the method is an explicit description of the tangent aspect of singular motions (the analytic part) to deduce a perturbation that yields a regular configuration. The latter serves as start configuration for the iterative solution (the numeric part). Numerical results are reported for a 7-DOF Kuka iiwa. △ Less

Submitted 29 December, 2024; originally announced December 2024.

Journal ref: In: Lenarcic, J., Husty, M. (eds) Advances in Robot Kinematics 2024. ARK 2024. Springer Proceedings in Advanced Robotics, vol 31. Springer, Cham

arXiv:2412.13681 [pdf, other]

doi 10.1016/j.mechmachtheory.2021.104549

Dynamics of Parallel Manipulators with Hybrid Complex Limbs -- Modular Modeling and Parallel Computing

Authors: Andreas Mueller

Abstract: Parallel manipulators, also called parallel kinematics machines (PKM), enable robotic solutions for highly dynamic handling and machining applications. The safe and accurate design and control necessitates high-fidelity dynamics models. Such modeling approaches have already been presented for PKM with simple limbs (i.e. each limb is a serial kinematic chain). A systematic modeling approach for PKM… ▽ More Parallel manipulators, also called parallel kinematics machines (PKM), enable robotic solutions for highly dynamic handling and machining applications. The safe and accurate design and control necessitates high-fidelity dynamics models. Such modeling approaches have already been presented for PKM with simple limbs (i.e. each limb is a serial kinematic chain). A systematic modeling approach for PKM with complex limbs (i.e. limbs that possess kinematic loops) was not yet proposed despite the fact that many successful PKM comprise complex limbs. This paper presents a systematic modular approach to the kinematics and dynamics modeling of PKM with complex limbs that are built as serial arrangement of closed loops. The latter are referred to as hybrid limbs, and can be found in almost all PKM with complex limbs, such as the Delta robot. The proposed method generalizes the formulation for PKM with simple limbs by means of local resolution of loop constraints, which is known as constraint embedding in multibody dynamics. The constituent elements of the method are the kinematic and dynamic equations of motions (EOM), and the inverse kinematics solution of the limbs, i.e. the relation of platform motion and the motion of the limbs. While the approach is conceptually independent of the used kinematics and dynamics formulation, a Lie group formulation is employed for deriving the EOM. The frame invariance of the Lie group formulation is used for devising a modular modeling method where the EOM of a representative limb are used to derived the EOM of the limbs of a particular PKM. The PKM topology is exploited in a parallel computation scheme that shall allow for computationally efficient distributed evaluation of the overall EOM of the PKM. Finally, the method is applied to the IRSBot-2 and a 3\underline{R}R[2RR]R Delta robot, which is presented in detail. △ Less

Submitted 18 December, 2024; originally announced December 2024.

Journal ref: Mechanism and Machine Theory, Volume 167, January 2022

arXiv:2412.13638 [pdf, other]

doi 10.1016/j.robot.2022.104187

A Constraint Embedding Approach for Dynamics Modeling of Parallel Kinematic Manipulators with Hybrid Limbs

Authors: Andreas Mueller

Abstract: Parallel kinematic manipulators (PKM) are characterized by closed kinematic loops, due to the parallel arrangement of limbs but also due to the existence of kinematic loops within the limbs. Moreover, many PKM are built with limbs constructed by serially combining kinematic loops. Such limbs are called hybrid, which form a particular class of complex limbs. Design and model-based control requires… ▽ More Parallel kinematic manipulators (PKM) are characterized by closed kinematic loops, due to the parallel arrangement of limbs but also due to the existence of kinematic loops within the limbs. Moreover, many PKM are built with limbs constructed by serially combining kinematic loops. Such limbs are called hybrid, which form a particular class of complex limbs. Design and model-based control requires accurate dynamic PKM models desirably without model simplifications. Dynamics modeling then necessitates kinematic relations of all members of the PKM, in contrast to the standard kinematics modeling of PKM, where only the forward and inverse kinematics solution for the manipulator (relating input and output motions) are computed. This becomes more involved for PKM with hybrid limbs. In this paper a modular modeling approach is employed, where limbs are treated separately, and the individual dynamic equations of motions (EOM) are subsequently assembled to the overall model. Key to the kinematic modeling is the constraint resolution for the individual loops within the limbs. This local constraint resolution is a special case of the general \emph{constraint embedding} technique. The proposed method finally allows for a systematic modeling of general PKM. The method is demonstrated for the IRSBot-2, where each limb comprises two independent loops. △ Less

Submitted 18 December, 2024; originally announced December 2024.

Journal ref: Robotics and Autonomous Systems, Volume 155, September 2022

arXiv:2412.05353 [pdf, other]

Incremental Sentence Processing Mechanisms in Autoregressive Transformer Language Models

Authors: Michael Hanna, Aaron Mueller

Abstract: Autoregressive transformer language models (LMs) possess strong syntactic abilities, often successfully handling phenomena from agreement to NPI licensing. However, the features they use to incrementally process language inputs are not well understood. In this paper, we fill this gap by studying the mechanisms underlying garden path sentence processing in LMs. We ask: (1) Do LMs use syntactic feat… ▽ More Autoregressive transformer language models (LMs) possess strong syntactic abilities, often successfully handling phenomena from agreement to NPI licensing. However, the features they use to incrementally process language inputs are not well understood. In this paper, we fill this gap by studying the mechanisms underlying garden path sentence processing in LMs. We ask: (1) Do LMs use syntactic features or shallow heuristics to perform incremental sentence processing? (2) Do LMs represent only one potential interpretation, or multiple? and (3) Do LMs reanalyze or repair their initial incorrect representations? To address these questions, we use sparse autoencoders to identify interpretable features that determine which continuation - and thus which reading - of a garden path sentence the LM prefers. We find that while many important features relate to syntactic structure, some reflect syntactically irrelevant heuristics. Moreover, while most active features correspond to one reading of the sentence, some features correspond to the other, suggesting that LMs assign weight to both possibilities simultaneously. Finally, LMs do not re-use features from garden path sentence processing to answer follow-up questions. △ Less

Submitted 6 December, 2024; originally announced December 2024.

Comments: Code and data available at https://github.com/hannamw/GP-mechanisms

arXiv:2412.05149 [pdf, other]

Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora

Authors: Michael Y. Hu, Aaron Mueller, Candace Ross, Adina Williams, Tal Linzen, Chengxu Zhuang, Ryan Cotterell, Leshem Choshen, Alex Warstadt, Ethan Gotlieb Wilcox

Abstract: The BabyLM Challenge is a community effort to close the data-efficiency gap between human and computational language learners. Participants compete to optimize language model training on a fixed language data budget of 100 million words or less. This year, we released improved text corpora, as well as a vision-and-language corpus to facilitate research into cognitively plausible vision language mo… ▽ More The BabyLM Challenge is a community effort to close the data-efficiency gap between human and computational language learners. Participants compete to optimize language model training on a fixed language data budget of 100 million words or less. This year, we released improved text corpora, as well as a vision-and-language corpus to facilitate research into cognitively plausible vision language models. Submissions were compared on evaluation tasks targeting grammatical ability, (visual) question answering, pragmatic abilities, and grounding, among other abilities. Participants could submit to a 10M-word text-only track, a 100M-word text-only track, and/or a 100M-word and image multimodal track. From 31 submissions employing diverse methods, a hybrid causal-masked language model architecture outperformed other approaches. No submissions outperformed the baselines in the multimodal track. In follow-up analyses, we found a strong relationship between training FLOPs and average performance across tasks, and that the best-performing submissions proposed changes to the training data, training objective, and model architecture. This year's BabyLM Challenge shows that there is still significant room for innovation in this setting, in particular for image-text modeling, but community-driven research can yield actionable insights about effective strategies for small-scale language modeling. △ Less

Submitted 6 December, 2024; originally announced December 2024.

arXiv:2412.03283 [pdf, ps, other]

Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models

Authors: Andreas Müller, Denis Lukovnikov, Jonas Thietke, Asja Fischer, Erwin Quiring

Abstract: Integrating watermarking into the generation process of latent diffusion models (LDMs) simplifies detection and attribution of generated content. Semantic watermarks, such as Tree-Rings and Gaussian Shading, represent a novel class of watermarking techniques that are easy to implement and highly robust against various perturbations. However, our work demonstrates a fundamental security vulnerabili… ▽ More Integrating watermarking into the generation process of latent diffusion models (LDMs) simplifies detection and attribution of generated content. Semantic watermarks, such as Tree-Rings and Gaussian Shading, represent a novel class of watermarking techniques that are easy to implement and highly robust against various perturbations. However, our work demonstrates a fundamental security vulnerability of semantic watermarks. We show that attackers can leverage unrelated models, even with different latent spaces and architectures (UNet vs DiT), to perform powerful and realistic forgery attacks. Specifically, we design two watermark forgery attacks. The first imprints a targeted watermark into real images by manipulating the latent representation of an arbitrary image in an unrelated LDM to get closer to the latent representation of a watermarked image. We also show that this technique can be used for watermark removal. The second attack generates new images with the target watermark by inverting a watermarked image and re-generating it with an arbitrary prompt. Both attacks just need a single reference image with the target watermark. Overall, our findings question the applicability of semantic watermarks by revealing that attackers can easily forge or remove these watermarks under realistic conditions. △ Less

Submitted 7 June, 2025; v1 submitted 4 December, 2024; originally announced December 2024.

Comments: CVPR 2025

Journal ref: Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 20937-20946

arXiv:2411.09826 [pdf, other]

Evaluating Gender Bias in Large Language Models

Authors: Michael Döll, Markus Döhring, Andreas Müller

Abstract: Gender bias in artificial intelligence has become an important issue, particularly in the context of language models used in communication-oriented applications. This study examines the extent to which Large Language Models (LLMs) exhibit gender bias in pronoun selection in occupational contexts. The analysis evaluates the models GPT-4, GPT-4o, PaLM 2 Text Bison and Gemini 1.0 Pro using a self-gen… ▽ More Gender bias in artificial intelligence has become an important issue, particularly in the context of language models used in communication-oriented applications. This study examines the extent to which Large Language Models (LLMs) exhibit gender bias in pronoun selection in occupational contexts. The analysis evaluates the models GPT-4, GPT-4o, PaLM 2 Text Bison and Gemini 1.0 Pro using a self-generated dataset. The jobs considered include a range of occupations, from those with a significant male presence to those with a notable female concentration, as well as jobs with a relatively equal gender distribution. Three different sentence processing methods were used to assess potential gender bias: masked tokens, unmasked sentences, and sentence completion. In addition, the LLMs suggested names of individuals in specific occupations, which were then examined for gender distribution. The results show a positive correlation between the models' pronoun choices and the gender distribution present in U.S. labor force data. Female pronouns were more often associated with female-dominated occupations, while male pronouns were more often associated with male-dominated occupations. Sentence completion showed the strongest correlation with actual gender distribution, while name generation resulted in a more balanced 'politically correct' gender distribution, albeit with notable variations in predominantly male or female occupations. Overall, the prompting method had a greater impact on gender distribution than the model selection itself, highlighting the complexity of addressing gender bias in LLMs. The findings highlight the importance of prompting in gender mapping. △ Less

Submitted 14 November, 2024; originally announced November 2024.

Comments: 13 pages, 12 figures, 1 table

arXiv:2410.22590 [pdf, other]

Characterizing the Role of Similarity in the Property Inferences of Language Models

Authors: Juan Diego Rodriguez, Aaron Mueller, Kanishka Misra

Abstract: Property inheritance -- a phenomenon where novel properties are projected from higher level categories (e.g., birds) to lower level ones (e.g., sparrows) -- provides a unique window into how humans organize and deploy conceptual knowledge. It is debated whether this ability arises due to explicitly stored taxonomic knowledge vs. simple computations of similarity between mental representations. How… ▽ More Property inheritance -- a phenomenon where novel properties are projected from higher level categories (e.g., birds) to lower level ones (e.g., sparrows) -- provides a unique window into how humans organize and deploy conceptual knowledge. It is debated whether this ability arises due to explicitly stored taxonomic knowledge vs. simple computations of similarity between mental representations. How are these mechanistic hypotheses manifested in contemporary language models? In this work, we investigate how LMs perform property inheritance with behavioral and causal representational analysis experiments. We find that taxonomy and categorical similarities are not mutually exclusive in LMs' property inheritance behavior. That is, LMs are more likely to project novel properties from one category to the other when they are taxonomically related and at the same time, highly similar. Our findings provide insight into the conceptual structure of language models and may suggest new psycholinguistic experiments for human subjects. △ Less

Submitted 9 March, 2025; v1 submitted 29 October, 2024; originally announced October 2024.

Comments: Published at NAACL 2025

arXiv:2410.21272 [pdf, other]

Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics

Authors: Yaniv Nikankin, Anja Reusch, Aaron Mueller, Yonatan Belinkov

Abstract: Do large language models (LLMs) solve reasoning tasks by learning robust generalizable algorithms, or do they memorize training data? To investigate this question, we use arithmetic reasoning as a representative task. Using causal analysis, we identify a subset of the model (a circuit) that explains most of the model's behavior for basic arithmetic logic and examine its functionality. By zooming i… ▽ More Do large language models (LLMs) solve reasoning tasks by learning robust generalizable algorithms, or do they memorize training data? To investigate this question, we use arithmetic reasoning as a representative task. Using causal analysis, we identify a subset of the model (a circuit) that explains most of the model's behavior for basic arithmetic logic and examine its functionality. By zooming in on the level of individual circuit neurons, we discover a sparse set of important neurons that implement simple heuristics. Each heuristic identifies a numerical input pattern and outputs corresponding answers. We hypothesize that the combination of these heuristic neurons is the mechanism used to produce correct arithmetic answers. To test this, we categorize each neuron into several heuristic types-such as neurons that activate when an operand falls within a certain range-and find that the unordered combination of these heuristic types is the mechanism that explains most of the model's accuracy on arithmetic prompts. Finally, we demonstrate that this mechanism appears as the main source of arithmetic accuracy early in training. Overall, our experimental results across several LLMs show that LLMs perform arithmetic using neither robust algorithms nor memorization; rather, they rely on a "bag of heuristics". △ Less

Submitted 20 May, 2025; v1 submitted 28 October, 2024; originally announced October 2024.

MSC Class: 68T5 ACM Class: I.2.7

arXiv:2410.14463 [pdf, ps, other]

An abstract structure determines the contextuality degree of observable-based Kochen-Specker proofs

Authors: Axel Muller, Alain Giorgetti

Abstract: This article delves into the concept of quantum contextuality, specifically focusing on proofs of the Kochen-Specker theorem obtained by assigning Pauli observables to hypergraph vertices satisfying a given commutation relation. The abstract structure composed of this hypergraph and the graph of anticommutations is named a hypergram. Its labelings with Pauli observables generalize the well-known m… ▽ More This article delves into the concept of quantum contextuality, specifically focusing on proofs of the Kochen-Specker theorem obtained by assigning Pauli observables to hypergraph vertices satisfying a given commutation relation. The abstract structure composed of this hypergraph and the graph of anticommutations is named a hypergram. Its labelings with Pauli observables generalize the well-known magic sets. A first result is that all these quantum labelings satisfying the conditions of a given hypergram inherently possess the same degree of contextuality. Then we provide a necessary and sufficient algebraic condition for the existence of such quantum labelings and an efficient algorithm to find one of them. We finally attach to each assignable hypergram an abstract notion of contextuality degree. By presenting the study of observable-based Kochen-Specker proofs from the perspective of graphs and matrices, this abstraction opens the way to new methods to search for original contextual configurations. △ Less

Submitted 18 October, 2024; originally announced October 2024.

Comments: 18 pages, 3 figures, 1 table

arXiv:2410.06029 [pdf, other]

Unclonable Functional Encryption

Authors: Arthur Mehta, Anne Müller

Abstract: In a functional encryption (FE) scheme, a user that holds a ciphertext and a function key can learn the result of applying the function to the plaintext message. Security requires that the user does not learn anything beyond the function evaluation. We extend this notion to the quantum setting by providing definitions and a construction for a quantum functional encryption (QFE) scheme which allows… ▽ More In a functional encryption (FE) scheme, a user that holds a ciphertext and a function key can learn the result of applying the function to the plaintext message. Security requires that the user does not learn anything beyond the function evaluation. We extend this notion to the quantum setting by providing definitions and a construction for a quantum functional encryption (QFE) scheme which allows for the evaluation of polynomialy-sized circuits on arbitrary quantum messages. Our construction is built upon quantum garbled circuits [BY22]. We also investigate the relationship of QFE to the seemingly unrelated notion of unclonable encryption (UE) and find that any QFE scheme universally achieves the property of unclonable functional encryption (UFE). In particular we assume the existence of an unclonable encryption scheme with quantum decryption keys which was recently constructed by [AKY24]. Our UFE guarantees that two parties cannot simultaneously recover the correct function outputs using two independently sampled function secret keys. As an application we give the first construction for public-key UE with variable decryption keys. Lastly, we establish a connection between quantum indistinguishability obfuscation (qiO) and quantum functional encryption (QFE); Showing that any multi-input indistinguishability-secure quantum functional encryption scheme unconditionally implies the existence of qiO. △ Less

Submitted 14 March, 2025; v1 submitted 8 October, 2024; originally announced October 2024.

arXiv:2410.04560 [pdf, other]

GAMformer: In-Context Learning for Generalized Additive Models

Authors: Andreas Mueller, Julien Siems, Harsha Nori, David Salinas, Arber Zela, Rich Caruana, Frank Hutter

Abstract: Generalized Additive Models (GAMs) are widely recognized for their ability to create fully interpretable machine learning models for tabular data. Traditionally, training GAMs involves iterative learning algorithms, such as splines, boosted trees, or neural networks, which refine the additive components through repeated error reduction. In this paper, we introduce GAMformer, the first method to le… ▽ More Generalized Additive Models (GAMs) are widely recognized for their ability to create fully interpretable machine learning models for tabular data. Traditionally, training GAMs involves iterative learning algorithms, such as splines, boosted trees, or neural networks, which refine the additive components through repeated error reduction. In this paper, we introduce GAMformer, the first method to leverage in-context learning to estimate shape functions of a GAM in a single forward pass, representing a significant departure from the conventional iterative approaches to GAM fitting. Building on previous research applying in-context learning to tabular data, we exclusively use complex, synthetic data to train GAMformer, yet find it extrapolates well to real-world data. Our experiments show that GAMformer performs on par with other leading GAMs across various classification benchmarks while generating highly interpretable shape functions. △ Less

Submitted 6 October, 2024; originally announced October 2024.

Comments: 20 pages, 12 figures

arXiv:2409.11933 [pdf, other]

Reinforcement Learning as an Improvement Heuristic for Real-World Production Scheduling

Authors: Arthur Müller, Lukas Vollenkemper

Abstract: The integration of Reinforcement Learning (RL) with heuristic methods is an emerging trend for solving optimization problems, which leverages RL's ability to learn from the data generated during the search process. One promising approach is to train an RL agent as an improvement heuristic, starting with a suboptimal solution that is iteratively improved by applying small changes. We apply this app… ▽ More The integration of Reinforcement Learning (RL) with heuristic methods is an emerging trend for solving optimization problems, which leverages RL's ability to learn from the data generated during the search process. One promising approach is to train an RL agent as an improvement heuristic, starting with a suboptimal solution that is iteratively improved by applying small changes. We apply this approach to a real-world multiobjective production scheduling problem. Our approach utilizes a network architecture that includes Transformer encoding to learn the relationships between jobs. Afterwards, a probability matrix is generated from which pairs of jobs are sampled and then swapped to improve the solution. We benchmarked our approach against other heuristics using real data from our industry partner, demonstrating its superior performance. △ Less

Submitted 18 September, 2024; originally announced September 2024.

Comments: This paper was accepted at the ICMLA 2024

arXiv:2408.09841 [pdf, other]

Demystifying Reinforcement Learning in Production Scheduling via Explainable AI

Authors: Daniel Fischer, Hannah M. Hüsener, Felix Grumbach, Lukas Vollenkemper, Arthur Müller, Pascal Reusch

Abstract: Deep Reinforcement Learning (DRL) is a frequently employed technique to solve scheduling problems. Although DRL agents ace at delivering viable results in short computing times, their reasoning remains opaque. We conduct a case study where we systematically apply two explainable AI (xAI) frameworks, namely SHAP (DeepSHAP) and Captum (Input x Gradient), to describe the reasoning behind scheduling d… ▽ More Deep Reinforcement Learning (DRL) is a frequently employed technique to solve scheduling problems. Although DRL agents ace at delivering viable results in short computing times, their reasoning remains opaque. We conduct a case study where we systematically apply two explainable AI (xAI) frameworks, namely SHAP (DeepSHAP) and Captum (Input x Gradient), to describe the reasoning behind scheduling decisions of a specialized DRL agent in a flow production. We find that methods in the xAI literature lack falsifiability and consistent terminology, do not adequately consider domain-knowledge, the target audience or real-world scenarios, and typically provide simple input-output explanations rather than causal interpretations. To resolve this issue, we introduce a hypotheses-based workflow. This approach enables us to inspect whether explanations align with domain knowledge and match the reward hypotheses of the agent. We furthermore tackle the challenge of communicating these insights to third parties by tailoring hypotheses to the target audience, which can serve as interpretations of the agent's behavior after verification. Our proposed workflow emphasizes the repeated verification of explanations and may be applicable to various DRL-based scheduling use cases. △ Less

Submitted 30 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

arXiv:2408.01416 [pdf, other]

The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability

Authors: Aaron Mueller, Jannik Brinkmann, Millicent Li, Samuel Marks, Koyena Pal, Nikhil Prakash, Can Rager, Aruna Sankaranarayanan, Arnab Sen Sharma, Jiuding Sun, Eric Todd, David Bau, Yonatan Belinkov

Abstract: Interpretability provides a toolset for understanding how and why neural networks behave in certain ways. However, there is little unity in the field: most studies employ ad-hoc evaluations and do not share theoretical foundations, making it difficult to measure progress and compare the pros and cons of different techniques. Furthermore, while mechanistic understanding is frequently discussed, the… ▽ More Interpretability provides a toolset for understanding how and why neural networks behave in certain ways. However, there is little unity in the field: most studies employ ad-hoc evaluations and do not share theoretical foundations, making it difficult to measure progress and compare the pros and cons of different techniques. Furthermore, while mechanistic understanding is frequently discussed, the basic causal units underlying these mechanisms are often not explicitly defined. In this paper, we propose a perspective on interpretability research grounded in causal mediation analysis. Specifically, we describe the history and current state of interpretability taxonomized according to the types of causal units (mediators) employed, as well as methods used to search over mediators. We discuss the pros and cons of each mediator, providing insights as to when particular kinds of mediators and search methods are most appropriate depending on the goals of a given study. We argue that this framing yields a more cohesive narrative of the field, as well as actionable insights for future work. Specifically, we recommend a focus on discovering new mediators with better trade-offs between human-interpretability and compute-efficiency, and which can uncover more sophisticated abstractions from neural networks than the primarily linear mediators employed in current work. We also argue for more standardized evaluations that enable principled comparisons across mediator types, such that we can better understand when particular causal units are better suited to particular use cases. △ Less

Submitted 2 August, 2024; originally announced August 2024.

arXiv:2407.19427 [pdf]

The influence of Automated Decision-Making systems in the context of street-level bureaucrats' practices

Authors: Manuel Portela, A. Paula Rodriguez Müller, Luca Tangi

Abstract: In an era of digital governance, the use of automation for individual and cooperative work is increasing in public administrations (Tangi et al., 2022). Despite the promises of efficiency and cost reduction, automation could bring new challenges to the governance schemes. Regional, national, and local governments are taking measures to regulate and measure the impact of automated decision-making s… ▽ More In an era of digital governance, the use of automation for individual and cooperative work is increasing in public administrations (Tangi et al., 2022). Despite the promises of efficiency and cost reduction, automation could bring new challenges to the governance schemes. Regional, national, and local governments are taking measures to regulate and measure the impact of automated decision-making systems (ADMS). This research focuses on the use and adoption of ADMS in European public administrations to understand how these systems have been transforming the roles, tasks, and duties of street-level bureaucrats. We conducted a qualitative study in which we interviewed street-level bureaucrats from three administrations who had used an ADMS for several years, which was embedded in their daily work routines. The outcome of our research is an analysis of five dimensions of how collaborative work, the organizational settings, the capacities of bureaucrats and the implementation of the ADMS enable or limit the capacities for offering better services towards the citizens. △ Less

Submitted 28 July, 2024; originally announced July 2024.

arXiv:2407.18009 [pdf, other]

Egocentric Robots in a Human-Centric World? Exploring Group-Robot-Interaction in Public Spaces

Authors: Ana Müller, Anja Richert

Abstract: The deployment of social robots in real-world scenarios is increasing, supporting humans in various contexts. However, they still struggle to grasp social dynamics, especially in public spaces, sometimes resulting in violations of social norms, such as interrupting human conversations. This behavior, originating from a limited processing of social norms, might be perceived as robot-centered. Under… ▽ More The deployment of social robots in real-world scenarios is increasing, supporting humans in various contexts. However, they still struggle to grasp social dynamics, especially in public spaces, sometimes resulting in violations of social norms, such as interrupting human conversations. This behavior, originating from a limited processing of social norms, might be perceived as robot-centered. Understanding social dynamics, particularly in group-robot-interactions (GRI), underscores the need for further research and development in human-robot-interaction (HRI). Enhancing the interaction abilities of social robots, especially in GRIs, can improve their effectiveness in real-world applications on a micro-level, as group interactions lead to increased motivation and comfort. In this study, we assessed the influence of the interaction condition (dyadic vs. triadic) on the perceived extraversion (ext.) of social robots in public spaces. The research involved 40 HRIs, including 24 dyadic (i.e., one human and one robot) interactions and 16 triadic interactions, which involve at least three entities, including the robot. △ Less

Submitted 25 July, 2024; originally announced July 2024.

Comments: Accepted at the workshop on advancing Group Understanding and robots' adaptive behavior (GROUND), held at the Robotics Science and Systems (RSS) Conference, 2024

arXiv:2407.14561 [pdf, other]

NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals

Authors: Jaden Fiotto-Kaufman, Alexander R. Loftus, Eric Todd, Jannik Brinkmann, Koyena Pal, Dmitrii Troitskii, Michael Ripa, Adam Belfki, Can Rager, Caden Juang, Aaron Mueller, Samuel Marks, Arnab Sen Sharma, Francesca Lucchetti, Nikhil Prakash, Carla Brodley, Arjun Guha, Jonathan Bell, Byron C. Wallace, David Bau

Abstract: We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of the representations and computations learned by very large neural networks. NNsight is an open-source system that extends PyTorch to introduce deferred remote execution. The National Deep Inference Fabric (NDIF) is a scalable inference service that executes NNsight requests, allowing users to share GPU re… ▽ More We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of the representations and computations learned by very large neural networks. NNsight is an open-source system that extends PyTorch to introduce deferred remote execution. The National Deep Inference Fabric (NDIF) is a scalable inference service that executes NNsight requests, allowing users to share GPU resources and pretrained models. These technologies are enabled by the Intervention Graph, an architecture developed to decouple experimental design from model runtime. Together, this framework provides transparent and efficient access to the internals of deep neural networks such as very large language models (LLMs) without imposing the cost or complexity of hosting customized models individually. We conduct a quantitative survey of the machine learning literature that reveals a growing gap in the study of the internals of large-scale AI. We demonstrate the design and use of our framework to address this gap by enabling a range of research methods on huge models. Finally, we conduct benchmarks to compare performance with previous approaches. Code, documentation, and tutorials are available at https://nnsight.net/. △ Less

Submitted 1 April, 2025; v1 submitted 18 July, 2024; originally announced July 2024.

Comments: Code at https://nnsight.net

arXiv:2407.04690 [pdf, other]

Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for Interpreting Neural Networks

Authors: Aaron Mueller

Abstract: Interpretability research takes counterfactual theories of causality for granted. Most causal methods rely on counterfactual interventions to inputs or the activations of particular model components, followed by observations of the change in models' output logits or behaviors. While this yields more faithful evidence than correlational methods, counterfactuals nonetheless have key problems that bi… ▽ More Interpretability research takes counterfactual theories of causality for granted. Most causal methods rely on counterfactual interventions to inputs or the activations of particular model components, followed by observations of the change in models' output logits or behaviors. While this yields more faithful evidence than correlational methods, counterfactuals nonetheless have key problems that bias our findings in specific and predictable ways. Specifically, (i) counterfactual theories do not effectively capture multiple independently sufficient causes of the same effect, which leads us to miss certain causes entirely; and (ii) counterfactual dependencies in neural networks are generally not transitive, which complicates methods for extracting and interpreting causal graphs from neural networks. We discuss the implications of these challenges for interpretability researchers and propose concrete suggestions for future work. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.03353 [pdf, ps, other]

doi 10.1115/DETC2013-12151

Is there an optimal choice of configuration space for Lie group integration schemes applied to constrained MBS?

Authors: Andreas Mueller, Zdravko Terze

Abstract: Recently various numerical integration schemes have been proposed for numerically simulating the dynamics of constrained multibody systems (MBS) operating. These integration schemes operate directly on the MBS configuration space considered as a Lie group. For discrete spatial mechanical systems there are two Lie group that can be used as configuration space: $SE\left( 3\right) $ and… ▽ More Recently various numerical integration schemes have been proposed for numerically simulating the dynamics of constrained multibody systems (MBS) operating. These integration schemes operate directly on the MBS configuration space considered as a Lie group. For discrete spatial mechanical systems there are two Lie group that can be used as configuration space: $SE\left( 3\right) $ and $SO\left( 3\right) \times \mathbb{R}^{3}$. Since the performance of the numerical integration scheme clearly depends on the underlying configuration space it is important to analyze the effect of using either variant. For constrained MBS a crucial aspect is the constraint satisfaction. In this paper the constraint violation observed for the two variants are investigated. It is concluded that the $SE\left( 3\right) $ formulation outperforms the $SO\left( 3\right) \times \mathbb{R}^{3}$ formulation if the absolute motions of the rigid bodies, as part of a constrained MBS, belong to a motion subgroup. In all other cases both formulations are equivalent. In the latter cases the $SO\left( 3\right) \times \mathbb{R}^{3}$ formulation should be used since the $SE\left( 3\right) $ formulation is numerically more complex, however. △ Less

Submitted 18 June, 2024; originally announced July 2024.

Journal ref: Proceedings of the ASME 2013 International Design Engineering Technical Conferences & Computers and Information in Engineering Conference, IDETC/CIE 2013, August 12-15, 2013, Portland, OR, USA

arXiv:2407.02928 [pdf, other]

doi 10.1088/1751-8121/add22b

A new heuristic approach for contextuality degree estimates and its four- to six-qubit portrayals

Authors: Axel Muller, Metod Saniga, Alain Giorgetti, Frédéric Holweck, Colm Kelleher

Abstract: We introduce and describe a new heuristic method for finding an upper bound on the degree of contextuality and the corresponding unsatisfied part of a quantum contextual configuration with three-element contexts (i.e., lines) located in a multi-qubit symplectic polar space of order two. While the previously used method based on a SAT solver was limited to three qubits, this new method is much fast… ▽ More We introduce and describe a new heuristic method for finding an upper bound on the degree of contextuality and the corresponding unsatisfied part of a quantum contextual configuration with three-element contexts (i.e., lines) located in a multi-qubit symplectic polar space of order two. While the previously used method based on a SAT solver was limited to three qubits, this new method is much faster and more versatile, enabling us to also handle four- to six-qubit cases. The four-qubit unsatisfied configurations we found are quite remarkable. That of an elliptic quadric features 315 lines and has in its core three copies of the split Cayley hexagon of order two having a Heawood-graph-underpinned geometry in common. That of a hyperbolic quadric also has 315 lines but, as a point-line incidence structure, is isomorphic to the dual $\mathcal{DW}(5,2)$ of $\mathcal{W}(5,2)$. Finally, an unsatisfied configuration with 1575 lines associated with all the lines/contexts of the four-qubit space contains a distinguished $\mathcal{DW}(5,2)$ centered on a point-plane incidence graph of PG$(3,2)$. The corresponding configurations found in the five-qubit space exhibit a considerably higher degree of complexity, except for a hyperbolic quadric, whose 6975 unsatisfied contexts are compactified around the point-hyperplane incidence graph of PG$(4,2)$. The most remarkable unsatisfied patterns discovered in the six-qubit space are a couple of disjoint split Cayley hexagons (for the full space) and a subgeometry underpinned by the complete bipartite graph $K_{7,7}$ (for a hyperbolic quadric). △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 35 pages, 14 figures

MSC Class: 81P13 ACM Class: J.2

Journal ref: J. Phys. A: Math. Theor. 58 (2025) 215302

arXiv:2406.12571 [pdf, ps, other]

doi 10.1016/j.mechmachtheory.2014.06.014

The significance of the configuration space Lie group for the constraint satisfaction in numerical time integration of multibody systems

Authors: Andreas Mueller, Zdravko Terze

Abstract: The dynamics simulation of multibody systems (MBS) using spatial velocities (non-holonomic velocities) requires time integration of the dynamics equations together with the kinematic reconstruction equations (relating time derivatives of configuration variables to rigid body velocities). The latter are specific to the geometry of the rigid body motion underlying a particular formulation, and thus… ▽ More The dynamics simulation of multibody systems (MBS) using spatial velocities (non-holonomic velocities) requires time integration of the dynamics equations together with the kinematic reconstruction equations (relating time derivatives of configuration variables to rigid body velocities). The latter are specific to the geometry of the rigid body motion underlying a particular formulation, and thus to the used configuration space (c-space). The proper c-space of a rigid body is the Lie group SE(3), and the geometry is that of the screw motions. The rigid bodies within a MBS are further subjected to geometric constraints, often due to lower kinematic pairs that define SE(3) subgroups. Traditionally, however, in MBS dynamics the translations and rotations are parameterized independently, which implies the use of the direct product group $SO\left( 3\right) \times {\Bbb R}^{3}$ as rigid body c-space, although this does not account for rigid body motions. Hence, its appropriateness was recently put into perspective. In this paper the significance of the c-space for the constraint satisfaction in numerical time stepping schemes is analyzed for holonomicaly constrained MBS modeled with the 'absolute coordinate' approach, i.e. using the Newton-Euler equations for the individual bodies subjected to geometric constraints. It is shown that the geometric constraints a body is subjected to are exactly satisfied if they constrain the motion to a subgroup of its c-space. Since only the $SE\left( 3\right) $ subgroups have a practical significance it is regarded as the appropriate c-space for the constrained rigid body. Consequently the constraints imposed by lower pair joints are exactly satisfied if the joint connects a body to the ground. For a general MBS, where the motions are not constrained to a subgroup, the SE(3) and $SO\left( 3\right) \times {\Bbb R}^{3}$ yield the same order of accuracy. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Journal ref: The significance of the configuration space Lie group for the constraint satisfaction in numerical time integration of multibody systems, Mechanism and Machine Theory, Vol. 82, 2014, pp. 173-202

arXiv:2406.03348 [pdf, other]

Position: A Call to Action for a Human-Centered AutoML Paradigm

Authors: Marius Lindauer, Florian Karl, Anne Klier, Julia Moosbauer, Alexander Tornede, Andreas Mueller, Frank Hutter, Matthias Feurer, Bernd Bischl

Abstract: Automated machine learning (AutoML) was formed around the fundamental objectives of automatically and efficiently configuring machine learning (ML) workflows, aiding the research of new ML algorithms, and contributing to the democratization of ML by making it accessible to a broader audience. Over the past decade, commendable achievements in AutoML have primarily focused on optimizing predictive p… ▽ More Automated machine learning (AutoML) was formed around the fundamental objectives of automatically and efficiently configuring machine learning (ML) workflows, aiding the research of new ML algorithms, and contributing to the democratization of ML by making it accessible to a broader audience. Over the past decade, commendable achievements in AutoML have primarily focused on optimizing predictive performance. This focused progress, while substantial, raises questions about how well AutoML has met its broader, original goals. In this position paper, we argue that a key to unlocking AutoML's full potential lies in addressing the currently underexplored aspect of user interaction with AutoML systems, including their diverse roles, expectations, and expertise. We envision a more human-centered approach in future AutoML research, promoting the collaborative design of ML systems that tightly integrates the complementary strengths of human expertise and AutoML methodologies. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.02294 [pdf, other]

Smaller Batches, Bigger Gains? Investigating the Impact of Batch Sizes on Reinforcement Learning Based Real-World Production Scheduling

Authors: Arthur Müller, Felix Grumbach, Matthia Sabatelli

Abstract: Production scheduling is an essential task in manufacturing, with Reinforcement Learning (RL) emerging as a key solution. In a previous work, RL was utilized to solve an extended permutation flow shop scheduling problem (PFSSP) for a real-world production line with two stages, linked by a central buffer. The RL agent was trained to sequence equallysized product batches to minimize setup efforts an… ▽ More Production scheduling is an essential task in manufacturing, with Reinforcement Learning (RL) emerging as a key solution. In a previous work, RL was utilized to solve an extended permutation flow shop scheduling problem (PFSSP) for a real-world production line with two stages, linked by a central buffer. The RL agent was trained to sequence equallysized product batches to minimize setup efforts and idle times. However, the substantial impact caused by varying the size of these product batches has not yet been explored. In this follow-up study, we investigate the effects of varying batch sizes, exploring both the quality of solutions and the training dynamics of the RL agent. The results demonstrate that it is possible to methodically identify reasonable boundaries for the batch size. These boundaries are determined on one side by the increasing sample complexity associated with smaller batch sizes, and on the other side by the decreasing flexibility of the agent when dealing with larger batch sizes. This provides the practitioner the ability to make an informed decision regarding the selection of an appropriate batch size. Moreover, we introduce and investigate two new curriculum learning strategies to enable the training with small batch sizes. The findings of this work offer the potential for application in several industrial use cases with comparable scheduling problems. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: This paper was accepted at the ETFA 2024 conference

arXiv:2405.01813 [pdf, other]

doi 10.1145/3555041.3589674

Towards Building Autonomous Data Services on Azure

Authors: Yiwen Zhu, Yuanyuan Tian, Joyce Cahoon, Subru Krishnan, Ankita Agarwal, Rana Alotaibi, Jesús Camacho-Rodríguez, Bibin Chundatt, Andrew Chung, Niharika Dutta, Andrew Fogarty, Anja Gruenheid, Brandon Haynes, Matteo Interlandi, Minu Iyer, Nick Jurgens, Sumeet Khushalani, Brian Kroth, Manoj Kumar, Jyoti Leeka, Sergiy Matusevych, Minni Mittal, Andreas Mueller, Kartheek Muthyala, Harsha Nagulapalli , et al. (13 additional authors not shown)

Abstract: Modern cloud has turned data services into easily accessible commodities. With just a few clicks, users are now able to access a catalog of data processing systems for a wide range of tasks. However, the cloud brings in both complexity and opportunity. While cloud users can quickly start an application by using various data services, it can be difficult to configure and optimize these services to… ▽ More Modern cloud has turned data services into easily accessible commodities. With just a few clicks, users are now able to access a catalog of data processing systems for a wide range of tasks. However, the cloud brings in both complexity and opportunity. While cloud users can quickly start an application by using various data services, it can be difficult to configure and optimize these services to gain the most value from them. For cloud providers, managing every aspect of an ever-increasing set of data services, while meeting customer SLAs and minimizing operational cost is becoming more challenging. Cloud technology enables the collection of significant amounts of workload traces and system telemetry. With the progress in data science (DS) and machine learning (ML), it is feasible and desirable to utilize a data-driven, ML-based approach to automate various aspects of data services, resulting in the creation of autonomous data services. This paper presents our perspectives and insights on creating autonomous data services on Azure. It also covers the future endeavors we plan to undertake and unresolved issues that still need attention. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: SIGMOD Companion of the 2023 International Conference on Management of Data. 2023

arXiv:2404.06214 [pdf, other]

[Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

Authors: Leshem Choshen, Ryan Cotterell, Michael Y. Hu, Tal Linzen, Aaron Mueller, Candace Ross, Alex Warstadt, Ethan Wilcox, Adina Williams, Chengxu Zhuang

Abstract: After last year's successful BabyLM Challenge, the competition will be hosted again in 2024/2025. The overarching goals of the challenge remain the same; however, some of the competition rules will be different. The big changes for this year's competition are as follows: First, we replace the loose track with a paper track, which allows (for example) non-model-based submissions, novel cognitively-… ▽ More After last year's successful BabyLM Challenge, the competition will be hosted again in 2024/2025. The overarching goals of the challenge remain the same; however, some of the competition rules will be different. The big changes for this year's competition are as follows: First, we replace the loose track with a paper track, which allows (for example) non-model-based submissions, novel cognitively-inspired benchmarks, or analysis techniques. Second, we are relaxing the rules around pretraining data, and will now allow participants to construct their own datasets provided they stay within the 100M-word or 10M-word budget. Third, we introduce a multimodal vision-and-language track, and will release a corpus of 50% text-only and 50% image-text multimodal data as a starting point for LM model training. The purpose of this CfP is to provide rules for this year's challenge, explain these rule changes and their rationale in greater detail, give a timeline of this year's competition, and provide answers to frequently asked questions from last year's challenge. △ Less

Submitted 27 July, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

arXiv:2403.19647 [pdf, other]

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Authors: Samuel Marks, Can Rager, Eric J. Michaud, Yonatan Belinkov, David Bau, Aaron Mueller

Abstract: We introduce methods for discovering and applying sparse feature circuits. These are causally implicated subnetworks of human-interpretable features for explaining language model behaviors. Circuits identified in prior work consist of polysemantic and difficult-to-interpret units like attention heads or neurons, rendering them unsuitable for many downstream applications. In contrast, sparse featur… ▽ More We introduce methods for discovering and applying sparse feature circuits. These are causally implicated subnetworks of human-interpretable features for explaining language model behaviors. Circuits identified in prior work consist of polysemantic and difficult-to-interpret units like attention heads or neurons, rendering them unsuitable for many downstream applications. In contrast, sparse feature circuits enable detailed understanding of unanticipated mechanisms. Because they are based on fine-grained units, sparse feature circuits are useful for downstream tasks: We introduce SHIFT, where we improve the generalization of a classifier by ablating features that a human judges to be task-irrelevant. Finally, we demonstrate an entirely unsupervised and scalable interpretability pipeline by discovering thousands of sparse feature circuits for automatically discovered model behaviors. △ Less

Submitted 27 March, 2025; v1 submitted 28 March, 2024; originally announced March 2024.

Comments: Code and data at https://github.com/saprmarks/feature-circuits. Demonstration at https://feature-circuits.xyz

Journal ref: International Conference on Learning Representations, 2025

arXiv:2403.18587 [pdf, other]

The Impact of Uniform Inputs on Activation Sparsity and Energy-Latency Attacks in Computer Vision

Authors: Andreas Müller, Erwin Quiring

Abstract: Resource efficiency plays an important role for machine learning nowadays. The energy and decision latency are two critical aspects to ensure a sustainable and practical application. Unfortunately, the energy consumption and decision latency are not robust against adversaries. Researchers have recently demonstrated that attackers can compute and submit so-called sponge examples at inference time t… ▽ More Resource efficiency plays an important role for machine learning nowadays. The energy and decision latency are two critical aspects to ensure a sustainable and practical application. Unfortunately, the energy consumption and decision latency are not robust against adversaries. Researchers have recently demonstrated that attackers can compute and submit so-called sponge examples at inference time to increase the energy consumption and decision latency of neural networks. In computer vision, the proposed strategy crafts inputs with less activation sparsity which could otherwise be used to accelerate the computation. In this paper, we analyze the mechanism how these energy-latency attacks reduce activation sparsity. In particular, we find that input uniformity is a key enabler. A uniform image, that is, an image with mostly flat, uniformly colored surfaces, triggers more activations due to a specific interplay of convolution, batch normalization, and ReLU activation. Based on these insights, we propose two new simple, yet effective strategies for crafting sponge examples: sampling images from a probability distribution and identifying dense, yet inconspicuous inputs in natural datasets. We empirically examine our findings in a comprehensive evaluation with multiple image classification models and show that our attack achieves the same sparsity effect as prior sponge-example methods, but at a fraction of computation effort. We also show that our sponge examples transfer between different neural networks. Finally, we discuss applications of our findings for the good by improving efficiency by increasing sparsity. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: Accepted at the DLSP 2024

arXiv:2403.09988 [pdf, other]

Interactive Distance Field Mapping and Planning to Enable Human-Robot Collaboration

Authors: Usama Ali, Lan Wu, Adrian Mueller, Fouad Sukkar, Tobias Kaupp, Teresa Vidal-Calleja

Abstract: Human-robot collaborative applications require scene representations that are kept up-to-date and facilitate safe motions in dynamic scenes. In this letter, we present an interactive distance field mapping and planning (IDMP) framework that handles dynamic objects and collision avoidance through an efficient representation. We define interactive mapping and planning as the process of creating and… ▽ More Human-robot collaborative applications require scene representations that are kept up-to-date and facilitate safe motions in dynamic scenes. In this letter, we present an interactive distance field mapping and planning (IDMP) framework that handles dynamic objects and collision avoidance through an efficient representation. We define interactive mapping and planning as the process of creating and updating the representation of the scene online while simultaneously planning and adapting the robot's actions based on that representation. The key aspect of this work is an efficient Gaussian Process field that performs incremental updates and handles dynamic objects reliably by identifying moving points via a simple and elegant formulation based on queries from a temporary latent model. In terms of mapping, IDMP is able to fuse point cloud data from single and multiple sensors, query the free space at any spatial resolution, and deal with moving objects without semantics. In terms of planning, IDMP allows seamless integration with gradient-based reactive planners facilitating dynamic obstacle avoidance for safe human-robot interactions. Our mapping performance is evaluated on both real and synthetic datasets. A comparison with similar state-of-the-art frameworks shows superior performance when handling dynamic objects and comparable or better performance in the accuracy of the computed distance and gradient field. Finally, we show how the framework can be used for fast motion planning in the presence of moving objects both in simulated and real-world scenes. An accompanying video, code, and datasets are made publicly available https://uts-ri.github.io/IDMP. △ Less

Submitted 22 October, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Showing 1–50 of 150 results for author: Muller, A