-
Localized Uncertainty Quantification in Random Forests via Proximities
Authors:
Jake S. Rhodes,
Scott D. Brown,
J. Riley Wilkinson
Abstract:
In machine learning, uncertainty quantification helps assess the reliability of model predictions, which is important in high-stakes scenarios. Traditional approaches often emphasize predictive accuracy, but there is a growing focus on incorporating uncertainty measures. This paper addresses localized uncertainty quantification in random forests. While current methods often rely on quantile regres…
▽ More
In machine learning, uncertainty quantification helps assess the reliability of model predictions, which is important in high-stakes scenarios. Traditional approaches often emphasize predictive accuracy, but there is a growing focus on incorporating uncertainty measures. This paper addresses localized uncertainty quantification in random forests. While current methods often rely on quantile regression or Monte Carlo techniques, we propose a new approach using naturally occurring test sets and similarity measures (proximities) typically viewed as byproducts of random forests. Specifically, we form localized distributions of OOB errors around nearby points, defined using the proximities, to create prediction intervals for regression and trust scores for classification. By varying the number of nearby points, our intervals can be adjusted to achieve the desired coverage while retaining the flexibility that reflects the certainty of individual predictions. For classification, excluding points identified as unclassifiable by our method generally enhances the accuracy of the model and provides higher accuracy-rejection AUC scores than competing methods.
△ Less
Submitted 26 September, 2025;
originally announced September 2025.
-
Raw-JPEG Adapter: Efficient Raw Image Compression with JPEG
Authors:
Mahmoud Afifi,
Ran Zhang,
Michael S. Brown
Abstract:
Digital cameras digitize scene light into linear raw representations, which the image signal processor (ISP) converts into display-ready outputs. While raw data preserves full sensor information--valuable for editing and vision tasks--formats such as Digital Negative (DNG) require large storage, making them impractical in constrained scenarios. In contrast, JPEG is a widely supported format, offer…
▽ More
Digital cameras digitize scene light into linear raw representations, which the image signal processor (ISP) converts into display-ready outputs. While raw data preserves full sensor information--valuable for editing and vision tasks--formats such as Digital Negative (DNG) require large storage, making them impractical in constrained scenarios. In contrast, JPEG is a widely supported format, offering high compression efficiency and broad compatibility, but it is not well-suited for raw storage. This paper presents RawJPEG Adapter, a lightweight, learnable, and invertible preprocessing pipeline that adapts raw images for standard JPEG compression. Our method applies spatial and optional frequency-domain transforms, with compact parameters stored in the JPEG comment field, enabling accurate raw reconstruction. Experiments across multiple datasets show that our method achieves higher fidelity than direct JPEG storage, supports other codecs, and provides a favorable trade-off between compression ratio and reconstruction accuracy.
△ Less
Submitted 29 September, 2025; v1 submitted 23 September, 2025;
originally announced September 2025.
-
A Unified AI Approach for Continuous Monitoring of Human Health and Diseases from Intensive Care Unit to Home with Physiological Foundation Models (UNIPHY+)
Authors:
Minxiao Wang,
Saurabh Kataria,
Juntong Ni,
Timothy G. Buchman,
Jocelyn Grunwell,
Mark Mai,
Wei Jin,
Matthew Clark,
Stephanie Brown,
Michael Fundora,
Puneet Sharma,
Tony Pan,
Sam Khan,
Timothy Ruchti,
Naveen Muthu,
Kevin Maher,
Sivasubramanium V Bhavani,
Xiao Hu
Abstract:
We present UNIPHY+, a unified physiological foundation model (physioFM) framework designed to enable continuous human health and diseases monitoring across care settings using ubiquitously obtainable physiological data. We propose novel strategies for incorporating contextual information during pretraining, fine-tuning, and lightweight model personalization via multi-modal learning, feature fusion…
▽ More
We present UNIPHY+, a unified physiological foundation model (physioFM) framework designed to enable continuous human health and diseases monitoring across care settings using ubiquitously obtainable physiological data. We propose novel strategies for incorporating contextual information during pretraining, fine-tuning, and lightweight model personalization via multi-modal learning, feature fusion-tuning, and knowledge distillation. We advocate testing UNIPHY+ with a broad set of use cases from intensive care to ambulatory monitoring in order to demonstrate that UNIPHY+ can empower generalizable, scalable, and personalized physiological AI to support both clinical decision-making and long-term health monitoring.
△ Less
Submitted 19 September, 2025;
originally announced September 2025.
-
Investigating Location-Regularised Self-Supervised Feature Learning for Seafloor Visual Imagery
Authors:
Cailei Liang,
Adrian Bodenmann,
Emma J Curtis,
Samuel Simmons,
Kazunori Nagano,
Stan Brown,
Adam Riese,
Blair Thornton
Abstract:
High-throughput interpretation of robotically gathered seafloor visual imagery can increase the efficiency of marine monitoring and exploration. Although recent research has suggested that location metadata can enhance self-supervised feature learning (SSL), its benefits across different SSL strategies, models and seafloor image datasets are underexplored. This study evaluates the impact of locati…
▽ More
High-throughput interpretation of robotically gathered seafloor visual imagery can increase the efficiency of marine monitoring and exploration. Although recent research has suggested that location metadata can enhance self-supervised feature learning (SSL), its benefits across different SSL strategies, models and seafloor image datasets are underexplored. This study evaluates the impact of location-based regularisation on six state-of-the-art SSL frameworks, which include Convolutional Neural Network (CNN) and Vision Transformer (ViT) models with varying latent-space dimensionality. Evaluation across three diverse seafloor image datasets finds that location-regularisation consistently improves downstream classification performance over standard SSL, with average F1-score gains of $4.9 \pm 4.0%$ for CNNs and $6.3 \pm 8.9%$ for ViTs, respectively. While CNNs pretrained on generic datasets benefit from high-dimensional latent representations, dataset-optimised SSL achieves similar performance across the high (512) and low (128) dimensional latent representations. Location-regularised SSL improves CNN performance over pre-trained models by $2.7 \pm 2.7%$ and $10.1 \pm 9.4%$ for high and low-dimensional latent representations, respectively. For ViTs, high-dimensionality benefits both pre-trained and dataset-optimised SSL. Although location-regularisation improves SSL performance compared to standard SSL methods, pre-trained ViTs show strong generalisation, matching the best-performing location-regularised SSL with F1-scores of $0.795 \pm 0.075$ and $0.795 \pm 0.077$, respectively. The findings highlight the value of location metadata for SSL regularisation, particularly when using low-dimensional latent representations, and demonstrate strong generalisation of high-dimensional ViTs for seafloor image analysis.
△ Less
Submitted 8 September, 2025;
originally announced September 2025.
-
Self-Organising Memristive Networks as Physical Learning Systems
Authors:
Francesco Caravelli,
Gianluca Milano,
Adam Z. Stieg,
Carlo Ricciardi,
Simon Anthony Brown,
Zdenka Kuncic
Abstract:
Learning with physical systems is an emerging paradigm that seeks to harness the intrinsic nonlinear dynamics of physical substrates for learning. The impetus for a paradigm shift in how hardware is used for computational intelligence stems largely from the unsustainability of artificial neural network software implemented on conventional transistor-based hardware. This Perspective highlights one…
▽ More
Learning with physical systems is an emerging paradigm that seeks to harness the intrinsic nonlinear dynamics of physical substrates for learning. The impetus for a paradigm shift in how hardware is used for computational intelligence stems largely from the unsustainability of artificial neural network software implemented on conventional transistor-based hardware. This Perspective highlights one promising approach using physical networks comprised of resistive memory nanoscale components with dynamically reconfigurable, self-organising electrical circuitry. Experimental advances have revealed the non-trivial interactions within these Self-Organising Memristive Networks (SOMNs), offering insights into their collective nonlinear and adaptive dynamics, and how these properties can be harnessed for learning using different hardware implementations. Theoretical approaches, including mean-field theory, graph theory, and concepts from disordered systems, reveal deeper insights into the dynamics of SOMNs, especially during transitions between different conductance states where criticality and other dynamical phase transitions emerge in both experiments and models. Furthermore, parallels between adaptive dynamics in SOMNs and plasticity in biological neuronal networks suggest the potential for realising energy-efficient, brain-like continual learning. SOMNs thus offer a promising route toward embedded edge intelligence, unlocking real-time decision-making for autonomous systems, dynamic sensing, and personalised healthcare, by enabling embedded learning in resource-constrained environments. The overarching aim of this Perspective is to show how the convergence of nanotechnology, statistical physics, complex systems, and self-organising principles offers a unique opportunity to advance a new generation of physical intelligence technologies.
△ Less
Submitted 31 August, 2025;
originally announced September 2025.
-
Improved Mapping Between Illuminations and Sensors for RAW Images
Authors:
Abhijith Punnappurath,
Luxi Zhao,
Hoang Le,
Abdelrahman Abdelhamed,
SaiKiran Kumar Tedla,
Michael S. Brown
Abstract:
RAW images are unprocessed camera sensor output with sensor-specific RGB values based on the sensor's color filter spectral sensitivities. RAW images also incur strong color casts due to the sensor's response to the spectral properties of scene illumination. The sensor- and illumination-specific nature of RAW images makes it challenging to capture RAW datasets for deep learning methods, as scenes…
▽ More
RAW images are unprocessed camera sensor output with sensor-specific RGB values based on the sensor's color filter spectral sensitivities. RAW images also incur strong color casts due to the sensor's response to the spectral properties of scene illumination. The sensor- and illumination-specific nature of RAW images makes it challenging to capture RAW datasets for deep learning methods, as scenes need to be captured for each sensor and under a wide range of illumination. Methods for illumination augmentation for a given sensor and the ability to map RAW images between sensors are important for reducing the burden of data capture. To explore this problem, we introduce the first-of-its-kind dataset comprising carefully captured scenes under a wide range of illumination. Specifically, we use a customized lightbox with tunable illumination spectra to capture several scenes with different cameras. Our illumination and sensor mapping dataset has 390 illuminations, four cameras, and 18 scenes. Using this dataset, we introduce a lightweight neural network approach for illumination and sensor mapping that outperforms competing methods. We demonstrate the utility of our approach on the downstream task of training a neural ISP. Link to project page: https://github.com/SamsungLabs/illum-sensor-mapping.
△ Less
Submitted 20 August, 2025;
originally announced August 2025.
-
SKATE, a Scalable Tournament Eval: Weaker LLMs differentiate between stronger ones using verifiable challenges
Authors:
Dewi S. W. Gould,
Bruno Mlodozeniec,
Samuel F. Brown
Abstract:
Evaluating the capabilities and risks of foundation models is paramount, yet current methods demand extensive domain expertise, hindering their scalability as these models rapidly evolve. We introduce SKATE: a novel evaluation framework in which large language models (LLMs) compete by generating and solving verifiable tasks for one another. Our core insight is to treat evaluation as a game: models…
▽ More
Evaluating the capabilities and risks of foundation models is paramount, yet current methods demand extensive domain expertise, hindering their scalability as these models rapidly evolve. We introduce SKATE: a novel evaluation framework in which large language models (LLMs) compete by generating and solving verifiable tasks for one another. Our core insight is to treat evaluation as a game: models act as both task-setters and solvers, incentivized to create questions which highlight their own strengths while exposing others' weaknesses. SKATE offers several key advantages, balancing scalability, open-endedness, and objectivity. It is fully automated, data-free, and scalable, requiring no human input or domain expertise. By using verifiable tasks rather than LLM judges, scoring is objective. Unlike domain-limited programmatically-generated benchmarks (e.g. chess-playing or spatial reasoning), having LLMs creatively pose challenges enables open-ended and scalable evaluation. As a proof of concept, we introduce LLM-set code-output-prediction (COP) challenges as a verifiable and extensible framework in which to test our approach. Using a TrueSkill-based ranking system, we evaluate six frontier LLMs and find that: (1) weaker models can reliably differentiate and score stronger ones, (2) LLM-based systems are capable of self-preferencing behavior, generating questions that align with their own capabilities, and (3) SKATE automatically surfaces fine-grained capability differences between models. Our findings are an important step towards general, scalable evaluation frameworks which can keep pace with LLM progress.
△ Less
Submitted 8 August, 2025;
originally announced August 2025.
-
Spectral Sensitivity Estimation with an Uncalibrated Diffraction Grating
Authors:
Lilika Makabe,
Hiroaki Santo,
Fumio Okura,
Michael S. Brown,
Yasuyuki Matsushita
Abstract:
This paper introduces a practical and accurate calibration method for camera spectral sensitivity using a diffraction grating. Accurate calibration of camera spectral sensitivity is crucial for various computer vision tasks, including color correction, illumination estimation, and material analysis. Unlike existing approaches that require specialized narrow-band filters or reference targets with k…
▽ More
This paper introduces a practical and accurate calibration method for camera spectral sensitivity using a diffraction grating. Accurate calibration of camera spectral sensitivity is crucial for various computer vision tasks, including color correction, illumination estimation, and material analysis. Unlike existing approaches that require specialized narrow-band filters or reference targets with known spectral reflectances, our method only requires an uncalibrated diffraction grating sheet, readily available off-the-shelf. By capturing images of the direct illumination and its diffracted pattern through the grating sheet, our method estimates both the camera spectral sensitivity and the diffraction grating parameters in a closed-form manner. Experiments on synthetic and real-world data demonstrate that our method outperforms conventional reference target-based methods, underscoring its effectiveness and practicality.
△ Less
Submitted 1 August, 2025;
originally announced August 2025.
-
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Authors:
Gheorghe Comanici,
Eric Bieber,
Mike Schaekermann,
Ice Pasupat,
Noveen Sachdeva,
Inderjit Dhillon,
Marcel Blistein,
Ori Ram,
Dan Zhang,
Evan Rosen,
Luke Marris,
Sam Petulla,
Colin Gaffney,
Asaf Aharoni,
Nathan Lintz,
Tiago Cardal Pais,
Henrik Jacobsson,
Idan Szpektor,
Nan-Jiang Jiang,
Krishna Haridasan,
Ahmed Omran,
Nikunj Saunshi,
Dara Bahri,
Gaurav Mishra,
Eric Chu
, et al. (3284 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde…
▽ More
In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.
△ Less
Submitted 22 July, 2025; v1 submitted 7 July, 2025;
originally announced July 2025.
-
Learning Camera-Agnostic White-Balance Preferences
Authors:
Luxi Zhao,
Mahmoud Afifi,
Michael S. Brown
Abstract:
The image signal processor (ISP) pipeline in modern cameras consists of several modules that transform raw sensor data into visually pleasing images in a display color space. Among these, the auto white balance (AWB) module is essential for compensating for scene illumination. However, commercial AWB systems often strive to compute aesthetic white-balance preferences rather than accurate neutral c…
▽ More
The image signal processor (ISP) pipeline in modern cameras consists of several modules that transform raw sensor data into visually pleasing images in a display color space. Among these, the auto white balance (AWB) module is essential for compensating for scene illumination. However, commercial AWB systems often strive to compute aesthetic white-balance preferences rather than accurate neutral color correction. While learning-based methods have improved AWB accuracy, they typically struggle to generalize across different camera sensors -- an issue for smartphones with multiple cameras. Recent work has explored cross-camera AWB, but most methods remain focused on achieving neutral white balance. In contrast, this paper is the first to address aesthetic consistency by learning a post-illuminant-estimation mapping that transforms neutral illuminant corrections into aesthetically preferred corrections in a camera-agnostic space. Once trained, our mapping can be applied after any neutral AWB module to enable consistent and stylized color rendering across unseen cameras. Our proposed model is lightweight -- containing only $\sim$500 parameters -- and runs in just 0.024 milliseconds on a typical flagship mobile CPU. Evaluated on a dataset of 771 smartphone images from three different cameras, our method achieves state-of-the-art performance while remaining fully compatible with existing cross-camera AWB techniques, introducing minimal computational and memory overhead.
△ Less
Submitted 14 August, 2025; v1 submitted 2 July, 2025;
originally announced July 2025.
-
Autonomous Computer Vision Development with Agentic AI
Authors:
Jin Kim,
Muhammad Wahi-Anwa,
Sangyun Park,
Shawn Shin,
John M. Hoffman,
Matthew S. Brown
Abstract:
Agentic Artificial Intelligence (AI) systems leveraging Large Language Models (LLMs) exhibit significant potential for complex reasoning, planning, and tool utilization. We demonstrate that a specialized computer vision system can be built autonomously from a natural language prompt using Agentic AI methods. This involved extending SimpleMind (SM), an open-source Cognitive AI environment with conf…
▽ More
Agentic Artificial Intelligence (AI) systems leveraging Large Language Models (LLMs) exhibit significant potential for complex reasoning, planning, and tool utilization. We demonstrate that a specialized computer vision system can be built autonomously from a natural language prompt using Agentic AI methods. This involved extending SimpleMind (SM), an open-source Cognitive AI environment with configurable tools for medical image analysis, with an LLM-based agent, implemented using OpenManus, to automate the planning (tool configuration) for a particular computer vision task. We provide a proof-of-concept demonstration that an agentic system can interpret a computer vision task prompt, plan a corresponding SimpleMind workflow by decomposing the task and configuring appropriate tools. From the user input prompt, "provide sm (SimpleMind) config for lungs, heart, and ribs segmentation for cxr (chest x-ray)"), the agent LLM was able to generate the plan (tool configuration file in YAML format), and execute SM-Learn (training) and SM-Think (inference) scripts autonomously. The computer vision agent automatically configured, trained, and tested itself on 50 chest x-ray images, achieving mean dice scores of 0.96, 0.82, 0.83, for lungs, heart, and ribs, respectively. This work shows the potential for autonomous planning and tool configuration that has traditionally been performed by a data scientist in the development of computer vision applications.
△ Less
Submitted 19 June, 2025; v1 submitted 10 June, 2025;
originally announced June 2025.
-
Temporal Convolutional Autoencoder for Interference Mitigation in FMCW Radar Altimeters
Authors:
Charles E. Thornton,
Jamie Sloop,
Samuel Brown,
Aaron Orndorff,
William C. Headley,
Stephen Young
Abstract:
We investigate the end-to-end altitude estimation performance of a convolutional autoencoder-based interference mitigation approach for frequency-modulated continuous-wave (FMCW) radar altimeters. Specifically, we show that a Temporal Convolutional Network (TCN) autoencoder effectively exploits temporal correlations in the received signal, providing superior interference suppression compared to a…
▽ More
We investigate the end-to-end altitude estimation performance of a convolutional autoencoder-based interference mitigation approach for frequency-modulated continuous-wave (FMCW) radar altimeters. Specifically, we show that a Temporal Convolutional Network (TCN) autoencoder effectively exploits temporal correlations in the received signal, providing superior interference suppression compared to a Least Mean Squares (LMS) adaptive filter. Unlike existing approaches, the present method operates directly on the received FMCW signal. Additionally, we identify key challenges in applying deep learning to wideband FMCW interference mitigation and outline directions for future research to enhance real-time feasibility and generalization to arbitrary interference conditions.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
Amplifying Human Creativity and Problem Solving with AI Through Generative Collective Intelligence
Authors:
Thomas P. Kehler,
Scott E. Page,
Alex Pentland,
Martin Reeves,
John Seely Brown
Abstract:
We propose a general framework for human-AI collaboration that amplifies the distinct capabilities of both types of intelligence. We refer to this as Generative Collective Intelligence (GCI). GCI employs AI in dual roles: as interactive agents and as technology that accumulates, organizes, and leverages knowledge. In this second role, AI creates a cognitive bridge between human reasoning and AI mo…
▽ More
We propose a general framework for human-AI collaboration that amplifies the distinct capabilities of both types of intelligence. We refer to this as Generative Collective Intelligence (GCI). GCI employs AI in dual roles: as interactive agents and as technology that accumulates, organizes, and leverages knowledge. In this second role, AI creates a cognitive bridge between human reasoning and AI models. The AI functions as a social and cultural technology that enables groups to solve complex problems through structured collaboration that transcends traditional communication barriers. We argue that GCI can overcome limitations of purely algorithmic approaches to problem-solving and decision-making. We describe the mathematical foundations of GCI, based on the law of comparative judgment and minimum regret principles, and briefly illustrate its applications across various domains, including climate adaptation, healthcare transformation, and civic participation. By combining human creativity with AI's computational capabilities, GCI offers a promising approach to addressing complex societal challenges that neither humans nor machines can solve alone.
△ Less
Submitted 4 June, 2025; v1 submitted 25 May, 2025;
originally announced May 2025.
-
How much to Dereverberate? Low-Latency Single-Channel Speech Enhancement in Distant Microphone Scenarios
Authors:
Satvik Venkatesh,
Philip Coleman,
Arthur Benilov,
Simon Brown,
Selim Sheta,
Frederic Roskam
Abstract:
Dereverberation is an important sub-task of Speech Enhancement (SE) to improve the signal's intelligibility and quality. However, it remains challenging because the reverberation is highly correlated with the signal. Furthermore, the single-channel SE literature has predominantly focused on rooms with short reverb times (typically under 1 second), smaller rooms (under volumes of 1000 cubic meters)…
▽ More
Dereverberation is an important sub-task of Speech Enhancement (SE) to improve the signal's intelligibility and quality. However, it remains challenging because the reverberation is highly correlated with the signal. Furthermore, the single-channel SE literature has predominantly focused on rooms with short reverb times (typically under 1 second), smaller rooms (under volumes of 1000 cubic meters) and relatively short distances (up to 2 meters). In this paper, we explore real-time low-latency single-channel SE under distant microphone scenarios, such as 5 to 10 meters, and focus on conference rooms and theatres, with larger room dimensions and reverberation times. Such a setup is useful for applications such as lecture demonstrations, drama, and to enhance stage acoustics. First, we show that single-channel SE in such challenging scenarios is feasible. Second, we investigate the relationship between room volume and reverberation time, and demonstrate its importance when randomly simulating room impulse responses. Lastly, we show that for dereverberation with short decay times, preserving early reflections before decaying the transfer function of the room improves overall signal quality.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
CCMNet: Leveraging Calibrated Color Correction Matrices for Cross-Camera Color Constancy
Authors:
Dongyoung Kim,
Mahmoud Afifi,
Dongyun Kim,
Michael S. Brown,
Seon Joo Kim
Abstract:
Computational color constancy, or white balancing, is a key module in a camera's image signal processor (ISP) that corrects color casts from scene lighting. Because this operation occurs in the camera-specific raw color space, white balance algorithms must adapt to different cameras. This paper introduces a learning-based method for cross-camera color constancy that generalizes to new cameras with…
▽ More
Computational color constancy, or white balancing, is a key module in a camera's image signal processor (ISP) that corrects color casts from scene lighting. Because this operation occurs in the camera-specific raw color space, white balance algorithms must adapt to different cameras. This paper introduces a learning-based method for cross-camera color constancy that generalizes to new cameras without retraining. Our method leverages pre-calibrated color correction matrices (CCMs) available on ISPs that map the camera's raw color space to a standard space (e.g., CIE XYZ). Our method uses these CCMs to transform predefined illumination colors (i.e., along the Planckian locus) into the test camera's raw space. The mapped illuminants are encoded into a compact camera fingerprint embedding (CFE) that enables the network to adapt to unseen cameras. To prevent overfitting due to limited cameras and CCMs during training, we introduce a data augmentation technique that interpolates between cameras and their CCMs. Experimental results across multiple datasets and backbones show that our method achieves state-of-the-art cross-camera color constancy while remaining lightweight and relying only on data readily available in camera ISPs.
△ Less
Submitted 10 April, 2025;
originally announced April 2025.
-
Time-Aware Auto White Balance in Mobile Photography
Authors:
Mahmoud Afifi,
Luxi Zhao,
Abhijith Punnappurath,
Mohammed A. Abdelsalam,
Ran Zhang,
Michael S. Brown
Abstract:
Cameras rely on auto white balance (AWB) to correct undesirable color casts caused by scene illumination and the camera's spectral sensitivity. This is typically achieved using an illuminant estimator that determines the global color cast solely from the color information in the camera's raw sensor image. Mobile devices provide valuable additional metadata-such as capture timestamp and geolocation…
▽ More
Cameras rely on auto white balance (AWB) to correct undesirable color casts caused by scene illumination and the camera's spectral sensitivity. This is typically achieved using an illuminant estimator that determines the global color cast solely from the color information in the camera's raw sensor image. Mobile devices provide valuable additional metadata-such as capture timestamp and geolocation-that offers strong contextual clues to help narrow down the possible illumination solutions. This paper proposes a lightweight illuminant estimation method that incorporates such contextual metadata, along with additional capture information and image colors, into a compact model (~5K parameters), achieving promising results, matching or surpassing larger models. To validate our method, we introduce a dataset of 3,224 smartphone images with contextual metadata collected at various times of day and under diverse lighting conditions. The dataset includes ground-truth illuminant colors, determined using a color chart, and user-preferred illuminants validated through a user study, providing a comprehensive benchmark for AWB evaluation.
△ Less
Submitted 25 June, 2025; v1 submitted 7 April, 2025;
originally announced April 2025.
-
Multispectral Demosaicing via Dual Cameras
Authors:
SaiKiran Tedla,
Junyong Lee,
Beixuan Yang,
Mahmoud Afifi,
Michael S. Brown
Abstract:
Multispectral (MS) images capture detailed scene information across a wide range of spectral bands, making them invaluable for applications requiring rich spectral data. Integrating MS imaging into multi camera devices, such as smartphones, has the potential to enhance both spectral applications and RGB image quality. A critical step in processing MS data is demosaicing, which reconstructs color i…
▽ More
Multispectral (MS) images capture detailed scene information across a wide range of spectral bands, making them invaluable for applications requiring rich spectral data. Integrating MS imaging into multi camera devices, such as smartphones, has the potential to enhance both spectral applications and RGB image quality. A critical step in processing MS data is demosaicing, which reconstructs color information from the mosaic MS images captured by the camera. This paper proposes a method for MS image demosaicing specifically designed for dual-camera setups where both RGB and MS cameras capture the same scene. Our approach leverages co-captured RGB images, which typically have higher spatial fidelity, to guide the demosaicing of lower-fidelity MS images. We introduce the Dual-camera RGB-MS Dataset - a large collection of paired RGB and MS mosaiced images with ground-truth demosaiced outputs - that enables training and evaluation of our method. Experimental results demonstrate that our method achieves state-of-the-art accuracy compared to existing techniques.
△ Less
Submitted 25 July, 2025; v1 submitted 27 March, 2025;
originally announced March 2025.
-
Revisiting Image Fusion for Multi-Illuminant White-Balance Correction
Authors:
David Serrano-Lozano,
Aditya Arora,
Luis Herranz,
Konstantinos G. Derpanis,
Michael S. Brown,
Javier Vazquez-Corral
Abstract:
White balance (WB) correction in scenes with multiple illuminants remains a persistent challenge in computer vision. Recent methods explored fusion-based approaches, where a neural network linearly blends multiple sRGB versions of an input image, each processed with predefined WB presets. However, we demonstrate that these methods are suboptimal for common multi-illuminant scenarios. Additionally,…
▽ More
White balance (WB) correction in scenes with multiple illuminants remains a persistent challenge in computer vision. Recent methods explored fusion-based approaches, where a neural network linearly blends multiple sRGB versions of an input image, each processed with predefined WB presets. However, we demonstrate that these methods are suboptimal for common multi-illuminant scenarios. Additionally, existing fusion-based methods rely on sRGB WB datasets lacking dedicated multi-illuminant images, limiting both training and evaluation. To address these challenges, we introduce two key contributions. First, we propose an efficient transformer-based model that effectively captures spatial dependencies across sRGB WB presets, substantially improving upon linear fusion techniques. Second, we introduce a large-scale multi-illuminant dataset comprising over 16,000 sRGB images rendered with five different WB settings, along with WB-corrected images. Our method achieves up to 100\% improvement over existing techniques on our new multi-illuminant image fusion dataset.
△ Less
Submitted 18 March, 2025;
originally announced March 2025.
-
A Bi-channel Aided Stitching of Atomic Force Microscopy Images
Authors:
Huanhuan Zhao,
Ruben Millan-Solsona,
Marti Checa,
Spenser R. Brown,
Jennifer L. Morrell-Falvey,
Liam Collins,
Arpan Biswas
Abstract:
Microscopy is an essential tool in scientific research, enabling the visualization of structures at micro- and nanoscale resolutions. However, the field of microscopy often encounters limitations in field-of-view (FOV), restricting the amount of sample that can be imaged in a single capture. To overcome this limitation, image stitching techniques have been developed to seamlessly merge multiple ov…
▽ More
Microscopy is an essential tool in scientific research, enabling the visualization of structures at micro- and nanoscale resolutions. However, the field of microscopy often encounters limitations in field-of-view (FOV), restricting the amount of sample that can be imaged in a single capture. To overcome this limitation, image stitching techniques have been developed to seamlessly merge multiple overlapping images into a single, high-resolution composite. The images collected from microscope need to be optimally stitched before accurate physical information can be extracted from post analysis. However, the existing stitching tools either struggle to stitch images together when the microscopy images are feature sparse or cannot address all the transformations of images. To address these issues, we propose a bi-channel aided feature-based image stitching method and demonstrate its use on AFM generated biofilm images. The topographical channel image of AFM data captures the morphological details of the sample, and a stitched topographical image is desired for researchers. We utilize the amplitude channel of AFM data to maximize the matching features and to estimate the position of the original topographical images and show that the proposed bi-channel aided stitching method outperforms the traditional stitching approach. Furthermore, we found that the differentiation of the topographical images along the x-axis provides similar feature information to the amplitude channel image, which generalizes our approach when the amplitude images are not available. Here we demonstrated the application on AFM, but similar approaches could be employed of optical microscopy with brightfield and fluorescence channels. We believe this proposed workflow will benefit the experimentalist to avoid erroneous analysis and discovery due to incorrect stitching.
△ Less
Submitted 13 March, 2025; v1 submitted 11 March, 2025;
originally announced March 2025.
-
Discovery and Deployment of Emergent Robot Swarm Behaviors via Representation Learning and Real2Sim2Real Transfer
Authors:
Connor Mattson,
Varun Raveendra,
Ricardo Vega,
Cameron Nowzari,
Daniel S. Drew,
Daniel S. Brown
Abstract:
Given a swarm of limited-capability robots, we seek to automatically discover the set of possible emergent behaviors. Prior approaches to behavior discovery rely on human feedback or hand-crafted behavior metrics to represent and evolve behaviors and only discover behaviors in simulation, without testing or considering the deployment of these new behaviors on real robot swarms. In this work, we pr…
▽ More
Given a swarm of limited-capability robots, we seek to automatically discover the set of possible emergent behaviors. Prior approaches to behavior discovery rely on human feedback or hand-crafted behavior metrics to represent and evolve behaviors and only discover behaviors in simulation, without testing or considering the deployment of these new behaviors on real robot swarms. In this work, we present Real2Sim2Real Behavior Discovery via Self-Supervised Representation Learning, which combines representation learning and novelty search to discover possible emergent behaviors automatically in simulation and enable direct controller transfer to real robots. First, we evaluate our method in simulation and show that our proposed self-supervised representation learning approach outperforms previous hand-crafted metrics by more accurately representing the space of possible emergent behaviors. Then, we address the reality gap by incorporating recent work in sim2real transfer for swarms into our lightweight simulator design, enabling direct robot deployment of all behaviors discovered in simulation on an open-source and low-cost robot platform.
△ Less
Submitted 21 February, 2025;
originally announced February 2025.
-
Early Risk Prediction of Pediatric Cardiac Arrest from Electronic Health Records via Multimodal Fused Transformer
Authors:
Jiaying Lu,
Stephanie R. Brown,
Songyuan Liu,
Shifan Zhao,
Kejun Dong,
Del Bold,
Michael Fundora,
Alaa Aljiffry,
Alex Fedorov,
Jocelyn Grunwell,
Xiao Hu
Abstract:
Early prediction of pediatric cardiac arrest (CA) is critical for timely intervention in high-risk intensive care settings. We introduce PedCA-FT, a novel transformer-based framework that fuses tabular view of EHR with the derived textual view of EHR to fully unleash the interactions of high-dimensional risk factors and their dynamics. By employing dedicated transformer modules for each modality v…
▽ More
Early prediction of pediatric cardiac arrest (CA) is critical for timely intervention in high-risk intensive care settings. We introduce PedCA-FT, a novel transformer-based framework that fuses tabular view of EHR with the derived textual view of EHR to fully unleash the interactions of high-dimensional risk factors and their dynamics. By employing dedicated transformer modules for each modality view, PedCA-FT captures complex temporal and contextual patterns to produce robust CA risk estimates. Evaluated on a curated pediatric cohort from the CHOA-CICU database, our approach outperforms ten other artificial intelligence models across five key performance metrics and identifies clinically meaningful risk factors. These findings underscore the potential of multimodal fusion techniques to enhance early CA detection and improve patient care.
△ Less
Submitted 20 May, 2025; v1 submitted 10 February, 2025;
originally announced February 2025.
-
Wearable AR in Everyday Contexts: Insights from a Digital Ethnography of YouTube Videos
Authors:
Tram Thi Minh Tran,
Shane Brown,
Oliver Weidlich,
Soojeong Yoo,
Callum Parker
Abstract:
With growing investment in consumer augmented reality (AR) headsets and glasses, wearable AR is moving from niche applications to everyday use. However, current research primarily examines AR in controlled settings, offering limited insights into its use in real-world daily life. To address this gap, we adopt a digital ethnographic approach, analysing 27 hours of 112 YouTube videos featuring early…
▽ More
With growing investment in consumer augmented reality (AR) headsets and glasses, wearable AR is moving from niche applications to everyday use. However, current research primarily examines AR in controlled settings, offering limited insights into its use in real-world daily life. To address this gap, we adopt a digital ethnographic approach, analysing 27 hours of 112 YouTube videos featuring early adopters. These videos capture usage ranging from continuous periods of hours to intermittent use over weeks and months. Our analysis shows that currently, wearable AR is primarily used for media consumption and gaming. While productivity is a desired use case, frequent use is constrained by current hardware limitations and the nascent application ecosystem. Users seek continuity in their digital experience, desiring functionalities similar to those on smartphones, tablets, or computers. We propose implications for everyday AR development that promote adoption while ensuring safe, ethical, and socially-aware integration into daily life.
△ Less
Submitted 11 February, 2025; v1 submitted 10 February, 2025;
originally announced February 2025.
-
How vulnerable is my policy? Adversarial attacks on modern behavior cloning policies
Authors:
Basavasagar Patil,
Akansha Kalra,
Guanhong Tao,
Daniel S. Brown
Abstract:
Learning from Demonstration (LfD) algorithms have shown promising results in robotic manipulation tasks, but their vulnerability to adversarial attacks remains underexplored. This paper presents a comprehensive study of adversarial attacks on both classic and recently proposed algorithms, including Behavior Cloning (BC), LSTM-GMM, Implicit Behavior Cloning (IBC), Diffusion Policy (DP), and VQ-Beha…
▽ More
Learning from Demonstration (LfD) algorithms have shown promising results in robotic manipulation tasks, but their vulnerability to adversarial attacks remains underexplored. This paper presents a comprehensive study of adversarial attacks on both classic and recently proposed algorithms, including Behavior Cloning (BC), LSTM-GMM, Implicit Behavior Cloning (IBC), Diffusion Policy (DP), and VQ-Behavior Transformer (VQ-BET). We study the vulnerability of these methods to untargeted, targeted and universal adversarial perturbations. While explicit policies, such as BC, LSTM-GMM and VQ-BET can be attacked in the same manner as standard computer vision models, we find that attacks for implicit and denoising policy models are nuanced and require developing novel attack methods. Our experiments on several simulated robotic manipulation tasks reveal that most of the current methods are highly vulnerable to adversarial perturbations. We also show that these attacks are transferable across algorithms, architectures, and tasks, raising concerning security vulnerabilities with potentially a white-box threat model. In addition, we test the efficacy of a randomized smoothing, a widely used adversarial defense technique, and highlight its limitation in defending against attacks on complex and multi-modal action distribution common in complex control tasks. In summary, our findings highlight the vulnerabilities of modern BC algorithms, paving way for future work in addressing such limitations.
△ Less
Submitted 5 February, 2025;
originally announced February 2025.
-
Early Failure Detection in Autonomous Surgical Soft-Tissue Manipulation via Uncertainty Quantification
Authors:
Jordan Thompson,
Ronald Koe,
Anthony Le,
Gabriella Goodman,
Daniel S. Brown,
Alan Kuntz
Abstract:
Autonomous surgical robots are a promising solution to the increasing demand for surgery amid a shortage of surgeons. Recent work has proposed learning-based approaches for the autonomous manipulation of soft tissue. However, due to variability in tissue geometries and stiffnesses, these methods do not always perform optimally, especially in out-of-distribution settings. We propose, develop, and t…
▽ More
Autonomous surgical robots are a promising solution to the increasing demand for surgery amid a shortage of surgeons. Recent work has proposed learning-based approaches for the autonomous manipulation of soft tissue. However, due to variability in tissue geometries and stiffnesses, these methods do not always perform optimally, especially in out-of-distribution settings. We propose, develop, and test the first application of uncertainty quantification to learned surgical soft-tissue manipulation policies as an early identification system for task failures. We analyze two different methods of uncertainty quantification, deep ensembles and Monte Carlo dropout, and find that deep ensembles provide a stronger signal of future task success or failure. We validate our approach using the physical daVinci Research Kit (dVRK) surgical robot to perform physical soft-tissue manipulation. We show that we are able to successfully detect out-of-distribution states leading to task failure and request human intervention when necessary while still enabling autonomous manipulation when possible. Our learned tissue manipulation policy with uncertainty-based early failure detection achieves a zero-shot sim2real performance improvement of 47.5% over the prior state of the art in learned soft-tissue manipulation. We also show that our method generalizes well to new types of tissue as well as to a bimanual soft-tissue manipulation task.
△ Less
Submitted 25 August, 2025; v1 submitted 17 January, 2025;
originally announced January 2025.
-
Toward Zero-Shot User Intent Recognition in Shared Autonomy
Authors:
Atharv Belsare,
Zohre Karimi,
Connor Mattson,
Daniel S. Brown
Abstract:
A fundamental challenge of shared autonomy is to use high-DoF robots to assist, rather than hinder, humans by first inferring user intent and then empowering the user to achieve their intent. Although successful, prior methods either rely heavily on a priori knowledge of all possible human intents or require many demonstrations and interactions with the human to learn these intents before being ab…
▽ More
A fundamental challenge of shared autonomy is to use high-DoF robots to assist, rather than hinder, humans by first inferring user intent and then empowering the user to achieve their intent. Although successful, prior methods either rely heavily on a priori knowledge of all possible human intents or require many demonstrations and interactions with the human to learn these intents before being able to assist the user. We propose and study a zero-shot, vision-only shared autonomy (VOSA) framework designed to allow robots to use end-effector vision to estimate zero-shot human intents in conjunction with blended control to help humans accomplish manipulation tasks with unknown and dynamically changing object locations. To demonstrate the effectiveness of our VOSA framework, we instantiate a simple version of VOSA on a Kinova Gen3 manipulator and evaluate our system by conducting a user study on three tabletop manipulation tasks. The performance of VOSA matches that of an oracle baseline model that receives privileged knowledge of possible human intents while also requiring significantly less effort than unassisted teleoperation. In more realistic settings, where the set of possible human intents is fully or partially unknown, we demonstrate that VOSA requires less human effort and time than baseline approaches while being preferred by a majority of the participants. Our results demonstrate the efficacy and efficiency of using off-the-shelf vision algorithms to enable flexible and beneficial shared control of a robot manipulator. Code and videos available here: https://sites.google.com/view/zeroshot-sharedautonomy/home.
△ Less
Submitted 14 January, 2025;
originally announced January 2025.
-
Efficient Neural Network Encoding for 3D Color Lookup Tables
Authors:
Vahid Zehtab,
David B. Lindell,
Marcus A. Brubaker,
Michael S. Brown
Abstract:
3D color lookup tables (LUTs) enable precise color manipulation by mapping input RGB values to specific output RGB values. 3D LUTs are instrumental in various applications, including video editing, in-camera processing, photographic filters, computer graphics, and color processing for displays. While an individual LUT does not incur a high memory overhead, software and devices may need to store do…
▽ More
3D color lookup tables (LUTs) enable precise color manipulation by mapping input RGB values to specific output RGB values. 3D LUTs are instrumental in various applications, including video editing, in-camera processing, photographic filters, computer graphics, and color processing for displays. While an individual LUT does not incur a high memory overhead, software and devices may need to store dozens to hundreds of LUTs that can take over 100 MB. This work aims to develop a neural network architecture that can encode hundreds of LUTs in a single compact representation. To this end, we propose a model with a memory footprint of less than 0.25 MB that can reconstruct 512 LUTs with only minor color distortion ($\barΔE_M$ $\leq$ 2.0) over the entire color gamut. We also show that our network can weight colors to provide further quality gains on natural image colors ($\barΔ{E}_M$ $\leq$ 1.0). Finally, we show that minor modifications to the network architecture enable a bijective encoding that produces LUTs that are invertible, allowing for reverse color processing. Our code is available at https://github.com/vahidzee/ennelut.
△ Less
Submitted 19 December, 2024;
originally announced December 2024.
-
Agent-Based Emulation for Deploying Robot Swarm Behaviors
Authors:
Ricardo Vega,
Kevin Zhu,
Connor Mattson,
Daniel S. Brown,
Cameron Nowzari
Abstract:
Despite significant research, robotic swarms have yet to be useful in solving real-world problems, largely due to the difficulty of creating and controlling swarming behaviors in multi-agent systems. Traditional top-down approaches in which a desired emergent behavior is produced often require complex, resource-heavy robots, limiting their practicality. This paper introduces a bottom-up approach b…
▽ More
Despite significant research, robotic swarms have yet to be useful in solving real-world problems, largely due to the difficulty of creating and controlling swarming behaviors in multi-agent systems. Traditional top-down approaches in which a desired emergent behavior is produced often require complex, resource-heavy robots, limiting their practicality. This paper introduces a bottom-up approach by employing an Embodied Agent-Based Modeling and Simulation approach, emphasizing the use of simple robots and identifying conditions that naturally lead to self-organized collective behaviors. Using the Reality-to-Simulation-to-Reality for Swarms (RSRS) process, we tightly integrate real-world experiments with simulations to reproduce known swarm behaviors as well as discovering a novel emergent behavior without aiming to eliminate or even reduce the sim2real gap. This paper presents the development of an Agent-Based Embodiment and Emulation process that balances the importance of running physical swarming experiments and the prohibitively time-consuming process of even setting up and running a single experiment with 20+ robots by leveraging low-fidelity lightweight simulations to enable hypothesis-formation to guide physical experiments. We demonstrate the usefulness of our methods by emulating two known behaviors from the literature and show a third behavior `discovered' by accident.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
Spiking Neural Networks as a Controller for Emergent Swarm Agents
Authors:
Kevin Zhu,
Connor Mattson,
Shay Snyder,
Ricardo Vega,
Daniel S. Brown,
Maryam Parsa,
Cameron Nowzari
Abstract:
Drones which can swarm and loiter in a certain area cost hundreds of dollars, but mosquitos can do the same and are essentially worthless. To control swarms of low-cost robots, researchers may end up spending countless hours brainstorming robot configurations and policies to ``organically" create behaviors which do not need expensive sensors and perception. Existing research explores the possible…
▽ More
Drones which can swarm and loiter in a certain area cost hundreds of dollars, but mosquitos can do the same and are essentially worthless. To control swarms of low-cost robots, researchers may end up spending countless hours brainstorming robot configurations and policies to ``organically" create behaviors which do not need expensive sensors and perception. Existing research explores the possible emergent behaviors in swarms of robots with only a binary sensor and a simple but hand-picked controller structure. Even agents in this highly limited sensing, actuation, and computational capability class can exhibit relatively complex global behaviors such as aggregation, milling, and dispersal, but finding the local interaction rules that enable more collective behaviors remains a significant challenge. This paper investigates the feasibility of training spiking neural networks to find those local interaction rules that result in particular emergent behaviors. In this paper, we focus on simulating a specific milling behavior already known to be producible using very simple binary sensing and acting agents. To do this, we use evolutionary algorithms to evolve not only the parameters (the weights, biases, and delays) of a spiking neural network, but also its structure. To create a baseline, we also show an evolutionary search strategy over the parameters for the incumbent hand-picked binary controller structure. Our simulations show that spiking neural networks can be evolved in binary sensing agents to form a mill.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
Aircraft Radar Altimeter Interference Mitigation Through a CNN-Layer Only Denoising Autoencoder Architecture
Authors:
Samuel B. Brown,
Stephen Young,
Adam Wagenknecht,
Daniel Jakubisin,
Charles E. Thornton,
Aaron Orndorff,
William C. Headley
Abstract:
Denoising autoencoders for signal processing applications have been shown to experience significant difficulty in learning to reconstruct radio frequency communication signals, particularly in the large sample regime. In communication systems, this challenge is primarily due to the need to reconstruct the modulated data stream which is generally highly stochastic in nature. In this work, we take a…
▽ More
Denoising autoencoders for signal processing applications have been shown to experience significant difficulty in learning to reconstruct radio frequency communication signals, particularly in the large sample regime. In communication systems, this challenge is primarily due to the need to reconstruct the modulated data stream which is generally highly stochastic in nature. In this work, we take advantage of this limitation by using the denoising autoencoder to instead remove interfering radio frequency communication signals while reconstructing highly structured FMCW radar signals. More specifically, in this work we show that a CNN-layer only autoencoder architecture can be utilized to improve the accuracy of a radar altimeter's ranging estimate even in severe interference environments consisting of a multitude of interference signals. This is demonstrated through comprehensive performance analysis of an end-to-end FMCW radar altimeter simulation with and without the convolutional layer-only autoencoder. The proposed approach significantly improves interference mitigation in the presence of both narrow-band tone interference as well as wideband QPSK interference in terms of range RMS error, number of false altitude reports, and the peak-to-sidelobe ratio of the resulting range profile. FMCW radar signals of up to 40,000 IQ samples can be reliably reconstructed.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
Melody predominates over harmony in the evolution of musical scales across 96 countries
Authors:
John M McBride,
Elizabeth Phillips,
Patrick E Savage,
Steven Brown,
Tsvi Tlusty
Abstract:
The standard theory of musical scales since antiquity has been based on harmony, rather than melody. While recent analyses provide mixed support for a role of melody as well as harmony, we lack a comparative analysis based on cross-cultural data. We address this longstanding problem through a rigorous computational comparison of the main theories using 1,314 scales from 96 countries. There is near…
▽ More
The standard theory of musical scales since antiquity has been based on harmony, rather than melody. While recent analyses provide mixed support for a role of melody as well as harmony, we lack a comparative analysis based on cross-cultural data. We address this longstanding problem through a rigorous computational comparison of the main theories using 1,314 scales from 96 countries. There is near-universal support for melodic theories, which predict step-sizes of 1-3 semitones. Harmony accounts for the prevalence of certain simple-integer-ratio intervals, particularly for music-theoretic scales from Eurasian societies, which may explain their dominance amongst Western scholars. However, harmony is a poor predictor of scales measured from ethnographic recordings, particularly outside of Eurasia. Overall, we show that the historical emphasis on harmony is misguided and that melody is the primary determinant of the world's musical scales.
△ Less
Submitted 2 July, 2025; v1 submitted 22 August, 2024;
originally announced August 2024.
-
Representation Alignment from Human Feedback for Cross-Embodiment Reward Learning from Mixed-Quality Demonstrations
Authors:
Connor Mattson,
Anurag Aribandi,
Daniel S. Brown
Abstract:
We study the problem of cross-embodiment inverse reinforcement learning, where we wish to learn a reward function from video demonstrations in one or more embodiments and then transfer the learned reward to a different embodiment (e.g., different action space, dynamics, size, shape, etc.). Learning reward functions that transfer across embodiments is important in settings such as teaching a robot…
▽ More
We study the problem of cross-embodiment inverse reinforcement learning, where we wish to learn a reward function from video demonstrations in one or more embodiments and then transfer the learned reward to a different embodiment (e.g., different action space, dynamics, size, shape, etc.). Learning reward functions that transfer across embodiments is important in settings such as teaching a robot a policy via human video demonstrations or teaching a robot to imitate a policy from another robot with a different embodiment. However, prior work has only focused on cases where near-optimal demonstrations are available, which is often difficult to ensure. By contrast, we study the setting of cross-embodiment reward learning from mixed-quality demonstrations. We demonstrate that prior work struggles to learn generalizable reward representations when learning from mixed-quality data. We then analyze several techniques that leverage human feedback for representation learning and alignment to enable effective cross-embodiment learning. Our results give insight into how different representation learning techniques lead to qualitatively different reward shaping behaviors and the importance of human feedback when learning from mixed-quality, mixed-embodiment data.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
NamedCurves: Learned Image Enhancement via Color Naming
Authors:
David Serrano-Lozano,
Luis Herranz,
Michael S. Brown,
Javier Vazquez-Corral
Abstract:
A popular method for enhancing images involves learning the style of a professional photo editor using pairs of training images comprised of the original input with the editor-enhanced version. When manipulating images, many editing tools offer a feature that allows the user to manipulate a limited selection of familiar colors. Editing by color name allows easy adjustment of elements like the "blu…
▽ More
A popular method for enhancing images involves learning the style of a professional photo editor using pairs of training images comprised of the original input with the editor-enhanced version. When manipulating images, many editing tools offer a feature that allows the user to manipulate a limited selection of familiar colors. Editing by color name allows easy adjustment of elements like the "blue" of the sky or the "green" of trees. Inspired by this approach to color manipulation, we propose NamedCurves, a learning-based image enhancement technique that separates the image into a small set of named colors. Our method learns to globally adjust the image for each specific named color via tone curves and then combines the images using an attention-based fusion mechanism to mimic spatial editing. We demonstrate the effectiveness of our method against several competing methods on the well-known Adobe 5K dataset and the PPR10K dataset, showing notable improvements.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Authors:
Teun van der Weij,
Felix Hofstätter,
Ollie Jaffe,
Samuel F. Brown,
Francis Rhys Ward
Abstract:
Trustworthy capability evaluations are crucial for ensuring the safety of AI systems, and are becoming a key component of AI regulation. However, the developers of an AI system, or the AI system itself, may have incentives for evaluations to understate the AI's actual capability. These conflicting interests lead to the problem of sandbagging, which we define as strategic underperformance on an eva…
▽ More
Trustworthy capability evaluations are crucial for ensuring the safety of AI systems, and are becoming a key component of AI regulation. However, the developers of an AI system, or the AI system itself, may have incentives for evaluations to understate the AI's actual capability. These conflicting interests lead to the problem of sandbagging, which we define as strategic underperformance on an evaluation. In this paper we assess sandbagging capabilities in contemporary language models (LMs). We prompt frontier LMs, like GPT-4 and Claude 3 Opus, to selectively underperform on dangerous capability evaluations, while maintaining performance on general (harmless) capability evaluations. Moreover, we find that models can be fine-tuned, on a synthetic dataset, to hide specific capabilities unless given a password. This behaviour generalizes to high-quality, held-out benchmarks such as WMDP. In addition, we show that both frontier and smaller models can be prompted or password-locked to target specific scores on a capability evaluation. We have mediocre success in password-locking a model to mimic the answers a weaker model would give. Overall, our results suggest that capability evaluations are vulnerable to sandbagging. This vulnerability decreases the trustworthiness of evaluations, and thereby undermines important safety decisions regarding the development and deployment of advanced AI systems.
△ Less
Submitted 6 February, 2025; v1 submitted 11 June, 2024;
originally announced June 2024.
-
SCI 3.0: A Web-based Schema Curation Interface for Graphical Event Representations
Authors:
Reece Suchocki,
Mary Martin,
Martha Palmer,
Susan Brown
Abstract:
To understand the complexity of global events, one must navigate a web of interwoven sub-events, identifying those most impactful elements within the larger, abstract macro-event framework at play. This concept can be extended to the field of natural language processing (NLP) through the creation of structured event schemas which can serve as representations of these abstract events. Central to ou…
▽ More
To understand the complexity of global events, one must navigate a web of interwoven sub-events, identifying those most impactful elements within the larger, abstract macro-event framework at play. This concept can be extended to the field of natural language processing (NLP) through the creation of structured event schemas which can serve as representations of these abstract events. Central to our approach is the Schema Curation Interface 3.0 (SCI 3.0), a web application that facilitates real-time editing of event schema properties within a generated graph e.g., adding, removing, or editing sub-events, entities, and relations directly through an interface.
△ Less
Submitted 16 May, 2024; v1 submitted 15 May, 2024;
originally announced May 2024.
-
The Ethics of Advanced AI Assistants
Authors:
Iason Gabriel,
Arianna Manzini,
Geoff Keeling,
Lisa Anne Hendricks,
Verena Rieser,
Hasan Iqbal,
Nenad Tomašev,
Ira Ktena,
Zachary Kenton,
Mikel Rodriguez,
Seliem El-Sayed,
Sasha Brown,
Canfer Akbulut,
Andrew Trask,
Edward Hughes,
A. Stevie Bergman,
Renee Shelby,
Nahema Marchal,
Conor Griffin,
Juan Mateos-Garcia,
Laura Weidinger,
Winnie Street,
Benjamin Lange,
Alex Ingerman,
Alison Lentz
, et al. (32 additional authors not shown)
Abstract:
This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, pro…
▽ More
This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, providing an overview of AI assistants, their technical foundations and potential range of applications. It then explores questions around AI value alignment, well-being, safety and malicious uses. Extending the circle of inquiry further, we next consider the relationship between advanced AI assistants and individual users in more detail, exploring topics such as manipulation and persuasion, anthropomorphism, appropriate relationships, trust and privacy. With this analysis in place, we consider the deployment of advanced assistants at a societal scale, focusing on cooperation, equity and access, misinformation, economic impact, the environment and how best to evaluate advanced AI assistants. Finally, we conclude by providing a range of recommendations for researchers, developers, policymakers and public stakeholders.
△ Less
Submitted 28 April, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI
Authors:
Seliem El-Sayed,
Canfer Akbulut,
Amanda McCroskery,
Geoff Keeling,
Zachary Kenton,
Zaria Jalan,
Nahema Marchal,
Arianna Manzini,
Toby Shevlane,
Shannon Vallor,
Daniel Susser,
Matija Franklin,
Sophie Bridgers,
Harry Law,
Matthew Rahtz,
Murray Shanahan,
Michael Henry Tessler,
Arthur Douillard,
Tom Everitt,
Sasha Brown
Abstract:
Recent generative AI systems have demonstrated more advanced persuasive capabilities and are increasingly permeating areas of life where they can influence decision-making. Generative AI presents a new risk profile of persuasion due the opportunity for reciprocal exchange and prolonged interactions. This has led to growing concerns about harms from AI persuasion and how they can be mitigated, high…
▽ More
Recent generative AI systems have demonstrated more advanced persuasive capabilities and are increasingly permeating areas of life where they can influence decision-making. Generative AI presents a new risk profile of persuasion due the opportunity for reciprocal exchange and prolonged interactions. This has led to growing concerns about harms from AI persuasion and how they can be mitigated, highlighting the need for a systematic study of AI persuasion. The current definitions of AI persuasion are unclear and related harms are insufficiently studied. Existing harm mitigation approaches prioritise harms from the outcome of persuasion over harms from the process of persuasion. In this paper, we lay the groundwork for the systematic study of AI persuasion. We first put forward definitions of persuasive generative AI. We distinguish between rationally persuasive generative AI, which relies on providing relevant facts, sound reasoning, or other forms of trustworthy evidence, and manipulative generative AI, which relies on taking advantage of cognitive biases and heuristics or misrepresenting information. We also put forward a map of harms from AI persuasion, including definitions and examples of economic, physical, environmental, psychological, sociocultural, political, privacy, and autonomy harm. We then introduce a map of mechanisms that contribute to harmful persuasion. Lastly, we provide an overview of approaches that can be used to mitigate against process harms of persuasion, including prompt engineering for manipulation classification and red teaming. Future work will operationalise these mitigations and study the interaction between different types of mechanisms of persuasion.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery
Authors:
Zohre Karimi,
Shing-Hei Ho,
Bao Thach,
Alan Kuntz,
Daniel S. Brown
Abstract:
Automating robotic surgery via learning from demonstration (LfD) techniques is extremely challenging. This is because surgical tasks often involve sequential decision-making processes with complex interactions of physical objects and have low tolerance for mistakes. Prior works assume that all demonstrations are fully observable and optimal, which might not be practical in the real world. This pap…
▽ More
Automating robotic surgery via learning from demonstration (LfD) techniques is extremely challenging. This is because surgical tasks often involve sequential decision-making processes with complex interactions of physical objects and have low tolerance for mistakes. Prior works assume that all demonstrations are fully observable and optimal, which might not be practical in the real world. This paper introduces a sample-efficient method that learns a robust reward function from a limited amount of ranked suboptimal demonstrations consisting of partial-view point cloud observations. The method then learns a policy by optimizing the learned reward function using reinforcement learning (RL). We show that using a learned reward function to obtain a policy is more robust than pure imitation learning. We apply our approach on a physical surgical electrocautery task and demonstrate that our method can perform well even when the provided demonstrations are suboptimal and the observations are high-dimensional point clouds. Code and videos available here: https://sites.google.com/view/lfdinelectrocautery
△ Less
Submitted 15 April, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
Modeling Kinematic Uncertainty of Tendon-Driven Continuum Robots via Mixture Density Networks
Authors:
Jordan Thompson,
Brian Y. Cho,
Daniel S. Brown,
Alan Kuntz
Abstract:
Tendon-driven continuum robot kinematic models are frequently computationally expensive, inaccurate due to unmodeled effects, or both. In particular, unmodeled effects produce uncertainties that arise during the robot's operation that lead to variability in the resulting geometry. We propose a novel solution to these issues through the development of a Gaussian mixture kinematic model. We train a…
▽ More
Tendon-driven continuum robot kinematic models are frequently computationally expensive, inaccurate due to unmodeled effects, or both. In particular, unmodeled effects produce uncertainties that arise during the robot's operation that lead to variability in the resulting geometry. We propose a novel solution to these issues through the development of a Gaussian mixture kinematic model. We train a mixture density network to output a Gaussian mixture model representation of the robot geometry given the current tendon displacements. This model computes a probability distribution that is more representative of the true distribution of geometries at a given configuration than a model that outputs a single geometry, while also reducing the computation time. We demonstrate one use of this model through a trajectory optimization method that explicitly reasons about the workspace uncertainty to minimize the probability of collision.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Exploring Correlation Patterns in the Ethereum Validator Network
Authors:
Simon Brown,
Leonardo Bautista-Gomez
Abstract:
There have been several studies into measuring the level of decentralization in Ethereum through applying various indices to indicate the relative dominance of entities in different domains in the ecosystem. However, these indices do not capture any correlation between those different entities, that could potentially make them the subject of external coercion, or covert collusion. We propose an in…
▽ More
There have been several studies into measuring the level of decentralization in Ethereum through applying various indices to indicate the relative dominance of entities in different domains in the ecosystem. However, these indices do not capture any correlation between those different entities, that could potentially make them the subject of external coercion, or covert collusion. We propose an index that measures the relative dominance of entities based on the application of correlation factors. We posit that this approach produces a more nuanced and accurate index of decentralization.
△ Less
Submitted 22 March, 2024;
originally announced April 2024.
-
Evaluating Frontier Models for Dangerous Capabilities
Authors:
Mary Phuong,
Matthew Aitchison,
Elliot Catt,
Sarah Cogan,
Alexandre Kaskasoli,
Victoria Krakovna,
David Lindner,
Matthew Rahtz,
Yannis Assael,
Sarah Hodkinson,
Heidi Howard,
Tom Lieberum,
Ramana Kumar,
Maria Abi Raad,
Albert Webson,
Lewis Ho,
Sharon Lin,
Sebastian Farquhar,
Marcus Hutter,
Gregoire Deletang,
Anian Ruoss,
Seliem El-Sayed,
Sasha Brown,
Anca Dragan,
Rohin Shah
, et al. (2 additional authors not shown)
Abstract:
To understand the risks posed by a new AI system, we must understand what it can and cannot do. Building on prior work, we introduce a programme of new "dangerous capability" evaluations and pilot them on Gemini 1.0 models. Our evaluations cover four areas: (1) persuasion and deception; (2) cyber-security; (3) self-proliferation; and (4) self-reasoning. We do not find evidence of strong dangerous…
▽ More
To understand the risks posed by a new AI system, we must understand what it can and cannot do. Building on prior work, we introduce a programme of new "dangerous capability" evaluations and pilot them on Gemini 1.0 models. Our evaluations cover four areas: (1) persuasion and deception; (2) cyber-security; (3) self-proliferation; and (4) self-reasoning. We do not find evidence of strong dangerous capabilities in the models we evaluated, but we flag early warning signs. Our goal is to help advance a rigorous science of dangerous capability evaluation, in preparation for future models.
△ Less
Submitted 5 April, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
Bayesian Constraint Inference from User Demonstrations Based on Margin-Respecting Preference Models
Authors:
Dimitris Papadimitriou,
Daniel S. Brown
Abstract:
It is crucial for robots to be aware of the presence of constraints in order to acquire safe policies. However, explicitly specifying all constraints in an environment can be a challenging task. State-of-the-art constraint inference algorithms learn constraints from demonstrations, but tend to be computationally expensive and prone to instability issues. In this paper, we propose a novel Bayesian…
▽ More
It is crucial for robots to be aware of the presence of constraints in order to acquire safe policies. However, explicitly specifying all constraints in an environment can be a challenging task. State-of-the-art constraint inference algorithms learn constraints from demonstrations, but tend to be computationally expensive and prone to instability issues. In this paper, we propose a novel Bayesian method that infers constraints based on preferences over demonstrations. The main advantages of our proposed approach are that it 1) infers constraints without calculating a new policy at each iteration, 2) uses a simple and more realistic ranking of groups of demonstrations, without requiring pairwise comparisons over all demonstrations, and 3) adapts to cases where there are varying levels of constraint violation. Our empirical results demonstrate that our proposed Bayesian approach infers constraints of varying severity, more accurately than state-of-the-art constraint inference methods.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
A Framework for Assurance Audits of Algorithmic Systems
Authors:
Khoa Lam,
Benjamin Lange,
Borhane Blili-Hamelin,
Jovana Davidovic,
Shea Brown,
Ali Hasan
Abstract:
An increasing number of regulations propose AI audits as a mechanism for achieving transparency and accountability for artificial intelligence (AI) systems. Despite some converging norms around various forms of AI auditing, auditing for the purpose of compliance and assurance currently lacks agreed-upon practices, procedures, taxonomies, and standards. We propose the criterion audit as an operatio…
▽ More
An increasing number of regulations propose AI audits as a mechanism for achieving transparency and accountability for artificial intelligence (AI) systems. Despite some converging norms around various forms of AI auditing, auditing for the purpose of compliance and assurance currently lacks agreed-upon practices, procedures, taxonomies, and standards. We propose the criterion audit as an operationalizable compliance and assurance external audit framework. We model elements of this approach after financial auditing practices, and argue that AI audits should similarly provide assurance to their stakeholders about AI organizations' ability to govern their algorithms in ways that mitigate harms and uphold human values. We discuss the necessary conditions for the criterion audit and provide a procedural blueprint for performing an audit engagement in practice. We illustrate how this framework can be adapted to current regulations by deriving the criteria on which bias audits can be performed for in-scope hiring algorithms, as required by the recently effective New York City Local Law 144 of 2021. We conclude by offering a critical discussion on the benefits, inherent limitations, and implementation challenges of applying practices of the more mature financial auditing industry to AI auditing where robust guardrails against quality assurance issues are only starting to emerge. Our discussion -- informed by experiences in performing these audits in practice -- highlights the critical role that an audit ecosystem plays in ensuring the effectiveness of audits.
△ Less
Submitted 28 May, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1326 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 9 May, 2025; v1 submitted 18 December, 2023;
originally announced December 2023.
-
RESIN-EDITOR: A Schema-guided Hierarchical Event Graph Visualizer and Editor
Authors:
Khanh Duy Nguyen,
Zixuan Zhang,
Reece Suchocki,
Sha Li,
Martha Palmer,
Susan Brown,
Jiawei Han,
Heng Ji
Abstract:
In this paper, we present RESIN-EDITOR, an interactive event graph visualizer and editor designed for analyzing complex events. Our RESIN-EDITOR system allows users to render and freely edit hierarchical event graphs extracted from multimedia and multi-document news clusters with guidance from human-curated event schemas. RESIN-EDITOR's unique features include hierarchical graph visualization, com…
▽ More
In this paper, we present RESIN-EDITOR, an interactive event graph visualizer and editor designed for analyzing complex events. Our RESIN-EDITOR system allows users to render and freely edit hierarchical event graphs extracted from multimedia and multi-document news clusters with guidance from human-curated event schemas. RESIN-EDITOR's unique features include hierarchical graph visualization, comprehensive source tracing, and interactive user editing, which is more powerful and versatile than existing Information Extraction (IE) visualization tools. In our evaluation of RESIN-EDITOR, we demonstrate ways in which our tool is effective in understanding complex events and enhancing system performance. The source code, a video demonstration, and a live website for RESIN-EDITOR have been made publicly available.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Creating a Discipline-specific Commons for Infectious Disease Epidemiology
Authors:
Michael M. Wagner,
William Hogan,
John Levander,
Adam Darr,
Matt Diller,
Max Sibilla,
Alexander T. Loiacono. Terence Sperringer, Jr.,
Shawn T. Brown
Abstract:
Objective: To create a commons for infectious disease (ID) epidemiology in which epidemiologists, public health officers, data producers, and software developers can not only share data and software, but receive assistance in improving their interoperability. Materials and Methods: We represented 586 datasets, 54 software, and 24 data formats in OWL 2 and then used logical queries to infer potenti…
▽ More
Objective: To create a commons for infectious disease (ID) epidemiology in which epidemiologists, public health officers, data producers, and software developers can not only share data and software, but receive assistance in improving their interoperability. Materials and Methods: We represented 586 datasets, 54 software, and 24 data formats in OWL 2 and then used logical queries to infer potentially interoperable combinations of software and datasets, as well as statistics about the FAIRness of the collection. We represented the objects in DATS 2.2 and a software metadata schema of our own design. We used these representations as the basis for the Content, Search, FAIR-o-meter, and Workflow pages that constitute the MIDAS Digital Commons. Results: Interoperability was limited by lack of standardization of input and output formats of software. When formats existed, they were human-readable specifications (22/24; 92%); only 3 formats (13%) had machine-readable specifications. Nevertheless, logical search of a triple store based on named data formats was able to identify scores of potentially interoperable combinations of software and datasets. Discussion: We improved the findability and availability of a sample of software and datasets and developed metrics for assessing interoperability. The barriers to interoperability included poor documentation of software input/output formats and little attention to standardization of most types of data in this field. Conclusion: Centralizing and formalizing the representation of digital objects within a commons promotes FAIRness, enables its measurement over time and the identification of potentially interoperable combinations of data and software.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
Exploring Behavior Discovery Methods for Heterogeneous Swarms of Limited-Capability Robots
Authors:
Connor Mattson,
Jeremy C. Clark,
Daniel S. Brown
Abstract:
We study the problem of determining the emergent behaviors that are possible given a functionally heterogeneous swarm of robots with limited capabilities. Prior work has considered behavior search for homogeneous swarms and proposed the use of novelty search over either a hand-specified or learned behavior space followed by clustering to return a taxonomy of emergent behaviors to the user. In this…
▽ More
We study the problem of determining the emergent behaviors that are possible given a functionally heterogeneous swarm of robots with limited capabilities. Prior work has considered behavior search for homogeneous swarms and proposed the use of novelty search over either a hand-specified or learned behavior space followed by clustering to return a taxonomy of emergent behaviors to the user. In this paper, we seek to better understand the role of novelty search and the efficacy of using clustering to discover novel emergent behaviors. Through a large set of experiments and ablations, we analyze the effect of representations, evolutionary search, and various clustering methods in the search for novel behaviors in a heterogeneous swarm. Our results indicate that prior methods fail to discover many interesting behaviors and that an iterative human-in-the-loop discovery process discovers more behaviors than random search, swarm chemistry, and automated behavior discovery. The combined discoveries of our experiments uncover 23 emergent behaviors, 18 of which are novel discoveries. To the best of our knowledge, these are the first known emergent behaviors for heterogeneous swarms of computation-free agents. Videos, code, and appendix are available at the project website: https://sites.google.com/view/heterogeneous-bd-methods
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Quantifying Assistive Robustness Via the Natural-Adversarial Frontier
Authors:
Jerry Zhi-Yang He,
Zackory Erickson,
Daniel S. Brown,
Anca D. Dragan
Abstract:
Our ultimate goal is to build robust policies for robots that assist people. What makes this hard is that people can behave unexpectedly at test time, potentially interacting with the robot outside its training distribution and leading to failures. Even just measuring robustness is a challenge. Adversarial perturbations are the default, but they can paint the wrong picture: they can correspond to…
▽ More
Our ultimate goal is to build robust policies for robots that assist people. What makes this hard is that people can behave unexpectedly at test time, potentially interacting with the robot outside its training distribution and leading to failures. Even just measuring robustness is a challenge. Adversarial perturbations are the default, but they can paint the wrong picture: they can correspond to human motions that are unlikely to occur during natural interactions with people. A robot policy might fail under small adversarial perturbations but work under large natural perturbations. We propose that capturing robustness in these interactive settings requires constructing and analyzing the entire natural-adversarial frontier: the Pareto-frontier of human policies that are the best trade-offs between naturalness and low robot performance. We introduce RIGID, a method for constructing this frontier by training adversarial human policies that trade off between minimizing robot reward and acting human-like (as measured by a discriminator). On an Assistive Gym task, we use RIGID to analyze the performance of standard collaborative Reinforcement Learning, as well as the performance of existing methods meant to increase robustness. We also compare the frontier RIGID identifies with the failures identified in expert adversarial interaction, and with naturally-occurring failures during user interaction. Overall, we find evidence that RIGID can provide a meaningful measure of robustness predictive of deployment performance, and uncover failure cases in human-robot interaction that are difficult to find manually. https://ood-human.github.io.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Indirect Swarm Control: Characterization and Analysis of Emergent Swarm Behaviors
Authors:
Ricardo Vega,
Connor Mattson,
Daniel S. Brown,
Cameron Nowzari
Abstract:
Emergence and emergent behaviors are often defined as cases where changes in local interactions between agents at a lower level effectively changes what occurs in the higher level of the system (i.e., the whole swarm) and its properties. However, the manner in which these collective emergent behaviors self-organize is less understood. The focus of this paper is in presenting a new framework for ch…
▽ More
Emergence and emergent behaviors are often defined as cases where changes in local interactions between agents at a lower level effectively changes what occurs in the higher level of the system (i.e., the whole swarm) and its properties. However, the manner in which these collective emergent behaviors self-organize is less understood. The focus of this paper is in presenting a new framework for characterizing the conditions that lead to different macrostates and how to predict/analyze their macroscopic properties, allowing us to indirectly engineer the same behaviors from the bottom up by tuning their environmental conditions rather than local interaction rules. We then apply this framework to a simple system of binary sensing and acting agents as an example to see if a re-framing of this swarms problem can help us push the state of the art forward. By first creating some working definitions of macrostates in a particular swarm system, we show how agent-based modeling may be combined with control theory to enable a generalized understanding of controllable emergent processes without needing to simulate everything. Whereas phase diagrams can generally only be created through Monte Carlo simulations or sweeping through ranges of parameters in a simulator, we develop closed-form functions that can immediately produce them revealing an infinite set of swarm parameter combinations that can lead to a specifically chosen self-organized behavior. While the exact methods are still under development, we believe simply laying out a potential path towards solutions that have evaded our traditional methods using a novel method is worth considering. Our results are characterized through both simulations and real experiments on ground robots.
△ Less
Submitted 28 March, 2024; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Examining Autoexposure for Challenging Scenes
Authors:
SaiKiran Tedla,
Beixuan Yang,
Michael S. Brown
Abstract:
Autoexposure (AE) is a critical step applied by camera systems to ensure properly exposed images. While current AE algorithms are effective in well-lit environments with constant illumination, these algorithms still struggle in environments with bright light sources or scenes with abrupt changes in lighting. A significant hurdle in developing new AE algorithms for challenging environments, especia…
▽ More
Autoexposure (AE) is a critical step applied by camera systems to ensure properly exposed images. While current AE algorithms are effective in well-lit environments with constant illumination, these algorithms still struggle in environments with bright light sources or scenes with abrupt changes in lighting. A significant hurdle in developing new AE algorithms for challenging environments, especially those with time-varying lighting, is the lack of suitable image datasets. To address this issue, we have captured a new 4D exposure dataset that provides a large solution space (i.e., shutter speed range from (1/500 to 15 seconds) over a temporal sequence with moving objects, bright lights, and varying lighting. In addition, we have designed a software platform to allow AE algorithms to be used in a plug-and-play manner with the dataset. Our dataset and associate platform enable repeatable evaluation of different AE algorithms and provide a much-needed starting point to develop better AE methods. We examine several existing AE strategies using our dataset and show that most users prefer a simple saliency method for challenging lighting conditions.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
Contextual Reliability: When Different Features Matter in Different Contexts
Authors:
Gaurav Ghosal,
Amrith Setlur,
Daniel S. Brown,
Anca D. Dragan,
Aditi Raghunathan
Abstract:
Deep neural networks often fail catastrophically by relying on spurious correlations. Most prior work assumes a clear dichotomy into spurious and reliable features; however, this is often unrealistic. For example, most of the time we do not want an autonomous car to simply copy the speed of surrounding cars -- we don't want our car to run a red light if a neighboring car does so. However, we canno…
▽ More
Deep neural networks often fail catastrophically by relying on spurious correlations. Most prior work assumes a clear dichotomy into spurious and reliable features; however, this is often unrealistic. For example, most of the time we do not want an autonomous car to simply copy the speed of surrounding cars -- we don't want our car to run a red light if a neighboring car does so. However, we cannot simply enforce invariance to next-lane speed, since it could provide valuable information about an unobservable pedestrian at a crosswalk. Thus, universally ignoring features that are sometimes (but not always) reliable can lead to non-robust performance. We formalize a new setting called contextual reliability which accounts for the fact that the "right" features to use may vary depending on the context. We propose and analyze a two-stage framework called Explicit Non-spurious feature Prediction (ENP) which first identifies the relevant features to use for a given context, then trains a model to rely exclusively on these features. Our work theoretically and empirically demonstrates the advantages of ENP over existing methods and provides new benchmarks for contextual reliability.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.