-
Walrus: An Efficient Decentralized Storage Network
Authors:
George Danezis,
Giacomo Giuliari,
Eleftherios Kokoris Kogias,
Markus Legner,
Jean-Pierre Smith,
Alberto Sonnino,
Karl Wüst
Abstract:
Decentralized storage systems face a fundamental trade-off between replication overhead, recovery efficiency, and security guarantees. Current approaches either rely on full replication, incurring substantial storage costs, or employ trivial erasure coding schemes that struggle with efficient recovery especially under high storage-node churn. We present Walrus, a novel decentralized blob storage s…
▽ More
Decentralized storage systems face a fundamental trade-off between replication overhead, recovery efficiency, and security guarantees. Current approaches either rely on full replication, incurring substantial storage costs, or employ trivial erasure coding schemes that struggle with efficient recovery especially under high storage-node churn. We present Walrus, a novel decentralized blob storage system that addresses these limitations through multiple technical innovations. At the core of Walrus is RedStuff, a two-dimensional erasure coding protocol that achieves high security with only 4.5x replication factor, while enabling self-healing recovery that requires bandwidth proportional to only the lost data $(O(|blob|/n)$ versus $O(|blob|)$ in traditional systems). Crucially, RedStuff is the first protocol to support storage challenges in asynchronous networks, preventing adversaries from exploiting network delays to pass verification without actually storing data. Walrus also introduces a novel multi-stage epoch change protocol that efficiently handles storage node churn while maintaining uninterrupted availability during committee transitions. Our system incorporates authenticated data structures to defend against malicious clients and ensures data consistency throughout storage and retrieval processes. Experimental evaluation demonstrates that Walrus achieves practical performance at scale, making it suitable for a wide range of decentralized applications requiring high-integrity, available blob storage with reasonable overhead.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
Aliasing Reduction in Neural Amp Modeling by Smoothing Activations
Authors:
Ryota Sato,
Julius O. Smith III
Abstract:
The increasing demand for high-quality digital emulations of analog audio hardware such as vintage guitar amplifiers has led to numerous works in neural-network-based black-box modeling, with deep learning architectures like WaveNet showing promising results. However, a key limitation in all of these models is the aliasing artifacts that arise from the use of nonlinear activation functions in neur…
▽ More
The increasing demand for high-quality digital emulations of analog audio hardware such as vintage guitar amplifiers has led to numerous works in neural-network-based black-box modeling, with deep learning architectures like WaveNet showing promising results. However, a key limitation in all of these models is the aliasing artifacts that arise from the use of nonlinear activation functions in neural networks. In this paper, we investigate novel and modified activation functions aimed at mitigating aliasing within neural amplifier models. Supporting this, we introduce a novel metric, the Aliasing-to-Signal Ratio (ASR), which quantitatively assesses the level of aliasing with high accuracy. Measuring also the conventional Error-to-Signal Ratio (ESR), we conducted studies on a range of preexisting and modern activation functions with varying stretch factors. Our findings confirmed that activation functions with smoother curves tend to achieve lower ASR values, indicating a noticeable reduction in aliasing. Notably, this improvement in aliasing reduction was achievable without a substantial increase in ESR, demonstrating the potential for high modeling accuracy with reduced aliasing in neural amp models.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
Dargana: fine-tuning EarthPT for dynamic tree canopy mapping from space
Authors:
Michael J. Smith,
Luke Fleming,
James E. Geach,
Ryan J. Roberts,
Freddie Kalaitzis,
James Banister
Abstract:
We present Dargana, a fine-tuned variant of the EarthPT time-series foundation model that achieves specialisation using <3% of its pre-training data volume and 5% of its pre-training compute. Dargana is fine-tuned to generate regularly updated classification of tree canopy cover at 10m resolution, distinguishing conifer and broadleaved tree types. Using Cornwall, UK, as a test case, the model achi…
▽ More
We present Dargana, a fine-tuned variant of the EarthPT time-series foundation model that achieves specialisation using <3% of its pre-training data volume and 5% of its pre-training compute. Dargana is fine-tuned to generate regularly updated classification of tree canopy cover at 10m resolution, distinguishing conifer and broadleaved tree types. Using Cornwall, UK, as a test case, the model achieves a pixel-level ROC-AUC of 0.98 and a PR-AUC of 0.83 on unseen satellite imagery. Dargana can identify fine structures like hedgerows and coppice below the training sample limit, and can track temporal changes to canopy cover such as new woodland establishment. Our results demonstrate how pre-trained Large Observation Models like EarthPT can be specialised for granular, dynamic land cover monitoring from space, providing a valuable, scalable tool for natural capital management and conservation.
△ Less
Submitted 24 April, 2025;
originally announced April 2025.
-
Fried Parameter Estimation from Single Wavefront Sensor Image with Artificial Neural Networks
Authors:
Jeffrey Smith,
Taisei Fujii,
Jesse Cranney,
Charles Gretton
Abstract:
Atmospheric turbulence degrades the quality of astronomical observations in ground-based telescopes, leading to distorted and blurry images. Adaptive Optics (AO) systems are designed to counteract these effects, using atmospheric measurements captured by a wavefront sensor to make real-time corrections to the incoming wavefront. The Fried parameter, r0, characterises the strength of atmospheric tu…
▽ More
Atmospheric turbulence degrades the quality of astronomical observations in ground-based telescopes, leading to distorted and blurry images. Adaptive Optics (AO) systems are designed to counteract these effects, using atmospheric measurements captured by a wavefront sensor to make real-time corrections to the incoming wavefront. The Fried parameter, r0, characterises the strength of atmospheric turbulence and is an essential control parameter for optimising the performance of AO systems and more recently sky profiling for Free Space Optical (FSO) communication channels. In this paper, we develop a novel data-driven approach, adapting machine learning methods from computer vision for Fried parameter estimation from a single Shack-Hartmann or pyramid wavefront sensor image. Using these data-driven methods, we present a detailed simulation-based evaluation of our approach using the open-source COMPASS AO simulation tool to evaluate both the Shack-Hartmann and pyramid wavefront sensors. Our evaluation is over a range of guide star magnitudes, and realistic noise, atmospheric and instrument conditions. Remarkably, we are able to develop a single network-based estimator that is accurate in both open and closed-loop AO configurations. Our method accurately estimates the Fried parameter from a single WFS image directly from AO telemetry to a few millimetres. Our approach is suitable for real time control, exhibiting 0.83ms r0 inference times on retail NVIDIA RTX 3090 GPU hardware, and thereby demonstrating a compelling economic solution for use in real-time instrument control.
△ Less
Submitted 23 April, 2025;
originally announced April 2025.
-
Optimizing Data Distribution and Kernel Performance for Efficient Training of Chemistry Foundation Models: A Case Study with MACE
Authors:
Jesun Firoz,
Franco Pellegrini,
Mario Geiger,
Darren Hsu,
Jenna A. Bilbrey,
Han-Yi Chou,
Maximilian Stadler,
Markus Hoehnerbach,
Tingyu Wang,
Dejun Lin,
Emine Kucukbenli,
Henry W. Sprueill,
Ilyes Batatia,
Sotiris S. Xantheas,
MalSoon Lee,
Chris Mundy,
Gabor Csanyi,
Justin S. Smith,
Ponnuswamy Sadayappan,
Sutanay Choudhury
Abstract:
Chemistry Foundation Models (CFMs) that leverage Graph Neural Networks (GNNs) operating on 3D molecular graph structures are becoming indispensable tools for computational chemists and materials scientists. These models facilitate the understanding of matter and the discovery of new molecules and materials. In contrast to GNNs operating on a large homogeneous graphs, GNNs used by CFMs process a la…
▽ More
Chemistry Foundation Models (CFMs) that leverage Graph Neural Networks (GNNs) operating on 3D molecular graph structures are becoming indispensable tools for computational chemists and materials scientists. These models facilitate the understanding of matter and the discovery of new molecules and materials. In contrast to GNNs operating on a large homogeneous graphs, GNNs used by CFMs process a large number of geometric graphs of varying sizes, requiring different optimization strategies than those developed for large homogeneous GNNs. This paper presents optimizations for two critical phases of CFM training: data distribution and model training, targeting MACE - a state-of-the-art CFM. We address the challenge of load balancing in data distribution by formulating it as a multi-objective bin packing problem. We propose an iterative algorithm that provides a highly effective, fast, and practical solution, ensuring efficient data distribution. For the training phase, we identify symmetric tensor contraction as the key computational kernel in MACE and optimize this kernel to improve the overall performance. Our combined approach of balanced data distribution and kernel optimization significantly enhances the training process of MACE. Experimental results demonstrate a substantial speedup, reducing per-epoch execution time for training from 12 to 2 minutes on 740 GPUs with a 2.6M sample dataset.
△ Less
Submitted 14 April, 2025;
originally announced April 2025.
-
AstroLLaVA: towards the unification of astronomical data and natural language
Authors:
Sharaf Zaman,
Michael J. Smith,
Pranav Khetarpal,
Rishabh Chakrabarty,
Michele Ginolfi,
Marc Huertas-Company,
Maja Jabłońska,
Sandor Kruk,
Matthieu Le Lain,
Sergio José Rodríguez Méndez,
Dimitrios Tanoglidis
Abstract:
We present AstroLLaVA, a vision language model for astronomy that enables interaction with astronomical imagery through natural dialogue. By fine-tuning the LLaVA model on a diverse dataset of $\sim$30k images with captions and question-answer pairs sourced from NASA's `Astronomy Picture of the Day', the European Southern Observatory, and the NASA/ESA Hubble Space Telescope, we create a model capa…
▽ More
We present AstroLLaVA, a vision language model for astronomy that enables interaction with astronomical imagery through natural dialogue. By fine-tuning the LLaVA model on a diverse dataset of $\sim$30k images with captions and question-answer pairs sourced from NASA's `Astronomy Picture of the Day', the European Southern Observatory, and the NASA/ESA Hubble Space Telescope, we create a model capable of answering open-ended questions about astronomical concepts depicted visually. Our two-stage fine-tuning process adapts the model to both image captioning and visual question answering in the astronomy domain. We demonstrate AstroLLaVA's performance on an astronomical visual question answering benchmark and release the model weights, code, and training set to encourage further open source work in this space. Finally, we suggest a roadmap towards general astronomical data alignment with pre-trained language models, and provide an open space for collaboration towards this end for interested researchers.
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
ChildlikeSHAPES: Semantic Hierarchical Region Parsing for Animating Figure Drawings
Authors:
Astitva Srivastava,
Harrison Jesse Smith,
Thu Nguyen-Phuoc,
Yuting Ye
Abstract:
Childlike human figure drawings represent one of humanity's most accessible forms of character expression, yet automatically analyzing their contents remains a significant challenge. While semantic segmentation of realistic humans has recently advanced considerably, existing models often fail when confronted with the abstract, representational nature of childlike drawings. This semantic understand…
▽ More
Childlike human figure drawings represent one of humanity's most accessible forms of character expression, yet automatically analyzing their contents remains a significant challenge. While semantic segmentation of realistic humans has recently advanced considerably, existing models often fail when confronted with the abstract, representational nature of childlike drawings. This semantic understanding is a crucial prerequisite for animation tools that seek to modify figures while preserving their unique style. To help achieve this, we propose a novel hierarchical segmentation model, built upon the architecture and pre-trained SAM, to quickly and accurately obtain these semantic labels. Our model achieves higher accuracy than state-of-the-art segmentation models focused on realistic humans and cartoon figures, even after fine-tuning. We demonstrate the value of our model for semantic segmentation through multiple applications: a fully automatic facial animation pipeline, a figure relighting pipeline, improvements to an existing childlike human figure drawing animation method, and generalization to out-of-domain figures. Finally, to support future work in this area, we introduce a dataset of 16,000 childlike drawings with pixel-level annotations across 25 semantic categories. Our work can enable entirely new, easily accessible tools for hand-drawn character animation, and our dataset can enable new lines of inquiry in a variety of graphics and human-centric research fields.
△ Less
Submitted 10 April, 2025;
originally announced April 2025.
-
Dual Boost-Driven Graph-Level Clustering Network
Authors:
John Smith,
Wenxuan Tu,
Junlong Wu,
Wenxin Zhang,
Jingxin Liu,
Haotian Wang,
Jieren Cheng,
Huajie Lei,
Guangzhen Yao,
Lingren Wang,
Mengfei Li,
Renda Han,
Yu Li
Abstract:
Graph-level clustering remains a pivotal yet formidable challenge in graph learning. Recently, the integration of deep learning with representation learning has demonstrated notable advancements, yielding performance enhancements to a certain degree. However, existing methods suffer from at least one of the following issues: 1. the original graph structure has noise, and 2. during feature propagat…
▽ More
Graph-level clustering remains a pivotal yet formidable challenge in graph learning. Recently, the integration of deep learning with representation learning has demonstrated notable advancements, yielding performance enhancements to a certain degree. However, existing methods suffer from at least one of the following issues: 1. the original graph structure has noise, and 2. during feature propagation and pooling processes, noise is gradually aggregated into the graph-level embeddings through information propagation. Consequently, these two limitations mask clustering-friendly information, leading to suboptimal graph-level clustering performance. To this end, we propose a novel Dual Boost-Driven Graph-Level Clustering Network (DBGCN) to alternately promote graph-level clustering and filtering out interference information in a unified framework. Specifically, in the pooling step, we evaluate the contribution of features at the global and optimize them using a learnable transformation matrix to obtain high-quality graph-level representation, such that the model's reasoning capability can be improved. Moreover, to enable reliable graph-level clustering, we first identify and suppress information detrimental to clustering by evaluating similarities between graph-level representations, providing more accurate guidance for multi-view fusion. Extensive experiments demonstrated that DBGCN outperforms the state-of-the-art graph-level clustering methods on six benchmark datasets.
△ Less
Submitted 13 April, 2025; v1 submitted 8 April, 2025;
originally announced April 2025.
-
A Survey on Hypothesis Generation for Scientific Discovery in the Era of Large Language Models
Authors:
Atilla Kaan Alkan,
Shashwat Sourav,
Maja Jablonska,
Simone Astarita,
Rishabh Chakrabarty,
Nikhil Garuda,
Pranav Khetarpal,
Maciej Pióro,
Dimitrios Tanoglidis,
Kartheik G. Iyer,
Mugdha S. Polimera,
Michael J. Smith,
Tirthankar Ghosal,
Marc Huertas-Company,
Sandor Kruk,
Kevin Schawinski,
Ioana Ciucă
Abstract:
Hypothesis generation is a fundamental step in scientific discovery, yet it is increasingly challenged by information overload and disciplinary fragmentation. Recent advances in Large Language Models (LLMs) have sparked growing interest in their potential to enhance and automate this process. This paper presents a comprehensive survey of hypothesis generation with LLMs by (i) reviewing existing me…
▽ More
Hypothesis generation is a fundamental step in scientific discovery, yet it is increasingly challenged by information overload and disciplinary fragmentation. Recent advances in Large Language Models (LLMs) have sparked growing interest in their potential to enhance and automate this process. This paper presents a comprehensive survey of hypothesis generation with LLMs by (i) reviewing existing methods, from simple prompting techniques to more complex frameworks, and proposing a taxonomy that categorizes these approaches; (ii) analyzing techniques for improving hypothesis quality, such as novelty boosting and structured reasoning; (iii) providing an overview of evaluation strategies; and (iv) discussing key challenges and future directions, including multimodal integration and human-AI collaboration. Our survey aims to serve as a reference for researchers exploring LLMs for hypothesis generation.
△ Less
Submitted 7 April, 2025;
originally announced April 2025.
-
Euclid Quick Data Release (Q1). Active galactic nuclei identification using diffusion-based inpainting of Euclid VIS images
Authors:
Euclid Collaboration,
G. Stevens,
S. Fotopoulou,
M. N. Bremer,
T. Matamoro Zatarain,
K. Jahnke,
B. Margalef-Bentabol,
M. Huertas-Company,
M. J. Smith,
M. Walmsley,
M. Salvato,
M. Mezcua,
A. Paulino-Afonso,
M. Siudek,
M. Talia,
F. Ricci,
W. Roster,
N. Aghanim,
B. Altieri,
S. Andreon,
H. Aussel,
C. Baccigalupi,
M. Baldi,
S. Bardelli,
P. Battaglia
, et al. (249 additional authors not shown)
Abstract:
Light emission from galaxies exhibit diverse brightness profiles, influenced by factors such as galaxy type, structural features and interactions with other galaxies. Elliptical galaxies feature more uniform light distributions, while spiral and irregular galaxies have complex, varied light profiles due to their structural heterogeneity and star-forming activity. In addition, galaxies with an acti…
▽ More
Light emission from galaxies exhibit diverse brightness profiles, influenced by factors such as galaxy type, structural features and interactions with other galaxies. Elliptical galaxies feature more uniform light distributions, while spiral and irregular galaxies have complex, varied light profiles due to their structural heterogeneity and star-forming activity. In addition, galaxies with an active galactic nucleus (AGN) feature intense, concentrated emission from gas accretion around supermassive black holes, superimposed on regular galactic light, while quasi-stellar objects (QSO) are the extreme case of the AGN emission dominating the galaxy. The challenge of identifying AGN and QSO has been discussed many times in the literature, often requiring multi-wavelength observations. This paper introduces a novel approach to identify AGN and QSO from a single image. Diffusion models have been recently developed in the machine-learning literature to generate realistic-looking images of everyday objects. Utilising the spatial resolving power of the Euclid VIS images, we created a diffusion model trained on one million sources, without using any source pre-selection or labels. The model learns to reconstruct light distributions of normal galaxies, since the population is dominated by them. We condition the prediction of the central light distribution by masking the central few pixels of each source and reconstruct the light according to the diffusion model. We further use this prediction to identify sources that deviate from this profile by examining the reconstruction error of the few central pixels regenerated in each source's core. Our approach, solely using VIS imaging, features high completeness compared to traditional methods of AGN and QSO selection, including optical, near-infrared, mid-infrared, and X-rays. [abridged]
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Simulating Raman Scattering Impairments with Depolarization Noise in Quantum-Classical Links
Authors:
Jake Smith,
Roberto Proietti
Abstract:
We model spontaneous Raman scattering noise in polarization-encoded quantum communication channels co-propagating with classical signals using the depolarization channel. Utilizing NetSquid simulations, we validate the model against demonstrations of qubit transmission, entanglement distribution, and teleportation.
We model spontaneous Raman scattering noise in polarization-encoded quantum communication channels co-propagating with classical signals using the depolarization channel. Utilizing NetSquid simulations, we validate the model against demonstrations of qubit transmission, entanglement distribution, and teleportation.
△ Less
Submitted 18 March, 2025; v1 submitted 17 March, 2025;
originally announced March 2025.
-
TetraGrip: Sensor-Driven Multi-Suction Reactive Object Manipulation in Cluttered Scenes
Authors:
Paolo Torrado,
Joshua Levin,
Markus Grotz,
Joshua Smith
Abstract:
Warehouse robotic systems equipped with vacuum grippers must reliably grasp a diverse range of objects from densely packed shelves. However, these environments present significant challenges, including occlusions, diverse object orientations, stacked and obstructed items, and surfaces that are difficult to suction. We introduce \tetra, a novel vacuum-based grasping strategy featuring four suction…
▽ More
Warehouse robotic systems equipped with vacuum grippers must reliably grasp a diverse range of objects from densely packed shelves. However, these environments present significant challenges, including occlusions, diverse object orientations, stacked and obstructed items, and surfaces that are difficult to suction. We introduce \tetra, a novel vacuum-based grasping strategy featuring four suction cups mounted on linear actuators. Each actuator is equipped with an optical time-of-flight (ToF) proximity sensor, enabling reactive grasping.
We evaluate \tetra in a warehouse-style setting, demonstrating its ability to manipulate objects in stacked and obstructed configurations. Our results show that our RL-based policy improves picking success in stacked-object scenarios by 22.86\% compared to a single-suction gripper. Additionally, we demonstrate that TetraGrip can successfully grasp objects in scenarios where a single-suction gripper fails due to physical limitations, specifically in two cases: (1) picking an object occluded by another object and (2) retrieving an object in a complex scenario. These findings highlight the advantages of multi-actuated, suction-based grasping in unstructured warehouse environments. The project website is available at: \href{https://tetragrip.github.io/}{https://tetragrip.github.io/}.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
Animating Childlike Drawings with 2.5D Character Rigs
Authors:
Harrison Jesse Smith,
Nicky He,
Yuting Ye
Abstract:
Drawing is a fun and intuitive way to create a character, accessible even to small children. However, animating 2D figure drawings is a much more challenging task, requiring specialized tools and skills. Bringing 2D figures to 3D so they can be animated and consumed in immersive media poses an even greater challenge. Moreover, it is desirable to preserve the unique style and identity of the figure…
▽ More
Drawing is a fun and intuitive way to create a character, accessible even to small children. However, animating 2D figure drawings is a much more challenging task, requiring specialized tools and skills. Bringing 2D figures to 3D so they can be animated and consumed in immersive media poses an even greater challenge. Moreover, it is desirable to preserve the unique style and identity of the figure when it is being animated and viewed from different perspectives. In this work, we present an approachable and easy-to-create 2.5D character model and retargeting technique that can apply complex 3D skeletal motion, including rotation within the transverse plane, onto a single childlike figure drawing in a style-preserving manner in realtime. Because our solution is view-dependent, the resulting character is well-suited for animation in both 2D and 3D contexts. We also present a novel annotation study motivating our system design decisions and a pair of user studies validating the usefulness and appeal of our solution. We showcase the generality of our system in a range of 2D and 3D applications.
△ Less
Submitted 25 February, 2025;
originally announced February 2025.
-
A hierarchical approach for assessing the vulnerability of tree-based classification models to membership inference attack
Authors:
Richard J. Preen,
Jim Smith
Abstract:
Machine learning models can inadvertently expose confidential properties of their training data, making them vulnerable to membership inference attacks (MIA). While numerous evaluation methods exist, many require computationally expensive processes, such as training multiple shadow models. This article presents two new complementary approaches for efficiently identifying vulnerable tree-based mode…
▽ More
Machine learning models can inadvertently expose confidential properties of their training data, making them vulnerable to membership inference attacks (MIA). While numerous evaluation methods exist, many require computationally expensive processes, such as training multiple shadow models. This article presents two new complementary approaches for efficiently identifying vulnerable tree-based models: an ante-hoc analysis of hyperparameter choices and a post-hoc examination of trained model structure. While these new methods cannot certify whether a model is safe from MIA, they provide practitioners with a means to significantly reduce the number of models that need to undergo expensive MIA assessment through a hierarchical filtering approach.
More specifically, it is shown that the rank order of disclosure risk for different hyperparameter combinations remains consistent across datasets, enabling the development of simple, human-interpretable rules for identifying relatively high-risk models before training. While this ante-hoc analysis cannot determine absolute safety since this also depends on the specific dataset, it allows the elimination of unnecessarily risky configurations during hyperparameter tuning. Additionally, computationally inexpensive structural metrics serve as indicators of MIA vulnerability, providing a second filtering stage to identify risky models after training but before conducting expensive attacks. Empirical results show that hyperparameter-based risk prediction rules can achieve high accuracy in predicting the most at risk combinations of hyperparameters across different tree-based model types, while requiring no model training. Moreover, target model accuracy is not seen to correlate with privacy risk, suggesting opportunities to optimise model configurations for both performance and privacy.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing
Authors:
James Seale Smith,
Chi-Heng Lin,
Shikhar Tuli,
Haris Jeelani,
Shangqian Gao,
Yilin Shen,
Hongxia Jin,
Yen-Chang Hsu
Abstract:
The rapid proliferation of large language models (LLMs) in natural language processing (NLP) has created a critical need for techniques that enable efficient deployment on memory-constrained devices without compromising performance. We present a method to prune LLMs that selectively prunes model blocks based on an importance score and replaces them with a low-parameter replacement strategy. Specif…
▽ More
The rapid proliferation of large language models (LLMs) in natural language processing (NLP) has created a critical need for techniques that enable efficient deployment on memory-constrained devices without compromising performance. We present a method to prune LLMs that selectively prunes model blocks based on an importance score and replaces them with a low-parameter replacement strategy. Specifically, we propose a principled metric to replace each pruned block using a weight-sharing mechanism that leverages unpruned counterparts from the model and block-specific low-rank adapters. Furthermore, we facilitate the learning of these replacement blocks with output feature normalization and an adapter initialization scheme built on low-rank SVD reconstructions. Empirical evaluations demonstrate substantial performance gains over existing methods, achieving state-of-the-art performance on 5/6 benchmarks for a compression rate of 30% and 6/6 benchmarks for a compression rate of 40%. We also demonstrate that our approach can extend smaller models, boosting performance on 6/6 benchmarks using only ~0.3% tokens of extended training with minimal additional parameter costs.
△ Less
Submitted 31 January, 2025; v1 submitted 24 January, 2025;
originally announced January 2025.
-
The Role of Generative AI in Software Student CollaborAItion
Authors:
Natalie Kiesler,
Jacqueline Smith,
Juho Leinonen,
Armando Fox,
Stephen MacNeil,
Petri Ihantola
Abstract:
Collaboration is a crucial part of computing education. The increase in AI capabilities over the last couple of years is bound to profoundly affect all aspects of systems and software engineering, including collaboration. In this position paper, we consider a scenario where AI agents would be able to take on any role in collaborative processes in computing education. We outline these roles, the ac…
▽ More
Collaboration is a crucial part of computing education. The increase in AI capabilities over the last couple of years is bound to profoundly affect all aspects of systems and software engineering, including collaboration. In this position paper, we consider a scenario where AI agents would be able to take on any role in collaborative processes in computing education. We outline these roles, the activities and group dynamics that software development currently include, and discuss if and in what way AI could facilitate these roles and activities. The goal of our work is to envision and critically examine potential futures. We present scenarios suggesting how AI can be integrated into existing collaborations. These are contrasted by design fictions that help demonstrate the new possibilities and challenges for computing education in the AI era.
△ Less
Submitted 23 January, 2025;
originally announced January 2025.
-
The Generative AI Ethics Playbook
Authors:
Jessie J. Smith,
Wesley Hanwen Deng,
William H. Smith,
Maarten Sap,
Nicole DeCario,
Jesse Dodge
Abstract:
The Generative AI Ethics Playbook provides guidance for identifying and mitigating risks of machine learning systems across various domains, including natural language processing, computer vision, and generative AI. This playbook aims to assist practitioners in diagnosing potential harms that may arise during the design, development, and deployment of datasets and models. It offers concrete strate…
▽ More
The Generative AI Ethics Playbook provides guidance for identifying and mitigating risks of machine learning systems across various domains, including natural language processing, computer vision, and generative AI. This playbook aims to assist practitioners in diagnosing potential harms that may arise during the design, development, and deployment of datasets and models. It offers concrete strategies and resources for mitigating these risks, to help minimize negative impacts on users and society. Drawing on current best practices in both research and ethical considerations, this playbook aims to serve as a comprehensive resource for AI/ML practitioners. The intended audience of this playbook includes machine learning researchers, engineers, and practitioners who are involved in the creation and implementation of generative and multimodal models (e.g., text-to-text, image-to-image, text-to-image, text-to-video).
Specifically, we provide transparency/documentation checklists, topics of interest, common questions, examples of harms through case studies, and resources and strategies to mitigate harms throughout the Generative AI lifecycle. This playbook was made collaboratively over the course of 16 months through extensive literature review of over 100 resources and peer-reviewed articles, as well as through an initial group brainstorming session with 18 interdisciplinary AI ethics experts from industry and academia, and with additional feedback from 8 experts (5 of whom were in the initial brainstorming session).
We note that while this playbook provides examples, discussion, and harm mitigation strategies, research in this area is ongoing. Our playbook aims to be a practically useful survey, taking a high-level view rather than aiming for covering the entire existing body of research.
△ Less
Submitted 17 December, 2024;
originally announced January 2025.
-
Electrostatic Clutches Enable Simultaneous Mechanical Multiplexing
Authors:
Timothy E. Amish,
Jeffrey T. Auletta,
Chad C. Kessens,
Joshua R. Smith,
Jeffrey I. Lipton
Abstract:
Actuating robotic systems with multiple degrees of freedom (DoF) traditionally requires numerous motors, leading to increased size, weight, cost, and power consumption. Mechanical multiplexing offers a solution by enabling a single actuator to control multiple DoF. However, existing multiplexers have either been limited to electrically controlled time-based multiplexing that control one DoF at a t…
▽ More
Actuating robotic systems with multiple degrees of freedom (DoF) traditionally requires numerous motors, leading to increased size, weight, cost, and power consumption. Mechanical multiplexing offers a solution by enabling a single actuator to control multiple DoF. However, existing multiplexers have either been limited to electrically controlled time-based multiplexing that control one DoF at a time or have relied on mechanical switching to control multiple DoF simultaneously. There is a strong need for a system that can perform electrically controlled multiplexing for both time-based and simultaneous control of multiple DoF. This study introduces a novel electrostatic capstan clutch-based mechanical multiplexer that enables high-force, single-motor control of multiple DoF. Here, we show that our system achieves both single-input-single-output (SISO) and single-input-multipleoutput (SIMO) actuation, allowing bidirectional control and position holding with minimal power consumption. Each output can actuate a 22.24 N load, limited by clutch performance, up to 5 cm. The number of outputs and actuation length is currently limited by the length of the drive shaft. We demonstrate the integration of our system into a 4-DoF commercial robotic hand using a single motor. These findings show that electrostatic clutchbased multiplexing provides a scalable and energy-efficient design solution for high-DoF robotic platforms, opening new possibilities for lightweight and power-efficient actuation in robotics.
△ Less
Submitted 21 March, 2025; v1 submitted 14 January, 2025;
originally announced January 2025.
-
RadarNeXt: Real-Time and Reliable 3D Object Detector Based On 4D mmWave Imaging Radar
Authors:
Liye Jia,
Runwei Guan,
Haocheng Zhao,
Qiuchi Zhao,
Ka Lok Man,
Jeremy Smith,
Limin Yu,
Yutao Yue
Abstract:
3D object detection is crucial for Autonomous Driving (AD) and Advanced Driver Assistance Systems (ADAS). However, most 3D detectors prioritize detection accuracy, often overlooking network inference speed in practical applications. In this paper, we propose RadarNeXt, a real-time and reliable 3D object detector based on the 4D mmWave radar point clouds. It leverages the re-parameterizable neural…
▽ More
3D object detection is crucial for Autonomous Driving (AD) and Advanced Driver Assistance Systems (ADAS). However, most 3D detectors prioritize detection accuracy, often overlooking network inference speed in practical applications. In this paper, we propose RadarNeXt, a real-time and reliable 3D object detector based on the 4D mmWave radar point clouds. It leverages the re-parameterizable neural networks to catch multi-scale features, reduce memory cost and accelerate the inference. Moreover, to highlight the irregular foreground features of radar point clouds and suppress background clutter, we propose a Multi-path Deformable Foreground Enhancement Network (MDFEN), ensuring detection accuracy while minimizing the sacrifice of speed and excessive number of parameters. Experimental results on View-of-Delft and TJ4DRadSet datasets validate the exceptional performance and efficiency of RadarNeXt, achieving 50.48 and 32.30 mAPs with the variant using our proposed MDFEN. Notably, our RadarNeXt variants achieve inference speeds of over 67.10 FPS on the RTX A4000 GPU and 28.40 FPS on the Jetson AGX Orin. This research demonstrates that RadarNeXt brings a novel and effective paradigm for 3D perception based on 4D mmWave radar.
△ Less
Submitted 4 January, 2025;
originally announced January 2025.
-
A Digital twin for Diesel Engines: Operator-infused PINNs with Transfer Learning for Engine Health Monitoring
Authors:
Kamaljyoti Nath,
Varun Kumar,
Daniel J. Smith,
George Em Karniadakis
Abstract:
Improving diesel engine efficiency and emission reduction have been critical research topics. Recent government regulations have shifted this focus to another important area related to engine health and performance monitoring. Although the advancements in the use of deep learning methods for system monitoring have shown promising results in this direction, designing efficient methods suitable for…
▽ More
Improving diesel engine efficiency and emission reduction have been critical research topics. Recent government regulations have shifted this focus to another important area related to engine health and performance monitoring. Although the advancements in the use of deep learning methods for system monitoring have shown promising results in this direction, designing efficient methods suitable for field systems remains an open research challenge. The objective of this study is to develop a computationally efficient neural network-based approach for identifying unknown parameters of a mean value diesel engine model to facilitate physics-based health monitoring and maintenance forecasting. We propose a hybrid method combining physics informed neural networks, PINNs, and a deep neural operator, DeepONet to predict unknown parameters and gas flow dynamics in a diesel engine. The operator network predicts independent actuator dynamics learnt through offline training, thereby reducing the PINNs online computational cost. To address PINNs need for retraining with changing input scenarios, we propose two transfer learning (TL) strategies. The first strategy involves multi-stage transfer learning for parameter identification. While this method is computationally efficient as compared to online PINN training, improvements are required to meet field requirements. The second TL strategy focuses solely on training the output weights and biases of a subset of multi-head networks pretrained on a larger dataset, substantially reducing computation time during online prediction. We also evaluate our model for epistemic and aleatoric uncertainty by incorporating dropout in pretrained networks and Gaussian noise in the training dataset. This strategy offers a tailored, computationally inexpensive, and physics-based approach for parameter identification in diesel engine sub systems.
△ Less
Submitted 16 December, 2024;
originally announced December 2024.
-
Grounding Descriptions in Images informs Zero-Shot Visual Recognition
Authors:
Shaunak Halbe,
Junjiao Tian,
K J Joseph,
James Seale Smith,
Katherine Stevo,
Vineeth N Balasubramanian,
Zsolt Kira
Abstract:
Vision-language models (VLMs) like CLIP have been cherished for their ability to perform zero-shot visual recognition on open-vocabulary concepts. This is achieved by selecting the object category whose textual representation bears the highest similarity with the query image. While successful in some domains, this method struggles with identifying fine-grained entities as well as generalizing to u…
▽ More
Vision-language models (VLMs) like CLIP have been cherished for their ability to perform zero-shot visual recognition on open-vocabulary concepts. This is achieved by selecting the object category whose textual representation bears the highest similarity with the query image. While successful in some domains, this method struggles with identifying fine-grained entities as well as generalizing to unseen concepts that are not captured by the training distribution. Recent works attempt to mitigate these challenges by integrating category descriptions at test time, albeit yielding modest improvements. We attribute these limited gains to a fundamental misalignment between image and description representations, which is rooted in the pretraining structure of CLIP. In this paper, we propose GRAIN, a new pretraining strategy aimed at aligning representations at both fine and coarse levels simultaneously. Our approach learns to jointly ground textual descriptions in image regions along with aligning overarching captions with global image representations. To drive this pre-training, we leverage frozen Multimodal Large Language Models (MLLMs) to derive large-scale synthetic annotations. We demonstrate the enhanced zero-shot performance of our model compared to current state-of-the art methods across 11 diverse image classification datasets. Additionally, we introduce Products-2023, a newly curated, manually labeled dataset featuring novel concepts, and showcase our model's ability to recognize these concepts by benchmarking on it. Significant improvements achieved by our model on other downstream tasks like retrieval further highlight the superior quality of representations learned by our approach. Code available at https://github.com/shaunak27/grain-clip .
△ Less
Submitted 5 December, 2024;
originally announced December 2024.
-
OSU-Wing PIC Phase I Evaluation: Baseline Workload and Situation Awareness Results
Authors:
Julie A. Adams,
Christopher A. Sanchez,
Vivek Mallampati,
Joshua Bhagat Smith,
Emily Burgess,
Andrew Dassonville
Abstract:
The common theory is that human pilot's performance degrades when responsible for an increased number of uncrewed aircraft systems (UAS). This theory was developed in the early 2010's for ground robots and not highly autonomous UAS. It has been shown that increasing autonomy can mitigate some performance impacts associated with increasing the number of UAS. Overall, the Oregon State University-Win…
▽ More
The common theory is that human pilot's performance degrades when responsible for an increased number of uncrewed aircraft systems (UAS). This theory was developed in the early 2010's for ground robots and not highly autonomous UAS. It has been shown that increasing autonomy can mitigate some performance impacts associated with increasing the number of UAS. Overall, the Oregon State University-Wing collaboration seeks to understand what factors negatively impact a pilot's ability to maintain responsibility and control over an assigned set of active UAS. The Phase I evaluation establishes baseline data focused on the number of UAS and the number of nests increase. This evaluation focuses on nominal operations as well as crewed aircraft encounters and adverse weather changes. The results demonstrate that the pilots were actively engaged and had very good situation awareness. Manipulation of the conditions did not result in any significant differences in overall workload. The overall results debunk the theory that increasing the number of UAS is detrimental to pilot's performance.
△ Less
Submitted 27 November, 2024;
originally announced November 2024.
-
Towards more efficient agricultural practices via transformer-based crop type classification
Authors:
E. Ulises Moya-Sánchez,
Yazid S. Mikail,
Daisy Nyang'anyi,
Michael J. Smith,
Isabella Smythe
Abstract:
Machine learning has great potential to increase crop production and resilience to climate change. Accurate maps of where crops are grown are a key input to a number of downstream policy and research applications. In this proposal, we present preliminary work showing that it is possible to accurately classify crops from time series derived from Sentinel 1 and 2 satellite imagery in Mexico using a…
▽ More
Machine learning has great potential to increase crop production and resilience to climate change. Accurate maps of where crops are grown are a key input to a number of downstream policy and research applications. In this proposal, we present preliminary work showing that it is possible to accurately classify crops from time series derived from Sentinel 1 and 2 satellite imagery in Mexico using a pixel-based binary crop/non-crop time series transformer model. We also find preliminary evidence that meta-learning approaches supplemented with data from similar agro-ecological zones may improve model performance. Due to these promising results, we propose further development of this method with the goal of accurate multi-class crop classification in Jalisco, Mexico via meta-learning with a dataset comprising similar agro-ecological zones.
△ Less
Submitted 4 November, 2024;
originally announced November 2024.
-
Rician Channel Modelling for Super Wideband MIMO Communications
Authors:
Sachitha C. Bandara,
Peter J. Smith,
Erfan Khordad,
Robin Evans,
Rajitha Senanayake
Abstract:
Recent developments in Multiple-Input-Multiple-Output (MIMO) technology include packing a large number of antenna elements in a compact array to access the bandwidth benefits provided by higher mutual coupling (MC). The resulting super-wideband (SW) systems require a circuit-theoretic framework to handle the MC and channel models which span extremely large bands. Hence, in this paper, we make two…
▽ More
Recent developments in Multiple-Input-Multiple-Output (MIMO) technology include packing a large number of antenna elements in a compact array to access the bandwidth benefits provided by higher mutual coupling (MC). The resulting super-wideband (SW) systems require a circuit-theoretic framework to handle the MC and channel models which span extremely large bands. Hence, in this paper, we make two key contributions. First, we develop a physically-consistent Rician channel model for use with SW systems. Secondly, we express the circuit-theoretic models in terms of a standard MIMO model, so that insights into the effects of antenna layouts, MC, and bandwidth can be made using standard communication theory. For example, we show the bandwidth widening resulting from the new channel model. In addition, we show that MC distorts line-of-sight paths which has beamforming implications. We also highlight the interaction between spatial correlation and MC and show that tight coupling reduces spatial correlations at low frequencies.
△ Less
Submitted 4 November, 2024;
originally announced November 2024.
-
Birdie: Advancing State Space Models with Reward-Driven Objectives and Curricula
Authors:
Sam Blouir,
Jimmy T. H. Smith,
Antonios Anastasopoulos,
Amarda Shehu
Abstract:
Efficient state space models (SSMs), such as linear recurrent neural networks and linear attention variants, offer computational advantages over Transformers but struggle with tasks requiring long-range in-context retrieval-like text copying, associative recall, and question answering over long contexts. Previous efforts to address these challenges have focused on architectural modifications, ofte…
▽ More
Efficient state space models (SSMs), such as linear recurrent neural networks and linear attention variants, offer computational advantages over Transformers but struggle with tasks requiring long-range in-context retrieval-like text copying, associative recall, and question answering over long contexts. Previous efforts to address these challenges have focused on architectural modifications, often reintroducing computational inefficiencies. In this paper, we propose a novel training procedure, Birdie, that significantly enhances the in-context retrieval capabilities of SSMs without altering their architecture. Our approach combines bidirectional input processing with dynamic mixtures of specialized pre-training objectives, optimized via reinforcement learning. We introduce a new bidirectional SSM architecture that seamlessly transitions from bidirectional context processing to causal generation. Experimental evaluations demonstrate that Birdie markedly improves performance on retrieval-intensive tasks such as multi-number phone book lookup, long paragraph question-answering, and infilling. This narrows the performance gap with Transformers, while retaining computational efficiency. Our findings highlight the importance of training procedures in leveraging the fixed-state capacity of SSMs, offering a new direction to advance their capabilities. All code and pre-trained models are available at https://www.github.com/samblouir/birdie, with support for JAX and PyTorch.
△ Less
Submitted 21 February, 2025; v1 submitted 1 November, 2024;
originally announced November 2024.
-
AI-Guided Codesign Framework for Novel Material and Device Design applied to MTJ-based True Random Number Generators
Authors:
Karan P. Patel,
Andrew Maicke,
Jared Arzate,
Jaesuk Kwon,
J. Darby Smith,
James B. Aimone,
Jean Anne C. Incorvia,
Suma G. Cardwell,
Catherine D. Schuman
Abstract:
Novel devices and novel computing paradigms are key for energy efficient, performant future computing systems. However, designing devices for new applications is often time consuming and tedious. Here, we investigate the design and optimization of spin orbit torque and spin transfer torque magnetic tunnel junction models as the probabilistic devices for true random number generation. We leverage r…
▽ More
Novel devices and novel computing paradigms are key for energy efficient, performant future computing systems. However, designing devices for new applications is often time consuming and tedious. Here, we investigate the design and optimization of spin orbit torque and spin transfer torque magnetic tunnel junction models as the probabilistic devices for true random number generation. We leverage reinforcement learning and evolutionary optimization to vary key device and material properties of the various device models for stochastic operation. Our AI guided codesign methods generated different candidate devices capable of generating stochastic samples for a desired probability distribution, while also minimizing energy usage for the devices.
△ Less
Submitted 1 November, 2024;
originally announced November 2024.
-
Toward path-invariant embeddings for local distance source characterization
Authors:
Lisa Linville,
Chengping Chai,
Nathan Marthindale,
Jacob Smith,
Scott Stewart,
Asmeret Naugle
Abstract:
This work builds on recent advances in foundation models in the language and image domains to explore similar approaches for seismic source characterization. We rely on an architecture called Barlow Twins, borrowed from an understanding of the human visual cortical system and originally envisioned for the image domain and adapt it for learning path invariance in seismic event time series. Our mode…
▽ More
This work builds on recent advances in foundation models in the language and image domains to explore similar approaches for seismic source characterization. We rely on an architecture called Barlow Twins, borrowed from an understanding of the human visual cortical system and originally envisioned for the image domain and adapt it for learning path invariance in seismic event time series. Our model improves the performance on event characterization tasks such as source discrimination across catalogs by 10-12% and provides more reliable predictive uncertainty estimates. We suggest that dataset scale and diversity more than architecture may determine aspects of the current ceiling on performance. We leverage decision trees, linear models, and visualization to understanding the dependencies in learned representations.
△ Less
Submitted 23 October, 2024;
originally announced October 2024.
-
Scalable Differential Privacy Mechanisms for Real-Time Machine Learning Applications
Authors:
Jessica Smith,
David Williams,
Emily Brown
Abstract:
Large language models (LLMs) are increasingly integrated into real-time machine learning applications, where safeguarding user privacy is paramount. Traditional differential privacy mechanisms often struggle to balance privacy and accuracy, particularly in fast-changing environments with continuously flowing data. To address these issues, we introduce Scalable Differential Privacy (SDP), a framewo…
▽ More
Large language models (LLMs) are increasingly integrated into real-time machine learning applications, where safeguarding user privacy is paramount. Traditional differential privacy mechanisms often struggle to balance privacy and accuracy, particularly in fast-changing environments with continuously flowing data. To address these issues, we introduce Scalable Differential Privacy (SDP), a framework tailored for real-time machine learning that emphasizes both robust privacy guarantees and enhanced model performance. SDP employs a hierarchical architecture to facilitate efficient noise aggregation across various learning agents. By integrating adaptive noise scheduling and gradient compression methods, our approach minimizes performance degradation while ensuring significant privacy protection. Extensive experiments on diverse datasets reveal that SDP maintains high accuracy levels while applying differential privacy effectively, showcasing its suitability for deployment in sensitive domains. This advancement points towards the potential for widespread adoption of privacy-preserving techniques in machine learning workflows.
△ Less
Submitted 16 September, 2024;
originally announced October 2024.
-
OptiGrasp: Optimized Grasp Pose Detection Using RGB Images for Warehouse Picking Robots
Authors:
Soofiyan Atar,
Yi Li,
Markus Grotz,
Michael Wolf,
Dieter Fox,
Joshua Smith
Abstract:
In warehouse environments, robots require robust picking capabilities to manage a wide variety of objects. Effective deployment demands minimal hardware, strong generalization to new products, and resilience in diverse settings. Current methods often rely on depth sensors for structural information, which suffer from high costs, complex setups, and technical limitations. Inspired by recent advance…
▽ More
In warehouse environments, robots require robust picking capabilities to manage a wide variety of objects. Effective deployment demands minimal hardware, strong generalization to new products, and resilience in diverse settings. Current methods often rely on depth sensors for structural information, which suffer from high costs, complex setups, and technical limitations. Inspired by recent advancements in computer vision, we propose an innovative approach that leverages foundation models to enhance suction grasping using only RGB images. Trained solely on a synthetic dataset, our method generalizes its grasp prediction capabilities to real-world robots and a diverse range of novel objects not included in the training set. Our network achieves an 82.3\% success rate in real-world applications. The project website with code and data will be available at http://optigrasp.github.io.
△ Less
Submitted 28 September, 2024;
originally announced September 2024.
-
Robust estimation of the intrinsic dimension of data sets with quantum cognition machine learning
Authors:
Luca Candelori,
Alexander G. Abanov,
Jeffrey Berger,
Cameron J. Hogan,
Vahagn Kirakosyan,
Kharen Musaelian,
Ryan Samson,
James E. T. Smith,
Dario Villani,
Martin T. Wells,
Mengjia Xu
Abstract:
We propose a new data representation method based on Quantum Cognition Machine Learning and apply it to manifold learning, specifically to the estimation of intrinsic dimension of data sets. The idea is to learn a representation of each data point as a quantum state, encoding both local properties of the point as well as its relation with the entire data. Inspired by ideas from quantum geometry, w…
▽ More
We propose a new data representation method based on Quantum Cognition Machine Learning and apply it to manifold learning, specifically to the estimation of intrinsic dimension of data sets. The idea is to learn a representation of each data point as a quantum state, encoding both local properties of the point as well as its relation with the entire data. Inspired by ideas from quantum geometry, we then construct from the quantum states a point cloud equipped with a quantum metric. The metric exhibits a spectral gap whose location corresponds to the intrinsic dimension of the data. The proposed estimator is based on the detection of this spectral gap. When tested on synthetic manifold benchmarks, our estimates are shown to be robust with respect to the introduction of point-wise Gaussian noise. This is in contrast to current state-of-the-art estimators, which tend to attribute artificial ``shadow dimensions'' to noise artifacts, leading to overestimates. This is a significant advantage when dealing with real data sets, which are inevitably affected by unknown levels of noise. We show the applicability and robustness of our method on real data, by testing it on the ISOMAP face database, MNIST, and the Wisconsin Breast Cancer Dataset.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
Astrometric Binary Classification Via Artificial Neural Networks
Authors:
Joe Smith
Abstract:
With nearly two billion stars observed and their corresponding astrometric parameters evaluated in the recent Gaia mission, the number of astrometric binary candidates have risen significantly. Due to the surplus of astrometric data, the current computational methods employed to inspect these astrometric binary candidates are both computationally expensive and cannot be executed in a reasonable ti…
▽ More
With nearly two billion stars observed and their corresponding astrometric parameters evaluated in the recent Gaia mission, the number of astrometric binary candidates have risen significantly. Due to the surplus of astrometric data, the current computational methods employed to inspect these astrometric binary candidates are both computationally expensive and cannot be executed in a reasonable time frame. In light of this, a machine learning (ML) technique to automatically classify whether a set of stars belong to an astrometric binary pair via an artificial neural network (ANN) is proposed. Using data from Gaia DR3, the ANN was trained and tested on 1.5 million highly probable true and visual binaries, considering the proper motions, parallaxes, and angular and physical separations as features. The ANN achieves high classification scores, with an accuracy of 99.3%, a precision rate of 0.988, a recall rate of 0.991, and an AUC of 0.999, indicating that the utilized ML technique is a highly effective method for classifying astrometric binaries. Thus, the proposed ANN is a promising alternative to the existing methods for the classification of astrometric binaries.
△ Less
Submitted 14 September, 2024;
originally announced September 2024.
-
An Augmentation-based Model Re-adaptation Framework for Robust Image Segmentation
Authors:
Zheming Zuo,
Joseph Smith,
Jonathan Stonehouse,
Boguslaw Obara
Abstract:
Image segmentation is a crucial task in computer vision, with wide-ranging applications in industry. The Segment Anything Model (SAM) has recently attracted intensive attention; however, its application in industrial inspection, particularly for segmenting commercial anti-counterfeit codes, remains challenging. Unlike open-source datasets, industrial settings often face issues such as small sample…
▽ More
Image segmentation is a crucial task in computer vision, with wide-ranging applications in industry. The Segment Anything Model (SAM) has recently attracted intensive attention; however, its application in industrial inspection, particularly for segmenting commercial anti-counterfeit codes, remains challenging. Unlike open-source datasets, industrial settings often face issues such as small sample sizes and complex textures. Additionally, computational cost is a key concern due to the varying number of trainable parameters. To address these challenges, we propose an Augmentation-based Model Re-adaptation Framework (AMRF). This framework leverages data augmentation techniques during training to enhance the generalisation of segmentation models, allowing them to adapt to newly released datasets with temporal disparity. By observing segmentation masks from conventional models (FCN and U-Net) and a pre-trained SAM model, we determine a minimal augmentation set that optimally balances training efficiency and model performance. Our results demonstrate that the fine-tuned FCN surpasses its baseline by 3.29% and 3.02% in cropping accuracy, and 5.27% and 4.04% in classification accuracy on two temporally continuous datasets. Similarly, the fine-tuned U-Net improves upon its baseline by 7.34% and 4.94% in cropping, and 8.02% and 5.52% in classification. Both models outperform the top-performing SAM models (ViT-Large and ViT-Base) by an average of 11.75% and 9.01% in cropping accuracy, and 2.93% and 4.83% in classification accuracy, respectively.
△ Less
Submitted 14 September, 2024;
originally announced September 2024.
-
Dynamic Bayesian Networks, Elicitation and Data Embedding for Secure Environments
Authors:
Kieran Drury,
Jim Q. Smith
Abstract:
Serious crime modelling typically needs to be undertaken securely behind a firewall where police knowledge and capabilities can remain undisclosed. Data informing an ongoing incident is often sparse, with a large proportion of relevant data only coming to light after the incident culminates or after police intervene - by which point it is too late to make use of the data to aid real-time decision…
▽ More
Serious crime modelling typically needs to be undertaken securely behind a firewall where police knowledge and capabilities can remain undisclosed. Data informing an ongoing incident is often sparse, with a large proportion of relevant data only coming to light after the incident culminates or after police intervene - by which point it is too late to make use of the data to aid real-time decision making for the incident in question. Much of the data that is available to police to support real-time decision making is highly confidential so cannot be shared with academics, and is therefore missing to them. In this paper, we describe the development of a formal protocol where a graphical model is used as a framework for securely translating a model designed by an academic team to a model for use by a police team. We then show, for the first time, how libraries of these models can be built and used for real-time decision support to circumvent the challenges of data missingness and tardiness seen in such a secure environment. The parallel development described by this protocol ensures that any sensitive information collected by police, and missing to academics, remains secured behind a firewall. The protocol nevertheless guides police so that they are able to combine the typically incomplete data streams that are open source with their more sensitive information in a formal and justifiable way. We illustrate the application of this protocol by describing how a new entry - a suspected vehicle attack - can be embedded into such a police library of criminal plots.
△ Less
Submitted 11 September, 2024;
originally announced September 2024.
-
SymPAC: Scalable Symbolic Music Generation With Prompts And Constraints
Authors:
Haonan Chen,
Jordan B. L. Smith,
Janne Spijkervet,
Ju-Chiang Wang,
Pei Zou,
Bochen Li,
Qiuqiang Kong,
Xingjian Du
Abstract:
Progress in the task of symbolic music generation may be lagging behind other tasks like audio and text generation, in part because of the scarcity of symbolic training data. In this paper, we leverage the greater scale of audio music data by applying pre-trained MIR models (for transcription, beat tracking, structure analysis, etc.) to extract symbolic events and encode them into token sequences.…
▽ More
Progress in the task of symbolic music generation may be lagging behind other tasks like audio and text generation, in part because of the scarcity of symbolic training data. In this paper, we leverage the greater scale of audio music data by applying pre-trained MIR models (for transcription, beat tracking, structure analysis, etc.) to extract symbolic events and encode them into token sequences. To the best of our knowledge, this work is the first to demonstrate the feasibility of training symbolic generation models solely from auto-transcribed audio data. Furthermore, to enhance the controllability of the trained model, we introduce SymPAC (Symbolic Music Language Model with Prompting And Constrained Generation), which is distinguished by using (a) prompt bars in encoding and (b) a technique called Constrained Generation via Finite State Machines (FSMs) during inference time. We show the flexibility and controllability of this approach, which may be critical in making music AI useful to creators and users.
△ Less
Submitted 9 September, 2024; v1 submitted 4 September, 2024;
originally announced September 2024.
-
SGP-RI: A Real-Time-Trainable and Decentralized IoT Indoor Localization Model Based on Sparse Gaussian Process with Reduced-Dimensional Inputs
Authors:
Zhe Tang,
Sihao Li,
Zichen Huang,
Guandong Yang,
Kyeong Soo Kim,
Jeremy S. Smith
Abstract:
Internet of Things (IoT) devices are deployed in the filed, there is an enormous amount of untapped potential in local computing on those IoT devices. Harnessing this potential for indoor localization, therefore, becomes an exciting research area. Conventionally, the training and deployment of indoor localization models are based on centralized servers with substantial computational resources. Thi…
▽ More
Internet of Things (IoT) devices are deployed in the filed, there is an enormous amount of untapped potential in local computing on those IoT devices. Harnessing this potential for indoor localization, therefore, becomes an exciting research area. Conventionally, the training and deployment of indoor localization models are based on centralized servers with substantial computational resources. This centralized approach faces several challenges, including the database's inability to accommodate the dynamic and unpredictable nature of the indoor electromagnetic environment, the model retraining costs, and the susceptibility of centralized servers to security breaches. To mitigate these challenges we aim to amalgamate the offline and online phases of traditional indoor localization methods using a real-time-trainable and decentralized IoT indoor localization model based on Sparse Gaussian Process with Reduced-dimensional Inputs (SGP-RI), where the number and dimension of the input data are reduced through reference point and wireless access point filtering, respectively. The experimental results based on a multi-building and multi-floor static database as well as a single-building and single-floor dynamic database, demonstrate that the proposed SGP-RI model with less than half the training samples as inducing inputs can produce comparable localization performance to the standard Gaussian Process model with the whole training samples. The SGP-RI model enables the decentralization of indoor localization, facilitating its deployment to resource-constrained IoT devices, and thereby could provide enhanced security and privacy, reduced costs, and network dependency. Also, the model's capability of real-time training makes it possible to quickly adapt to the time-varying indoor electromagnetic environment.
△ Less
Submitted 24 August, 2024;
originally announced September 2024.
-
NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar
Authors:
Runwei Guan,
Jianan Liu,
Liye Jia,
Haocheng Zhao,
Shanliang Yao,
Xiaohui Zhu,
Ka Lok Man,
Eng Gee Lim,
Jeremy Smith,
Yutao Yue
Abstract:
Recently, visual grounding and multi-sensors setting have been incorporated into perception system for terrestrial autonomous driving systems and Unmanned Surface Vehicles (USVs), yet the high complexity of modern learning-based visual grounding model using multi-sensors prevents such model to be deployed on USVs in the real-life. To this end, we design a low-power multi-task model named NanoMVG f…
▽ More
Recently, visual grounding and multi-sensors setting have been incorporated into perception system for terrestrial autonomous driving systems and Unmanned Surface Vehicles (USVs), yet the high complexity of modern learning-based visual grounding model using multi-sensors prevents such model to be deployed on USVs in the real-life. To this end, we design a low-power multi-task model named NanoMVG for waterway embodied perception, guiding both camera and 4D millimeter-wave radar to locate specific object(s) through natural language. NanoMVG can perform both box-level and mask-level visual grounding tasks simultaneously. Compared to other visual grounding models, NanoMVG achieves highly competitive performance on the WaterVG dataset, particularly in harsh environments and boasts ultra-low power consumption for long endurance.
△ Less
Submitted 11 February, 2025; v1 submitted 30 August, 2024;
originally announced August 2024.
-
Turbulence Strength $C_n^2$ Estimation from Video using Physics-based Deep Learning
Authors:
Ripon Kumar Saha,
Esen Salcin,
Jihoo Kim,
Joseph Smith,
Suren Jayasuriya
Abstract:
Images captured from a long distance suffer from dynamic image distortion due to turbulent flow of air cells with random temperatures, and thus refractive indices. This phenomenon, known as image dancing, is commonly characterized by its refractive-index structure constant $C_n^2$ as a measure of the turbulence strength. For many applications such as atmospheric forecast model, long-range/astronom…
▽ More
Images captured from a long distance suffer from dynamic image distortion due to turbulent flow of air cells with random temperatures, and thus refractive indices. This phenomenon, known as image dancing, is commonly characterized by its refractive-index structure constant $C_n^2$ as a measure of the turbulence strength. For many applications such as atmospheric forecast model, long-range/astronomy imaging, and aviation safety, optical communication technology, $C_n^2$ estimation is critical for accurately sensing the turbulent environment. Previous methods for $C_n^2$ estimation include estimation from meteorological data (temperature, relative humidity, wind shear, etc.) for single-point measurements, two-ended pathlength measurements from optical scintillometer for path-averaged $C_n^2$, and more recently estimating $C_n^2$ from passive video cameras for low cost and hardware complexity. In this paper, we present a comparative analysis of classical image gradient methods for $C_n^2$ estimation and modern deep learning-based methods leveraging convolutional neural networks. To enable this, we collect a dataset of video capture along with reference scintillometer measurements for ground truth, and we release this unique dataset to the scientific community. We observe that deep learning methods can achieve higher accuracy when trained on similar data, but suffer from generalization errors to other, unseen imagery as compared to classical methods. To overcome this trade-off, we present a novel physics-based network architecture that combines learned convolutional layers with a differentiable image gradient method that maintains high accuracy while being generalizable across image datasets.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
Predictive Anchoring: A Novel Interaction to Support Contextualized Suggestions for Grid Displays
Authors:
Cynthia Zastudil,
Christine Holyfield,
June A. Smith,
Hannah Vy Nguyen,
Stephen MacNeil
Abstract:
Grid displays are the most common form of augmentative and alternative communication device recommended by speech-language pathologists for children. Grid displays present a large variety of vocabulary which can be beneficial for a users' language development. However, the extensive navigation and cognitive overhead required of users of grid displays can negatively impact users' ability to activel…
▽ More
Grid displays are the most common form of augmentative and alternative communication device recommended by speech-language pathologists for children. Grid displays present a large variety of vocabulary which can be beneficial for a users' language development. However, the extensive navigation and cognitive overhead required of users of grid displays can negatively impact users' ability to actively participate in social interactions, which is an important factor of their language development. We present a novel interaction technique for grid displays, Predictive Anchoring, based on user interaction theory and language development theory. Our design is informed by existing literature in AAC research, presented in the form of a set of design goals and a preliminary design sketch. Future work in user studies and interaction design are also discussed.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
MoDeGPT: Modular Decomposition for Large Language Model Compression
Authors:
Chi-Heng Lin,
Shangqian Gao,
James Seale Smith,
Abhishek Patel,
Shikhar Tuli,
Yilin Shen,
Hongxia Jin,
Yen-Chang Hsu
Abstract:
Large Language Models (LLMs) have reshaped the landscape of artificial intelligence by demonstrating exceptional performance across various tasks. However, substantial computational requirements make their deployment challenging on devices with limited resources. Recently, compression methods using low-rank matrix techniques have shown promise, yet these often lead to degraded accuracy or introduc…
▽ More
Large Language Models (LLMs) have reshaped the landscape of artificial intelligence by demonstrating exceptional performance across various tasks. However, substantial computational requirements make their deployment challenging on devices with limited resources. Recently, compression methods using low-rank matrix techniques have shown promise, yet these often lead to degraded accuracy or introduce significant overhead in parameters and inference latency. This paper introduces \textbf{Mo}dular \textbf{De}composition (MoDeGPT), a novel structured compression framework that does not need recovery fine-tuning while resolving the above drawbacks. MoDeGPT partitions the Transformer block into modules comprised of matrix pairs and reduces the hidden dimensions via reconstructing the module-level outputs. MoDeGPT is developed based on a theoretical framework that utilizes three well-established matrix decomposition algorithms -- Nyström approximation, CR decomposition, and SVD -- and applies them to our redefined transformer modules. Our comprehensive experiments show MoDeGPT, without backward propagation, matches or surpasses previous structured compression methods that rely on gradient information, and saves 98% of compute costs on compressing a 13B model. On \textsc{Llama}-2/3 and OPT models, MoDeGPT maintains 90-95% zero-shot performance with 25-30% compression rates. Moreover, the compression can be done on a single GPU within a few hours and increases the inference throughput by up to 46%.
△ Less
Submitted 2 May, 2025; v1 submitted 18 August, 2024;
originally announced August 2024.
-
Cellular Plasticity Model for Bottom-Up Robotic Design
Authors:
Trevor R. Smith,
Thomas J. Smith,
Nicholas S. Szczecinski,
Sergiy Yakovenko,
Yu Gu
Abstract:
Traditional top-down robotic design often lacks the adaptability needed to handle real-world complexities, prompting the need for more flexible approaches. Therefore, this study introduces a novel cellular plasticity model tailored for bottom-up robotic design. The proposed model utilizes an activator-inhibitor reaction, a common foundation of Turing patterns, which are fundamental in morphogenesi…
▽ More
Traditional top-down robotic design often lacks the adaptability needed to handle real-world complexities, prompting the need for more flexible approaches. Therefore, this study introduces a novel cellular plasticity model tailored for bottom-up robotic design. The proposed model utilizes an activator-inhibitor reaction, a common foundation of Turing patterns, which are fundamental in morphogenesis -- the emergence of form from simple interactions. Turing patterns describe how diffusion and interactions between two chemical substances-an activator and an inhibitor-can lead to complex patterns and structures, such as the formation of limbs and feathers. Our study extends this concept by modeling cellular plasticity as an activator-inhibitor reaction augmented with environmental stimuli, encapsulating the core phenomena observed across various cell types: stem cells, neurons, and muscle cells. In addition to demonstrating self-regulation and self-containment, this approach ensures that a robot's form and function are direct emergent responses to its environment without a comprehensive environmental model. In the proposed model, a factory acts as the activator, producing a product that serves as the inhibitor, which is then influenced by environmental stimuli through consumption. These components are regulated by cellular plasticity phenomena as feedback loops. We calculate the equilibrium points of the model and the stability criterion. Simulations examine how varying parameters affect the system's transient behavior and the impact of competing functions on its functional capacity. Results show the model converges to a single stable equilibrium tuned to the environmental stimulation. Such dynamic behavior underscores the model's utility for generating predictable responses within robotics and biological systems, showcasing its potential for navigating the complexities of adaptive systems.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
pathfinder: A Semantic Framework for Literature Review and Knowledge Discovery in Astronomy
Authors:
Kartheik G. Iyer,
Mikaeel Yunus,
Charles O'Neill,
Christine Ye,
Alina Hyk,
Kiera McCormick,
Ioana Ciuca,
John F. Wu,
Alberto Accomazzi,
Simone Astarita,
Rishabh Chakrabarty,
Jesse Cranney,
Anjalie Field,
Tirthankar Ghosal,
Michele Ginolfi,
Marc Huertas-Company,
Maja Jablonska,
Sandor Kruk,
Huiling Liu,
Gabriel Marchidan,
Rohit Mistry,
J. P. Naiman,
J. E. G. Peek,
Mugdha Polimera,
Sergio J. Rodriguez
, et al. (5 additional authors not shown)
Abstract:
The exponential growth of astronomical literature poses significant challenges for researchers navigating and synthesizing general insights or even domain-specific knowledge. We present Pathfinder, a machine learning framework designed to enable literature review and knowledge discovery in astronomy, focusing on semantic searching with natural language instead of syntactic searches with keywords.…
▽ More
The exponential growth of astronomical literature poses significant challenges for researchers navigating and synthesizing general insights or even domain-specific knowledge. We present Pathfinder, a machine learning framework designed to enable literature review and knowledge discovery in astronomy, focusing on semantic searching with natural language instead of syntactic searches with keywords. Utilizing state-of-the-art large language models (LLMs) and a corpus of 350,000 peer-reviewed papers from the Astrophysics Data System (ADS), Pathfinder offers an innovative approach to scientific inquiry and literature exploration. Our framework couples advanced retrieval techniques with LLM-based synthesis to search astronomical literature by semantic context as a complement to currently existing methods that use keywords or citation graphs. It addresses complexities of jargon, named entities, and temporal aspects through time-based and citation-based weighting schemes. We demonstrate the tool's versatility through case studies, showcasing its application in various research scenarios. The system's performance is evaluated using custom benchmarks, including single-paper and multi-paper tasks. Beyond literature review, Pathfinder offers unique capabilities for reformatting answers in ways that are accessible to various audiences (e.g. in a different language or as simplified text), visualizing research landscapes, and tracking the impact of observatories and methodologies. This tool represents a significant advancement in applying AI to astronomical research, aiding researchers at all career stages in navigating modern astronomy literature.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
Towards Scalable and Stable Parallelization of Nonlinear RNNs
Authors:
Xavier Gonzalez,
Andrew Warrington,
Jimmy T. H. Smith,
Scott W. Linderman
Abstract:
Transformers and linear state space models can be evaluated in parallel on modern hardware, but evaluating nonlinear RNNs appears to be an inherently sequential problem. Recently, however, Lim et al. '24 developed an approach called DEER, which evaluates nonlinear RNNs in parallel by posing the states as the solution to a fixed-point problem. They derived a parallel form of Newton's method to solv…
▽ More
Transformers and linear state space models can be evaluated in parallel on modern hardware, but evaluating nonlinear RNNs appears to be an inherently sequential problem. Recently, however, Lim et al. '24 developed an approach called DEER, which evaluates nonlinear RNNs in parallel by posing the states as the solution to a fixed-point problem. They derived a parallel form of Newton's method to solve the fixed-point problem and achieved significant speedups over sequential evaluation. However, the computational complexity of DEER is cubic in the state size, and the algorithm can suffer from numerical instability. We address these limitations with two novel contributions. To reduce the computational complexity, we apply quasi-Newton approximations and show they converge comparably to Newton, use less memory, and are faster. To stabilize DEER, we leverage a connection between the Levenberg-Marquardt algorithm and Kalman smoothing, which we call ELK. This connection allows us to stabilize Newton's method while using efficient parallelized Kalman smoothing algorithms to retain performance. Through several experiments, we show that these innovations allow for parallel evaluation of nonlinear RNNs at larger scales and with greater stability.
△ Less
Submitted 15 January, 2025; v1 submitted 26 July, 2024;
originally announced July 2024.
-
Mean Teacher based SSL Framework for Indoor Localization Using Wi-Fi RSSI Fingerprinting
Authors:
Sihao Li,
Zhe Tang,
Kyeong Soo Kim,
Jeremy S. Smith
Abstract:
Wi-Fi fingerprinting is widely applied for indoor localization due to the widespread availability of Wi-Fi devices. However, traditional methods are not ideal for multi-building and multi-floor environments due to the scalability issues. Therefore, more and more researchers have employed deep learning techniques to enable scalable indoor localization. This paper introduces a novel semi-supervised…
▽ More
Wi-Fi fingerprinting is widely applied for indoor localization due to the widespread availability of Wi-Fi devices. However, traditional methods are not ideal for multi-building and multi-floor environments due to the scalability issues. Therefore, more and more researchers have employed deep learning techniques to enable scalable indoor localization. This paper introduces a novel semi-supervised learning framework for neural networks based on wireless access point selection, noise injection, and Mean Teacher model, which leverages unlabeled fingerprints to enhance localization performance. The proposed framework can manage hybrid in/outsourcing and voluntarily contributed databases and continually expand the fingerprint database with newly submitted unlabeled fingerprints during service. The viability of the proposed framework was examined using two established deep-learning models with the UJIIndoorLoc database. The experimental results suggest that the proposed framework significantly improves localization performance compared to the supervised learning-based approach in terms of floor-level coordinate estimation using EvAAL metric. It shows enhancements up to 10.99% and 8.98% in the former scenario and 4.25% and 9.35% in the latter, respectively with additional studies highlight the importance of the essential components of the proposed framework.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
Hierarchical Stage-Wise Training of Linked Deep Neural Networks for Multi-Building and Multi-Floor Indoor Localization Based on Wi-Fi RSSI Fingerprinting
Authors:
Sihao Li,
Kyeong Soo Kim,
Zhe Tang,
Graduate,
Jeremy S. Smith
Abstract:
In this paper, we present a new solution to the problem of large-scale multi-building and multi-floor indoor localization based on linked neural networks, where each neural network is dedicated to a sub-problem and trained under a hierarchical stage-wise training framework. When the measured data from sensors have a hierarchical representation as in multi-building and multi-floor indoor localizati…
▽ More
In this paper, we present a new solution to the problem of large-scale multi-building and multi-floor indoor localization based on linked neural networks, where each neural network is dedicated to a sub-problem and trained under a hierarchical stage-wise training framework. When the measured data from sensors have a hierarchical representation as in multi-building and multi-floor indoor localization, it is important to exploit the hierarchical nature in data processing to provide a scalable solution. In this regard, the hierarchical stage-wise training framework extends the original stage-wise training framework to the case of multiple linked networks by training a lower-hierarchy network based on the prior knowledge gained from the training of higher-hierarchy networks. The experimental results with the publicly-available UJIIndoorLoc multi-building and multi-floor Wi-Fi RSSI fingerprint database demonstrate that the linked neural networks trained under the proposed hierarchical stage-wise training framework can achieve a three-dimensional localization error of 8.19 m, which, to the best of the authors' knowledge, is the most accurate result ever obtained for neural network-based models trained and evaluated with the full datasets of the UJIIndoorLoc database, and that, when applied to a model based on hierarchical convolutional neural networks, the proposed training framework can also significantly reduce the three-dimensional localization error from 11.78 m to 8.71 m.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
Towards a theory of learning dynamics in deep state space models
Authors:
Jakub Smékal,
Jimmy T. H. Smith,
Michael Kleinman,
Dan Biderman,
Scott W. Linderman
Abstract:
State space models (SSMs) have shown remarkable empirical performance on many long sequence modeling tasks, but a theoretical understanding of these models is still lacking. In this work, we study the learning dynamics of linear SSMs to understand how covariance structure in data, latent state size, and initialization affect the evolution of parameters throughout learning with gradient descent. We…
▽ More
State space models (SSMs) have shown remarkable empirical performance on many long sequence modeling tasks, but a theoretical understanding of these models is still lacking. In this work, we study the learning dynamics of linear SSMs to understand how covariance structure in data, latent state size, and initialization affect the evolution of parameters throughout learning with gradient descent. We show that focusing on the learning dynamics in the frequency domain affords analytical solutions under mild assumptions, and we establish a link between one-dimensional SSMs and the dynamics of deep linear feed-forward networks. Finally, we analyze how latent state over-parameterization affects convergence time and describe future work in extending our results to the study of deep SSMs with nonlinear connections. This work is a step toward a theory of learning dynamics in deep state space models.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
The diameter of a stochastic matrix: A new measure for sensitivity analysis in Bayesian networks
Authors:
Manuele Leonelli,
Jim Q. Smith,
Sophia K. Wright
Abstract:
Bayesian networks are one of the most widely used classes of probabilistic models for risk management and decision support because of their interpretability and flexibility in including heterogeneous pieces of information. In any applied modelling, it is critical to assess how robust the inferences on certain target variables are to changes in the model. In Bayesian networks, these analyses fall u…
▽ More
Bayesian networks are one of the most widely used classes of probabilistic models for risk management and decision support because of their interpretability and flexibility in including heterogeneous pieces of information. In any applied modelling, it is critical to assess how robust the inferences on certain target variables are to changes in the model. In Bayesian networks, these analyses fall under the umbrella of sensitivity analysis, which is most commonly carried out by quantifying dissimilarities using Kullback-Leibler information measures. In this paper, we argue that robustness methods based instead on the familiar total variation distance provide simple and more valuable bounds on robustness to misspecification, which are both formally justifiable and transparent. We introduce a novel measure of dependence in conditional probability tables called the diameter to derive such bounds. This measure quantifies the strength of dependence between a variable and its parents. We demonstrate how such formal robustness considerations can be embedded in building a Bayesian network.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Use of a Multiscale Vision Transformer to predict Nursing Activities Score from Low Resolution Thermal Videos in an Intensive Care Unit
Authors:
Isaac YL Lee,
Thanh Nguyen-Duc,
Ryo Ueno,
Jesse Smith,
Peter Y Chan
Abstract:
Excessive caregiver workload in hospital nurses has been implicated in poorer patient care and increased worker burnout. Measurement of this workload in the Intensive Care Unit (ICU) is often done using the Nursing Activities Score (NAS), but this is usually recorded manually and sporadically. Previous work has made use of Ambient Intelligence (AmI) by using computer vision to passively derive car…
▽ More
Excessive caregiver workload in hospital nurses has been implicated in poorer patient care and increased worker burnout. Measurement of this workload in the Intensive Care Unit (ICU) is often done using the Nursing Activities Score (NAS), but this is usually recorded manually and sporadically. Previous work has made use of Ambient Intelligence (AmI) by using computer vision to passively derive caregiver-patient interaction times to monitor staff workload. In this letter, we propose using a Multiscale Vision Transformer (MViT) to passively predict the NAS from low-resolution thermal videos recorded in an ICU. 458 videos were obtained from an ICU in Melbourne, Australia and used to train a MViTv2 model using an indirect prediction and a direct prediction method. The indirect method predicted 1 of 8 potentially identifiable NAS activities from the video before inferring the NAS. The direct method predicted the NAS score immediately from the video. The indirect method yielded an average 5-fold accuracy of 57.21%, an area under the receiver operating characteristic curve (ROC AUC) of 0.865, a F1 score of 0.570 and a mean squared error (MSE) of 28.16. The direct method yielded a MSE of 18.16. We also showed that the MViTv2 outperforms similar models such as R(2+1)D and ResNet50-LSTM under identical settings.
This study shows the feasibility of using a MViTv2 to passively predict the NAS in an ICU and monitor staff workload automatically. Our results above also show an increased accuracy in predicting NAS directly versus predicting NAS indirectly. We hope that our study can provide a direction for future work and further improve the accuracy of passive NAS monitoring.
△ Less
Submitted 30 May, 2024;
originally announced June 2024.
-
AstroPT: Scaling Large Observation Models for Astronomy
Authors:
Michael J. Smith,
Ryan J. Roberts,
Eirini Angeloudi,
Marc Huertas-Company
Abstract:
This work presents AstroPT, an autoregressive pretrained transformer developed with astronomical use-cases in mind. The AstroPT models presented here have been pretrained on 8.6 million $512 \times 512$ pixel $grz$-band galaxy postage stamp observations from the DESI Legacy Survey DR8. We train a selection of foundation models of increasing size from 1 million to 2.1 billion parameters, and find t…
▽ More
This work presents AstroPT, an autoregressive pretrained transformer developed with astronomical use-cases in mind. The AstroPT models presented here have been pretrained on 8.6 million $512 \times 512$ pixel $grz$-band galaxy postage stamp observations from the DESI Legacy Survey DR8. We train a selection of foundation models of increasing size from 1 million to 2.1 billion parameters, and find that AstroPT follows a similar saturating log-log scaling law to textual models. We also find that the models' performances on downstream tasks as measured by linear probing improves with model size up to the model parameter saturation point. We believe that collaborative community development paves the best route towards realising an open source `Large Observation Model' -- a model trained on data taken from the observational sciences at the scale seen in natural language processing. To this end, we release the source code, weights, and dataset for AstroPT under the MIT license, and invite potential collaborators to join us in collectively building and researching these models.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression Comprehension
Authors:
Runwei Guan,
Ruixiao Zhang,
Ningwei Ouyang,
Jianan Liu,
Ka Lok Man,
Xiaohao Cai,
Ming Xu,
Jeremy Smith,
Eng Gee Lim,
Yutao Yue,
Hui Xiong
Abstract:
Embodied perception is essential for intelligent vehicles and robots in interactive environmental understanding. However, these advancements primarily focus on vision, with limited attention given to using 3D modeling sensors, restricting a comprehensive understanding of objects in response to prompts containing qualitative and quantitative queries. Recently, as a promising automotive sensor with…
▽ More
Embodied perception is essential for intelligent vehicles and robots in interactive environmental understanding. However, these advancements primarily focus on vision, with limited attention given to using 3D modeling sensors, restricting a comprehensive understanding of objects in response to prompts containing qualitative and quantitative queries. Recently, as a promising automotive sensor with affordable cost, 4D millimeter-wave radars provide denser point clouds than conventional radars and perceive both semantic and physical characteristics of objects, thereby enhancing the reliability of perception systems. To foster the development of natural language-driven context understanding in radar scenes for 3D visual grounding, we construct the first dataset, Talk2Radar, which bridges these two modalities for 3D Referring Expression Comprehension (REC). Talk2Radar contains 8,682 referring prompt samples with 20,558 referred objects. Moreover, we propose a novel model, T-RadarNet, for 3D REC on point clouds, achieving State-Of-The-Art (SOTA) performance on the Talk2Radar dataset compared to counterparts. Deformable-FPN and Gated Graph Fusion are meticulously designed for efficient point cloud feature modeling and cross-modal fusion between radar and text features, respectively. Comprehensive experiments provide deep insights into radar-based 3D REC. We release our project at https://github.com/GuanRunwei/Talk2Radar.
△ Less
Submitted 9 February, 2025; v1 submitted 21 May, 2024;
originally announced May 2024.
-
State-Free Inference of State-Space Models: The Transfer Function Approach
Authors:
Rom N. Parnichkun,
Stefano Massaroli,
Alessandro Moro,
Jimmy T. H. Smith,
Ramin Hasani,
Mathias Lechner,
Qi An,
Christopher Ré,
Hajime Asama,
Stefano Ermon,
Taiji Suzuki,
Atsushi Yamashita,
Michael Poli
Abstract:
We approach designing a state-space model for deep learning applications through its dual representation, the transfer function, and uncover a highly efficient sequence parallel inference algorithm that is state-free: unlike other proposed algorithms, state-free inference does not incur any significant memory or computational cost with an increase in state size. We achieve this using properties of…
▽ More
We approach designing a state-space model for deep learning applications through its dual representation, the transfer function, and uncover a highly efficient sequence parallel inference algorithm that is state-free: unlike other proposed algorithms, state-free inference does not incur any significant memory or computational cost with an increase in state size. We achieve this using properties of the proposed frequency domain transfer function parametrization, which enables direct computation of its corresponding convolutional kernel's spectrum via a single Fast Fourier Transform. Our experimental results across multiple sequence lengths and state sizes illustrates, on average, a 35% training speed improvement over S4 layers -- parametrized in time-domain -- on the Long Range Arena benchmark, while delivering state-of-the-art downstream performances over other attention-free approaches. Moreover, we report improved perplexity in language modeling over a long convolutional Hyena baseline, by simply introducing our transfer function parametrization. Our code is available at https://github.com/ruke1ire/RTF.
△ Less
Submitted 1 June, 2024; v1 submitted 9 May, 2024;
originally announced May 2024.