-
How Does LLM Reasoning Work for Code? A Survey and a Call to Action
Authors:
Ira Ceka,
Saurabh Pujar,
Irene Manotas,
Gail Kaiser,
Baishakhi Ray,
Shyam Ramji
Abstract:
The rise of large language models (LLMs) has led to dramatic improvements across a wide range of natural language tasks. These advancements have extended into the domain of code, facilitating complex tasks such as code generation, translation, summarization, and repair. However, their utility for real-world deployment in-the-wild has only recently been studied, particularly on software engineering…
▽ More
The rise of large language models (LLMs) has led to dramatic improvements across a wide range of natural language tasks. These advancements have extended into the domain of code, facilitating complex tasks such as code generation, translation, summarization, and repair. However, their utility for real-world deployment in-the-wild has only recently been studied, particularly on software engineering (SWE) tasks such as GitHub issue resolution. In this study, we examine the code reasoning techniques that underlie the ability to perform such tasks, and examine the paradigms used to drive their performance. Our contributions in this paper are: (1) the first dedicated survey on code reasoning for code tasks, highlighting overarching strategies, hybrid and agentic approaches; (2) a taxonomy of various techniques used to drive code reasoning; (3) a comprehensive overview of performance on common benchmarks and a showcase of new, under-explored benchmarks with high potential in SWE; (4) an exploration on how core properties of code can be used to explain different reasoning techniques; and (5) gaps and potentially under-explored areas for future research.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Dr. GPT Will See You Now, but Should It? Exploring the Benefits and Harms of Large Language Models in Medical Diagnosis using Crowdsourced Clinical Cases
Authors:
Bonam Mingole,
Aditya Majumdar,
Firdaus Ahmed Choudhury,
Jennifer L. Kraschnewski,
Shyam S. Sundar,
Amulya Yadav
Abstract:
The proliferation of Large Language Models (LLMs) in high-stakes applications such as medical (self-)diagnosis and preliminary triage raises significant ethical and practical concerns about the effectiveness, appropriateness, and possible harmfulness of the use of these technologies for health-related concerns and queries. Some prior work has considered the effectiveness of LLMs in answering exper…
▽ More
The proliferation of Large Language Models (LLMs) in high-stakes applications such as medical (self-)diagnosis and preliminary triage raises significant ethical and practical concerns about the effectiveness, appropriateness, and possible harmfulness of the use of these technologies for health-related concerns and queries. Some prior work has considered the effectiveness of LLMs in answering expert-written health queries/prompts, questions from medical examination banks, or queries based on pre-existing clinical cases. Unfortunately, these existing studies completely ignore an in-the-wild evaluation of the effectiveness of LLMs in answering everyday health concerns and queries typically asked by general users, which corresponds to the more prevalent use case for LLMs. To address this research gap, this paper presents the findings from a university-level competition that leveraged a novel, crowdsourced approach for evaluating the effectiveness of LLMs in answering everyday health queries. Over the course of a week, a total of 34 participants prompted four publicly accessible LLMs with 212 real (or imagined) health concerns, and the LLM generated responses were evaluated by a team of nine board-certified physicians. At a high level, our findings indicate that on average, 76% of the 212 LLM responses were deemed to be accurate by physicians. Further, with the help of medical professionals, we investigated whether RAG versions of these LLMs (powered with a comprehensive medical knowledge base) can improve the quality of responses generated by LLMs. Finally, we also derive qualitative insights to explain our quantitative findings by conducting interviews with seven medical professionals who were shown all the prompts in our competition. This paper aims to provide a more grounded understanding of how LLMs perform in real-world everyday health communication.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
Understanding Software Engineering Agents Through the Lens of Traceability: An Empirical Study
Authors:
Ira Ceka,
Saurabh Pujar,
Shyam Ramji,
Luca Buratti,
Gail Kaiser,
Baishakhi Ray
Abstract:
With the advent of large language models (LLMs), software engineering agents (SWE agents) have emerged as a powerful paradigm for automating a range of software tasks -- from code generation and repair to test case synthesis. These agents operate autonomously by interpreting user input and responding to environmental feedback. While various agent architectures have demonstrated strong empirical pe…
▽ More
With the advent of large language models (LLMs), software engineering agents (SWE agents) have emerged as a powerful paradigm for automating a range of software tasks -- from code generation and repair to test case synthesis. These agents operate autonomously by interpreting user input and responding to environmental feedback. While various agent architectures have demonstrated strong empirical performance, the internal decision-making worfklows that drive their behavior remain poorly understood. Deeper insight into these workflows hold promise for improving both agent reliability and efficiency. In this work, we present the first systematic study of SWE agent behavior through the lens of execution traces. Our contributions are as follows: (1) we propose the first taxonomy of decision-making pathways across five representative agents; (2) using this taxonomy, we identify three core components essential to agent success -- bug localization, patch generation, and reproduction test generation -- and study each in depth; (3) we study the impact of test generation on successful patch production; and analyze strategies that can lead to successful test generation; (4) we further conduct the first large-scale code clone analysis comparing agent-generated and developer-written patches and provide a qualitative study revealing structural and stylistic differences in patch content. Together, these findings offer novel insights into agent design and open avenues for building agents that are both more effective and more aligned with human development practices.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
Can Quantized Audio Language Models Perform Zero-Shot Spoofing Detection?
Authors:
Bikash Dutta,
Rishabh Ranjan,
Shyam Sathvik,
Mayank Vatsa,
Richa Singh
Abstract:
Quantization is essential for deploying large audio language models (LALMs) efficiently in resource-constrained environments. However, its impact on complex tasks, such as zero-shot audio spoofing detection, remains underexplored. This study evaluates the zero-shot capabilities of five LALMs, GAMA, LTU-AS, MERaLiON, Qwen-Audio, and SALMONN, across three distinct datasets: ASVspoof2019, In-the-Wild…
▽ More
Quantization is essential for deploying large audio language models (LALMs) efficiently in resource-constrained environments. However, its impact on complex tasks, such as zero-shot audio spoofing detection, remains underexplored. This study evaluates the zero-shot capabilities of five LALMs, GAMA, LTU-AS, MERaLiON, Qwen-Audio, and SALMONN, across three distinct datasets: ASVspoof2019, In-the-Wild, and WaveFake, and investigates their robustness to quantization (FP32, FP16, INT8). Despite high initial spoof detection accuracy, our analysis demonstrates severe predictive biases toward spoof classification across all models, rendering their practical performance equivalent to random classification. Interestingly, quantization to FP16 precision resulted in negligible performance degradation compared to FP32, effectively halving memory and computational requirements without materially impacting accuracy. However, INT8 quantization intensified model biases, significantly degrading balanced accuracy. These findings highlight critical architectural limitations and emphasize FP16 quantization as an optimal trade-off, providing guidelines for practical deployment and future model refinement.
△ Less
Submitted 7 June, 2025;
originally announced June 2025.
-
LSM-2: Learning from Incomplete Wearable Sensor Data
Authors:
Maxwell A. Xu,
Girish Narayanswamy,
Kumar Ayush,
Dimitris Spathis,
Shun Liao,
Shyam A. Tailor,
Ahmed Metwally,
A. Ali Heydari,
Yuwei Zhang,
Jake Garrison,
Samy Abdel-Ghaffar,
Xuhai Xu,
Ken Gu,
Jacob Sunshine,
Ming-Zher Poh,
Yun Liu,
Tim Althoff,
Shrikanth Narayanan,
Pushmeet Kohli,
Mark Malhotra,
Shwetak Patel,
Yuzhe Yang,
James M. Rehg,
Xin Liu,
Daniel McDuff
Abstract:
Foundation models, a cornerstone of recent advancements in machine learning, have predominantly thrived on complete and well-structured data. Wearable sensor data frequently suffers from significant missingness, posing a substantial challenge for self-supervised learning (SSL) models that typically assume complete data inputs. This paper introduces the second generation of Large Sensor Model (LSM-…
▽ More
Foundation models, a cornerstone of recent advancements in machine learning, have predominantly thrived on complete and well-structured data. Wearable sensor data frequently suffers from significant missingness, posing a substantial challenge for self-supervised learning (SSL) models that typically assume complete data inputs. This paper introduces the second generation of Large Sensor Model (LSM-2) with Adaptive and Inherited Masking (AIM), a novel SSL approach that learns robust representations directly from incomplete data without requiring explicit imputation. AIM's core novelty lies in its use of learnable mask tokens to model both existing ("inherited") and artificially introduced missingness, enabling it to robustly handle fragmented real-world data during inference. Pre-trained on an extensive dataset of 40M hours of day-long multimodal sensor data, our LSM-2 with AIM achieves the best performance across a diverse range of tasks, including classification, regression and generative modeling. Furthermore, LSM-2 with AIM exhibits superior scaling performance, and critically, maintains high performance even under targeted missingness scenarios, reflecting clinically coherent patterns, such as the diagnostic value of nighttime biosignals for hypertension prediction. This makes AIM a more reliable choice for real-world wearable data applications.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Enhancing Experimental Efficiency in Materials Design: A Comparative Study of Taguchi and Machine Learning Methods
Authors:
Shyam Prabhu,
P Akshay Kumar,
Antov Selwinston,
Pavan Taduvai,
Shreya Bairi,
Rohit Batra
Abstract:
Materials design problems often require optimizing multiple variables, rendering full factorial exploration impractical. Design of experiment (DOE) methods, such as Taguchi technique, are commonly used to efficiently sample the design space but they inherently lack the ability to capture non-linear dependency of process variables. In this work, we demonstrate how machine learning (ML) methods can…
▽ More
Materials design problems often require optimizing multiple variables, rendering full factorial exploration impractical. Design of experiment (DOE) methods, such as Taguchi technique, are commonly used to efficiently sample the design space but they inherently lack the ability to capture non-linear dependency of process variables. In this work, we demonstrate how machine learning (ML) methods can be used to overcome these limitations. We compare the performance of Taguchi method against an active learning based Gaussian process regression (GPR) model in a wire arc additive manufacturing (WAAM) process to accurately predict aspects of bead geometry, including penetration depth, bead width, and height. While Taguchi method utilized a three-factor, five-level L25 orthogonal array to suggest weld parameters, the GPR model used an uncertainty-based exploration acquisition function coupled with latin hypercube sampling for initial training data. Accuracy and efficiency of both models was evaluated on 15 test cases, with GPR outperforming Taguchi in both metrics. This work applies to broader materials processing domain requiring efficient exploration of complex parameters.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Federated Foundation Model for GI Endoscopy Images
Authors:
Alina Devkota,
Annahita Amireskandari,
Joel Palko,
Shyam Thakkar,
Donald Adjeroh,
Xiajun Jiang,
Binod Bhattarai,
Prashnna K. Gyawali
Abstract:
Gastrointestinal (GI) endoscopy is essential in identifying GI tract abnormalities in order to detect diseases in their early stages and improve patient outcomes. Although deep learning has shown success in supporting GI diagnostics and decision-making, these models require curated datasets with labels that are expensive to acquire. Foundation models offer a promising solution by learning general-…
▽ More
Gastrointestinal (GI) endoscopy is essential in identifying GI tract abnormalities in order to detect diseases in their early stages and improve patient outcomes. Although deep learning has shown success in supporting GI diagnostics and decision-making, these models require curated datasets with labels that are expensive to acquire. Foundation models offer a promising solution by learning general-purpose representations, which can be finetuned for specific tasks, overcoming data scarcity. Developing foundation models for medical imaging holds significant potential, but the sensitive and protected nature of medical data presents unique challenges. Foundation model training typically requires extensive datasets, and while hospitals generate large volumes of data, privacy restrictions prevent direct data sharing, making foundation model training infeasible in most scenarios. In this work, we propose a FL framework for training foundation models for gastroendoscopy imaging, enabling data to remain within local hospital environments while contributing to a shared model. We explore several established FL algorithms, assessing their suitability for training foundation models without relying on task-specific labels, conducting experiments in both homogeneous and heterogeneous settings. We evaluate the trained foundation model on three critical downstream tasks--classification, detection, and segmentation--and demonstrate that it achieves improved performance across all tasks, highlighting the effectiveness of our approach in a federated, privacy-preserving setting.
△ Less
Submitted 5 June, 2025; v1 submitted 29 May, 2025;
originally announced May 2025.
-
Customizing a Large Language Model for VHDL Design of High-Performance Microprocessors
Authors:
Nicolas Dupuis,
Ravi Nair,
Shyam Ramji,
Sean McClintock,
Nishant Chauhan,
Priyanka Nagpal,
Bart Blaner,
Ken Valk,
Leon Stok,
Ruchir Puri
Abstract:
The use of Large Language Models (LLMs) in hardware design has taken off in recent years, principally through its incorporation in tools that increase chip designer productivity. There has been considerable discussion about the use of LLMs in RTL specifications of chip designs, for which the two most popular languages are Verilog and VHDL. LLMs and their use in Verilog design has received signific…
▽ More
The use of Large Language Models (LLMs) in hardware design has taken off in recent years, principally through its incorporation in tools that increase chip designer productivity. There has been considerable discussion about the use of LLMs in RTL specifications of chip designs, for which the two most popular languages are Verilog and VHDL. LLMs and their use in Verilog design has received significant attention due to the higher popularity of the language, but little attention so far has been given to VHDL despite its continued popularity in the industry. There has also been little discussion about the unique needs of organizations that engage in high-performance processor design, and techniques to deploy AI solutions in these settings. In this paper, we describe our journey in developing a Large Language Model (LLM) specifically for the purpose of explaining VHDL code, a task that has particular importance in an organization with decades of experience and assets in high-performance processor design. We show how we developed test sets specific to our needs and used them for evaluating models as we performed extended pretraining (EPT) of a base LLM. Expert evaluation of the code explanations produced by the EPT model increased to 69% compared to a base model rating of 43%. We further show how we developed an LLM-as-a-judge to gauge models similar to expert evaluators. This led us to deriving and evaluating a host of new models, including an instruction-tuned version of the EPT model with an expected expert evaluator rating of 71%. Our experiments also indicate that with the potential use of newer base models, this rating can be pushed to 85% and beyond. We conclude with a discussion on further improving the quality of hardware design LLMs using exciting new developments in the Generative AI world.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.
-
AI Assisted Cervical Cancer Screening for Cytology Samples in Developing Countries
Authors:
Love Panta,
Suraj Prasai,
Karishma Malla Vaidya,
Shyam Shrestha,
Suresh Manandhar
Abstract:
Cervical cancer remains a significant health challenge, with high incidence and mortality rates, particularly in transitioning countries. Conventional Liquid-Based Cytology(LBC) is a labor-intensive process, requires expert pathologists and is highly prone to errors, highlighting the need for more efficient screening methods. This paper introduces an innovative approach that integrates low-cost bi…
▽ More
Cervical cancer remains a significant health challenge, with high incidence and mortality rates, particularly in transitioning countries. Conventional Liquid-Based Cytology(LBC) is a labor-intensive process, requires expert pathologists and is highly prone to errors, highlighting the need for more efficient screening methods. This paper introduces an innovative approach that integrates low-cost biological microscopes with our simple and efficient AI algorithms for automated whole-slide analysis. Our system uses a motorized microscope to capture cytology images, which are then processed through an AI pipeline involving image stitching, cell segmentation, and classification. We utilize the lightweight UNet-based model involving human-in-the-loop approach to train our segmentation model with minimal ROIs. CvT-based classification model, trained on the SIPaKMeD dataset, accurately categorizes five cell types. Our framework offers enhanced accuracy and efficiency in cervical cancer screening compared to various state-of-art methods, as demonstrated by different evaluation metrics.
△ Less
Submitted 29 April, 2025;
originally announced April 2025.
-
Spatial Speech Translation: Translating Across Space With Binaural Hearables
Authors:
Tuochao Chen,
Qirui Wang,
Runlin He,
Shyam Gollakota
Abstract:
Imagine being in a crowded space where people speak a different language and having hearables that transform the auditory space into your native language, while preserving the spatial cues for all speakers. We introduce spatial speech translation, a novel concept for hearables that translate speakers in the wearer's environment, while maintaining the direction and unique voice characteristics of e…
▽ More
Imagine being in a crowded space where people speak a different language and having hearables that transform the auditory space into your native language, while preserving the spatial cues for all speakers. We introduce spatial speech translation, a novel concept for hearables that translate speakers in the wearer's environment, while maintaining the direction and unique voice characteristics of each speaker in the binaural output. To achieve this, we tackle several technical challenges spanning blind source separation, localization, real-time expressive translation, and binaural rendering to preserve the speaker directions in the translated audio, while achieving real-time inference on the Apple M2 silicon. Our proof-of-concept evaluation with a prototype binaural headset shows that, unlike existing models, which fail in the presence of interference, we achieve a BLEU score of up to 22.01 when translating between languages, despite strong interference from other speakers in the environment. User studies further confirm the system's effectiveness in spatially rendering the translated speech in previously unseen real-world reverberant environments. Taking a step back, this work marks the first step towards integrating spatial perception into speech translation.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Enhancing Leaf Disease Classification Using GAT-GCN Hybrid Model
Authors:
Shyam Sundhar,
Riya Sharma,
Priyansh Maheshwari,
Suvidha Rupesh Kumar,
T. Sunil Kumar
Abstract:
Agriculture plays a critical role in the global economy, providing livelihoods and ensuring food security for billions. As innovative agricultural practices become more widespread, the risk of crop diseases has increased, highlighting the urgent need for efficient, low-intervention disease identification methods. This research presents a hybrid model combining Graph Attention Networks (GATs) and G…
▽ More
Agriculture plays a critical role in the global economy, providing livelihoods and ensuring food security for billions. As innovative agricultural practices become more widespread, the risk of crop diseases has increased, highlighting the urgent need for efficient, low-intervention disease identification methods. This research presents a hybrid model combining Graph Attention Networks (GATs) and Graph Convolution Networks (GCNs) for leaf disease classification. GCNs have been widely used for learning from graph-structured data, and GATs enhance this by incorporating attention mechanisms to focus on the most important neighbors. The methodology integrates superpixel segmentation for efficient feature extraction, partitioning images into meaningful, homogeneous regions that better capture localized features. The authors have employed an edge augmentation technique to enhance the robustness of the model. The edge augmentation technique has introduced a significant degree of generalization in the detection capabilities of the model. To further optimize training, weight initialization techniques are applied. The hybrid model is evaluated against the individual performance of the GCN and GAT models and the hybrid model achieved a precision of 0.9822, recall of 0.9818, and F1-score of 0.9818 in apple leaf disease classification, a precision of 0.9746, recall of 0.9744, and F1-score of 0.9743 in potato leaf disease classification, and a precision of 0.8801, recall of 0.8801, and F1-score of 0.8799 in sugarcane leaf disease classification. These results demonstrate the robustness and performance of the model, suggesting its potential to support sustainable agricultural practices through precise and effective disease detection. This work is a small step towards reducing the loss of crops and hence supporting sustainable goals of zero hunger and life on land.
△ Less
Submitted 7 April, 2025;
originally announced April 2025.
-
Dimensional optimization of single-DOF planar rigid link-flapping mechanisms for high lift and low power
Authors:
Shyam Sunder Nishad,
Anupam Saxena
Abstract:
Rigid link flapping mechanisms remain the most practical choice for flapping wing micro-aerial vehicles (MAVs) to carry useful payloads and onboard batteries for free flight due to their long-term durability and reliability. However, to achieve high agility and maneuverability-like insects-MAVs with these mechanisms require significant weight reduction. One approach involves using single-DOF plana…
▽ More
Rigid link flapping mechanisms remain the most practical choice for flapping wing micro-aerial vehicles (MAVs) to carry useful payloads and onboard batteries for free flight due to their long-term durability and reliability. However, to achieve high agility and maneuverability-like insects-MAVs with these mechanisms require significant weight reduction. One approach involves using single-DOF planar rigid linkages, which are rarely optimized dimensionally for high lift and low power so that smaller motors and batteries could be used. We integrated a mechanism simulator based on a quasistatic nonlinear finite element method with an unsteady vortex lattice method-based aerodynamic analysis tool within an optimization routine. We optimized three different mechanism topologies from the literature. As a result, significant power savings were observed up to 42% in some cases, due to increased amplitude and higher lift coefficients resulting from optimized asymmetric sweeping velocity profiles. We also conducted an uncertainty analysis that revealed the need for high manufacturing tolerances to ensure reliable mechanism performance. The presented unified computational tool also facilitates the optimal selection of MAV components based on the payload and flight time requirements.
△ Less
Submitted 27 March, 2025;
originally announced March 2025.
-
Towards Open Diversity-Aware Social Interactions
Authors:
Loizos Michael,
Ivano Bison,
Matteo Busso,
Luca Cernuzzi,
Amalia De Götzen,
Shyam Diwakar,
Kobi Gal,
Amarsanaa Ganbold,
George Gaskell,
Daniel Gatica-Perez,
Jessica Heesen,
Daniele Miorandi,
Salvador Ruiz-Correa,
Laura Schelenz,
Avi Segal,
Carles Sierra,
Hao Xu,
Fausto Giunchiglia
Abstract:
Social Media and the Internet have catalyzed an unprecedented potential for exposure to human diversity in terms of demographics, talents, opinions, knowledge, and the like. However, this potential has not come with new, much needed, instruments and skills to harness it. This paper presents our work on promoting richer and deeper social relations through the design and development of the "Internet…
▽ More
Social Media and the Internet have catalyzed an unprecedented potential for exposure to human diversity in terms of demographics, talents, opinions, knowledge, and the like. However, this potential has not come with new, much needed, instruments and skills to harness it. This paper presents our work on promoting richer and deeper social relations through the design and development of the "Internet of Us", an online platform that uses diversity-aware Artificial Intelligence to mediate and empower human social interactions. We discuss the multiple facets of diversity in social settings, the multidisciplinary work that is required to reap the benefits of diversity, and the vision for a diversity-aware hybrid human-AI society.
△ Less
Submitted 17 February, 2025;
originally announced March 2025.
-
Robust Multi-Objective Controlled Decoding of Large Language Models
Authors:
Seongho Son,
William Bankes,
Sangwoong Yoon,
Shyam Sundhar Ramesh,
Xiaohang Tang,
Ilija Bogunovic
Abstract:
Test-time alignment of Large Language Models (LLMs) to human preferences offers a flexible way to generate responses aligned to diverse objectives without extensive retraining of LLMs. Existing methods achieve alignment to multiple objectives simultaneously (e.g., instruction-following, helpfulness, conciseness) by optimizing their corresponding reward functions. However, they often rely on predef…
▽ More
Test-time alignment of Large Language Models (LLMs) to human preferences offers a flexible way to generate responses aligned to diverse objectives without extensive retraining of LLMs. Existing methods achieve alignment to multiple objectives simultaneously (e.g., instruction-following, helpfulness, conciseness) by optimizing their corresponding reward functions. However, they often rely on predefined weights or optimize for averages, sacrificing one objective for another and leading to unbalanced outcomes. To address this, we introduce Robust Multi-Objective Decoding (RMOD), a novel inference-time algorithm that optimizes for improving worst-case rewards. RMOD formalizes the robust decoding problem as a maximin two-player game between reward weights and the sampling policy, solving for the Nash equilibrium. We show that the game reduces to a convex optimization problem to find the worst-case weights, while the best response policy can be computed analytically. We also introduce a practical RMOD variant designed for efficient decoding with contemporary LLMs, incurring minimal computational overhead compared to non-robust Multi-Objective Decoding (MOD) methods. Our experimental results showcase the effectiveness of RMOD in generating responses equitably aligned with diverse objectives, outperforming baselines up to 20%.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
ICPR 2024 Competition on Rider Intention Prediction
Authors:
Shankar Gangisetty,
Abdul Wasi,
Shyam Nandan Rai,
C. V. Jawahar,
Sajay Raj,
Manish Prajapati,
Ayesha Choudhary,
Aaryadev Chandra,
Dev Chandan,
Shireen Chand,
Suvaditya Mukherjee
Abstract:
The recent surge in the vehicle market has led to an alarming increase in road accidents. This underscores the critical importance of enhancing road safety measures, particularly for vulnerable road users like motorcyclists. Hence, we introduce the rider intention prediction (RIP) competition that aims to address challenges in rider safety by proactively predicting maneuvers before they occur, the…
▽ More
The recent surge in the vehicle market has led to an alarming increase in road accidents. This underscores the critical importance of enhancing road safety measures, particularly for vulnerable road users like motorcyclists. Hence, we introduce the rider intention prediction (RIP) competition that aims to address challenges in rider safety by proactively predicting maneuvers before they occur, thereby strengthening rider safety. This capability enables the riders to react to the potential incorrect maneuvers flagged by advanced driver assistance systems (ADAS). We collect a new dataset, namely, rider action anticipation dataset (RAAD) for the competition consisting of two tasks: single-view RIP and multi-view RIP. The dataset incorporates a spectrum of traffic conditions and challenging navigational maneuvers on roads with varying lighting conditions. For the competition, we received seventy-five registrations and five team submissions for inference of which we compared the methods of the top three performing teams on both the RIP tasks: one state-space model (Mamba2) and two learning-based approaches (SVM and CNN-LSTM). The results indicate that the state-space model outperformed the other methods across the entire dataset, providing a balanced performance across maneuver classes. The SVM-based RIP method showed the second-best performance when using random sampling and SMOTE. However, the CNN-LSTM method underperformed, primarily due to class imbalance issues, particularly struggling with minority classes. This paper details the proposed RAAD dataset and provides a summary of the submissions for the RIP 2024 competition.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
Artificial intelligence for objective assessment of acrobatic movements: How to apply machine learning for identifying tumbling elements in cheer sports
Authors:
Sophia Wesely,
Ella Hofer,
Robin Curth,
Shyam Paryani,
Nicole Mills,
Olaf Ueberschär,
Julia Westermayr
Abstract:
Over the past four decades, cheerleading has evolved from a sideline activity at major sporting events into a professional, competitive sport with growing global popularity. Evaluating tumbling elements in cheerleading relies on both objective measures and subjective judgments, such as difficulty and execution quality. However, the complexity of tumbling - encompassing team synchronicity, ground i…
▽ More
Over the past four decades, cheerleading has evolved from a sideline activity at major sporting events into a professional, competitive sport with growing global popularity. Evaluating tumbling elements in cheerleading relies on both objective measures and subjective judgments, such as difficulty and execution quality. However, the complexity of tumbling - encompassing team synchronicity, ground interactions, choreography, and artistic expression - makes objective assessment challenging. Artificial intelligence (AI) has revolutionized various scientific fields and industries through precise data-driven analyses, yet their application in acrobatic sports remains limited despite significant potential for enhancing performance evaluation and coaching. This study investigates the feasibility of using an AI-based approach with data from a single inertial measurement unit to accurately identify and objectively assess tumbling elements in standard cheerleading routines. A sample of 16 participants (13 females, 3 males) from a Division I collegiate cheerleading team wore a single inertial measurement unit at the dorsal pelvis. Over a 4-week seasonal preparation period, 1102 tumbling elements were recorded during regular practice sessions. Using triaxial accelerations and rotational speeds, various ML algorithms were employed to classify and evaluate the execution of tumbling manoeuvres. Results indicate that certain machine learning models can effectively identify different tumbling elements despite inter-individual variability and data noise, achieving high accuracy. These findings demonstrate the significant potential for integrating AI-driven assessments into cheerleading and other acrobatic sports, providing objective metrics that complement traditional judging methods.
△ Less
Submitted 11 February, 2025;
originally announced March 2025.
-
A new lower bound for multi-color discrepancy with applications to fair division
Authors:
Ioannis Caragiannis,
Kasper Green Larsen,
Sudarshan Shyam
Abstract:
A classical problem in combinatorics seeks colorings of low discrepancy. More concretely, the goal is to color the elements of a set system so that the number of appearances of any color among the elements in each set is as balanced as possible. We present a new lower bound for multi-color discrepancy, showing that there is a set system with $n$ subsets over a set of elements in which any $k$-colo…
▽ More
A classical problem in combinatorics seeks colorings of low discrepancy. More concretely, the goal is to color the elements of a set system so that the number of appearances of any color among the elements in each set is as balanced as possible. We present a new lower bound for multi-color discrepancy, showing that there is a set system with $n$ subsets over a set of elements in which any $k$-coloring of the elements has discrepancy at least $Ω\left(\sqrt{\frac{n}{\ln{k}}}\right)$. This result improves the previously best-known lower bound of $Ω\left(\sqrt{\frac{n}{k}}\right)$ of Doerr and Srivastav [2003] and may have several applications. Here, we explore its implications on the feasibility of fair division concepts for instances with $n$ agents having valuations for a set of indivisible items. The first such concept is known as consensus $1/k$-division up to $d$ items (\cd$d$) and aims to allocate the items into $k$ bundles so that no matter which bundle each agent is assigned to, the allocation is envy-free up to $d$ items. The above lower bound implies that \cd$d$ can be infeasible for $d\in Ω\left(\sqrt{\frac{n}{\ln{k}}}\right)$. We furthermore extend our proof technique to show that there exist instances of the problem of allocating indivisible items to $k$ groups of $n$ agents in total so that envy-freeness and proportionality up to $d$ items are infeasible for $d\in Ω\left(\sqrt{\frac{n}{k\ln{k}}}\right)$ and $d\in Ω\left(\sqrt{\frac{n}{k^3\ln{k}}}\right)$, respectively. The lower bounds for fair division improve the currently best-known ones by Manurangsi and Suksompong [2022].
△ Less
Submitted 18 February, 2025; v1 submitted 14 February, 2025;
originally announced February 2025.
-
Large language models perpetuate bias in palliative care: development and analysis of the Palliative Care Adversarial Dataset (PCAD)
Authors:
Naomi Akhras,
Fares Antaki,
Fannie Mottet,
Olivia Nguyen,
Shyam Sawhney,
Sabrina Bajwah,
Joanna M Davies
Abstract:
Bias and inequity in palliative care disproportionately affect marginalised groups. Large language models (LLMs), such as GPT-4o, hold potential to enhance care but risk perpetuating biases present in their training data. This study aimed to systematically evaluate whether GPT-4o propagates biases in palliative care responses using adversarially designed datasets. In July 2024, GPT-4o was probed u…
▽ More
Bias and inequity in palliative care disproportionately affect marginalised groups. Large language models (LLMs), such as GPT-4o, hold potential to enhance care but risk perpetuating biases present in their training data. This study aimed to systematically evaluate whether GPT-4o propagates biases in palliative care responses using adversarially designed datasets. In July 2024, GPT-4o was probed using the Palliative Care Adversarial Dataset (PCAD), and responses were evaluated by three palliative care experts in Canada and the United Kingdom using validated bias rubrics. The PCAD comprised PCAD-Direct (100 adversarial questions) and PCAD-Counterfactual (84 paired scenarios). These datasets targeted four care dimensions (access to care, pain management, advance care planning, and place of death preferences) and three identity axes (ethnicity, age, and diagnosis). Bias was detected in a substantial proportion of responses. For adversarial questions, the pooled bias rate was 0.33 (95% confidence interval [CI]: 0.28, 0.38); "allows biased premise" was the most frequently identified source of bias (0.47; 95% CI: 0.39, 0.55), such as failing to challenge stereotypes. For counterfactual scenarios, the pooled bias rate was 0.26 (95% CI: 0.20, 0.31), with "potential for withholding" as the most frequently identified source of bias (0.25; 95% CI: 0.18, 0.34), such as withholding interventions based on identity. Bias rates were consistent across care dimensions and identity axes. GPT-4o perpetuates biases in palliative care, with implications for clinical decision-making and equity. The PCAD datasets provide novel tools to assess and address LLM bias in palliative care.
△ Less
Submitted 11 February, 2025;
originally announced February 2025.
-
LR0.FM: Low-Res Benchmark and Improving Robustness for Zero-Shot Classification in Foundation Models
Authors:
Priyank Pathak,
Shyam Marjit,
Shruti Vyas,
Yogesh S Rawat
Abstract:
Visual-language foundation Models (FMs) exhibit remarkable zero-shot generalization across diverse tasks, largely attributed to extensive pre-training on largescale datasets. However, their robustness on low-resolution/pixelated (LR) images, a common challenge in real-world scenarios, remains underexplored. We introduce LR0.FM, a comprehensive benchmark evaluating the impact of low resolution on t…
▽ More
Visual-language foundation Models (FMs) exhibit remarkable zero-shot generalization across diverse tasks, largely attributed to extensive pre-training on largescale datasets. However, their robustness on low-resolution/pixelated (LR) images, a common challenge in real-world scenarios, remains underexplored. We introduce LR0.FM, a comprehensive benchmark evaluating the impact of low resolution on the zero-shot classification performance of 10 FM(s) across 66 backbones and 15 datasets. We propose a novel metric, Weighted Aggregated Robustness, to address the limitations of existing metrics and better evaluate model performance across resolutions and datasets. Our key findings show that: (i) model size positively correlates with robustness to resolution degradation, (ii) pre-training dataset quality is more important than its size, and (iii) fine-tuned and higher resolution models are less robust against LR. Our analysis further reveals that the model makes semantically reasonable predictions at LR, and the lack of fine-grained details in input adversely impacts the model's initial layers more than the deeper layers. We use these insights and introduce a simple strategy, LR-TK0, to enhance the robustness of models without compromising their pre-trained weights. We demonstrate the effectiveness of LR-TK0 for robustness against low-resolution across several datasets and its generalization capability across backbones and other approaches. Code is available at https://github.com/shyammarjit/LR0.FM
△ Less
Submitted 18 May, 2025; v1 submitted 6 February, 2025;
originally announced February 2025.
-
DiversityOne: A Multi-Country Smartphone Sensor Dataset for Everyday Life Behavior Modeling
Authors:
Matteo Busso,
Andrea Bontempelli,
Leonardo Javier Malcotti,
Lakmal Meegahapola,
Peter Kun,
Shyam Diwakar,
Chaitanya Nutakki,
Marcelo Dario Rodas Britez,
Hao Xu,
Donglei Song,
Salvador Ruiz Correa,
Andrea-Rebeca Mendoza-Lara,
George Gaskell,
Sally Stares,
Miriam Bidoglia,
Amarsanaa Ganbold,
Altangerel Chagnaa,
Luca Cernuzzi,
Alethia Hume,
Ronald Chenu-Abente,
Roy Alia Asiku,
Ivan Kayongo,
Daniel Gatica-Perez,
Amalia de Götzen,
Ivano Bison
, et al. (1 additional authors not shown)
Abstract:
Understanding everyday life behavior of young adults through personal devices, e.g., smartphones and smartwatches, is key for various applications, from enhancing the user experience in mobile apps to enabling appropriate interventions in digital health apps. Towards this goal, previous studies have relied on datasets combining passive sensor data with human-provided annotations or self-reports. H…
▽ More
Understanding everyday life behavior of young adults through personal devices, e.g., smartphones and smartwatches, is key for various applications, from enhancing the user experience in mobile apps to enabling appropriate interventions in digital health apps. Towards this goal, previous studies have relied on datasets combining passive sensor data with human-provided annotations or self-reports. However, many existing datasets are limited in scope, often focusing on specific countries primarily in the Global North, involving a small number of participants, or using a limited range of pre-processed sensors. These limitations restrict the ability to capture cross-country variations of human behavior, including the possibility of studying model generalization, and robustness. To address this gap, we introduce DiversityOne, a dataset which spans eight countries (China, Denmark, India, Italy, Mexico, Mongolia, Paraguay, and the United Kingdom) and includes data from 782 college students over four weeks. DiversityOne contains data from 26 smartphone sensor modalities and 350K+ self-reports. As of today, it is one of the largest and most diverse publicly available datasets, while featuring extensive demographic and psychosocial survey data. DiversityOne opens the possibility of studying important research problems in ubiquitous computing, particularly in domain adaptation and generalization across countries, all research areas so far largely underexplored because of the lack of adequate datasets.
△ Less
Submitted 5 February, 2025;
originally announced February 2025.
-
On Almost Surely Safe Alignment of Large Language Models at Inference-Time
Authors:
Xiaotong Ji,
Shyam Sundhar Ramesh,
Matthieu Zimmer,
Ilija Bogunovic,
Jun Wang,
Haitham Bou Ammar
Abstract:
We introduce a novel inference-time alignment approach for LLMs that aims to generate safe responses almost surely, i.e., with probability approaching one. Our approach models the generation of safe responses as a constrained Markov Decision Process (MDP) within the LLM's latent space. We augment a safety state that tracks the evolution of safety constraints and dynamically penalize unsafe generat…
▽ More
We introduce a novel inference-time alignment approach for LLMs that aims to generate safe responses almost surely, i.e., with probability approaching one. Our approach models the generation of safe responses as a constrained Markov Decision Process (MDP) within the LLM's latent space. We augment a safety state that tracks the evolution of safety constraints and dynamically penalize unsafe generations to ensure the generation of safe responses. Consequently, we demonstrate formal safety guarantees w.r.t. the given cost model upon solving the MDP in the latent space with sufficiently large penalties. Building on this foundation, we propose InferenceGuard, a practical implementation that safely aligns LLMs without modifying the model weights. Empirically, we demonstrate that InferenceGuard effectively balances safety and task performance, outperforming existing inference-time alignment methods in generating safe and aligned responses. Our findings contribute to the advancement of safer LLM deployment through alignment at inference-time, thus presenting a promising alternative to resource-intensive, overfitting-prone alignment techniques like RLHF.
△ Less
Submitted 20 June, 2025; v1 submitted 3 February, 2025;
originally announced February 2025.
-
Optimizing Multitask Industrial Processes with Predictive Action Guidance
Authors:
Naval Kishore Mehta,
Arvind,
Shyam Sunder Prasad,
Sumeet Saurav,
Sanjay Singh
Abstract:
Monitoring complex assembly processes is critical for maintaining productivity and ensuring compliance with assembly standards. However, variability in human actions and subjective task preferences complicate accurate task anticipation and guidance. To address these challenges, we introduce the Multi-Modal Transformer Fusion and Recurrent Units (MMTFRU) Network for egocentric activity anticipation…
▽ More
Monitoring complex assembly processes is critical for maintaining productivity and ensuring compliance with assembly standards. However, variability in human actions and subjective task preferences complicate accurate task anticipation and guidance. To address these challenges, we introduce the Multi-Modal Transformer Fusion and Recurrent Units (MMTFRU) Network for egocentric activity anticipation, utilizing multimodal fusion to improve prediction accuracy. Integrated with the Operator Action Monitoring Unit (OAMU), the system provides proactive operator guidance, preventing deviations in the assembly process. OAMU employs two strategies: (1) Top-5 MMTF-RU predictions, combined with a reference graph and an action dictionary, for next-step recommendations; and (2) Top-1 MMTF-RU predictions, integrated with a reference graph, for detecting sequence deviations and predicting anomaly scores via an entropy-informed confidence mechanism. We also introduce Time-Weighted Sequence Accuracy (TWSA) to evaluate operator efficiency and ensure timely task completion. Our approach is validated on the industrial Meccano dataset and the largescale EPIC-Kitchens-55 dataset, demonstrating its effectiveness in dynamic environments.
△ Less
Submitted 9 January, 2025;
originally announced January 2025.
-
Human-like Bots for Tactical Shooters Using Compute-Efficient Sensors
Authors:
Niels Justesen,
Maria Kaselimi,
Sam Snodgrass,
Miruna Vozaru,
Matthew Schlegel,
Jonas Wingren,
Gabriella A. B. Barros,
Tobias Mahlmann,
Shyam Sudhakaran,
Wesley Kerr,
Albert Wang,
Christoffer Holmgård,
Georgios N. Yannakakis,
Sebastian Risi,
Julian Togelius
Abstract:
Artificial intelligence (AI) has enabled agents to master complex video games, from first-person shooters like Counter-Strike to real-time strategy games such as StarCraft II and racing games like Gran Turismo. While these achievements are notable, applying these AI methods in commercial video game production remains challenging due to computational constraints. In commercial scenarios, the majori…
▽ More
Artificial intelligence (AI) has enabled agents to master complex video games, from first-person shooters like Counter-Strike to real-time strategy games such as StarCraft II and racing games like Gran Turismo. While these achievements are notable, applying these AI methods in commercial video game production remains challenging due to computational constraints. In commercial scenarios, the majority of computational resources are allocated to 3D rendering, leaving limited capacity for AI methods, which often demand high computational power, particularly those relying on pixel-based sensors. Moreover, the gaming industry prioritizes creating human-like behavior in AI agents to enhance player experience, unlike academic models that focus on maximizing game performance. This paper introduces a novel methodology for training neural networks via imitation learning to play a complex, commercial-standard, VALORANT-like 2v2 tactical shooter game, requiring only modest CPU hardware during inference. Our approach leverages an innovative, pixel-free perception architecture using a small set of ray-cast sensors, which capture essential spatial information efficiently. These sensors allow AI to perform competently without the computational overhead of traditional methods. Models are trained to mimic human behavior using supervised learning on human trajectory data, resulting in realistic and engaging AI agents. Human evaluation tests confirm that our AI agents provide human-like gameplay experiences while operating efficiently under computational constraints. This offers a significant advancement in AI model development for tactical shooter games and possibly other genres.
△ Less
Submitted 30 December, 2024;
originally announced January 2025.
-
Adapting Large Language Models for Improving TCP Fairness over WiFi
Authors:
Shyam Kumar Shrestha,
Shiva Raj Pokhrel,
Jonathan Kua
Abstract:
The new transmission control protocol (TCP) relies on Deep Learning (DL) for prediction and optimization, but requires significant manual effort to design deep neural networks (DNNs) and struggles with generalization in dynamic environments. Inspired by the success of large language models (LLMs), this study proposes TCP-LLM, a novel framework leveraging LLMs for TCP applications. TCP-LLM utilizes…
▽ More
The new transmission control protocol (TCP) relies on Deep Learning (DL) for prediction and optimization, but requires significant manual effort to design deep neural networks (DNNs) and struggles with generalization in dynamic environments. Inspired by the success of large language models (LLMs), this study proposes TCP-LLM, a novel framework leveraging LLMs for TCP applications. TCP-LLM utilizes pre-trained knowledge to reduce engineering effort, enhance generalization, and deliver superior performance across diverse TCP tasks. Applied to reducing flow unfairness, adapting congestion control, and preventing starvation, TCP-LLM demonstrates significant improvements over TCP with minimal fine-tuning.
△ Less
Submitted 24 December, 2024;
originally announced December 2024.
-
Learn2Mix: Training Neural Networks Using Adaptive Data Integration
Authors:
Shyam Venkatasubramanian,
Vahid Tarokh
Abstract:
Accelerating model convergence within resource-constrained environments is critical to ensure fast and efficient neural network training. This work presents learn2mix, a novel training strategy that adaptively adjusts class proportions within batches, focusing on classes with higher error rates. Unlike classical training methods that use static class proportions, learn2mix continually adapts class…
▽ More
Accelerating model convergence within resource-constrained environments is critical to ensure fast and efficient neural network training. This work presents learn2mix, a novel training strategy that adaptively adjusts class proportions within batches, focusing on classes with higher error rates. Unlike classical training methods that use static class proportions, learn2mix continually adapts class proportions during training, leading to faster convergence. Empirical evaluations conducted on benchmark datasets show that neural networks trained with learn2mix converge faster than those trained with existing approaches, achieving improved results for classification, regression, and reconstruction tasks under limited training resources and with imbalanced classes. Our empirical findings are supported by theoretical analysis.
△ Less
Submitted 13 February, 2025; v1 submitted 20 December, 2024;
originally announced December 2024.
-
Near-Optimal Trace Reconstruction for Mildly Separated Strings
Authors:
Anders Aamand,
Allen Liu,
Shyam Narayanan
Abstract:
In the trace reconstruction problem our goal is to learn an unknown string $x\in \{0,1\}^n$ given independent traces of $x$. A trace is obtained by independently deleting each bit of $x$ with some probability $δ$ and concatenating the remaining bits. It is a major open question whether the trace reconstruction problem can be solved with a polynomial number of traces when the deletion probability…
▽ More
In the trace reconstruction problem our goal is to learn an unknown string $x\in \{0,1\}^n$ given independent traces of $x$. A trace is obtained by independently deleting each bit of $x$ with some probability $δ$ and concatenating the remaining bits. It is a major open question whether the trace reconstruction problem can be solved with a polynomial number of traces when the deletion probability $δ$ is constant. The best known upper bound and lower bounds are respectively $\exp(\tilde O(n^{1/5}))$ and $\tilde Ω(n^{3/2})$ both by Chase [Cha21b,Cha21a]. Our main result is that if the string $x$ is mildly separated, meaning that the number of zeros between any two ones in $x$ is at least polylog$n$, and if $δ$ is a sufficiently small constant, then the trace reconstruction problem can be solved with $O(n \log n)$ traces and in polynomial time.
△ Less
Submitted 27 November, 2024;
originally announced November 2024.
-
The Zamba2 Suite: Technical Report
Authors:
Paolo Glorioso,
Quentin Anthony,
Yury Tokpanov,
Anna Golubeva,
Vasudev Shyam,
James Whittington,
Jonathan Pilault,
Beren Millidge
Abstract:
In this technical report, we present the Zamba2 series -- a suite of 1.2B, 2.7B, and 7.4B parameter hybrid Mamba2-transformer models that achieve state of the art performance against the leading open-weights models of their class, while achieving substantial gains in inference latency, throughput, and memory efficiency. The Zamba2 series builds upon our initial work with Zamba1-7B, optimizing its…
▽ More
In this technical report, we present the Zamba2 series -- a suite of 1.2B, 2.7B, and 7.4B parameter hybrid Mamba2-transformer models that achieve state of the art performance against the leading open-weights models of their class, while achieving substantial gains in inference latency, throughput, and memory efficiency. The Zamba2 series builds upon our initial work with Zamba1-7B, optimizing its architecture, training and annealing datasets, and training for up to three trillion tokens. We provide open-source weights for all models of the Zamba2 series as well as instruction-tuned variants that are strongly competitive against comparable instruct-tuned models of their class. We additionally open-source the pretraining dataset, which we call Zyda-2, used to train the Zamba2 series of models. The models and datasets used in this work are openly available at https://huggingface.co/Zyphra
△ Less
Submitted 21 November, 2024;
originally announced November 2024.
-
Sample-Efficient Private Learning of Mixtures of Gaussians
Authors:
Hassan Ashtiani,
Mahbod Majid,
Shyam Narayanan
Abstract:
We study the problem of learning mixtures of Gaussians with approximate differential privacy. We prove that roughly $kd^2 + k^{1.5} d^{1.75} + k^2 d$ samples suffice to learn a mixture of $k$ arbitrary $d$-dimensional Gaussians up to low total variation distance, with differential privacy. Our work improves over the previous best result [AAL24b] (which required roughly $k^2 d^4$ samples) and is pr…
▽ More
We study the problem of learning mixtures of Gaussians with approximate differential privacy. We prove that roughly $kd^2 + k^{1.5} d^{1.75} + k^2 d$ samples suffice to learn a mixture of $k$ arbitrary $d$-dimensional Gaussians up to low total variation distance, with differential privacy. Our work improves over the previous best result [AAL24b] (which required roughly $k^2 d^4$ samples) and is provably optimal when $d$ is much larger than $k^2$. Moreover, we give the first optimal bound for privately learning mixtures of $k$ univariate (i.e., $1$-dimensional) Gaussians. Importantly, we show that the sample complexity for privately learning mixtures of univariate Gaussians is linear in the number of components $k$, whereas the previous best sample complexity [AAL21] was quadratic in $k$. Our algorithms utilize various techniques, including the inverse sensitivity mechanism [AD20b, AD20a, HKMN23], sample compression for distributions [ABDH+20], and methods for bounding volumes of sumsets.
△ Less
Submitted 4 November, 2024;
originally announced November 2024.
-
Statistical-Computational Trade-offs for Density Estimation
Authors:
Anders Aamand,
Alexandr Andoni,
Justin Y. Chen,
Piotr Indyk,
Shyam Narayanan,
Sandeep Silwal,
Haike Xu
Abstract:
We study the density estimation problem defined as follows: given $k$ distributions $p_1, \ldots, p_k$ over a discrete domain $[n]$, as well as a collection of samples chosen from a ``query'' distribution $q$ over $[n]$, output $p_i$ that is ``close'' to $q$. Recently~\cite{aamand2023data} gave the first and only known result that achieves sublinear bounds in {\em both} the sampling complexity and…
▽ More
We study the density estimation problem defined as follows: given $k$ distributions $p_1, \ldots, p_k$ over a discrete domain $[n]$, as well as a collection of samples chosen from a ``query'' distribution $q$ over $[n]$, output $p_i$ that is ``close'' to $q$. Recently~\cite{aamand2023data} gave the first and only known result that achieves sublinear bounds in {\em both} the sampling complexity and the query time while preserving polynomial data structure space. However, their improvement over linear samples and time is only by subpolynomial factors.
Our main result is a lower bound showing that, for a broad class of data structures, their bounds cannot be significantly improved. In particular, if an algorithm uses $O(n/\log^c k)$ samples for some constant $c>0$ and polynomial space, then the query time of the data structure must be at least $k^{1-O(1)/\log \log k}$, i.e., close to linear in the number of distributions $k$. This is a novel \emph{statistical-computational} trade-off for density estimation, demonstrating that any data structure must use close to a linear number of samples or take close to linear query time. The lower bound holds even in the realizable case where $q=p_i$ for some $i$, and when the distributions are flat (specifically, all distributions are uniform over half of the domain $[n]$). We also give a simple data structure for our lower bound instance with asymptotically matching upper bounds. Experiments show that the data structure is quite efficient in practice.
△ Less
Submitted 30 October, 2024;
originally announced October 2024.
-
Have LLMs Reopened the Pandora's Box of AI-Generated Fake News?
Authors:
Xinyu Wang,
Wenbo Zhang,
Sai Koneru,
Hangzhi Guo,
Bonam Mingole,
S. Shyam Sundar,
Sarah Rajtmajer,
Amulya Yadav
Abstract:
With the rise of AI-generated content spewed at scale from large language models (LLMs), genuine concerns about the spread of fake news have intensified. The perceived ability of LLMs to produce convincing fake news at scale poses new challenges for both human and automated fake news detection systems. To address this gap, this paper presents the findings from a university-level competition that a…
▽ More
With the rise of AI-generated content spewed at scale from large language models (LLMs), genuine concerns about the spread of fake news have intensified. The perceived ability of LLMs to produce convincing fake news at scale poses new challenges for both human and automated fake news detection systems. To address this gap, this paper presents the findings from a university-level competition that aimed to explore how LLMs can be used by humans to create fake news, and to assess the ability of human annotators and AI models to detect it. A total of 110 participants used LLMs to create 252 unique fake news stories, and 84 annotators participated in the detection tasks. Our findings indicate that LLMs are ~68% more effective at detecting real news than humans. However, for fake news detection, the performance of LLMs and humans remains comparable (~60% accuracy). Additionally, we examine the impact of visual elements (e.g., pictures) in news on the accuracy of detecting fake news stories. Finally, we also examine various strategies used by fake news creators to enhance the credibility of their AI-generated content. This work highlights the increasing complexity of detecting AI-generated fake news, particularly in collaborative human-AI settings.
△ Less
Submitted 29 March, 2025; v1 submitted 24 October, 2024;
originally announced October 2024.
-
CASCRNet: An Atrous Spatial Pyramid Pooling and Shared Channel Residual based Network for Capsule Endoscopy
Authors:
K V Srinanda,
M Manvith Prabhu,
Shyam Lal
Abstract:
This manuscript summarizes work on the Capsule Vision Challenge 2024 by MISAHUB. To address the multi-class disease classification task, which is challenging due to the complexity and imbalance in the Capsule Vision challenge dataset, this paper proposes CASCRNet (Capsule endoscopy-Aspp-SCR-Network), a parameter-efficient and novel model that uses Shared Channel Residual (SCR) blocks and Atrous Sp…
▽ More
This manuscript summarizes work on the Capsule Vision Challenge 2024 by MISAHUB. To address the multi-class disease classification task, which is challenging due to the complexity and imbalance in the Capsule Vision challenge dataset, this paper proposes CASCRNet (Capsule endoscopy-Aspp-SCR-Network), a parameter-efficient and novel model that uses Shared Channel Residual (SCR) blocks and Atrous Spatial Pyramid Pooling (ASPP) blocks. Further, the performance of the proposed model is compared with other well-known approaches. The experimental results yield that proposed model provides better disease classification results. The proposed model was successful in classifying diseases with an F1 Score of 78.5% and a Mean AUC of 98.3%, which is promising given its compact architecture.
△ Less
Submitted 27 November, 2024; v1 submitted 23 October, 2024;
originally announced October 2024.
-
Hey GPT, Can You be More Racist? Analysis from Crowdsourced Attempts to Elicit Biased Content from Generative AI
Authors:
Hangzhi Guo,
Pranav Narayanan Venkit,
Eunchae Jang,
Mukund Srinath,
Wenbo Zhang,
Bonam Mingole,
Vipul Gupta,
Kush R. Varshney,
S. Shyam Sundar,
Amulya Yadav
Abstract:
The widespread adoption of large language models (LLMs) and generative AI (GenAI) tools across diverse applications has amplified the importance of addressing societal biases inherent within these technologies. While the NLP community has extensively studied LLM bias, research investigating how non-expert users perceive and interact with biases from these systems remains limited. As these technolo…
▽ More
The widespread adoption of large language models (LLMs) and generative AI (GenAI) tools across diverse applications has amplified the importance of addressing societal biases inherent within these technologies. While the NLP community has extensively studied LLM bias, research investigating how non-expert users perceive and interact with biases from these systems remains limited. As these technologies become increasingly prevalent, understanding this question is crucial to inform model developers in their efforts to mitigate bias. To address this gap, this work presents the findings from a university-level competition, which challenged participants to design prompts for eliciting biased outputs from GenAI tools. We quantitatively and qualitatively analyze the competition submissions and identify a diverse set of biases in GenAI and strategies employed by participants to induce bias in GenAI. Our finding provides unique insights into how non-expert users perceive and interact with biases from GenAI tools.
△ Less
Submitted 20 October, 2024;
originally announced October 2024.
-
Instance-Optimality in I/O-Efficient Sampling and Sequential Estimation
Authors:
Shyam Narayanan,
Václav Rozhoň,
Jakub Tětek,
Mikkel Thorup
Abstract:
Suppose we have a memory storing $0$s and $1$s and we want to estimate the frequency of $1$s by sampling. We want to do this I/O-efficiently, exploiting that each read gives a block of $B$ bits at unit cost; not just one bit. If the input consists of uniform blocks: either all 1s or all 0s, then sampling a whole block at a time does not reduce the number of samples needed for estimation. On the ot…
▽ More
Suppose we have a memory storing $0$s and $1$s and we want to estimate the frequency of $1$s by sampling. We want to do this I/O-efficiently, exploiting that each read gives a block of $B$ bits at unit cost; not just one bit. If the input consists of uniform blocks: either all 1s or all 0s, then sampling a whole block at a time does not reduce the number of samples needed for estimation. On the other hand, if bits are randomly permuted, then getting a block of $B$ bits is as good as getting $B$ independent bit samples. However, we do not want to make any such assumptions on the input. Instead, our goal is to have an algorithm with instance-dependent performance guarantees which stops sampling blocks as soon as we know that we have a probabilistically reliable estimate. We prove our algorithms to be instance-optimal among algorithms oblivious to the order of the blocks, which we argue is the strongest form of instance optimality we can hope for. We also present similar results for I/O-efficiently estimating mean with both additive and multiplicative error, estimating histograms, quantiles, as well as the empirical cumulative distribution function.
We obtain our above results on I/O-efficient sampling by reducing to corresponding problems in the so-called sequential estimation. In this setting, one samples from an unknown distribution until one can provide an estimate with some desired error probability. We then provide non-parametric instance-optimal results for several fundamental problems: mean and quantile estimation, as well as learning mixture distributions with respect to $\ell_\infty$ and the so-called Kolmogorov-Smirnov distance.
△ Less
Submitted 18 October, 2024;
originally announced October 2024.
-
Scaling Wearable Foundation Models
Authors:
Girish Narayanswamy,
Xin Liu,
Kumar Ayush,
Yuzhe Yang,
Xuhai Xu,
Shun Liao,
Jake Garrison,
Shyam Tailor,
Jake Sunshine,
Yun Liu,
Tim Althoff,
Shrikanth Narayanan,
Pushmeet Kohli,
Jiening Zhan,
Mark Malhotra,
Shwetak Patel,
Samy Abdel-Ghaffar,
Daniel McDuff
Abstract:
Wearable sensors have become ubiquitous thanks to a variety of health tracking features. The resulting continuous and longitudinal measurements from everyday life generate large volumes of data; however, making sense of these observations for scientific and actionable insights is non-trivial. Inspired by the empirical success of generative modeling, where large neural networks learn powerful repre…
▽ More
Wearable sensors have become ubiquitous thanks to a variety of health tracking features. The resulting continuous and longitudinal measurements from everyday life generate large volumes of data; however, making sense of these observations for scientific and actionable insights is non-trivial. Inspired by the empirical success of generative modeling, where large neural networks learn powerful representations from vast amounts of text, image, video, or audio data, we investigate the scaling properties of sensor foundation models across compute, data, and model size. Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, electrodermal activity, accelerometer, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM, a multimodal foundation model built on the largest wearable-signals dataset with the most extensive range of sensor modalities to date. Our results establish the scaling laws of LSM for tasks such as imputation, interpolation and extrapolation, both across time and sensor modalities. Moreover, we highlight how LSM enables sample-efficient downstream learning for tasks like exercise and activity recognition.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities
Authors:
Lichang Chen,
Hexiang Hu,
Mingda Zhang,
Yiwen Chen,
Zifeng Wang,
Yandong Li,
Pranav Shyam,
Tianyi Zhou,
Heng Huang,
Ming-Hsuan Yang,
Boqing Gong
Abstract:
We introduce OmnixR, an evaluation suite designed to benchmark SoTA Omni-modality Language Models, such as GPT-4o and Gemini. Evaluating OLMs, which integrate multiple modalities such as text, vision, and audio, presents unique challenges. Particularly, the user message might often consist of multiple modalities, such that OLMs have to establish holistic understanding and reasoning across modaliti…
▽ More
We introduce OmnixR, an evaluation suite designed to benchmark SoTA Omni-modality Language Models, such as GPT-4o and Gemini. Evaluating OLMs, which integrate multiple modalities such as text, vision, and audio, presents unique challenges. Particularly, the user message might often consist of multiple modalities, such that OLMs have to establish holistic understanding and reasoning across modalities to accomplish the task. Existing benchmarks are limited to single modality or dual-modality tasks, overlooking comprehensive multi-modal assessments of model reasoning. To address this, OmnixR offers two evaluation variants: (1)synthetic subset: a synthetic dataset generated automatically by translating text into multiple modalities--audio, images, video, and hybrids (Omnify). (2)realistic subset: a real-world dataset, manually curated and annotated by experts, for evaluating cross-modal reasoning in natural settings. OmnixR presents a unique evaluation towards assessing OLMs over a diverse mix of modalities, such as a question that involves video, audio, and text, providing a rigorous cross-modal reasoning testbed unlike any existing benchmarks. Our experiments find that all state-of-the-art OLMs struggle with OmnixR questions that require integrating information from multiple modalities to answer. Further analysis highlights differences in reasoning behavior, underscoring the challenges of omni-modal AI alignment.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
Micrometer: Micromechanics Transformer for Predicting Mechanical Responses of Heterogeneous Materials
Authors:
Sifan Wang,
Tong-Rui Liu,
Shyam Sankaran,
Paris Perdikaris
Abstract:
Heterogeneous materials, crucial in various engineering applications, exhibit complex multiscale behavior, which challenges the effectiveness of traditional computational methods. In this work, we introduce the Micromechanics Transformer ({\em Micrometer}), an artificial intelligence (AI) framework for predicting the mechanical response of heterogeneous materials, bridging the gap between advanced…
▽ More
Heterogeneous materials, crucial in various engineering applications, exhibit complex multiscale behavior, which challenges the effectiveness of traditional computational methods. In this work, we introduce the Micromechanics Transformer ({\em Micrometer}), an artificial intelligence (AI) framework for predicting the mechanical response of heterogeneous materials, bridging the gap between advanced data-driven methods and complex solid mechanics problems. Trained on a large-scale high-resolution dataset of 2D fiber-reinforced composites, Micrometer can achieve state-of-the-art performance in predicting microscale strain fields across a wide range of microstructures, material properties under any loading conditions and We demonstrate the accuracy and computational efficiency of Micrometer through applications in computational homogenization and multiscale modeling, where Micrometer achieves 1\% error in predicting macroscale stress fields while reducing computational time by up to two orders of magnitude compared to conventional numerical solvers. We further showcase the adaptability of the proposed model through transfer learning experiments on new materials with limited data, highlighting its potential to tackle diverse scenarios in mechanical analysis of solid materials. Our work represents a significant step towards AI-driven innovation in computational solid mechanics, addressing the limitations of traditional numerical methods and paving the way for more efficient simulations of heterogeneous materials across various industrial applications.
△ Less
Submitted 23 September, 2024;
originally announced October 2024.
-
Mitigating the Risk of Health Inequity Exacerbated by Large Language Models
Authors:
Yuelyu Ji,
Wenhe Ma,
Sonish Sivarajkumar,
Hang Zhang,
Eugene Mathew Sadhu,
Zhuochun Li,
Xizhi Wu,
Shyam Visweswaran,
Yanshan Wang
Abstract:
Recent advancements in large language models have demonstrated their potential in numerous medical applications, particularly in automating clinical trial matching for translational research and enhancing medical question answering for clinical decision support. However, our study shows that incorporating non decisive sociodemographic factors such as race, sex, income level, LGBT+ status, homeless…
▽ More
Recent advancements in large language models have demonstrated their potential in numerous medical applications, particularly in automating clinical trial matching for translational research and enhancing medical question answering for clinical decision support. However, our study shows that incorporating non decisive sociodemographic factors such as race, sex, income level, LGBT+ status, homelessness, illiteracy, disability, and unemployment into the input of LLMs can lead to incorrect and harmful outputs for these populations. These discrepancies risk exacerbating existing health disparities if LLMs are widely adopted in healthcare. To address this issue, we introduce EquityGuard, a novel framework designed to detect and mitigate the risk of health inequities in LLM based medical applications. Our evaluation demonstrates its efficacy in promoting equitable outcomes across diverse populations.
△ Less
Submitted 14 October, 2024; v1 submitted 7 October, 2024;
originally announced October 2024.
-
ZeroSCD: Zero-Shot Street Scene Change Detection
Authors:
Shyam Sundar Kannan,
Byung-Cheol Min
Abstract:
Scene Change Detection is a challenging task in computer vision and robotics that aims to identify differences between two images of the same scene captured at different times. Traditional change detection methods rely on training models that take these image pairs as input and estimate the changes, which requires large amounts of annotated data, a costly and time-consuming process. To overcome th…
▽ More
Scene Change Detection is a challenging task in computer vision and robotics that aims to identify differences between two images of the same scene captured at different times. Traditional change detection methods rely on training models that take these image pairs as input and estimate the changes, which requires large amounts of annotated data, a costly and time-consuming process. To overcome this, we propose ZeroSCD, a zero-shot scene change detection framework that eliminates the need for training. ZeroSCD leverages pre-existing models for place recognition and semantic segmentation, utilizing their features and outputs to perform change detection. In this framework, features extracted from the place recognition model are used to estimate correspondences and detect changes between the two images. These are then combined with segmentation results from the semantic segmentation model to precisely delineate the boundaries of the detected changes. Extensive experiments on benchmark datasets demonstrate that ZeroSCD outperforms several state-of-the-art methods in change detection accuracy, despite not being trained on any of the benchmark datasets, proving its effectiveness and adaptability across different scenarios.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.
-
Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation
Authors:
Satyapriya Krishna,
Kalpesh Krishna,
Anhad Mohananey,
Steven Schwarcz,
Adam Stambler,
Shyam Upadhyay,
Manaal Faruqui
Abstract:
Large Language Models (LLMs) have demonstrated significant performance improvements across various cognitive tasks. An emerging application is using LLMs to enhance retrieval-augmented generation (RAG) capabilities. These systems require LLMs to understand user queries, retrieve relevant information, and synthesize coherent and accurate responses. Given the increasing real-world deployment of such…
▽ More
Large Language Models (LLMs) have demonstrated significant performance improvements across various cognitive tasks. An emerging application is using LLMs to enhance retrieval-augmented generation (RAG) capabilities. These systems require LLMs to understand user queries, retrieve relevant information, and synthesize coherent and accurate responses. Given the increasing real-world deployment of such systems, comprehensive evaluation becomes crucial. To this end, we propose FRAMES (Factuality, Retrieval, And reasoning MEasurement Set), a high-quality evaluation dataset designed to test LLMs' ability to provide factual responses, assess retrieval capabilities, and evaluate the reasoning required to generate final answers. While previous work has provided datasets and benchmarks to evaluate these abilities in isolation, FRAMES offers a unified framework that provides a clearer picture of LLM performance in end-to-end RAG scenarios. Our dataset comprises challenging multi-hop questions that require the integration of information from multiple sources. We present baseline results demonstrating that even state-of-the-art LLMs struggle with this task, achieving 0.40 accuracy with no retrieval. The accuracy is significantly improved with our proposed multi-step retrieval pipeline, achieving an accuracy of 0.66 (>50% improvement). We hope our work will help bridge evaluation gaps and assist in developing more robust and capable RAG systems.
△ Less
Submitted 24 January, 2025; v1 submitted 19 September, 2024;
originally announced September 2024.
-
Steinmetz Neural Networks for Complex-Valued Data
Authors:
Shyam Venkatasubramanian,
Ali Pezeshki,
Vahid Tarokh
Abstract:
We introduce a new approach to processing complex-valued data using DNNs consisting of parallel real-valued subnetworks with coupled outputs. Our proposed class of architectures, referred to as Steinmetz Neural Networks, incorporates multi-view learning to construct more interpretable representations in the latent space. Moreover, we present the Analytic Neural Network, which incorporates a consis…
▽ More
We introduce a new approach to processing complex-valued data using DNNs consisting of parallel real-valued subnetworks with coupled outputs. Our proposed class of architectures, referred to as Steinmetz Neural Networks, incorporates multi-view learning to construct more interpretable representations in the latent space. Moreover, we present the Analytic Neural Network, which incorporates a consistency penalty that encourages analytic signal representations in the latent space of the Steinmetz neural network. This penalty enforces a deterministic and orthogonal relationship between the real and imaginary components. Using an information-theoretic construction, we demonstrate that the generalization gap upper bound posited by the analytic neural network is lower than that of the general class of Steinmetz neural networks. Our numerical experiments depict the improved performance and robustness to additive noise, afforded by our proposed networks on benchmark datasets and synthetic examples.
△ Less
Submitted 13 February, 2025; v1 submitted 16 September, 2024;
originally announced September 2024.
-
A Non-Traditional Approach to Assisting Data Address Translation
Authors:
Shyam Murthy,
Gurindar S Sohi
Abstract:
This paper proposes a novel way to assist conventional data address translation. The approach, PC-Indexed Data Address Translation (PCAX), uses the PC of a load instruction, and not a data virtual address, to obtain the page table entry (PTE) for the data accessed by a load instruction. PCAX is intended to be used for a small subset of the static loads in a program. We observe that: (i) a small su…
▽ More
This paper proposes a novel way to assist conventional data address translation. The approach, PC-Indexed Data Address Translation (PCAX), uses the PC of a load instruction, and not a data virtual address, to obtain the page table entry (PTE) for the data accessed by a load instruction. PCAX is intended to be used for a small subset of the static loads in a program. We observe that: (i) a small subset of static loads is responsible for most of the misses in a data translation lookaside buffer (DTLB), and (ii) often a dynamic instance of a static load instruction accesses the same PTE as the last dynamic instance, and consider PCAX for this subset. With PCAX the effective miss rate of a conventional DTLB can be cut down by a factor of 2-3X in many cases, and even more in some cases. PCAX is also beneficial in reducing the number of secondary TLB (STLB) misses. Since the tables used for PCAX can be accessed alongside instruction fetch, they can be slow, yet frequently provide a valid PTE even before the data address calculation. This results in a performance improvement, and reduced data address translation energy, in most cases.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Meta-Learning in Audio and Speech Processing: An End to End Comprehensive Review
Authors:
Athul Raimon,
Shubha Masti,
Shyam K Sateesh,
Siyani Vengatagiri,
Bhaskarjyoti Das
Abstract:
This survey overviews various meta-learning approaches used in audio and speech processing scenarios. Meta-learning is used where model performance needs to be maximized with minimum annotated samples, making it suitable for low-sample audio processing. Although the field has made some significant contributions, audio meta-learning still lacks the presence of comprehensive survey papers. We presen…
▽ More
This survey overviews various meta-learning approaches used in audio and speech processing scenarios. Meta-learning is used where model performance needs to be maximized with minimum annotated samples, making it suitable for low-sample audio processing. Although the field has made some significant contributions, audio meta-learning still lacks the presence of comprehensive survey papers. We present a systematic review of meta-learning methodologies in audio processing. This includes audio-specific discussions on data augmentation, feature extraction, preprocessing techniques, meta-learners, task selection strategies and also presents important datasets in audio, together with crucial real-world use cases. Through this extensive review, we aim to provide valuable insights and identify future research directions in the intersection of meta-learning and audio processing.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Decoding Human Emotions: Analyzing Multi-Channel EEG Data using LSTM Networks
Authors:
Shyam K Sateesh,
Sparsh BK,
Uma D
Abstract:
Emotion recognition from electroencephalogram (EEG) signals is a thriving field, particularly in neuroscience and Human-Computer Interaction (HCI). This study aims to understand and improve the predictive accuracy of emotional state classification through metrics such as valence, arousal, dominance, and likeness by applying a Long Short-Term Memory (LSTM) network to analyze EEG signals. Using a po…
▽ More
Emotion recognition from electroencephalogram (EEG) signals is a thriving field, particularly in neuroscience and Human-Computer Interaction (HCI). This study aims to understand and improve the predictive accuracy of emotional state classification through metrics such as valence, arousal, dominance, and likeness by applying a Long Short-Term Memory (LSTM) network to analyze EEG signals. Using a popular dataset of multi-channel EEG recordings known as DEAP, we look towards leveraging LSTM networks' properties to handle temporal dependencies within EEG signal data. This allows for a more comprehensive understanding and classification of emotional parameter states. We obtain accuracies of 89.89%, 90.33%, 90.70%, and 90.54% for arousal, valence, dominance, and likeness, respectively, demonstrating significant improvements in emotion recognition model capabilities. This paper elucidates the methodology and architectural specifics of our LSTM model and provides a benchmark analysis with existing papers.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
LADDER: Language-Driven Slice Discovery and Error Rectification in Vision Classifiers
Authors:
Shantanu Ghosh,
Rayan Syed,
Chenyu Wang,
Vaibhav Choudhary,
Binxu Li,
Clare B. Poynton,
Shyam Visweswaran,
Kayhan Batmanghelich
Abstract:
Error slice discovery is crucial to diagnose and mitigate model errors. Current clustering or discrete attribute-based slice discovery methods face key limitations: 1) clustering results in incoherent slices, while assigning discrete attributes to slices leads to incomplete coverage of error patterns due to missing or insufficient attributes; 2) these methods lack complex reasoning, preventing the…
▽ More
Error slice discovery is crucial to diagnose and mitigate model errors. Current clustering or discrete attribute-based slice discovery methods face key limitations: 1) clustering results in incoherent slices, while assigning discrete attributes to slices leads to incomplete coverage of error patterns due to missing or insufficient attributes; 2) these methods lack complex reasoning, preventing them from fully explaining model biases; 3) they fail to integrate \textit{domain knowledge}, limiting their usage in specialized fields \eg radiology. We propose\ladder (\underline{La}nguage-\underline{D}riven \underline{D}iscovery and \underline{E}rror \underline{R}ectification), to address the limitations by: (1) leveraging the flexibility of natural language to address incompleteness, (2) employing LLM's latent \textit{domain knowledge} and advanced reasoning to analyze sentences and derive testable hypotheses directly, identifying biased attributes, and form coherent error slices without clustering. Existing mitigation methods typically address only the worst-performing group, often amplifying errors in other subgroups. In contrast,\ladder generates pseudo attributes from the discovered hypotheses to mitigate errors across all biases without explicit attribute annotations or prior knowledge of bias. Rigorous evaluations on 6 datasets spanning natural and medical images -- comparing 200+ classifiers with diverse architectures, pretraining strategies, and LLMs -- show that\ladder consistently outperforms existing baselines in discovering and mitigating biases.
△ Less
Submitted 29 May, 2025; v1 submitted 31 July, 2024;
originally announced August 2024.
-
Parameterized Verification of Timed Networks with Clock Invariants
Authors:
Étienne André,
Swen Jacobs,
Shyam Lal Karra,
Ocan Sankur
Abstract:
We consider parameterized verification problems for networks of timed automata (TAs) that communicate via disjunctive guards or lossy broadcast. To this end, we first consider disjunctive timed networks (DTNs), i.e., networks of TAs that communicate via location guards that enable a transition only if there is another process in a certain location. We solve for the first time the general case with…
▽ More
We consider parameterized verification problems for networks of timed automata (TAs) that communicate via disjunctive guards or lossy broadcast. To this end, we first consider disjunctive timed networks (DTNs), i.e., networks of TAs that communicate via location guards that enable a transition only if there is another process in a certain location. We solve for the first time the general case with clock invariants, and establish the decidability of the parameterized verification problem for local trace properties and for reachability of global configurations; Moreover, we prove that, surprisingly and unlike in other settings, this model is equivalent to lossy broadcast networks.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
Authors:
Vasudev Shyam,
Jonathan Pilault,
Emily Shepperd,
Quentin Anthony,
Beren Millidge
Abstract:
Our formulation reveals that the reduction across the sequence axis can be efficiently computed in parallel through a tree reduction. Our algorithm, called Tree Attention, for parallelizing exact attention computation across multiple GPUs enables cross-device decoding to be performed asymptotically faster (up to 8x faster in our experiments) than state-of-the-art approaches such as Ring Attention,…
▽ More
Our formulation reveals that the reduction across the sequence axis can be efficiently computed in parallel through a tree reduction. Our algorithm, called Tree Attention, for parallelizing exact attention computation across multiple GPUs enables cross-device decoding to be performed asymptotically faster (up to 8x faster in our experiments) than state-of-the-art approaches such as Ring Attention, while also requiring significantly less communication volume and incurring 2x less peak memory. We demonstrate that Tree Attention speeds up decoding up to 4x on Llama 3.1-8B and can be applied to a variety of hardware and networking setups such as H100 DGX nodes, AMD MI300x nodes, and PCIe connected NVIDIA RTX 4090s. Our code is publicly available here: https://github.com/Zyphra/tree_attention
△ Less
Submitted 9 February, 2025; v1 submitted 7 August, 2024;
originally announced August 2024.
-
Accelerating Full Waveform Inversion By Transfer Learning
Authors:
Divya Shyam Singh,
Leon Herrmann,
Qing Sun,
Tim Bürchner,
Felix Dietrich,
Stefan Kollmannsberger
Abstract:
Full waveform inversion (FWI) is a powerful tool for reconstructing material fields based on sparsely measured data obtained by wave propagation. For specific problems, discretizing the material field with a neural network (NN) improves the robustness and reconstruction quality of the corresponding optimization problem. We call this method NN-based FWI. Starting from an initial guess, the weights…
▽ More
Full waveform inversion (FWI) is a powerful tool for reconstructing material fields based on sparsely measured data obtained by wave propagation. For specific problems, discretizing the material field with a neural network (NN) improves the robustness and reconstruction quality of the corresponding optimization problem. We call this method NN-based FWI. Starting from an initial guess, the weights of the NN are iteratively updated to fit the simulated wave signals to the sparsely measured data set. For gradient-based optimization, a suitable choice of the initial guess, i.e., a suitable NN weight initialization, is crucial for fast and robust convergence.
In this paper, we introduce a novel transfer learning approach to further improve NN-based FWI. This approach leverages supervised pretraining to provide a better NN weight initialization, leading to faster convergence of the subsequent optimization problem. Moreover, the inversions yield physically more meaningful local minima. The network is pretrained to predict the unknown material field using the gradient information from the first iteration of conventional FWI. In our computational experiments on two-dimensional domains, the training data set consists of reference simulations with arbitrarily positioned elliptical voids of different shapes and orientations. We compare the performance of the proposed transfer learning NN-based FWI with three other methods: conventional FWI, NN-based FWI without pretraining and conventional FWI with an initial guess predicted from the pretrained NN. Our results show that transfer learning NN-based FWI outperforms the other methods in terms of convergence speed and reconstruction quality.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
MLtoGAI: Semantic Web based with Machine Learning for Enhanced Disease Prediction and Personalized Recommendations using Generative AI
Authors:
Shyam Dongre,
Ritesh Chandra,
Sonali Agarwal
Abstract:
In modern healthcare, addressing the complexities of accurate disease prediction and personalized recommendations is both crucial and challenging. This research introduces MLtoGAI, which integrates Semantic Web technology with Machine Learning (ML) to enhance disease prediction and offer user-friendly explanations through ChatGPT. The system comprises three key components: a reusable disease ontol…
▽ More
In modern healthcare, addressing the complexities of accurate disease prediction and personalized recommendations is both crucial and challenging. This research introduces MLtoGAI, which integrates Semantic Web technology with Machine Learning (ML) to enhance disease prediction and offer user-friendly explanations through ChatGPT. The system comprises three key components: a reusable disease ontology that incorporates detailed knowledge about various diseases, a diagnostic classification model that uses patient symptoms to detect specific diseases accurately, and the integration of Semantic Web Rule Language (SWRL) with ontology and ChatGPT to generate clear, personalized health advice. This approach significantly improves prediction accuracy and ensures results that are easy to understand, addressing the complexity of diseases and diverse symptoms. The MLtoGAI system demonstrates substantial advancements in accuracy and user satisfaction, contributing to developing more intelligent and accessible healthcare solutions. This innovative approach combines the strengths of ML algorithms with the ability to provide transparent, human-understandable explanations through ChatGPT, achieving significant improvements in prediction accuracy and user comprehension. By leveraging semantic technology and explainable AI, the system enhances the accuracy of disease prediction and ensures that the recommendations are relevant and easily understood by individual patients. Our research highlights the potential of integrating advanced technologies to overcome existing challenges in medical diagnostics, paving the way for future developments in intelligent healthcare systems. Additionally, the system is validated using 200 synthetic patient data records, ensuring robust performance and reliability.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
Mapping the individual, social, and biospheric impacts of Foundation Models
Authors:
Andrés Domínguez Hernández,
Shyam Krishna,
Antonella Maia Perini,
Michael Katell,
SJ Bennett,
Ann Borda,
Youmna Hashem,
Semeli Hadjiloizou,
Sabeehah Mahomed,
Smera Jayadeva,
Mhairi Aitken,
David Leslie
Abstract:
Responding to the rapid roll-out and large-scale commercialization of foundation models, large language models, and generative AI, an emerging body of work is shedding light on the myriad impacts these technologies are having across society. Such research is expansive, ranging from the production of discriminatory, fake and toxic outputs, and privacy and copyright violations, to the unjust extract…
▽ More
Responding to the rapid roll-out and large-scale commercialization of foundation models, large language models, and generative AI, an emerging body of work is shedding light on the myriad impacts these technologies are having across society. Such research is expansive, ranging from the production of discriminatory, fake and toxic outputs, and privacy and copyright violations, to the unjust extraction of labor and natural resources. The same has not been the case in some of the most prominent AI governance initiatives in the global north like the UK's AI Safety Summit and the G7's Hiroshima process, which have influenced much of the international dialogue around AI governance. Despite the wealth of cautionary tales and evidence of algorithmic harm, there has been an ongoing over-emphasis within the AI governance discourse on technical matters of safety and global catastrophic or existential risks. This narrowed focus has tended to draw attention away from very pressing social and ethical challenges posed by the current brute-force industrialization of AI applications. To address such a visibility gap between real-world consequences and speculative risks, this paper offers a critical framework to account for the social, political, and environmental dimensions of foundation models and generative AI. We identify 14 categories of risks and harms and map them according to their individual, social, and biospheric impacts. We argue that this novel typology offers an integrative perspective to address the most urgent negative impacts of foundation models and their downstream applications. We conclude with recommendations on how this typology could be used to inform technical and normative interventions to advance responsible AI.
△ Less
Submitted 24 July, 2024;
originally announced July 2024.
-
Making New Connections: LLMs as Puzzle Generators for The New York Times' Connections Word Game
Authors:
Tim Merino,
Sam Earle,
Ryan Sudhakaran,
Shyam Sudhakaran,
Julian Togelius
Abstract:
The Connections puzzle is a word association game published daily by The New York Times (NYT). In this game, players are asked to find groups of four words that are connected by a common theme. While solving a given Connections puzzle requires both semantic knowledge and abstract reasoning, generating novel puzzles additionally requires a form of metacognition: generators must be able to accuratel…
▽ More
The Connections puzzle is a word association game published daily by The New York Times (NYT). In this game, players are asked to find groups of four words that are connected by a common theme. While solving a given Connections puzzle requires both semantic knowledge and abstract reasoning, generating novel puzzles additionally requires a form of metacognition: generators must be able to accurately model the downstream reasoning of potential solvers. In this paper, we investigate the ability of the GPT family of Large Language Models (LLMs) to generate challenging and creative word games for human players. We start with an analysis of the word game Connections and the unique challenges it poses as a Procedural Content Generation (PCG) domain. We then propose a method for generating Connections puzzles using LLMs by adapting a Tree of Thoughts (ToT) prompting approach. We evaluate this method by conducting a user study, asking human players to compare AI-generated puzzles against published Connections puzzles. Our findings show that LLMs are capable puzzle creators, and can generate diverse sets of enjoyable, challenging, and creative Connections puzzles as judged by human users.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.