-
Deblurring in the Wild: A Real-World Dataset from Smartphone High-Speed Videos
Authors:
Mahdi Mohd Hossain Noki,
Syed Mumtahin Mahmud,
Prothito Shovon Majumder,
Abdul Mohaimen Al Radi,
Md. Haider Ali,
Md. Mosaddek Khan
Abstract:
We introduce the largest real-world image deblurring dataset constructed from smartphone slow-motion videos. Using 240 frames captured over one second, we simulate realistic long-exposure blur by averaging frames to produce blurry images, while using the temporally centered frame as the sharp reference. Our dataset contains over 42,000 high-resolution blur-sharp image pairs, making it approximatel…
▽ More
We introduce the largest real-world image deblurring dataset constructed from smartphone slow-motion videos. Using 240 frames captured over one second, we simulate realistic long-exposure blur by averaging frames to produce blurry images, while using the temporally centered frame as the sharp reference. Our dataset contains over 42,000 high-resolution blur-sharp image pairs, making it approximately 10 times larger than widely used datasets, with 8 times the amount of different scenes, including indoor and outdoor environments, with varying object and camera motions. We benchmark multiple state-of-the-art (SOTA) deblurring models on our dataset and observe significant performance degradation, highlighting the complexity and diversity of our benchmark. Our dataset serves as a challenging new benchmark to facilitate robust and generalizable deblurring models.
△ Less
Submitted 30 June, 2025; v1 submitted 24 June, 2025;
originally announced June 2025.
-
Human-Aligned Faithfulness in Toxicity Explanations of LLMs
Authors:
Ramaravind K. Mothilal,
Joanna Roy,
Syed Ishtiaque Ahmed,
Shion Guha
Abstract:
The discourse around toxicity and LLMs in NLP largely revolves around detection tasks. This work shifts the focus to evaluating LLMs' reasoning about toxicity -- from their explanations that justify a stance -- to enhance their trustworthiness in downstream tasks. Despite extensive research on explainability, it is not straightforward to adopt existing methods to evaluate free-form toxicity explan…
▽ More
The discourse around toxicity and LLMs in NLP largely revolves around detection tasks. This work shifts the focus to evaluating LLMs' reasoning about toxicity -- from their explanations that justify a stance -- to enhance their trustworthiness in downstream tasks. Despite extensive research on explainability, it is not straightforward to adopt existing methods to evaluate free-form toxicity explanation due to their over-reliance on input text perturbations, among other challenges. To account for these, we propose a novel, theoretically-grounded multi-dimensional criterion, Human-Aligned Faithfulness (HAF), that measures the extent to which LLMs' free-form toxicity explanations align with those of a rational human under ideal conditions. We develop six metrics, based on uncertainty quantification, to comprehensively evaluate \haf of LLMs' toxicity explanations with no human involvement, and highlight how "non-ideal" the explanations are. We conduct several experiments on three Llama models (of size up to 70B) and an 8B Ministral model on five diverse toxicity datasets. Our results show that while LLMs generate plausible explanations to simple prompts, their reasoning about toxicity breaks down when prompted about the nuanced relations between the complete set of reasons, the individual reasons, and their toxicity stances, resulting in inconsistent and nonsensical responses. We open-source our code and LLM-generated explanations at https://github.com/uofthcdslab/HAF.
△ Less
Submitted 23 June, 2025;
originally announced June 2025.
-
From Tiny Machine Learning to Tiny Deep Learning: A Survey
Authors:
Shriyank Somvanshi,
Md Monzurul Islam,
Gaurab Chhetri,
Rohit Chakraborty,
Mahmuda Sultana Mimi,
Sawgat Ahmed Shuvo,
Kazi Sifatul Islam,
Syed Aaqib Javed,
Sharif Ahmed Rafat,
Anandi Dutta,
Subasish Das
Abstract:
The rapid growth of edge devices has driven the demand for deploying artificial intelligence (AI) at the edge, giving rise to Tiny Machine Learning (TinyML) and its evolving counterpart, Tiny Deep Learning (TinyDL). While TinyML initially focused on enabling simple inference tasks on microcontrollers, the emergence of TinyDL marks a paradigm shift toward deploying deep learning models on severely…
▽ More
The rapid growth of edge devices has driven the demand for deploying artificial intelligence (AI) at the edge, giving rise to Tiny Machine Learning (TinyML) and its evolving counterpart, Tiny Deep Learning (TinyDL). While TinyML initially focused on enabling simple inference tasks on microcontrollers, the emergence of TinyDL marks a paradigm shift toward deploying deep learning models on severely resource-constrained hardware. This survey presents a comprehensive overview of the transition from TinyML to TinyDL, encompassing architectural innovations, hardware platforms, model optimization techniques, and software toolchains. We analyze state-of-the-art methods in quantization, pruning, and neural architecture search (NAS), and examine hardware trends from MCUs to dedicated neural accelerators. Furthermore, we categorize software deployment frameworks, compilers, and AutoML tools enabling practical on-device learning. Applications across domains such as computer vision, audio recognition, healthcare, and industrial monitoring are reviewed to illustrate the real-world impact of TinyDL. Finally, we identify emerging directions including neuromorphic computing, federated TinyDL, edge-native foundation models, and domain-specific co-design approaches. This survey aims to serve as a foundational resource for researchers and practitioners, offering a holistic view of the ecosystem and laying the groundwork for future advancements in edge AI.
△ Less
Submitted 25 June, 2025; v1 submitted 21 June, 2025;
originally announced June 2025.
-
TranslationCorrect: A Unified Framework for Machine Translation Post-Editing with Predictive Error Assistance
Authors:
Syed Mekael Wasti,
Shou-Yi Hung,
Christopher Collins,
En-Shiun Annie Lee
Abstract:
Machine translation (MT) post-editing and research data collection often rely on inefficient, disconnected workflows. We introduce TranslationCorrect, an integrated framework designed to streamline these tasks. TranslationCorrect combines MT generation using models like NLLB, automated error prediction using models like XCOMET or LLM APIs (providing detailed reasoning), and an intuitive post-editi…
▽ More
Machine translation (MT) post-editing and research data collection often rely on inefficient, disconnected workflows. We introduce TranslationCorrect, an integrated framework designed to streamline these tasks. TranslationCorrect combines MT generation using models like NLLB, automated error prediction using models like XCOMET or LLM APIs (providing detailed reasoning), and an intuitive post-editing interface within a single environment. Built with human-computer interaction (HCI) principles in mind to minimize cognitive load, as confirmed by a user study. For translators, it enables them to correct errors and batch translate efficiently. For researchers, TranslationCorrect exports high-quality span-based annotations in the Error Span Annotation (ESA) format, using an error taxonomy inspired by Multidimensional Quality Metrics (MQM). These outputs are compatible with state-of-the-art error detection models and suitable for training MT or post-editing systems. Our user study confirms that TranslationCorrect significantly improves translation efficiency and user satisfaction over traditional annotation methods.
△ Less
Submitted 23 June, 2025;
originally announced June 2025.
-
A GenAI System for Improved FAIR Independent Biological Database Integration
Authors:
Syed N. Sakib,
Kallol Naha,
Sajratul Y. Rubaiat,
Hasan M. Jamil
Abstract:
Life sciences research increasingly requires identifying, accessing, and effectively processing data from an ever-evolving array of information sources on the Linked Open Data (LOD) network. This dynamic landscape places a significant burden on researchers, as the quality of query responses depends heavily on the selection and semantic integration of data sources --processes that are often labor-i…
▽ More
Life sciences research increasingly requires identifying, accessing, and effectively processing data from an ever-evolving array of information sources on the Linked Open Data (LOD) network. This dynamic landscape places a significant burden on researchers, as the quality of query responses depends heavily on the selection and semantic integration of data sources --processes that are often labor-intensive, error-prone, and costly. While the adoption of FAIR (Findable, Accessible, Interoperable, and Reusable) data principles has aimed to address these challenges, barriers to efficient and accurate scientific data processing persist.
In this paper, we introduce FAIRBridge, an experimental natural language-based query processing system designed to empower scientists to discover, access, and query biological databases, even when they are not FAIR-compliant. FAIRBridge harnesses the capabilities of AI to interpret query intents, map them to relevant databases described in scientific literature, and generate executable queries via intelligent resource access plans. The system also includes robust tools for mitigating low-quality query processing, ensuring high fidelity and responsiveness in the information delivered.
FAIRBridge's autonomous query processing framework enables users to explore alternative data sources, make informed choices at every step, and leverage community-driven crowd curation when needed. By providing a user-friendly, automated hypothesis-testing platform in natural English, FAIRBridge significantly enhances the integration and processing of scientific data, offering researchers a powerful new tool for advancing their inquiries.
△ Less
Submitted 22 June, 2025;
originally announced June 2025.
-
Virtual Teleportation of CSIT via Non-Signaling Assistance
Authors:
Yuhang Yao,
Syed A. Jafar
Abstract:
Non-signaling correlations, which (strictly) include quantum correlations, provide a tractable path to explore the potential impact of quantum nonlocality on the capacity of classical communication networks. Motivated by a recent discovery that certain wireless network settings benefit significantly from non-signaling (NS) correlations, various generalizations are considered. First, it is shown th…
▽ More
Non-signaling correlations, which (strictly) include quantum correlations, provide a tractable path to explore the potential impact of quantum nonlocality on the capacity of classical communication networks. Motivated by a recent discovery that certain wireless network settings benefit significantly from non-signaling (NS) correlations, various generalizations are considered. First, it is shown that for a point to point discrete memoryless channel with a non-causal channel state information at the transmitter (CSIT), the NS-assisted Shannon capacity matches the classical (without NS assistance) capacity of the channel for the setting where the state is also made available to the receiver. The key insight is summarized as 'virtual teleportation of CSIT via NS-assistance' and is supported by further results as follows. For a discrete memoryless 2-user broadcast channel (BC), the Shannon capacity region with NS-assistance available only between the transmitter and User 1, is found next. Consistent with the aforementioned key insight, the result matches the classical capacity region for the setting where the desired message of User 2 is made available in advance as side-information to User 1. The latter capacity region is known from a result of Kramer and Shamai. Next, for a semi-deterministic BC, the Shannon capacity region with full (tripartite) NS-assistance is shown to be the same as if only bipartite NS-assistance was available between the transmitter and the non-deterministic user. Bipartite NS-assistance between the transmitter and only the deterministic user, does not improve the capacity region relative to the corresponding classical setting. Finally, the analysis is extended to a K-user BC with full NS-assistance among all parties.
△ Less
Submitted 21 June, 2025;
originally announced June 2025.
-
Mapping the Evolution of Research Contributions using KnoVo
Authors:
Sajratul Y. Rubaiat,
Syed N. Sakib,
Hasan M. Jamil
Abstract:
This paper presents KnoVo (Knowledge Evolution), an intelligent framework designed for quantifying and analyzing the evolution of research novelty in the scientific literature. Moving beyond traditional citation analysis, which primarily measures impact, KnoVo determines a paper's novelty relative to both prior and subsequent work within its multilayered citation network. Given a target paper's ab…
▽ More
This paper presents KnoVo (Knowledge Evolution), an intelligent framework designed for quantifying and analyzing the evolution of research novelty in the scientific literature. Moving beyond traditional citation analysis, which primarily measures impact, KnoVo determines a paper's novelty relative to both prior and subsequent work within its multilayered citation network. Given a target paper's abstract, KnoVo utilizes Large Language Models (LLMs) to dynamically extract dimensions of comparison (e.g., methodology, application, dataset). The target paper is then compared to related publications along these same extracted dimensions. This comparative analysis, inspired by tournament selection, yields quantitative novelty scores reflecting the relative improvement, equivalence, or inferiority of the target paper in specific aspects. By aggregating these scores and visualizing their progression, for instance, through dynamic evolution graphs and comparative radar charts, KnoVo facilitates researchers not only to assess originality and identify similar work, but also to track knowledge evolution along specific research dimensions, uncover research gaps, and explore cross-disciplinary connections. We demonstrate these capabilities through a detailed analysis of 20 diverse papers from multiple scientific fields and report on the performance of various open-source LLMs within the KnoVo framework.
△ Less
Submitted 25 June, 2025; v1 submitted 20 June, 2025;
originally announced June 2025.
-
Prmpt2Adpt: Prompt-Based Zero-Shot Domain Adaptation for Resource-Constrained Environments
Authors:
Yasir Ali Farrukh,
Syed Wali,
Irfan Khan,
Nathaniel D. Bastian
Abstract:
Unsupervised Domain Adaptation (UDA) is a critical challenge in real-world vision systems, especially in resource-constrained environments like drones, where memory and computation are limited. Existing prompt-driven UDA methods typically rely on large vision-language models and require full access to source-domain data during adaptation, limiting their applicability. In this work, we propose Prmp…
▽ More
Unsupervised Domain Adaptation (UDA) is a critical challenge in real-world vision systems, especially in resource-constrained environments like drones, where memory and computation are limited. Existing prompt-driven UDA methods typically rely on large vision-language models and require full access to source-domain data during adaptation, limiting their applicability. In this work, we propose Prmpt2Adpt, a lightweight and efficient zero-shot domain adaptation framework built around a teacher-student paradigm guided by prompt-based feature alignment. At the core of our method is a distilled and fine-tuned CLIP model, used as the frozen backbone of a Faster R-CNN teacher. A small set of low-level source features is aligned to the target domain semantics-specified only through a natural language prompt-via Prompt-driven Instance Normalization (PIN). These semantically steered features are used to briefly fine-tune the detection head of the teacher model. The adapted teacher then generates high-quality pseudo-labels, which guide the on-the-fly adaptation of a compact student model. Experiments on the MDS-A dataset demonstrate that Prmpt2Adpt achieves competitive detection performance compared to state-of-the-art methods, while delivering up to 7x faster adaptation and 5x faster inference speed using few source images-making it a practical and scalable solution for real-time adaptation in low-resource domains.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Automated MRI Tumor Segmentation using hybrid U-Net with Transformer and Efficient Attention
Authors:
Syed Haider Ali,
Asrar Ahmad,
Muhammad Ali,
Asifullah Khan,
Muhammad Shahban,
Nadeem Shaukat
Abstract:
Cancer is an abnormal growth with potential to invade locally and metastasize to distant organs. Accurate auto-segmentation of the tumor and surrounding normal tissues is required for radiotherapy treatment plan optimization. Recent AI-based segmentation models are generally trained on large public datasets, which lack the heterogeneity of local patient populations. While these studies advance AI-…
▽ More
Cancer is an abnormal growth with potential to invade locally and metastasize to distant organs. Accurate auto-segmentation of the tumor and surrounding normal tissues is required for radiotherapy treatment plan optimization. Recent AI-based segmentation models are generally trained on large public datasets, which lack the heterogeneity of local patient populations. While these studies advance AI-based medical image segmentation, research on local datasets is necessary to develop and integrate AI tumor segmentation models directly into hospital software for efficient and accurate oncology treatment planning and execution. This study enhances tumor segmentation using computationally efficient hybrid UNet-Transformer models on magnetic resonance imaging (MRI) datasets acquired from a local hospital under strict privacy protection. We developed a robust data pipeline for seamless DICOM extraction and preprocessing, followed by extensive image augmentation to ensure model generalization across diverse clinical settings, resulting in a total dataset of 6080 images for training. Our novel architecture integrates UNet-based convolutional neural networks with a transformer bottleneck and complementary attention modules, including efficient attention, Squeeze-and-Excitation (SE) blocks, Convolutional Block Attention Module (CBAM), and ResNeXt blocks. To accelerate convergence and reduce computational demands, we used a maximum batch size of 8 and initialized the encoder with pretrained ImageNet weights, training the model on dual NVIDIA T4 GPUs via checkpointing to overcome Kaggle's runtime limits. Quantitative evaluation on the local MRI dataset yielded a Dice similarity coefficient of 0.764 and an Intersection over Union (IoU) of 0.736, demonstrating competitive performance despite limited data and underscoring the importance of site-specific model development for clinical deployment.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
Exploiting Music Source Separation for Automatic Lyrics Transcription with Whisper
Authors:
Jaza Syed,
Ivan Meresman Higgs,
Ondřej Cífka,
Mark Sandler
Abstract:
Automatic lyrics transcription (ALT) remains a challenging task in the field of music information retrieval, despite great advances in automatic speech recognition (ASR) brought about by transformer-based architectures in recent years. One of the major challenges in ALT is the high amplitude of interfering audio signals relative to conventional ASR due to musical accompaniment. Recent advances in…
▽ More
Automatic lyrics transcription (ALT) remains a challenging task in the field of music information retrieval, despite great advances in automatic speech recognition (ASR) brought about by transformer-based architectures in recent years. One of the major challenges in ALT is the high amplitude of interfering audio signals relative to conventional ASR due to musical accompaniment. Recent advances in music source separation have enabled automatic extraction of high-quality separated vocals, which could potentially improve ALT performance. However, the effect of source separation has not been systematically investigated in order to establish best practices for its use. This work examines the impact of source separation on ALT using Whisper, a state-of-the-art open source ASR model. We evaluate Whisper's performance on original audio, separated vocals, and vocal stems across short-form and long-form transcription tasks. For short-form, we suggest a concatenation method that results in a consistent reduction in Word Error Rate (WER). For long-form, we propose an algorithm using source separation as a vocal activity detector to derive segment boundaries, which results in a consistent reduction in WER relative to Whisper's native long-form algorithm. Our approach achieves state-of-the-art results for an open source system on the Jam-ALT long-form ALT benchmark, without any training or fine-tuning. We also publish MUSDB-ALT, the first dataset of long-form lyric transcripts following the Jam-ALT guidelines for which vocal stems are publicly available.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects
Authors:
Guohuan Xie,
Syed Ariff Syed Hesham,
Wenya Guo,
Bing Li,
Ming-Ming Cheng,
Guolei Sun,
Yun Liu
Abstract:
Video Scene Parsing (VSP) has emerged as a cornerstone in computer vision, facilitating the simultaneous segmentation, recognition, and tracking of diverse visual entities in dynamic scenes. In this survey, we present a holistic review of recent advances in VSP, covering a wide array of vision tasks, including Video Semantic Segmentation (VSS), Video Instance Segmentation (VIS), Video Panoptic Seg…
▽ More
Video Scene Parsing (VSP) has emerged as a cornerstone in computer vision, facilitating the simultaneous segmentation, recognition, and tracking of diverse visual entities in dynamic scenes. In this survey, we present a holistic review of recent advances in VSP, covering a wide array of vision tasks, including Video Semantic Segmentation (VSS), Video Instance Segmentation (VIS), Video Panoptic Segmentation (VPS), as well as Video Tracking and Segmentation (VTS), and Open-Vocabulary Video Segmentation (OVVS). We systematically analyze the evolution from traditional hand-crafted features to modern deep learning paradigms -- spanning from fully convolutional networks to the latest transformer-based architectures -- and assess their effectiveness in capturing both local and global temporal contexts. Furthermore, our review critically discusses the technical challenges, ranging from maintaining temporal consistency to handling complex scene dynamics, and offers a comprehensive comparative study of datasets and evaluation metrics that have shaped current benchmarking standards. By distilling the key contributions and shortcomings of state-of-the-art methodologies, this survey highlights emerging trends and prospective research directions that promise to further elevate the robustness and adaptability of VSP in real-world applications.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
SuperPoint-SLAM3: Augmenting ORB-SLAM3 with Deep Features, Adaptive NMS, and Learning-Based Loop Closure
Authors:
Shahram Najam Syed,
Ishir Roongta,
Kavin Ravie,
Gangadhar Nageswar
Abstract:
Visual simultaneous localization and mapping (SLAM) must remain accurate under extreme viewpoint, scale and illumination variations. The widely adopted ORB-SLAM3 falters in these regimes because it relies on hand-crafted ORB keypoints. We introduce SuperPoint-SLAM3, a drop-in upgrade that (i) replaces ORB with the self-supervised SuperPoint detector--descriptor, (ii) enforces spatially uniform key…
▽ More
Visual simultaneous localization and mapping (SLAM) must remain accurate under extreme viewpoint, scale and illumination variations. The widely adopted ORB-SLAM3 falters in these regimes because it relies on hand-crafted ORB keypoints. We introduce SuperPoint-SLAM3, a drop-in upgrade that (i) replaces ORB with the self-supervised SuperPoint detector--descriptor, (ii) enforces spatially uniform keypoints via adaptive non-maximal suppression (ANMS), and (iii) integrates a lightweight NetVLAD place-recognition head for learning-based loop closure.
On the KITTI Odometry benchmark SuperPoint-SLAM3 reduces mean translational error from 4.15% to 0.34% and mean rotational error from 0.0027 deg/m to 0.0010 deg/m. On the EuRoC MAV dataset it roughly halves both errors across every sequence (e.g., V2\_03: 1.58% -> 0.79%). These gains confirm that fusing modern deep features with a learned loop-closure module markedly improves ORB-SLAM3 accuracy while preserving its real-time operation.
Implementation, pretrained weights and reproducibility scripts are available at https://github.com/shahram95/SuperPointSLAM3.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
The Amazon Nova Family of Models: Technical Report and Model Card
Authors:
Amazon AGI,
Aaron Langford,
Aayush Shah,
Abhanshu Gupta,
Abhimanyu Bhatter,
Abhinav Goyal,
Abhinav Mathur,
Abhinav Mohanty,
Abhishek Kumar,
Abhishek Sethi,
Abi Komma,
Abner Pena,
Achin Jain,
Adam Kunysz,
Adam Opyrchal,
Adarsh Singh,
Aditya Rawal,
Adok Achar Budihal Prasad,
Adrià de Gispert,
Agnika Kumar,
Aishwarya Aryamane,
Ajay Nair,
Akilan M,
Akshaya Iyengar,
Akshaya Vishnu Kudlu Shanbhogue
, et al. (761 additional authors not shown)
Abstract:
We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents…
▽ More
We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text. Amazon Nova Micro is a text-only model that delivers our lowest-latency responses at very low cost. Amazon Nova Canvas is an image generation model that creates professional grade images with rich customization controls. Amazon Nova Reel is a video generation model offering high-quality outputs, customization, and motion control. Our models were built responsibly and with a commitment to customer trust, security, and reliability. We report benchmarking results for core capabilities, agentic performance, long context, functional adaptation, runtime performance, and human evaluation.
△ Less
Submitted 17 March, 2025;
originally announced June 2025.
-
From Proxies to Fields: Spatiotemporal Reconstruction of Global Radiation from Sparse Sensor Sequences
Authors:
Kazuma Kobayashi,
Samrendra Roy,
Seid Koric,
Diab Abueidda,
Syed Bahauddin Alam
Abstract:
Accurate reconstruction of latent environmental fields from sparse and indirect observations is a foundational challenge across scientific domains-from atmospheric science and geophysics to public health and aerospace safety. Traditional approaches rely on physics-based simulators or dense sensor networks, both constrained by high computational cost, latency, or limited spatial coverage. We presen…
▽ More
Accurate reconstruction of latent environmental fields from sparse and indirect observations is a foundational challenge across scientific domains-from atmospheric science and geophysics to public health and aerospace safety. Traditional approaches rely on physics-based simulators or dense sensor networks, both constrained by high computational cost, latency, or limited spatial coverage. We present the Temporal Radiation Operator Network (TRON), a spatiotemporal neural operator architecture designed to infer continuous global scalar fields from sequences of sparse, non-uniform proxy measurements.
Unlike recent forecasting models that operate on dense, gridded inputs to predict future states, TRON addresses a more ill-posed inverse problem: reconstructing the current global field from sparse, temporally evolving sensor sequences, without access to future observations or dense labels. Demonstrated on global cosmic radiation dose reconstruction, TRON is trained on 22 years of simulation data and generalizes across 65,341 spatial locations, 8,400 days, and sequence lengths from 7 to 90 days. It achieves sub-second inference with relative L2 errors below 0.1%, representing a >58,000X speedup over Monte Carlo-based estimators. Though evaluated in the context of cosmic radiation, TRON offers a domain-agnostic framework for scientific field reconstruction from sparse data, with applications in atmospheric modeling, geophysical hazard monitoring, and real-time environmental risk forecasting.
△ Less
Submitted 24 May, 2025;
originally announced June 2025.
-
An Explainable AI Framework for Dynamic Resource Management in Vehicular Network Slicing
Authors:
Haochen Sun,
Yifan Liu,
Ahmed Al-Tahmeesschi,
Swarna Chetty,
Syed Ali Raza Zaidi,
Avishek Nag,
Hamed Ahmadi
Abstract:
Effective resource management and network slicing are essential to meet the diverse service demands of vehicular networks, including Enhanced Mobile Broadband (eMBB) and Ultra-Reliable and Low-Latency Communications (URLLC). This paper introduces an Explainable Deep Reinforcement Learning (XRL) framework for dynamic network slicing and resource allocation in vehicular networks, built upon a near-r…
▽ More
Effective resource management and network slicing are essential to meet the diverse service demands of vehicular networks, including Enhanced Mobile Broadband (eMBB) and Ultra-Reliable and Low-Latency Communications (URLLC). This paper introduces an Explainable Deep Reinforcement Learning (XRL) framework for dynamic network slicing and resource allocation in vehicular networks, built upon a near-real-time RAN intelligent controller. By integrating a feature-based approach that leverages Shapley values and an attention mechanism, we interpret and refine the decisions of our reinforcementlearning agents, addressing key reliability challenges in vehicular communication systems. Simulation results demonstrate that our approach provides clear, real-time insights into the resource allocation process and achieves higher interpretability precision than a pure attention mechanism. Furthermore, the Quality of Service (QoS) satisfaction for URLLC services increased from 78.0% to 80.13%, while that for eMBB services improved from 71.44% to 73.21%.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
Security Degradation in Iterative AI Code Generation -- A Systematic Analysis of the Paradox
Authors:
Shivani Shukla,
Himanshu Joshi,
Romilla Syed
Abstract:
The rapid adoption of Large Language Models(LLMs) for code generation has transformed software development, yet little attention has been given to how security vulnerabilities evolve through iterative LLM feedback. This paper analyzes security degradation in AI-generated code through a controlled experiment with 400 code samples across 40 rounds of "improvements" using four distinct prompting stra…
▽ More
The rapid adoption of Large Language Models(LLMs) for code generation has transformed software development, yet little attention has been given to how security vulnerabilities evolve through iterative LLM feedback. This paper analyzes security degradation in AI-generated code through a controlled experiment with 400 code samples across 40 rounds of "improvements" using four distinct prompting strategies. Our findings show a 37.6% increase in critical vulnerabilities after just five iterations, with distinct vulnerability patterns emerging across different prompting approaches. This evidence challenges the assumption that iterative LLM refinement improves code security and highlights the essential role of human expertise in the loop. We propose practical guidelines for developers to mitigate these risks, emphasizing the need for robust human validation between LLM iterations to prevent the paradoxical introduction of new security issues during supposedly beneficial code "improvements".
△ Less
Submitted 19 May, 2025;
originally announced June 2025.
-
Advanced fraud detection using machine learning models: enhancing financial transaction security
Authors:
Nudrat Fariha,
Md Nazmuddin Moin Khan,
Md Iqbal Hossain,
Syed Ali Reza,
Joy Chakra Bortty,
Kazi Sharmin Sultana,
Md Shadidur Islam Jawad,
Saniah Safat,
Md Abdul Ahad,
Maksuda Begum
Abstract:
The rise of digital payments has accelerated the need for intelligent and scalable systems to detect fraud. This research presents an end-to-end, feature-rich machine learning framework for detecting credit card transaction anomalies and fraud using real-world data. The study begins by merging transactional, cardholder, merchant, and merchant category datasets from a relational database to create…
▽ More
The rise of digital payments has accelerated the need for intelligent and scalable systems to detect fraud. This research presents an end-to-end, feature-rich machine learning framework for detecting credit card transaction anomalies and fraud using real-world data. The study begins by merging transactional, cardholder, merchant, and merchant category datasets from a relational database to create a unified analytical view. Through the feature engineering process, we extract behavioural signals such as average spending, deviation from historical patterns, transaction timing irregularities, and category frequency metrics. These features are enriched with temporal markers such as hour, day of week, and weekend indicators to expose all latent patterns that indicate fraudulent behaviours. Exploratory data analysis reveals contextual transaction trends across all the dataset features. Using the transactional data, we train and evaluate a range of unsupervised models: Isolation Forest, One Class SVM, and a deep autoencoder trained to reconstruct normal behavior. These models flag the top 1% of reconstruction errors as outliers. PCA visualizations illustrate each models ability to separate anomalies into a two-dimensional latent space. We further segment the transaction landscape using K-Means clustering and DBSCAN to identify dense clusters of normal activity and isolate sparse, suspicious regions.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
CoLMbo: Speaker Language Model for Descriptive Profiling
Authors:
Massa Baali,
Shuo Han,
Syed Abdul Hannan,
Purusottam Samal,
Karanveer Singh,
Soham Deshmukh,
Rita Singh,
Bhiksha Raj
Abstract:
Speaker recognition systems are often limited to classification tasks and struggle to generate detailed speaker characteristics or provide context-rich descriptions. These models primarily extract embeddings for speaker identification but fail to capture demographic attributes such as dialect, gender, and age in a structured manner. This paper introduces CoLMbo, a Speaker Language Model (SLM) that…
▽ More
Speaker recognition systems are often limited to classification tasks and struggle to generate detailed speaker characteristics or provide context-rich descriptions. These models primarily extract embeddings for speaker identification but fail to capture demographic attributes such as dialect, gender, and age in a structured manner. This paper introduces CoLMbo, a Speaker Language Model (SLM) that addresses these limitations by integrating a speaker encoder with prompt-based conditioning. This allows for the creation of detailed captions based on speaker embeddings. CoLMbo utilizes user-defined prompts to adapt dynamically to new speaker characteristics and provides customized descriptions, including regional dialect variations and age-related traits. This innovative approach not only enhances traditional speaker profiling but also excels in zero-shot scenarios across diverse datasets, marking a significant advancement in the field of speaker recognition.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
"I Wrote, I Paused, I Rewrote" Teaching LLMs to Read Between the Lines of Student Writing
Authors:
Samra Zafar,
Shaheer Minhas,
Syed Ali Hassan Zaidi,
Arfa Naeem,
Zahra Ali
Abstract:
Large language models(LLMs) like Gemini are becoming common tools for supporting student writing. But most of their feedback is based only on the final essay missing important context about how that text was written. In this paper, we explore whether using writing process data, collected through keystroke logging and periodic snapshots, can help LLMs give feedback that better reflects how learners…
▽ More
Large language models(LLMs) like Gemini are becoming common tools for supporting student writing. But most of their feedback is based only on the final essay missing important context about how that text was written. In this paper, we explore whether using writing process data, collected through keystroke logging and periodic snapshots, can help LLMs give feedback that better reflects how learners think and revise while writing. We built a digital writing tool that captures both what students type and how their essays evolve over time. Twenty students used this tool to write timed essays, which were then evaluated in two ways: (i) LLM generated feedback using both the final essay and the full writing trace, and (ii) After the task, students completed surveys about how useful and relatable they found the feedback. Early results show that learners preferred the process-aware LLM feedback, finding it more in tune with their own thinking. We also found that certain types of edits, like adding new content or reorganizing paragraphs, aligned closely with higher scores in areas like coherence and elaboration. Our findings suggest that making LLMs more aware of the writing process can lead to feedback that feels more meaningful, personal, and supportive.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
IntenTest: Stress Testing for Intent Integrity in API-Calling LLM Agents
Authors:
Shiwei Feng,
Xiangzhe Xu,
Xuan Chen,
Kaiyuan Zhang,
Syed Yusuf Ahmed,
Zian Su,
Mingwei Zheng,
Xiangyu Zhang
Abstract:
LLM agents are increasingly deployed to automate real-world tasks by invoking APIs through natural language instructions. While powerful, they often suffer from misinterpretation of user intent, leading to the agent's actions that diverge from the user's intended goal, especially as external toolkits evolve. Traditional software testing assumes structured inputs and thus falls short in handling th…
▽ More
LLM agents are increasingly deployed to automate real-world tasks by invoking APIs through natural language instructions. While powerful, they often suffer from misinterpretation of user intent, leading to the agent's actions that diverge from the user's intended goal, especially as external toolkits evolve. Traditional software testing assumes structured inputs and thus falls short in handling the ambiguity of natural language. We introduce IntenTest, an API-centric stress testing framework that systematically uncovers intent integrity violations in LLM agents. Unlike prior work focused on fixed benchmarks or adversarial inputs, IntenTest generates realistic tasks based on toolkits' documentation and applies targeted mutations to expose subtle agent errors while preserving user intent. To guide testing, we propose semantic partitioning, which organizes natural language tasks into meaningful categories based on toolkit API parameters and their equivalence classes. Within each partition, seed tasks are mutated and ranked by a lightweight predictor that estimates the likelihood of triggering agent errors. To enhance efficiency, IntenTest maintains a datatype-aware strategy memory that retrieves and adapts effective mutation patterns from past cases. Experiments on 80 toolkit APIs demonstrate that IntenTest effectively uncovers intent integrity violations, significantly outperforming baselines in both error-exposing rate and query efficiency. Moreover, IntenTest generalizes well to stronger target models using smaller LLMs for test generation, and adapts to evolving APIs across domains.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
BTPD: A Multilingual Hand-curated Dataset of Bengali Transnational Political Discourse Across Online Communities
Authors:
Dipto Das,
Syed Ishtiaque Ahmed,
Shion Guha
Abstract:
Understanding political discourse in online spaces is crucial for analyzing public opinion and ideological polarization. While social computing and computational linguistics have explored such discussions in English, such research efforts are significantly limited in major yet under-resourced languages like Bengali due to the unavailability of datasets. In this paper, we present a multilingual dat…
▽ More
Understanding political discourse in online spaces is crucial for analyzing public opinion and ideological polarization. While social computing and computational linguistics have explored such discussions in English, such research efforts are significantly limited in major yet under-resourced languages like Bengali due to the unavailability of datasets. In this paper, we present a multilingual dataset of Bengali transnational political discourse (BTPD) collected from three online platforms, each representing distinct community structures and interaction dynamics. Besides describing how we hand-curated the dataset through community-informed keyword-based retrieval, this paper also provides a general overview of its topics and multilingual content.
△ Less
Submitted 7 June, 2025;
originally announced June 2025.
-
Fixing It in Post: A Comparative Study of LLM Post-Training Data Quality and Model Performance
Authors:
Aladin Djuhera,
Swanand Ravindra Kadhe,
Syed Zawad,
Farhan Ahmed,
Heiko Ludwig,
Holger Boche
Abstract:
Recent work on large language models (LLMs) has increasingly focused on post-training and alignment with datasets curated to enhance instruction following, world knowledge, and specialized skills. However, most post-training datasets used in leading open- and closed-source LLMs remain inaccessible to the public, with limited information about their construction process. This lack of transparency h…
▽ More
Recent work on large language models (LLMs) has increasingly focused on post-training and alignment with datasets curated to enhance instruction following, world knowledge, and specialized skills. However, most post-training datasets used in leading open- and closed-source LLMs remain inaccessible to the public, with limited information about their construction process. This lack of transparency has motivated the recent development of open-source post-training corpora. While training on these open alternatives can yield performance comparable to that of leading models, systematic comparisons remain challenging due to the significant computational cost of conducting them rigorously at scale, and are therefore largely absent. As a result, it remains unclear how specific samples, task types, or curation strategies influence downstream performance when assessing data quality. In this work, we conduct the first comprehensive side-by-side analysis of two prominent open post-training datasets: Tulu-3-SFT-Mix and SmolTalk. Using the Magpie framework, we annotate each sample with detailed quality metrics, including turn structure (single-turn vs. multi-turn), task category, input quality, and response quality, and we derive statistics that reveal structural and qualitative similarities and differences between the two datasets. Based on these insights, we design a principled curation recipe that produces a new data mixture, TuluTalk, which contains 14% fewer samples than either source dataset while matching or exceeding their performance on key benchmarks. Our findings offer actionable insights for constructing more effective post-training datasets that improve model performance within practical resource limits. To support future research, we publicly release both the annotated source datasets and our curated TuluTalk mixture.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation
Authors:
Muhammad Sohail Danish,
Muhammad Akhtar Munir,
Syed Roshaan Ali Shah,
Muhammad Haris Khan,
Rao Muhammad Anwer,
Jorma Laaksonen,
Fahad Shahbaz Khan,
Salman Khan
Abstract:
Modern Earth observation (EO) increasingly leverages deep learning to harness the scale and diversity of satellite imagery across sensors and regions. While recent foundation models have demonstrated promising generalization across EO tasks, many remain limited by the scale, geographical coverage, and spectral diversity of their training data, factors critical for learning globally transferable re…
▽ More
Modern Earth observation (EO) increasingly leverages deep learning to harness the scale and diversity of satellite imagery across sensors and regions. While recent foundation models have demonstrated promising generalization across EO tasks, many remain limited by the scale, geographical coverage, and spectral diversity of their training data, factors critical for learning globally transferable representations. In this work, we introduce TerraFM, a scalable self-supervised learning model that leverages globally distributed Sentinel-1 and Sentinel-2 imagery, combined with large spatial tiles and land-cover aware sampling to enrich spatial and semantic coverage. By treating sensing modalities as natural augmentations in our self-supervised approach, we unify radar and optical inputs via modality-specific patch embeddings and adaptive cross-attention fusion. Our training strategy integrates local-global contrastive learning and introduces a dual-centering mechanism that incorporates class-frequency-aware regularization to address long-tailed distributions in land cover.TerraFM achieves strong generalization on both classification and segmentation tasks, outperforming prior models on GEO-Bench and Copernicus-Bench. Our code and pretrained models are publicly available at: https://github.com/mbzuai-oryx/TerraFM .
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
Clustering and Median Aggregation Improve Differentially Private Inference
Authors:
Kareem Amin,
Salman Avestimehr,
Sara Babakniya,
Alex Bie,
Weiwei Kong,
Natalia Ponomareva,
Umar Syed
Abstract:
Differentially private (DP) language model inference is an approach for generating private synthetic text. A sensitive input example is used to prompt an off-the-shelf large language model (LLM) to produce a similar example. Multiple examples can be aggregated together to formally satisfy the DP guarantee.
Prior work creates inference batches by sampling sensitive inputs uniformly at random. We…
▽ More
Differentially private (DP) language model inference is an approach for generating private synthetic text. A sensitive input example is used to prompt an off-the-shelf large language model (LLM) to produce a similar example. Multiple examples can be aggregated together to formally satisfy the DP guarantee.
Prior work creates inference batches by sampling sensitive inputs uniformly at random. We show that uniform sampling degrades the quality of privately generated text, especially when the sensitive examples concern heterogeneous topics.
We remedy this problem by clustering the input data before selecting inference batches. Next, we observe that clustering also leads to more similar next-token predictions across inferences. We use this insight to introduce a new algorithm that aggregates next token statistics by privately computing medians instead of averages. This approach leverages the fact that the median has decreased local sensitivity when next token predictions are similar, allowing us to state a data-dependent and ex-post DP guarantee about the privacy properties of this algorithm. Finally, we demonstrate improvements in terms of representativeness metrics (e.g., MAUVE) as well as downstream task performance. We show that our method produces high-quality synthetic data at significantly lower privacy cost than a previous state-of-the-art method.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
A Comprehensive Survey on Bio-Inspired Algorithms: Taxonomy, Applications, and Future Directions
Authors:
Shriyank Somvanshi,
Md Monzurul Islam,
Syed Aaqib Javed,
Gaurab Chhetri,
Kazi Sifatul Islam,
Tausif Islam Chowdhury,
Sazzad Bin Bashar Polock,
Anandi Dutta,
Subasish Das
Abstract:
Bio-inspired algorithms (BIAs) utilize natural processes such as evolution, swarm behavior, foraging, and plant growth to solve complex, nonlinear, high-dimensional optimization problems. This survey categorizes BIAs into eight groups: evolutionary, swarm intelligence, physics-inspired, ecosystem and plant-based, predator-prey, neural-inspired, human-inspired, and hybrid approaches, and reviews th…
▽ More
Bio-inspired algorithms (BIAs) utilize natural processes such as evolution, swarm behavior, foraging, and plant growth to solve complex, nonlinear, high-dimensional optimization problems. This survey categorizes BIAs into eight groups: evolutionary, swarm intelligence, physics-inspired, ecosystem and plant-based, predator-prey, neural-inspired, human-inspired, and hybrid approaches, and reviews their core principles, strengths, and limitations. We illustrate the usage of these algorithms in machine learning, engineering design, bioinformatics, and intelligent systems, and highlight recent advances in hybridization, parameter tuning, and adaptive strategies. Finally, we identify open challenges such as scalability, convergence, reliability, and interpretability to suggest directions for future research. This work aims to serve as a foundational resource for both researchers and practitioners interested in understanding the current landscape and future directions of bio-inspired computing.
△ Less
Submitted 25 May, 2025;
originally announced June 2025.
-
Energy-Aware Workflow Execution: An Overview of Techniques for Saving Energy and Emissions in Scientific Compute Clusters
Authors:
Lauritz Thamsen,
Yehia Elkhatib,
Paul Harvey,
Syed Waqar Nabi,
Jeremy Singer,
Wim Vanderbauwhede
Abstract:
Scientific research in many fields routinely requires the analysis of large datasets, and scientists often employ workflow systems to leverage clusters of computers for their data analysis. However, due to their size and scale, these workflow applications can have a considerable environmental footprint in terms of compute resource use, energy consumption, and carbon emissions. Mitigating this is c…
▽ More
Scientific research in many fields routinely requires the analysis of large datasets, and scientists often employ workflow systems to leverage clusters of computers for their data analysis. However, due to their size and scale, these workflow applications can have a considerable environmental footprint in terms of compute resource use, energy consumption, and carbon emissions. Mitigating this is critical in light of climate change and the urgent need to reduce carbon emissions.
In this chapter, we exemplify the problem by estimating the carbon footprint of three real-world scientific workflows from different scientific domains. We then describe techniques for reducing the energy consumption and, thereby, carbon footprint of individual workflow tasks and entire workflow applications, such as using energy-efficient heterogeneous architectures, generating optimised code, scaling processor voltages and frequencies, consolidating workloads on shared cluster nodes, and scheduling workloads for optimised energy efficiency.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Evaluating Apple Intelligence's Writing Tools for Privacy Against Large Language Model-Based Inference Attacks: Insights from Early Datasets
Authors:
Mohd. Farhan Israk Soumik,
Syed Mhamudul Hasan,
Abdur R. Shahid
Abstract:
The misuse of Large Language Models (LLMs) to infer emotions from text for malicious purposes, known as emotion inference attacks, poses a significant threat to user privacy. In this paper, we investigate the potential of Apple Intelligence's writing tools, integrated across iPhone, iPad, and MacBook, to mitigate these risks through text modifications such as rewriting and tone adjustment. By deve…
▽ More
The misuse of Large Language Models (LLMs) to infer emotions from text for malicious purposes, known as emotion inference attacks, poses a significant threat to user privacy. In this paper, we investigate the potential of Apple Intelligence's writing tools, integrated across iPhone, iPad, and MacBook, to mitigate these risks through text modifications such as rewriting and tone adjustment. By developing early novel datasets specifically for this purpose, we empirically assess how different text modifications influence LLM-based detection. This capability suggests strong potential for Apple Intelligence's writing tools as privacy-preserving mechanisms. Our findings lay the groundwork for future adaptive rewriting systems capable of dynamically neutralizing sensitive emotional content to enhance user privacy. To the best of our knowledge, this research provides the first empirical analysis of Apple Intelligence's text-modification tools within a privacy-preservation context with the broader goal of developing on-device, user-centric privacy-preserving mechanisms to protect against LLMs-based advanced inference attacks on deployed systems.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
SafeCOMM: What about Safety Alignment in Fine-Tuned Telecom Large Language Models?
Authors:
Aladin Djuhera,
Swanand Ravindra Kadhe,
Farhan Ahmed,
Syed Zawad,
Holger Boche,
Walid Saad
Abstract:
Fine-tuning large language models (LLMs) for telecom tasks and datasets is a common practice to adapt general-purpose models to the telecom domain. However, little attention has been paid to how this process may compromise model safety. Recent research has shown that even benign fine-tuning can degrade the safety alignment of LLMs, causing them to respond to harmful or unethical user queries. In t…
▽ More
Fine-tuning large language models (LLMs) for telecom tasks and datasets is a common practice to adapt general-purpose models to the telecom domain. However, little attention has been paid to how this process may compromise model safety. Recent research has shown that even benign fine-tuning can degrade the safety alignment of LLMs, causing them to respond to harmful or unethical user queries. In this paper, we investigate this issue for telecom-tuned LLMs using three representative datasets featured by the GenAINet initiative. We show that safety degradation persists even for structured and seemingly harmless datasets such as 3GPP standards and tabular records, indicating that telecom-specific data is not immune to safety erosion during fine-tuning. We further extend our analysis to publicly available Telecom LLMs trained via continual pre-training, revealing that safety alignment is often severely lacking, primarily due to the omission of safety-focused instruction tuning. To address these issues in both fine-tuned and pre-trained models, we conduct extensive experiments and evaluate three safety realignment defenses (SafeInstruct, SafeLoRA, and SafeMERGE) using established red-teaming benchmarks. The results show that, across all settings, the proposed defenses can effectively restore safety after harmful degradation without compromising downstream task performance, leading to Safe teleCOMMunication (SafeCOMM) models. In a nutshell, our work serves as a diagnostic study and practical guide for safety realignment in telecom-tuned LLMs, and emphasizes the importance of safety-aware instruction and fine-tuning for real-world deployments of Telecom LLMs.
△ Less
Submitted 29 May, 2025;
originally announced June 2025.
-
Digital Forensic Investigation of the ChatGPT Windows Application
Authors:
Malithi Wanniarachchi Kankanamge,
Nick McKenna,
Santiago Carmona,
Syed Mhamudul Hasan,
Abdur R. Shahid,
Ahmed Imteaj
Abstract:
The ChatGPT Windows application offers better user interaction in the Windows operating system (OS) by enhancing productivity and streamlining the workflow of ChatGPT's utilization. However, there are potential misuses associated with this application that require rigorous forensic analysis. This study presents a holistic forensic analysis of the ChatGPT Windows application, focusing on identifyin…
▽ More
The ChatGPT Windows application offers better user interaction in the Windows operating system (OS) by enhancing productivity and streamlining the workflow of ChatGPT's utilization. However, there are potential misuses associated with this application that require rigorous forensic analysis. This study presents a holistic forensic analysis of the ChatGPT Windows application, focusing on identifying and recovering digital artifacts for investigative purposes. With the use of widely popular and openly available digital forensics tools such as Autopsy, FTK Imager, Magnet RAM Capture, Wireshark, and Hex Workshop, this research explores different methods to extract and analyze cache, chat logs, metadata, and network traffic from the application. Our key findings also demonstrate the history of the application's chat, user interactions, and system-level traces that can be recovered even after deletion, providing critical insights into the crime investigation and, thus, documenting and outlining a potential misuse report for digital forensics.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Zero-Trust Foundation Models: A New Paradigm for Secure and Collaborative Artificial Intelligence for Internet of Things
Authors:
Kai Li,
Conggai Li,
Xin Yuan,
Shenghong Li,
Sai Zou,
Syed Sohail Ahmed,
Wei Ni,
Dusit Niyato,
Abbas Jamalipour,
Falko Dressler,
Ozgur B. Akan
Abstract:
This paper focuses on Zero-Trust Foundation Models (ZTFMs), a novel paradigm that embeds zero-trust security principles into the lifecycle of foundation models (FMs) for Internet of Things (IoT) systems. By integrating core tenets, such as continuous verification, least privilege access (LPA), data confidentiality, and behavioral analytics into the design, training, and deployment of FMs, ZTFMs ca…
▽ More
This paper focuses on Zero-Trust Foundation Models (ZTFMs), a novel paradigm that embeds zero-trust security principles into the lifecycle of foundation models (FMs) for Internet of Things (IoT) systems. By integrating core tenets, such as continuous verification, least privilege access (LPA), data confidentiality, and behavioral analytics into the design, training, and deployment of FMs, ZTFMs can enable secure, privacy-preserving AI across distributed, heterogeneous, and potentially adversarial IoT environments. We present the first structured synthesis of ZTFMs, identifying their potential to transform conventional trust-based IoT architectures into resilient, self-defending ecosystems. Moreover, we propose a comprehensive technical framework, incorporating federated learning (FL), blockchain-based identity management, micro-segmentation, and trusted execution environments (TEEs) to support decentralized, verifiable intelligence at the network edge. In addition, we investigate emerging security threats unique to ZTFM-enabled systems and evaluate countermeasures, such as anomaly detection, adversarial training, and secure aggregation. Through this analysis, we highlight key open research challenges in terms of scalability, secure orchestration, interpretable threat attribution, and dynamic trust calibration. This survey lays a foundational roadmap for secure, intelligent, and trustworthy IoT infrastructures powered by FMs.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
From Connectivity to Autonomy: The Dawn of Self-Evolving Communication Systems
Authors:
Zeinab Nezami,
Syed Danial Ali Shah,
Maryam Hafeez,
Karim Djemame,
Syed Ali Raza Zaidi
Abstract:
This paper envisions 6G as a self-evolving telecom ecosystem, where AI-driven intelligence enables dynamic adaptation beyond static connectivity. We explore the key enablers of autonomous communication systems, spanning reconfigurable infrastructure, adaptive middleware, and intelligent network functions, alongside multi-agent collaboration for distributed decision-making. We explore how these met…
▽ More
This paper envisions 6G as a self-evolving telecom ecosystem, where AI-driven intelligence enables dynamic adaptation beyond static connectivity. We explore the key enablers of autonomous communication systems, spanning reconfigurable infrastructure, adaptive middleware, and intelligent network functions, alongside multi-agent collaboration for distributed decision-making. We explore how these methodologies align with emerging industrial IoT frameworks, ensuring seamless integration within digital manufacturing processes. Our findings emphasize the potential for improved real-time decision-making, optimizing efficiency, and reducing latency in networked control systems. The discussion addresses ethical challenges, research directions, and standardization efforts, concluding with a technology stack roadmap to guide future developments. By leveraging state-of-the-art 6G network management techniques, this research contributes to the next generation of intelligent automation solutions, bridging the gap between theoretical advancements and real-world industrial applications.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Comparative Analysis of the Land Use and Land Cover Changes in Different Governorates of Oman using Spatiotemporal Multi-spectral Satellite Data
Authors:
Muhammad Shafi,
Syed Mohsin Bokhari
Abstract:
Land cover and land use (LULC) changes are key applications of satellite imagery, and they have critical roles in resource management, urbanization, protection of soils and the environment, and enhancing sustainable development. The literature has heavily utilized multispectral spatiotemporal satellite data alongside advanced machine learning algorithms to monitor and predict LULC changes. This st…
▽ More
Land cover and land use (LULC) changes are key applications of satellite imagery, and they have critical roles in resource management, urbanization, protection of soils and the environment, and enhancing sustainable development. The literature has heavily utilized multispectral spatiotemporal satellite data alongside advanced machine learning algorithms to monitor and predict LULC changes. This study analyzes and compares LULC changes across various governorates (provinces) of the Sultanate of Oman from 2016 to 2021 using annual time steps. For the chosen region, multispectral spatiotemporal data were acquired from the open-source Sentinel-2 satellite dataset. Supervised machine learning algorithms were used to train and classify different land covers, such as water bodies, crops, urban, etc. The constructed model was subsequently applied within the study region, allowing for an effective comparative evaluation of LULC changes within the given timeframe.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
ToPSen: Task-Oriented Priming and Sensory Alignment for Comparing Coding Strategies Between Sighted and Blind Programmers
Authors:
Md Ehtesham-Ul-Haque,
Syed Masum Billah
Abstract:
This paper examines how the coding strategies of sighted and blind programmers differ when working with audio feedback alone. The goal is to identify challenges in mixed-ability collaboration, particularly when sighted programmers work with blind peers or teach programming to blind students. To overcome limitations of traditional blindness simulation studies, we proposed Task-Oriented Priming and…
▽ More
This paper examines how the coding strategies of sighted and blind programmers differ when working with audio feedback alone. The goal is to identify challenges in mixed-ability collaboration, particularly when sighted programmers work with blind peers or teach programming to blind students. To overcome limitations of traditional blindness simulation studies, we proposed Task-Oriented Priming and Sensory Alignment (ToPSen), a design framework that reframes sensory constraints as technical requirements rather than as a disability. Through a study of 12 blind and 12 sighted participants coding non-visually, we found that expert blind programmers maintain more accurate mental models and process more information in working memory than sighted programmers using ToPSen. Our analysis revealed that blind and sighted programmers process structural information differently, exposing gaps in current IDE designs. These insights inform our guidelines for improving the accessibility of programming tools and fostering effective mixed-ability collaboration.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
IKIWISI: An Interactive Visual Pattern Generator for Evaluating the Reliability of Vision-Language Models Without Ground Truth
Authors:
Md Touhidul Islam,
Imran Kabir,
Md Alimoor Reza,
Syed Masum Billah
Abstract:
We present IKIWISI ("I Know It When I See It"), an interactive visual pattern generator for assessing vision-language models in video object recognition when ground truth is unavailable. IKIWISI transforms model outputs into a binary heatmap where green cells indicate object presence and red cells indicate object absence. This visualization leverages humans' innate pattern recognition abilities to…
▽ More
We present IKIWISI ("I Know It When I See It"), an interactive visual pattern generator for assessing vision-language models in video object recognition when ground truth is unavailable. IKIWISI transforms model outputs into a binary heatmap where green cells indicate object presence and red cells indicate object absence. This visualization leverages humans' innate pattern recognition abilities to evaluate model reliability. IKIWISI introduces "spy objects": adversarial instances users know are absent, to discern models hallucinating on nonexistent items. The tool functions as a cognitive audit mechanism, surfacing mismatches between human and machine perception by visualizing where models diverge from human understanding.
Our study with 15 participants found that users considered IKIWISI easy to use, made assessments that correlated with objective metrics when available, and reached informed conclusions by examining only a small fraction of heatmap cells. This approach not only complements traditional evaluation methods through visual assessment of model behavior with custom object sets, but also reveals opportunities for improving alignment between human perception and machine understanding in vision-language systems.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy
Authors:
Sebastian Antony Joseph,
Syed Murtaza Husain,
Stella S. R. Offner,
Stéphanie Juneau,
Paul Torrey,
Adam S. Bolton,
Juan P. Farias,
Niall Gaffney,
Greg Durrett,
Junyi Jessy Li
Abstract:
Large Language Models (LLMs) are being explored for applications in scientific research, including their capabilities to synthesize literature, answer research questions, generate research ideas, and even conduct computational experiments. Ultimately, our goal is for these to help scientists derive novel scientific insights. In many areas of science, such insights often arise from processing and v…
▽ More
Large Language Models (LLMs) are being explored for applications in scientific research, including their capabilities to synthesize literature, answer research questions, generate research ideas, and even conduct computational experiments. Ultimately, our goal is for these to help scientists derive novel scientific insights. In many areas of science, such insights often arise from processing and visualizing data to understand its patterns. However, evaluating whether an LLM-mediated scientific workflow produces outputs conveying the correct scientific insights is challenging to evaluate and has not been addressed in past work. We introduce AstroVisBench, the first benchmark for both scientific computing and visualization in the astronomy domain. AstroVisBench judges a language model's ability to both (1) create astronomy-specific workflows to process and analyze data and (2) visualize the results of these workflows through complex plots. Our evaluation of visualizations uses a novel LLM-as-a-judge workflow, which is validated against annotation by five professional astronomers. Using AstroVisBench we present an evaluation of state-of-the-art language models, showing a significant gap in their ability to engage in astronomy research as useful assistants. This evaluation provides a strong end-to-end evaluation for AI scientists that offers a path forward for the development of visualization-based workflows, which are central to a broad range of domains from physics to biology.
△ Less
Submitted 3 June, 2025; v1 submitted 26 May, 2025;
originally announced May 2025.
-
C3R: Channel Conditioned Cell Representations for unified evaluation in microscopy imaging
Authors:
Umar Marikkar,
Syed Sameed Husain,
Muhammad Awais,
Sara Atito
Abstract:
Immunohistochemical (IHC) images reveal detailed information about structures and functions at the subcellular level. However, unlike natural images, IHC datasets pose challenges for deep learning models due to their inconsistencies in channel count and configuration, stemming from varying staining protocols across laboratories and studies. Existing approaches build channel-adaptive models, which…
▽ More
Immunohistochemical (IHC) images reveal detailed information about structures and functions at the subcellular level. However, unlike natural images, IHC datasets pose challenges for deep learning models due to their inconsistencies in channel count and configuration, stemming from varying staining protocols across laboratories and studies. Existing approaches build channel-adaptive models, which unfortunately fail to support out-of-distribution (OOD) evaluation across IHC datasets and cannot be applied in a true zero-shot setting with mismatched channel counts. To address this, we introduce a structured view of cellular image channels by grouping them into either context or concept, where we treat the context channels as a reference to the concept channels in the image. We leverage this context-concept principle to develop Channel Conditioned Cell Representations (C3R), a framework designed for unified evaluation on in-distribution (ID) and OOD datasets. C3R is a two-fold framework comprising a channel-adaptive encoder architecture and a masked knowledge distillation training strategy, both built around the context-concept principle. We find that C3R outperforms existing benchmarks on both ID and OOD tasks, while a trivial implementation of our core idea also outperforms the channel-adaptive methods reported on the CHAMMI benchmark. Our method opens a new pathway for cross-dataset generalization between IHC datasets, without requiring dataset-specific adaptation or retraining.
△ Less
Submitted 24 May, 2025;
originally announced May 2025.
-
An agentic system with reinforcement-learned subsystem improvements for parsing form-like documents
Authors:
Ayesha Amjad,
Saurav Sthapit,
Tahir Qasim Syed
Abstract:
Extracting alphanumeric data from form-like documents such as invoices, purchase orders, bills, and financial documents is often performed via vision (OCR) and learning algorithms or monolithic pipelines with limited potential for systemic improvements. We propose an agentic AI system that leverages Large Language Model (LLM) agents and a reinforcement learning (RL) driver agent to automate consis…
▽ More
Extracting alphanumeric data from form-like documents such as invoices, purchase orders, bills, and financial documents is often performed via vision (OCR) and learning algorithms or monolithic pipelines with limited potential for systemic improvements. We propose an agentic AI system that leverages Large Language Model (LLM) agents and a reinforcement learning (RL) driver agent to automate consistent, self-improving extraction under LLM inference uncertainty. Our work highlights the limitations of monolithic LLM-based extraction and introduces a modular, multi-agent framework with task-specific prompts and an RL policy of rewards and penalties to guide a meta-prompting agent to learn from past errors and improve prompt-based actor agents. This self-corrective adaptive system handles diverse documents, file formats, layouts, and LLMs, aiming to automate accurate information extraction without the need for human intervention. Results as reported on two benchmark datasets of SOIRE, and CORD, are promising for the agentic AI framework.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
Duluth at SemEval-2025 Task 7: TF-IDF with Optimized Vector Dimensions for Multilingual Fact-Checked Claim Retrieval
Authors:
Shujauddin Syed,
Ted Pedersen
Abstract:
This paper presents the Duluth approach to the SemEval-2025 Task 7 on Multilingual and Crosslingual Fact-Checked Claim Retrieval. We implemented a TF-IDF-based retrieval system with experimentation on vector dimensions and tokenization strategies. Our best-performing configuration used word-level tokenization with a vocabulary size of 15,000 features, achieving an average success@10 score of 0.78…
▽ More
This paper presents the Duluth approach to the SemEval-2025 Task 7 on Multilingual and Crosslingual Fact-Checked Claim Retrieval. We implemented a TF-IDF-based retrieval system with experimentation on vector dimensions and tokenization strategies. Our best-performing configuration used word-level tokenization with a vocabulary size of 15,000 features, achieving an average success@10 score of 0.78 on the development set and 0.69 on the test set across ten languages. Our system showed stronger performance on higher-resource languages but still lagged significantly behind the top-ranked system, which achieved 0.96 average success@10. Our findings suggest that though advanced neural architectures are increasingly dominant in multilingual retrieval tasks, properly optimized traditional methods like TF-IDF remain competitive baselines, especially in limited compute resource scenarios.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
Energy-Efficient and Reliable Data Collection in Receiver-Initiated Wake-up Radio Enabled IoT Networks
Authors:
Syed Luqman Shah,
Ziaul Haq Abbas,
Ghulam Abbas,
Nurul Huda Mahmood
Abstract:
In unmanned aerial vehicle (UAV)-assisted wake-up radio (WuR)-enabled internet of things (IoT) networks, UAVs can instantly activate the main radios (MRs) of the sensor nodes (SNs) with a wake-up call (WuC) for efficient data collection in mission-driven data collection scenarios. However, the spontaneous response of numerous SNs to the UAV's WuC can lead to significant packet loss and collisions,…
▽ More
In unmanned aerial vehicle (UAV)-assisted wake-up radio (WuR)-enabled internet of things (IoT) networks, UAVs can instantly activate the main radios (MRs) of the sensor nodes (SNs) with a wake-up call (WuC) for efficient data collection in mission-driven data collection scenarios. However, the spontaneous response of numerous SNs to the UAV's WuC can lead to significant packet loss and collisions, as WuR does not exhibit its superiority for high-traffic loads. To address this challenge, we propose an innovative receiver-initiated WuR UAV-assisted clustering (RI-WuR-UAC) medium access control (MAC) protocol to achieve low latency and high reliability in ultra-low power consumption applications. We model the proposed protocol using the $M/G/1/2$ queuing framework and derive expressions for key performance metrics, i.e., channel busyness probability, probability of successful clustering, average SN energy consumption, and average transmission delay. The RI-WuR-UAC protocol employs three distinct data flow models, tailored to different network traffic conditions, which perform three MAC mechanisms: channel assessment (CCA) clustering for light traffic loads, backoff plus CCA clustering for dense and heavy traffic, and adaptive clustering for variable traffic loads. Simulation results demonstrate that the RI-WuR-UAC protocol significantly outperforms the benchmark sub-carrier modulation clustering protocol. By varying the network load, we capture the trade-offs among the performance metrics, showcasing the superior efficiency and reliability of the RI-WuR-UAC protocol.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
Sponge Attacks on Sensing AI: Energy-Latency Vulnerabilities and Defense via Model Pruning
Authors:
Syed Mhamudul Hasan,
Hussein Zangoti,
Iraklis Anagnostopoulos,
Abdur R. Shahid
Abstract:
Recent studies have shown that sponge attacks can significantly increase the energy consumption and inference latency of deep neural networks (DNNs). However, prior work has focused primarily on computer vision and natural language processing tasks, overlooking the growing use of lightweight AI models in sensing-based applications on resource-constrained devices, such as those in Internet of Thing…
▽ More
Recent studies have shown that sponge attacks can significantly increase the energy consumption and inference latency of deep neural networks (DNNs). However, prior work has focused primarily on computer vision and natural language processing tasks, overlooking the growing use of lightweight AI models in sensing-based applications on resource-constrained devices, such as those in Internet of Things (IoT) environments. These attacks pose serious threats of energy depletion and latency degradation in systems where limited battery capacity and real-time responsiveness are critical for reliable operation. This paper makes two key contributions. First, we present the first systematic exploration of energy-latency sponge attacks targeting sensing-based AI models. Using wearable sensing-based AI as a case study, we demonstrate that sponge attacks can substantially degrade performance by increasing energy consumption, leading to faster battery drain, and by prolonging inference latency. Second, to mitigate such attacks, we investigate model pruning, a widely adopted compression technique for resource-constrained AI, as a potential defense. Our experiments show that pruning-induced sparsity significantly improves model resilience against sponge poisoning. We also quantify the trade-offs between model efficiency and attack resilience, offering insights into the security implications of model compression in sensing-based AI systems deployed in IoT environments.
△ Less
Submitted 9 May, 2025;
originally announced May 2025.
-
Exploring Convolutional Neural Networks for Rice Grain Classification: An Explainable AI Approach
Authors:
Muhammad Junaid Asif,
Hamza Khan,
Rabia Tehseen,
Syed Tahir Hussain Rizvi,
Mujtaba Asad,
Shazia Saqib,
Rana Fayyaz Ahmad
Abstract:
Rice is an essential staple food worldwide that is important in promoting international trade, economic growth, and nutrition. Asian countries such as China, India, Pakistan, Thailand, Vietnam, and Indonesia are notable for their significant contribution to the cultivation and utilization of rice. These nations are also known for cultivating different rice grains, including short and long grains.…
▽ More
Rice is an essential staple food worldwide that is important in promoting international trade, economic growth, and nutrition. Asian countries such as China, India, Pakistan, Thailand, Vietnam, and Indonesia are notable for their significant contribution to the cultivation and utilization of rice. These nations are also known for cultivating different rice grains, including short and long grains. These sizes are further classified as basmati, jasmine, kainat saila, ipsala, arborio, etc., catering to diverse culinary preferences and cultural traditions. For both local and international trade, inspecting and maintaining the quality of rice grains to satisfy customers and preserve a country's reputation is necessary. Manual quality check and classification is quite a laborious and time-consuming process. It is also highly prone to mistakes. Therefore, an automatic solution must be proposed for the effective and efficient classification of different varieties of rice grains. This research paper presents an automatic framework based on a convolutional neural network (CNN) for classifying different varieties of rice grains. We evaluated the proposed model based on performance metrics such as accuracy, recall, precision, and F1-Score. The CNN model underwent rigorous training and validation, achieving a remarkable accuracy rate and a perfect area under each class's Receiver Operating Characteristic (ROC) curve. The confusion matrix analysis confirmed the model's effectiveness in distinguishing between the different rice varieties, indicating minimal misclassifications. Additionally, the integration of explainability techniques such as LIME (Local Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive exPlanations) provided valuable insights into the model's decision-making process, revealing how specific features of the rice grains influenced classification outcomes.
△ Less
Submitted 15 May, 2025; v1 submitted 7 May, 2025;
originally announced May 2025.
-
Direct Image Classification from Fourier Ptychographic Microscopy Measurements without Reconstruction
Authors:
Navya Sonal Agarwal,
Jan Philipp Schneider,
Kanchana Vaishnavi Gandikota,
Syed Muhammad Kazim,
John Meshreki,
Ivo Ihrke,
Michael Moeller
Abstract:
The computational imaging technique of Fourier Ptychographic Microscopy (FPM) enables high-resolution imaging with a wide field of view and can serve as an extremely valuable tool, e.g. in the classification of cells in medical applications. However, reconstructing a high-resolution image from tens or even hundreds of measurements is computationally expensive, particularly for a wide field of view…
▽ More
The computational imaging technique of Fourier Ptychographic Microscopy (FPM) enables high-resolution imaging with a wide field of view and can serve as an extremely valuable tool, e.g. in the classification of cells in medical applications. However, reconstructing a high-resolution image from tens or even hundreds of measurements is computationally expensive, particularly for a wide field of view. Therefore, in this paper, we investigate the idea of classifying the image content in the FPM measurements directly without performing a reconstruction step first. We show that Convolutional Neural Networks (CNN) can extract meaningful information from measurement sequences, significantly outperforming the classification on a single band-limited image (up to 12 %) while being significantly more efficient than a reconstruction of a high-resolution image. Furthermore, we demonstrate that a learned multiplexing of several raw measurements allows maintaining the classification accuracy while reducing the amount of data (and consequently also the acquisition time) significantly.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
Advancing Remote and Continuous Cardiovascular Patient Monitoring through a Novel and Resource-efficient IoT-Driven Framework
Authors:
Sanam Nayab,
Sohail Raza Chohan,
Aqsa Jameel,
Syed Rehan Shah,
Syed Ahsan Masud Zaidi,
Aditya Nath Jha,
Kamran Siddique
Abstract:
Cardiovascular diseases are a leading cause of fatalities worldwide, often occurring suddenly with limited time for intervention. Current healthcare monitoring systems for cardiac patients rely heavily on hospitalization, which can be impractical for continuous monitoring. This paper presents a novel IoT-based solution for remote, real-time tracking of critical cardiac metrics, addressing the pres…
▽ More
Cardiovascular diseases are a leading cause of fatalities worldwide, often occurring suddenly with limited time for intervention. Current healthcare monitoring systems for cardiac patients rely heavily on hospitalization, which can be impractical for continuous monitoring. This paper presents a novel IoT-based solution for remote, real-time tracking of critical cardiac metrics, addressing the pressing need for accessible and continuous healthcare, particularly for the aging population in Pakistan. The proposed IoT kit measures essential parameters such as body temperature, heart rate (HR), blood pressure (BP), oxygen saturation (SPO2), and electrocardiography (ECG).
A key innovation of the system is its integration with a cloud-based application, enabling constant remote monitoring and incorporating an alarm mechanism to alert medical professionals for timely intervention, reducing the risk of catastrophic incidents. The system was tested in a clinical environment with 20 participants, demonstrating results closely aligned with those obtained using standard medical devices. The findings validate the system's potential for reliable remote monitoring, offering a significant step forward in proactive cardiac healthcare management. This novel approach combines IoT technology with cloud-based applications to provide a cost-effective and efficient solution for reducing unexpected fatalities among cardiac patients.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
Multi-Scale Target-Aware Representation Learning for Fundus Image Enhancement
Authors:
Haofan Wu,
Yin Huang,
Yuqing Wu,
Qiuyu Yang,
Bingfang Wang,
Li Zhang,
Muhammad Fahadullah Khan,
Ali Zia,
M. Saleh Memon,
Syed Sohail Bukhari,
Abdul Fattah Memon,
Daizong Ji,
Ya Zhang,
Ghulam Mustafa,
Yin Fang
Abstract:
High-quality fundus images provide essential anatomical information for clinical screening and ophthalmic disease diagnosis. Yet, due to hardware limitations, operational variability, and patient compliance, fundus images often suffer from low resolution and signal-to-noise ratio. Recent years have witnessed promising progress in fundus image enhancement. However, existing works usually focus on r…
▽ More
High-quality fundus images provide essential anatomical information for clinical screening and ophthalmic disease diagnosis. Yet, due to hardware limitations, operational variability, and patient compliance, fundus images often suffer from low resolution and signal-to-noise ratio. Recent years have witnessed promising progress in fundus image enhancement. However, existing works usually focus on restoring structural details or global characteristics of fundus images, lacking a unified image enhancement framework to recover comprehensive multi-scale information. Moreover, few methods pinpoint the target of image enhancement, e.g., lesions, which is crucial for medical image-based diagnosis. To address these challenges, we propose a multi-scale target-aware representation learning framework (MTRL-FIE) for efficient fundus image enhancement. Specifically, we propose a multi-scale feature encoder (MFE) that employs wavelet decomposition to embed both low-frequency structural information and high-frequency details. Next, we design a structure-preserving hierarchical decoder (SHD) to fuse multi-scale feature embeddings for real fundus image restoration. SHD integrates hierarchical fusion and group attention mechanisms to achieve adaptive feature fusion while retaining local structural smoothness. Meanwhile, a target-aware feature aggregation (TFA) module is used to enhance pathological regions and reduce artifacts. Experimental results on multiple fundus image datasets demonstrate the effectiveness and generalizability of MTRL-FIE for fundus image enhancement. Compared to state-of-the-art methods, MTRL-FIE achieves superior enhancement performance with a more lightweight architecture. Furthermore, our approach generalizes to other ophthalmic image processing tasks without supervised fine-tuning, highlighting its potential for clinical applications.
△ Less
Submitted 3 May, 2025;
originally announced May 2025.
-
Design, Integration, and Evaluation of a Dual-Arm Robotic System for High Throughput Tissue Sampling from Potato Tubers
Authors:
Divyanth L. G.,
Syed Usama Bin Sabir,
Divya Rathore,
Lav R. Khot,
Chakradhar Mattupalli,
Manoj Karkee
Abstract:
Manual tissue extraction from potato tubers for molecular pathogen detection is highly laborious. This study presents a machine-vision-guided, dual-arm coordinated inline robotic system integrating tuber grasping and tissue sampling mechanisms. Tubers are transported on a conveyor that halts when a YOLOv11-based vision system detects a tuber within the workspace of a one-prismatic-degree-of-freedo…
▽ More
Manual tissue extraction from potato tubers for molecular pathogen detection is highly laborious. This study presents a machine-vision-guided, dual-arm coordinated inline robotic system integrating tuber grasping and tissue sampling mechanisms. Tubers are transported on a conveyor that halts when a YOLOv11-based vision system detects a tuber within the workspace of a one-prismatic-degree-of-freedom (P-DoF) robotic arm. This arm, equipped with a gripping end-effector, secures and positions the tuber for sampling. The second arm, a 3-P-DoF Cartesian manipulator with a biopsy punch-based end-effector, then performs tissue extraction guided by a YOLOv10-based vision system that identifies the sampling sites on the tuber such as eyes or stolon scars. The sampling involves four stages: insertion of the punch into the tuber, punch rotation for tissue detachment, biopsy punch retraction, and deposition of the tissue core onto a collection site. The system achieved an average positional error of 1.84 mm along the tuber surface and a depth deviation of 1.79 mm from a 7.00 mm target. The success rate for core extraction and deposition was 81.5%, with an average sampling cycle of 10.4 seconds. The total cost of the system components was under $1,900, demonstrating the system's potential as a cost-effective alternative to labor-intensive manual tissue sampling. Future work will focus on optimizing for multi-site sampling from a single tuber and validation in commercial settings.
△ Less
Submitted 1 May, 2025;
originally announced May 2025.
-
Red Teaming Large Language Models for Healthcare
Authors:
Vahid Balazadeh,
Michael Cooper,
David Pellow,
Atousa Assadi,
Jennifer Bell,
Mark Coastworth,
Kaivalya Deshpande,
Jim Fackler,
Gabriel Funingana,
Spencer Gable-Cook,
Anirudh Gangadhar,
Abhishek Jaiswal,
Sumanth Kaja,
Christopher Khoury,
Amrit Krishnan,
Randy Lin,
Kaden McKeen,
Sara Naimimohasses,
Khashayar Namdar,
Aviraj Newatia,
Allan Pang,
Anshul Pattoo,
Sameer Peesapati,
Diana Prepelita,
Bogdana Rakova
, et al. (10 additional authors not shown)
Abstract:
We present the design process and findings of the pre-conference workshop at the Machine Learning for Healthcare Conference (2024) entitled Red Teaming Large Language Models for Healthcare, which took place on August 15, 2024. Conference participants, comprising a mix of computational and clinical expertise, attempted to discover vulnerabilities -- realistic clinical prompts for which a large lang…
▽ More
We present the design process and findings of the pre-conference workshop at the Machine Learning for Healthcare Conference (2024) entitled Red Teaming Large Language Models for Healthcare, which took place on August 15, 2024. Conference participants, comprising a mix of computational and clinical expertise, attempted to discover vulnerabilities -- realistic clinical prompts for which a large language model (LLM) outputs a response that could cause clinical harm. Red-teaming with clinicians enables the identification of LLM vulnerabilities that may not be recognised by LLM developers lacking clinical expertise. We report the vulnerabilities found, categorise them, and present the results of a replication study assessing the vulnerabilities across all LLMs provided.
△ Less
Submitted 1 May, 2025;
originally announced May 2025.
-
The Development of Reflective Practice on a Work-Based Software Engineering Program: A Longitudinal Study
Authors:
Matthew Barr,
Syed Waqar Nabi,
Oana Andrei
Abstract:
This study examines the development of reflective practice among students on a four-year work-based Software Engineering program. Using two established models of reflection - Boud et al.'s Model of Reflective Process and Bain et al.'s 5R Framework for Reflection - we analyse a series of reflective assignments submitted by students over four years. Our longitudinal analysis reveals clear trends in…
▽ More
This study examines the development of reflective practice among students on a four-year work-based Software Engineering program. Using two established models of reflection - Boud et al.'s Model of Reflective Process and Bain et al.'s 5R Framework for Reflection - we analyse a series of reflective assignments submitted by students over four years. Our longitudinal analysis reveals clear trends in how students' reflective abilities evolve over the course of the program. We find that more sophisticated forms of reflection, such as integration of knowledge, appropriation of skills, and reconstruction of practice, increase markedly in prevalence in later years. The complementary nature of workplace experience and university study is highlighted in students' reflections, demonstrating a key benefit of the work-based learning approach. By the final year, all students demonstrate the ability to reconstruct their experiences to inform future practice. Our findings provide insight into how reflective practice develops in Software Engineering education and suggest potential value in incorporating more structured reflection into traditional degree programs. The study also reveals instances of meta-reflection, where students reflect on the value of reflection itself, indicating a deep engagement with the reflective process. While acknowledging limitations, this work offers a unique longitudinal perspective on the development of reflective practice in work-based Software Engineering education.
△ Less
Submitted 1 May, 2025; v1 submitted 29 April, 2025;
originally announced April 2025.
-
Exploiting inter-agent coupling information for efficient reinforcement learning of cooperative LQR
Authors:
Shahbaz P Qadri Syed,
He Bai
Abstract:
Developing scalable and efficient reinforcement learning algorithms for cooperative multi-agent control has received significant attention over the past years. Existing literature has proposed inexact decompositions of local Q-functions based on empirical information structures between the agents. In this paper, we exploit inter-agent coupling information and propose a systematic approach to exact…
▽ More
Developing scalable and efficient reinforcement learning algorithms for cooperative multi-agent control has received significant attention over the past years. Existing literature has proposed inexact decompositions of local Q-functions based on empirical information structures between the agents. In this paper, we exploit inter-agent coupling information and propose a systematic approach to exactly decompose the local Q-function of each agent. We develop an approximate least square policy iteration algorithm based on the proposed decomposition and identify two architectures to learn the local Q-function for each agent. We establish that the worst-case sample complexity of the decomposition is equal to the centralized case and derive necessary and sufficient graphical conditions on the inter-agent couplings to achieve better sample efficiency. We demonstrate the improved sample efficiency and computational efficiency on numerical examples.
△ Less
Submitted 29 April, 2025;
originally announced April 2025.
-
Generative AI in Education: Student Skills and Lecturer Roles
Authors:
Stefanie Krause,
Ashish Dalvi,
Syed Khubaib Zaidi
Abstract:
Generative Artificial Intelligence (GenAI) tools such as ChatGPT are emerging as a revolutionary tool in education that brings both positive aspects and challenges for educators and students, reshaping how learning and teaching are approached. This study aims to identify and evaluate the key competencies students need to effectively engage with GenAI in education and to provide strategies for lect…
▽ More
Generative Artificial Intelligence (GenAI) tools such as ChatGPT are emerging as a revolutionary tool in education that brings both positive aspects and challenges for educators and students, reshaping how learning and teaching are approached. This study aims to identify and evaluate the key competencies students need to effectively engage with GenAI in education and to provide strategies for lecturers to integrate GenAI into teaching practices. The study applied a mixed method approach with a combination of a literature review and a quantitative survey involving 130 students from South Asia and Europe to obtain its findings. The literature review identified 14 essential student skills for GenAI engagement, with AI literacy, critical thinking, and ethical AI practices emerging as the most critical. The student survey revealed gaps in prompt engineering, bias awareness, and AI output management. In our study of lecturer strategies, we identified six key areas, with GenAI Integration and Curriculum Design being the most emphasised. Our findings highlight the importance of incorporating GenAI into education. While literature prioritized ethics and policy development, students favour hands-on, project-based learning and practical AI applications. To foster inclusive and responsible GenAI adoption, institutions should ensure equitable access to GenAI tools, establish clear academic integrity policies, and advocate for global GenAI research initiatives.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
From Inductive to Deductive: LLMs-Based Qualitative Data Analysis in Requirements Engineering
Authors:
Syed Tauhid Ullah Shah,
Mohamad Hussein,
Ann Barcomb,
Mohammad Moshirpour
Abstract:
Requirements Engineering (RE) is essential for developing complex and regulated software projects. Given the challenges in transforming stakeholder inputs into consistent software designs, Qualitative Data Analysis (QDA) provides a systematic approach to handling free-form data. However, traditional QDA methods are time-consuming and heavily reliant on manual effort. In this paper, we explore the…
▽ More
Requirements Engineering (RE) is essential for developing complex and regulated software projects. Given the challenges in transforming stakeholder inputs into consistent software designs, Qualitative Data Analysis (QDA) provides a systematic approach to handling free-form data. However, traditional QDA methods are time-consuming and heavily reliant on manual effort. In this paper, we explore the use of Large Language Models (LLMs), including GPT-4, Mistral, and LLaMA-2, to improve QDA tasks in RE. Our study evaluates LLMs' performance in inductive (zero-shot) and deductive (one-shot, few-shot) annotation tasks, revealing that GPT-4 achieves substantial agreement with human analysts in deductive settings, with Cohen's Kappa scores exceeding 0.7, while zero-shot performance remains limited. Detailed, context-rich prompts significantly improve annotation accuracy and consistency, particularly in deductive scenarios, and GPT-4 demonstrates high reliability across repeated runs. These findings highlight the potential of LLMs to support QDA in RE by reducing manual effort while maintaining annotation quality. The structured labels automatically provide traceability of requirements and can be directly utilized as classes in domain models, facilitating systematic software design.
△ Less
Submitted 27 April, 2025;
originally announced April 2025.