-
Detecting What Matters: A Novel Approach for Out-of-Distribution 3D Object Detection in Autonomous Vehicles
Authors:
Menna Taha,
Aya Ahmed,
Mohammed Karmoose,
Yasser Gadallah
Abstract:
Autonomous vehicles (AVs) use object detection models to recognize their surroundings and make driving decisions accordingly. Conventional object detection approaches classify objects into known classes, which limits the AV's ability to detect and appropriately respond to Out-of-Distribution (OOD) objects. This problem is a significant safety concern since the AV may fail to detect objects or misc…
▽ More
Autonomous vehicles (AVs) use object detection models to recognize their surroundings and make driving decisions accordingly. Conventional object detection approaches classify objects into known classes, which limits the AV's ability to detect and appropriately respond to Out-of-Distribution (OOD) objects. This problem is a significant safety concern since the AV may fail to detect objects or misclassify them, which can potentially lead to hazardous situations such as accidents. Consequently, we propose a novel object detection approach that shifts the emphasis from conventional class-based classification to object harmfulness determination. Instead of object detection by their specific class, our method identifies them as either 'harmful' or 'harmless' based on whether they pose a danger to the AV. This is done based on the object position relative to the AV and its trajectory. With this metric, our model can effectively detect previously unseen objects to enable the AV to make safer real-time decisions. Our results demonstrate that the proposed model effectively detects OOD objects, evaluates their harmfulness, and classifies them accordingly, thus enhancing the AV decision-making effectiveness in dynamic environments.
△ Less
Submitted 29 June, 2025;
originally announced June 2025.
-
FairI Tales: Evaluation of Fairness in Indian Contexts with a Focus on Bias and Stereotypes
Authors:
Janki Atul Nawale,
Mohammed Safi Ur Rahman Khan,
Janani D,
Mansi Gupta,
Danish Pruthi,
Mitesh M. Khapra
Abstract:
Existing studies on fairness are largely Western-focused, making them inadequate for culturally diverse countries such as India. To address this gap, we introduce INDIC-BIAS, a comprehensive India-centric benchmark designed to evaluate fairness of LLMs across 85 identity groups encompassing diverse castes, religions, regions, and tribes. We first consult domain experts to curate over 1,800 socio-c…
▽ More
Existing studies on fairness are largely Western-focused, making them inadequate for culturally diverse countries such as India. To address this gap, we introduce INDIC-BIAS, a comprehensive India-centric benchmark designed to evaluate fairness of LLMs across 85 identity groups encompassing diverse castes, religions, regions, and tribes. We first consult domain experts to curate over 1,800 socio-cultural topics spanning behaviors and situations, where biases and stereotypes are likely to emerge. Grounded in these topics, we generate and manually validate 20,000 real-world scenario templates to probe LLMs for fairness. We structure these templates into three evaluation tasks: plausibility, judgment, and generation. Our evaluation of 14 popular LLMs on these tasks reveals strong negative biases against marginalized identities, with models frequently reinforcing common stereotypes. Additionally, we find that models struggle to mitigate bias even when explicitly asked to rationalize their decision. Our evaluation provides evidence of both allocative and representational harms that current LLMs could cause towards Indian identities, calling for a more cautious usage in practical applications. We release INDIC-BIAS as an open-source benchmark to advance research on benchmarking and mitigating biases and stereotypes in the Indian context.
△ Less
Submitted 29 June, 2025;
originally announced June 2025.
-
Zak-OFDM: Low Complexity Joint Equalization of OFDM Carriers in Doubly-Spread Channels
Authors:
Saif Khan Mohammed,
Sandesh Rao Mattu,
Nishant Mehrotra,
Venkatesh Khammammetti,
Robert Calderbank
Abstract:
We communicate over wireless channels by first estimating and then equalizing the effective channel. In Zak-OTFS (orthogonal time frequency space) modulation the carrier waveform is a pulse in the delay-Doppler (DD) domain, formally a quasi-periodic localized function with specific periods along delay and Doppler. When the channel delay spread is less than the delay period, and the channel Doppler…
▽ More
We communicate over wireless channels by first estimating and then equalizing the effective channel. In Zak-OTFS (orthogonal time frequency space) modulation the carrier waveform is a pulse in the delay-Doppler (DD) domain, formally a quasi-periodic localized function with specific periods along delay and Doppler. When the channel delay spread is less than the delay period, and the channel Doppler spread is less than the Doppler period, the response to a single Zak-OTFS carrier provides an image of the scattering environment and can be used to predict the effective channel at all other carriers. This makes channel estimation straightforward, and there is no loss in spectral efficiency since it is possible to design data and pilot signals that are mutually unbiased. However, the naive approach to equalization has complexity ${\mathcal O}(M^3N^3)$ where $M$ and $N$ are respectively the number of delay and Doppler bins in an OTFS frame. We simplify equalization by transforming Zak-OTFS information symbols to CP-OFDM (cyclic prefix orthogonal frequency division multiplexing) modulation.
Why not simply communicate with CP-OFDM? Inter-carrier interference (ICI) in CP-OFDM makes it is very challenging to acquire the complete frequency domain (FD) channel response between subcarriers in the presence of mobility and delay spread. We avoid this difficulty by estimating the effective channel in the DD domain from which we are able to reconstruct the complete FD channel response. We take advantage of CP-OFDM to design an ${\mathcal O}(M^2N^2)$ low-complexity method of jointly equalizing all subcarriers, where $MN$ is the number of subcarriers. Our approach removes the need for traditional pilots in CP-OFDM and reduces the need to vary carrier spacing with mobility.
△ Less
Submitted 28 June, 2025;
originally announced June 2025.
-
A Detailed Factor Analysis for the Political Compass Test: Navigating Ideologies of Large Language Models
Authors:
Sadia Kamal,
Lalu Prasad Yadav Prakash,
S M Rafiuddin,
Mohammed Rakib,
Arunkumar Bagavathi,
Atriya Sen,
Sagnik Ray Choudhury
Abstract:
Political Compass Test (PCT) or similar questionnaires have been used to quantify LLM's political leanings. Building on a recent line of work that examines the validity of PCT tests, we demonstrate that variation in standard generation parameters does not significantly impact the models' PCT scores. However, external factors such as prompt variations and fine-tuning individually and in combination…
▽ More
Political Compass Test (PCT) or similar questionnaires have been used to quantify LLM's political leanings. Building on a recent line of work that examines the validity of PCT tests, we demonstrate that variation in standard generation parameters does not significantly impact the models' PCT scores. However, external factors such as prompt variations and fine-tuning individually and in combination affect the same. Finally, we demonstrate that when models are fine-tuned on text datasets with higher political content than others, the PCT scores are not differentially affected. This calls for a thorough investigation into the validity of PCT and similar tests, as well as the mechanism by which political leanings are encoded in LLMs.
△ Less
Submitted 24 June, 2025;
originally announced June 2025.
-
SDRNET: Stacked Deep Residual Network for Accurate Semantic Segmentation of Fine-Resolution Remotely Sensed Images
Authors:
Naftaly Wambugu,
Ruisheng Wang,
Bo Guo,
Tianshu Yu,
Sheng Xu,
Mohammed Elhassan
Abstract:
Land cover maps generated from semantic segmentation of high-resolution remotely sensed images have drawn mucon in the photogrammetry and remote sensing research community. Currently, massive fine-resolution remotely sensed (FRRS) images acquired by improving sensing and imaging technologies become available. However, accurate semantic segmentation of such FRRS images is greatly affected by substa…
▽ More
Land cover maps generated from semantic segmentation of high-resolution remotely sensed images have drawn mucon in the photogrammetry and remote sensing research community. Currently, massive fine-resolution remotely sensed (FRRS) images acquired by improving sensing and imaging technologies become available. However, accurate semantic segmentation of such FRRS images is greatly affected by substantial class disparities, the invisibility of key ground objects due to occlusion, and object size variation. Despite the extraordinary potential in deep convolutional neural networks (DCNNs) in image feature learning and representation, extracting sufficient features from FRRS images for accurate semantic segmentation is still challenging. These challenges demand the deep learning models to learn robust features and generate sufficient feature descriptors. Specifically, learning multi-contextual features to guarantee adequate coverage of varied object sizes from the ground scene and harnessing global-local contexts to overcome class disparities challenge even profound networks. Deeper networks significantly lose spatial details due to gradual downsampling processes resulting in poor segmentation results and coarse boundaries. This article presents a stacked deep residual network (SDRNet) for semantic segmentation from FRRS images. The proposed framework utilizes two stacked encoder-decoder networks to harness long-range semantics yet preserve spatial information and dilated residual blocks (DRB) between each encoder and decoder network to capture sufficient global dependencies thus improving segmentation performance. Our experimental results obtained using the ISPRS Vaihingen and Potsdam datasets demonstrate that the SDRNet performs effectively and competitively against current DCNNs in semantic segmentation.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Exploring the Design Space of 3D MLLMs for CT Report Generation
Authors:
Mohammed Baharoon,
Jun Ma,
Congyu Fang,
Augustin Toma,
Bo Wang
Abstract:
Multimodal Large Language Models (MLLMs) have emerged as a promising way to automate Radiology Report Generation (RRG). In this work, we systematically investigate the design space of 3D MLLMs, including visual input representation, projectors, Large Language Models (LLMs), and fine-tuning techniques for 3D CT report generation. We also introduce two knowledge-based report augmentation methods tha…
▽ More
Multimodal Large Language Models (MLLMs) have emerged as a promising way to automate Radiology Report Generation (RRG). In this work, we systematically investigate the design space of 3D MLLMs, including visual input representation, projectors, Large Language Models (LLMs), and fine-tuning techniques for 3D CT report generation. We also introduce two knowledge-based report augmentation methods that improve performance on the GREEN score by up to 10\%, achieving the 2nd place on the MICCAI 2024 AMOS-MM challenge. Our results on the 1,687 cases from the AMOS-MM dataset show that RRG is largely independent of the size of LLM under the same training protocol. We also show that larger volume size does not always improve performance if the original ViT was pre-trained on a smaller volume size. Lastly, we show that using a segmentation mask along with the CT volume improves performance. The code is publicly available at https://github.com/bowang-lab/AMOS-MM-Solution
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
G$^{2}$D: Boosting Multimodal Learning with Gradient-Guided Distillation
Authors:
Mohammed Rakib,
Arunkumar Bagavathi
Abstract:
Multimodal learning aims to leverage information from diverse data modalities to achieve more comprehensive performance. However, conventional multimodal models often suffer from modality imbalance, where one or a few modalities dominate model optimization, leading to suboptimal feature representation and underutilization of weak modalities. To address this challenge, we introduce Gradient-Guided…
▽ More
Multimodal learning aims to leverage information from diverse data modalities to achieve more comprehensive performance. However, conventional multimodal models often suffer from modality imbalance, where one or a few modalities dominate model optimization, leading to suboptimal feature representation and underutilization of weak modalities. To address this challenge, we introduce Gradient-Guided Distillation (G$^{2}$D), a knowledge distillation framework that optimizes the multimodal model with a custom-built loss function that fuses both unimodal and multimodal objectives. G$^{2}$D further incorporates a dynamic sequential modality prioritization (SMP) technique in the learning process to ensure each modality leads the learning process, avoiding the pitfall of stronger modalities overshadowing weaker ones. We validate G$^{2}$D on multiple real-world datasets and show that G$^{2}$D amplifies the significance of weak modalities while training and outperforms state-of-the-art methods in classification and regression tasks. Our code is available at https://github.com/rAIson-Lab/G2D.
△ Less
Submitted 27 June, 2025; v1 submitted 26 June, 2025;
originally announced June 2025.
-
Multicontinuum Homogenization for Poroelasticity Model
Authors:
Dmitry Ammosov,
Mohammed Al-Kobaisi,
Yalchin Efendiev
Abstract:
In this paper, we derive multicontinuum poroelasticity models using the multicontinuum homogenization method. Poroelasticity models are widely used in many areas of science and engineering to describe coupled flow and mechanics processes in porous media. However, in many applications, the properties of poroelastic media possess high contrast, presenting serious computational challenges. It is well…
▽ More
In this paper, we derive multicontinuum poroelasticity models using the multicontinuum homogenization method. Poroelasticity models are widely used in many areas of science and engineering to describe coupled flow and mechanics processes in porous media. However, in many applications, the properties of poroelastic media possess high contrast, presenting serious computational challenges. It is well known that standard homogenization approaches often fail to give an accurate solution due to the lack of macroscopic parameters. Multicontinuum approaches allow us to consider such cases by defining several average states known as continua. In the field of poroelasticity, multiple-network models arising from the multiple porous media theory are representatives of these approaches. In this work, we extend previous findings by deriving the generalized multicontinuum poroelasticity model. We apply the recently developed multicontinuum homogenization method and provide a rigorous derivation of multicontinuum equations. For this purpose, we formulate coupled constraint cell problems in oversampled regions to consider different homogenized effects. Then, we obtain a multicontinuum expansion of the fine-scale fields and derive the multicontinuum model supposing the smoothness of macroscopic variables. We present the most general version of equations and the simplified ones based on our numerical experiments. Numerical results are presented for different heterogeneous media cases and demonstrate the high accuracy of our proposed multicontinuum models.
△ Less
Submitted 25 June, 2025;
originally announced June 2025.
-
Exploring the Capabilities of the Frontier Large Language Models for Nuclear Energy Research
Authors:
Ahmed Almeldein,
Mohammed Alnaggar,
Rick Archibald,
Tom Beck,
Arpan Biswas,
Rike Bostelmann,
Wes Brewer,
Chris Bryan,
Christopher Calle,
Cihangir Celik,
Rajni Chahal,
Jong Youl Choi,
Arindam Chowdhury,
Mark Cianciosa,
Franklin Curtis,
Gregory Davidson,
Sebastian De Pascuale,
Lisa Fassino,
Ana Gainaru,
Yashika Ghai,
Luke Gibson,
Qian Gong,
Christopher Greulich,
Scott Greenwood,
Cory Hauck
, et al. (25 additional authors not shown)
Abstract:
The AI for Nuclear Energy workshop at Oak Ridge National Laboratory evaluated the potential of Large Language Models (LLMs) to accelerate fusion and fission research. Fourteen interdisciplinary teams explored diverse nuclear science challenges using ChatGPT, Gemini, Claude, and other AI models over a single day. Applications ranged from developing foundation models for fusion reactor control to au…
▽ More
The AI for Nuclear Energy workshop at Oak Ridge National Laboratory evaluated the potential of Large Language Models (LLMs) to accelerate fusion and fission research. Fourteen interdisciplinary teams explored diverse nuclear science challenges using ChatGPT, Gemini, Claude, and other AI models over a single day. Applications ranged from developing foundation models for fusion reactor control to automating Monte Carlo simulations, predicting material degradation, and designing experimental programs for advanced reactors. Teams employed structured workflows combining prompt engineering, deep research capabilities, and iterative refinement to generate hypotheses, prototype code, and research strategies. Key findings demonstrate that LLMs excel at early-stage exploration, literature synthesis, and workflow design, successfully identifying research gaps and generating plausible experimental frameworks. However, significant limitations emerged, including difficulties with novel materials designs, advanced code generation for modeling and simulation, and domain-specific details requiring expert validation. The successful outcomes resulted from expert-driven prompt engineering and treating AI as a complementary tool rather than a replacement for physics-based methods. The workshop validated AI's potential to accelerate nuclear energy research through rapid iteration and cross-disciplinary synthesis while highlighting the need for curated nuclear-specific datasets, workflow automation, and specialized model development. These results provide a roadmap for integrating AI tools into nuclear science workflows, potentially reducing development cycles for safer, more efficient nuclear energy systems while maintaining rigorous scientific standards.
△ Less
Submitted 26 June, 2025; v1 submitted 10 June, 2025;
originally announced June 2025.
-
Trustworthy Efficient Communication for Distributed Learning using LQ-SGD Algorithm
Authors:
Hongyang Li,
Lincen Bai,
Caesar Wu,
Mohammed Chadli,
Said Mammar,
Pascal Bouvry
Abstract:
We propose LQ-SGD (Low-Rank Quantized Stochastic Gradient Descent), an efficient communication gradient compression algorithm designed for distributed training. LQ-SGD further develops on the basis of PowerSGD by incorporating the low-rank approximation and log-quantization techniques, which drastically reduce the communication overhead, while still ensuring the convergence speed of training and m…
▽ More
We propose LQ-SGD (Low-Rank Quantized Stochastic Gradient Descent), an efficient communication gradient compression algorithm designed for distributed training. LQ-SGD further develops on the basis of PowerSGD by incorporating the low-rank approximation and log-quantization techniques, which drastically reduce the communication overhead, while still ensuring the convergence speed of training and model accuracy. In addition, LQ-SGD and other compression-based methods show stronger resistance to gradient inversion than traditional SGD, providing a more robust and efficient optimization path for distributed learning systems.
△ Less
Submitted 22 June, 2025;
originally announced June 2025.
-
From RAG to Agentic: Validating Islamic-Medicine Responses with LLM Agents
Authors:
Mohammad Amaan Sayeed,
Mohammed Talha Alam,
Raza Imam,
Shahab Saquib Sohail,
Amir Hussain
Abstract:
Centuries-old Islamic medical texts like Avicenna's Canon of Medicine and the Prophetic Tibb-e-Nabawi encode a wealth of preventive care, nutrition, and holistic therapies, yet remain inaccessible to many and underutilized in modern AI systems. Existing language-model benchmarks focus narrowly on factual recall or user preference, leaving a gap in validating culturally grounded medical guidance at…
▽ More
Centuries-old Islamic medical texts like Avicenna's Canon of Medicine and the Prophetic Tibb-e-Nabawi encode a wealth of preventive care, nutrition, and holistic therapies, yet remain inaccessible to many and underutilized in modern AI systems. Existing language-model benchmarks focus narrowly on factual recall or user preference, leaving a gap in validating culturally grounded medical guidance at scale. We propose a unified evaluation pipeline, Tibbe-AG, that aligns 30 carefully curated Prophetic-medicine questions with human-verified remedies and compares three LLMs (LLaMA-3, Mistral-7B, Qwen2-7B) under three configurations: direct generation, retrieval-augmented generation, and a scientific self-critique filter. Each answer is then assessed by a secondary LLM serving as an agentic judge, yielding a single 3C3H quality score. Retrieval improves factual accuracy by 13%, while the agentic prompt adds another 10% improvement through deeper mechanistic insight and safety considerations. Our results demonstrate that blending classical Islamic texts with retrieval and self-evaluation enables reliable, culturally sensitive medical question-answering.
△ Less
Submitted 22 June, 2025; v1 submitted 18 June, 2025;
originally announced June 2025.
-
ADAM-Dehaze: Adaptive Density-Aware Multi-Stage Dehazing for Improved Object Detection in Foggy Conditions
Authors:
Fatmah AlHindaassi,
Mohammed Talha Alam,
Fakhri Karray
Abstract:
Adverse weather conditions, particularly fog, pose a significant challenge to autonomous vehicles, surveillance systems, and other safety-critical applications by severely degrading visual information. We introduce ADAM-Dehaze, an adaptive, density-aware dehazing framework that jointly optimizes image restoration and object detection under varying fog intensities. A lightweight Haze Density Estima…
▽ More
Adverse weather conditions, particularly fog, pose a significant challenge to autonomous vehicles, surveillance systems, and other safety-critical applications by severely degrading visual information. We introduce ADAM-Dehaze, an adaptive, density-aware dehazing framework that jointly optimizes image restoration and object detection under varying fog intensities. A lightweight Haze Density Estimation Network (HDEN) classifies each input as light, medium, or heavy fog. Based on this score, the system dynamically routes the image through one of three CORUN branches: Light, Medium, or Complex, each tailored to its haze regime. A novel adaptive loss balances physical-model coherence and perceptual fidelity, ensuring both accurate defogging and preservation of fine details. On Cityscapes and the real-world RTTS benchmark, ADAM-Dehaze improves PSNR by up to 2.1 dB, reduces FADE by 30 percent, and increases object detection mAP by up to 13 points, while cutting inference time by 20 percent. These results highlight the importance of intensity-specific processing and seamless integration with downstream vision tasks. Code available at: https://github.com/talha-alam/ADAM-Dehaze.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
The AI Policy Module: Developing Computer Science Student Competency in AI Ethics and Policy
Authors:
James Weichert,
Daniel Dunlap,
Mohammed Farghally,
Hoda Eldardiry
Abstract:
As artificial intelligence (AI) further embeds itself into many settings across personal and professional contexts, increasing attention must be paid not only to AI ethics, but also to the governance and regulation of AI technologies through AI policy. However, the prevailing post-secondary computing curriculum is currently ill-equipped to prepare future AI practitioners to confront increasing dem…
▽ More
As artificial intelligence (AI) further embeds itself into many settings across personal and professional contexts, increasing attention must be paid not only to AI ethics, but also to the governance and regulation of AI technologies through AI policy. However, the prevailing post-secondary computing curriculum is currently ill-equipped to prepare future AI practitioners to confront increasing demands to implement abstract ethical principles and normative policy preferences into the design and development of AI systems. We believe that familiarity with the 'AI policy landscape' and the ability to translate ethical principles to practices will in the future constitute an important responsibility for even the most technically-focused AI engineers.
Toward preparing current computer science (CS) students for these new expectations, we developed an AI Policy Module to introduce discussions of AI policy into the CS curriculum. Building on a successful pilot in fall 2024, in this innovative practice full paper we present an updated and expanded version of the module, including a technical assignment on "AI regulation". We present the findings from our pilot of the AI Policy Module 2.0, evaluating student attitudes towards AI ethics and policy through pre- and post-module surveys. Following the module, students reported increased concern about the ethical impacts of AI technologies while also expressing greater confidence in their abilities to engage in discussions about AI regulation. Finally, we highlight the AI Regulation Assignment as an effective and engaging tool for exploring the limits of AI alignment and emphasizing the role of 'policy' in addressing ethical challenges.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
COSMMIC: Comment-Sensitive Multimodal Multilingual Indian Corpus for Summarization and Headline Generation
Authors:
Raghvendra Kumar,
S. A. Mohammed Salman,
Aryan Sahu,
Tridib Nandi,
Pragathi Y. P.,
Sriparna Saha,
Jose G. Moreno
Abstract:
Despite progress in comment-aware multimodal and multilingual summarization for English and Chinese, research in Indian languages remains limited. This study addresses this gap by introducing COSMMIC, a pioneering comment-sensitive multimodal, multilingual dataset featuring nine major Indian languages. COSMMIC comprises 4,959 article-image pairs and 24,484 reader comments, with ground-truth summar…
▽ More
Despite progress in comment-aware multimodal and multilingual summarization for English and Chinese, research in Indian languages remains limited. This study addresses this gap by introducing COSMMIC, a pioneering comment-sensitive multimodal, multilingual dataset featuring nine major Indian languages. COSMMIC comprises 4,959 article-image pairs and 24,484 reader comments, with ground-truth summaries available in all included languages. Our approach enhances summaries by integrating reader insights and feedback. We explore summarization and headline generation across four configurations: (1) using article text alone, (2) incorporating user comments, (3) utilizing images, and (4) combining text, comments, and images. To assess the dataset's effectiveness, we employ state-of-the-art language models such as LLama3 and GPT-4. We conduct a comprehensive study to evaluate different component combinations, including identifying supportive comments, filtering out noise using a dedicated comment classifier using IndicBERT, and extracting valuable insights from images with a multilingual CLIP-based classifier. This helps determine the most effective configurations for natural language generation (NLG) tasks. Unlike many existing datasets that are either text-only or lack user comments in multimodal settings, COSMMIC uniquely integrates text, images, and user feedback. This holistic approach bridges gaps in Indian language resources, advancing NLP research and fostering inclusivity.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
Systems-Theoretic and Data-Driven Security Analysis in ML-enabled Medical Devices
Authors:
Gargi Mitra,
Mohammadreza Hallajiyan,
Inji Kim,
Athish Pranav Dharmalingam,
Mohammed Elnawawy,
Shahrear Iqbal,
Karthik Pattabiraman,
Homa Alemzadeh
Abstract:
The integration of AI/ML into medical devices is rapidly transforming healthcare by enhancing diagnostic and treatment facilities. However, this advancement also introduces serious cybersecurity risks due to the use of complex and often opaque models, extensive interconnectivity, interoperability with third-party peripheral devices, Internet connectivity, and vulnerabilities in the underlying tech…
▽ More
The integration of AI/ML into medical devices is rapidly transforming healthcare by enhancing diagnostic and treatment facilities. However, this advancement also introduces serious cybersecurity risks due to the use of complex and often opaque models, extensive interconnectivity, interoperability with third-party peripheral devices, Internet connectivity, and vulnerabilities in the underlying technologies. These factors contribute to a broad attack surface and make threat prevention, detection, and mitigation challenging. Given the highly safety-critical nature of these devices, a cyberattack on these devices can cause the ML models to mispredict, thereby posing significant safety risks to patients. Therefore, ensuring the security of these devices from the time of design is essential. This paper underscores the urgency of addressing the cybersecurity challenges in ML-enabled medical devices at the pre-market phase. We begin by analyzing publicly available data on device recalls and adverse events, and known vulnerabilities, to understand the threat landscape of AI/ML-enabled medical devices and their repercussions on patient safety. Building on this analysis, we introduce a suite of tools and techniques designed by us to assist security analysts in conducting comprehensive premarket risk assessments. Our work aims to empower manufacturers to embed cybersecurity as a core design principle in AI/ML-enabled medical devices, thereby making them safe for patients.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
Intrinsic Annealing in a Hybrid Memristor-Magnetic Tunnel Junction Ising Machine
Authors:
Mohammed Akib Iftakher,
Hugo Levices,
Kamel-Eddine Harabi,
Adrien Renaudineau,
Mathieu-Coumba Faye,
Corentin Bouchard,
Florian Disdier,
Bernard Viala,
Elisa Vianello,
Philippe Talatchian,
Kevin Garello,
Damien Querlioz,
Louis Hutin
Abstract:
Hardware implementations of the Ising model offer promising solutions to large-scale optimization tasks. In the literature, various nanodevices have been shown to emulate the spin dynamics for such Ising machines with remarkable effectiveness. Other nanodevices have been shown to implement spin-spin coupling with compact footprint and minimal energy dissipation. However, an ideal Ising machine wou…
▽ More
Hardware implementations of the Ising model offer promising solutions to large-scale optimization tasks. In the literature, various nanodevices have been shown to emulate the spin dynamics for such Ising machines with remarkable effectiveness. Other nanodevices have been shown to implement spin-spin coupling with compact footprint and minimal energy dissipation. However, an ideal Ising machine would associate both types of nanodevices, and they must operate synergistically to support annealing: a progressive reduction of machine stochasticity that allows it to settle to energy minimum. Here, we report an Ising machine that combines two nanotechnologies: memristor crossbar -- storing multi-level couplings -- and stochastic magnetic tunnel junction (SMTJ), acting as thermally driven spins. Because the same read voltage that interrogates the crossbar also biases the SMTJs, increasing this voltage automatically lowers the effective temperature of the machine, providing an intrinsic, nearly circuit-free annealing technique. Operating at zero magnetic field, our prototype consistently reaches the global optimum of a 24-vertex weighted MAX-CUT and a 10-vertex, three-color graph-coloring problem. Given that both nanotechnologies in our demonstrator are CMOS-integrated, this approach is compatible with advanced 3D integration, offering a scalable pathway toward compact, fast, and energy-efficient large-scale Ising solvers.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
GAMORA: A Gesture Articulated Meta Operative Robotic Arm for Hazardous Material Handling in Containment-Level Environments
Authors:
Farha Abdul Wasay,
Mohammed Abdul Rahman,
Hania Ghouse
Abstract:
The convergence of robotics and virtual reality (VR) has enabled safer and more efficient workflows in high-risk laboratory settings, particularly virology labs. As biohazard complexity increases, minimizing direct human exposure while maintaining precision becomes essential. We propose GAMORA (Gesture Articulated Meta Operative Robotic Arm), a novel VR-guided robotic system that enables remote ex…
▽ More
The convergence of robotics and virtual reality (VR) has enabled safer and more efficient workflows in high-risk laboratory settings, particularly virology labs. As biohazard complexity increases, minimizing direct human exposure while maintaining precision becomes essential. We propose GAMORA (Gesture Articulated Meta Operative Robotic Arm), a novel VR-guided robotic system that enables remote execution of hazardous tasks using natural hand gestures. Unlike existing scripted automation or traditional teleoperation, GAMORA integrates the Oculus Quest 2, NVIDIA Jetson Nano, and Robot Operating System (ROS) to provide real-time immersive control, digital twin simulation, and inverse kinematics-based articulation. The system supports VR-based training and simulation while executing precision tasks in physical environments via a 3D-printed robotic arm. Inverse kinematics ensure accurate manipulation for delicate operations such as specimen handling and pipetting. The pipeline includes Unity-based 3D environment construction, real-time motion planning, and hardware-in-the-loop testing. GAMORA achieved a mean positional discrepancy of 2.2 mm (improved from 4 mm), pipetting accuracy within 0.2 mL, and repeatability of 1.2 mm across 50 trials. Integrated object detection via YOLOv8 enhances spatial awareness, while energy-efficient operation (50% reduced power output) ensures sustainable deployment. The system's digital-physical feedback loop enables safe, precise, and repeatable automation of high-risk lab tasks. GAMORA offers a scalable, immersive solution for robotic control and biosafety in biomedical research environments.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
Constitutive Manifold Neural Networks
Authors:
Wouter J. Schuttert,
Mohammed Iqbal Abdul Rasheed,
Bojana Rosić
Abstract:
Important material properties like thermal conductivity are often represented as symmetric positive definite (SPD) tensors, which exhibit variability due to inherent material heterogeneity and manufacturing uncertainties. These tensors reside on a curved Riemannian manifold, and accurately modeling their stochastic nature requires preserving both their symmetric positive definite properties and sp…
▽ More
Important material properties like thermal conductivity are often represented as symmetric positive definite (SPD) tensors, which exhibit variability due to inherent material heterogeneity and manufacturing uncertainties. These tensors reside on a curved Riemannian manifold, and accurately modeling their stochastic nature requires preserving both their symmetric positive definite properties and spatial symmetries. To achieve this, uncertainties are parametrized into scaling (magnitude) and rotation (orientation) components, modeled as independent random variables on a manifold structure derived from the maximum entropy principle. The propagation of such stochastic tensors through physics-based simulations necessitates computationally efficient surrogate models. However, traditional multi-layer perceptron (MLP) architectures are not well-suited for SPD tensors, as directly inputting their components fails to preserve their geometric properties, often leading to suboptimal results. To address this, we introduce Constitutive Manifold Neural Networks (CMNN). This approach introduces a preprocessing layer by mapping the SPD tensor from the curved manifold to the local tangent, a flat vector space, creating an information preserving map for input to the hidden layers of the neural networks. A case study on a steady-state heat conduction problem with stochastic anisotropic conductivity demonstrates that geometry-preserving preprocessing, such as logarithmic maps for scaling data, significantly improves learning performance over conventional MLPs. These findings underscore the importance of manifold-aware techniques when working with tensor-valued data in engineering applications.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers
Authors:
Mohammed Mehedi Hasan,
Hao Li,
Emad Fallahzadeh,
Gopi Krishnan Rajbahadur,
Bram Adams,
Ahmed E. Hassan
Abstract:
Although Foundation Models (FMs), such as GPT-4, are increasingly used in domains like finance and software engineering, reliance on textual interfaces limits these models' real-world interaction. To address this, FM providers introduced tool calling-triggering a proliferation of frameworks with distinct tool interfaces. In late 2024, Anthropic introduced the Model Context Protocol (MCP) to standa…
▽ More
Although Foundation Models (FMs), such as GPT-4, are increasingly used in domains like finance and software engineering, reliance on textual interfaces limits these models' real-world interaction. To address this, FM providers introduced tool calling-triggering a proliferation of frameworks with distinct tool interfaces. In late 2024, Anthropic introduced the Model Context Protocol (MCP) to standardize this tool ecosystem, which has become the de facto standard with over eight million weekly SDK downloads. Despite its adoption, MCP's AI-driven, non-deterministic control flow introduces new risks to sustainability, security, and maintainability, warranting closer examination.
Towards this end, we present the first large-scale empirical study of MCP servers. Using state-of-the-art health metrics and a hybrid analysis pipeline, combining a general-purpose static analysis tool with an MCP-specific scanner, we evaluate 1,899 open-source MCP servers to assess their health, security, and maintainability. Despite MCP servers demonstrating strong health metrics, we identify eight distinct vulnerabilities - only three overlapping with traditional software vulnerabilities. Additionally, 7.2% of servers contain general vulnerabilities and 5.5% exhibit MCP-specific tool poisoning. Regarding maintainability, while 66% exhibit code smells, 14.4% contain nine bug patterns overlapping with traditional open-source software projects. These findings highlight the need for MCP-specific vulnerability detection techniques while reaffirming the value of traditional analysis and refactoring practices.
△ Less
Submitted 19 June, 2025; v1 submitted 16 June, 2025;
originally announced June 2025.
-
Delving Into the Psychology of Machines: Exploring the Structure of Self-Regulated Learning via LLM-Generated Survey Responses
Authors:
Leonie V. D. E. Vogelsmeier,
Eduardo Oliveira,
Kamila Misiejuk,
Sonsoles López-Pernas,
Mohammed Saqr
Abstract:
Large language models (LLMs) offer the potential to simulate human-like responses and behaviors, creating new opportunities for psychological science. In the context of self-regulated learning (SRL), if LLMs can reliably simulate survey responses at scale and speed, they could be used to test intervention scenarios, refine theoretical models, augment sparse datasets, and represent hard-to-reach po…
▽ More
Large language models (LLMs) offer the potential to simulate human-like responses and behaviors, creating new opportunities for psychological science. In the context of self-regulated learning (SRL), if LLMs can reliably simulate survey responses at scale and speed, they could be used to test intervention scenarios, refine theoretical models, augment sparse datasets, and represent hard-to-reach populations. However, the validity of LLM-generated survey responses remains uncertain, with limited research focused on SRL and existing studies beyond SRL yielding mixed results. Therefore, in this study, we examined LLM-generated responses to the 44-item Motivated Strategies for Learning Questionnaire (MSLQ; Pintrich \& De Groot, 1990), a widely used instrument assessing students' learning strategies and academic motivation. Particularly, we used the LLMs GPT-4o, Claude 3.7 Sonnet, Gemini 2 Flash, LLaMA 3.1-8B, and Mistral Large. We analyzed item distributions, the psychological network of the theoretical SRL dimensions, and psychometric validity based on the latent factor structure. Our results suggest that Gemini 2 Flash was the most promising LLM, showing considerable sampling variability and producing underlying dimensions and theoretical relationships that align with prior theory and empirical findings. At the same time, we observed discrepancies and limitations, underscoring both the potential and current constraints of using LLMs for simulating psychological survey data and applying it in educational contexts.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Leveraging MIMIC Datasets for Better Digital Health: A Review on Open Problems, Progress Highlights, and Future Promises
Authors:
Afifa Khaled,
Mohammed Sabir,
Rizwan Qureshi,
Camillo Maria Caruso,
Valerio Guarrasi,
Suncheng Xiang,
S Kevin Zhou
Abstract:
The Medical Information Mart for Intensive Care (MIMIC) datasets have become the Kernel of Digital Health Research by providing freely accessible, deidentified records from tens of thousands of critical care admissions, enabling a broad spectrum of applications in clinical decision support, outcome prediction, and healthcare analytics. Although numerous studies and surveys have explored the predic…
▽ More
The Medical Information Mart for Intensive Care (MIMIC) datasets have become the Kernel of Digital Health Research by providing freely accessible, deidentified records from tens of thousands of critical care admissions, enabling a broad spectrum of applications in clinical decision support, outcome prediction, and healthcare analytics. Although numerous studies and surveys have explored the predictive power and clinical utility of MIMIC based models, critical challenges in data integration, representation, and interoperability remain underexplored. This paper presents a comprehensive survey that focuses uniquely on open problems. We identify persistent issues such as data granularity, cardinality limitations, heterogeneous coding schemes, and ethical constraints that hinder the generalizability and real-time implementation of machine learning models. We highlight key progress in dimensionality reduction, temporal modelling, causal inference, and privacy preserving analytics, while also outlining promising directions including hybrid modelling, federated learning, and standardized preprocessing pipelines. By critically examining these structural limitations and their implications, this survey offers actionable insights to guide the next generation of MIMIC powered digital health innovations.
△ Less
Submitted 15 June, 2025;
originally announced June 2025.
-
Konooz: Multi-domain Multi-dialect Corpus for Named Entity Recognition
Authors:
Nagham Hamad,
Mohammed Khalilia,
Mustafa Jarrar
Abstract:
We introduce Konooz, a novel multi-dimensional corpus covering 16 Arabic dialects across 10 domains, resulting in 160 distinct corpora. The corpus comprises about 777k tokens, carefully collected and manually annotated with 21 entity types using both nested and flat annotation schemes - using the Wojood guidelines. While Konooz is useful for various NLP tasks like domain adaptation and transfer le…
▽ More
We introduce Konooz, a novel multi-dimensional corpus covering 16 Arabic dialects across 10 domains, resulting in 160 distinct corpora. The corpus comprises about 777k tokens, carefully collected and manually annotated with 21 entity types using both nested and flat annotation schemes - using the Wojood guidelines. While Konooz is useful for various NLP tasks like domain adaptation and transfer learning, this paper primarily focuses on benchmarking existing Arabic Named Entity Recognition (NER) models, especially cross-domain and cross-dialect model performance. Our benchmarking of four Arabic NER models using Konooz reveals a significant drop in performance of up to 38% when compared to the in-distribution data. Furthermore, we present an in-depth analysis of domain and dialect divergence and the impact of resource scarcity. We also measured the overlap between domains and dialects using the Maximum Mean Discrepancy (MMD) metric, and illustrated why certain NER models perform better on specific dialects and domains. Konooz is open-source and publicly available at https://sina.birzeit.edu/wojood/#download
△ Less
Submitted 14 June, 2025;
originally announced June 2025.
-
Zero-Shot Scene Understanding with Multimodal Large Language Models for Automated Vehicles
Authors:
Mohammed Elhenawy,
Shadi Jaradat,
Taqwa I. Alhadidi,
Huthaifa I. Ashqar,
Ahmed Jaber,
Andry Rakotonirainy,
Mohammad Abu Tami
Abstract:
Scene understanding is critical for various downstream tasks in autonomous driving, including facilitating driver-agent communication and enhancing human-centered explainability of autonomous vehicle (AV) decisions. This paper evaluates the capability of four multimodal large language models (MLLMs), including relatively small models, to understand scenes in a zero-shot, in-context learning settin…
▽ More
Scene understanding is critical for various downstream tasks in autonomous driving, including facilitating driver-agent communication and enhancing human-centered explainability of autonomous vehicle (AV) decisions. This paper evaluates the capability of four multimodal large language models (MLLMs), including relatively small models, to understand scenes in a zero-shot, in-context learning setting. Additionally, we explore whether combining these models using an ensemble approach with majority voting can enhance scene understanding performance. Our experiments demonstrate that GPT-4o, the largest model, outperforms the others in scene understanding. However, the performance gap between GPT-4o and the smaller models is relatively modest, suggesting that advanced techniques such as improved in-context learning, retrieval-augmented generation (RAG), or fine-tuning could further optimize the smaller models' performance. We also observe mixed results with the ensemble approach: while some scene attributes show improvement in performance metrics such as F1-score, others experience a decline. These findings highlight the need for more sophisticated ensemble techniques to achieve consistent gains across all scene attributes. This study underscores the potential of leveraging MLLMs for scene understanding and provides insights into optimizing their performance for autonomous driving applications.
△ Less
Submitted 17 March, 2025;
originally announced June 2025.
-
A Lightweight IDS for Early APT Detection Using a Novel Feature Selection Method
Authors:
Bassam Noori Shaker,
Bahaa Al-Musawi,
Mohammed Falih Hassan
Abstract:
An Advanced Persistent Threat (APT) is a multistage, highly sophisticated, and covert form of cyber threat that gains unauthorized access to networks to either steal valuable data or disrupt the targeted network. These threats often remain undetected for extended periods, emphasizing the critical need for early detection in networks to mitigate potential APT consequences. In this work, we propose…
▽ More
An Advanced Persistent Threat (APT) is a multistage, highly sophisticated, and covert form of cyber threat that gains unauthorized access to networks to either steal valuable data or disrupt the targeted network. These threats often remain undetected for extended periods, emphasizing the critical need for early detection in networks to mitigate potential APT consequences. In this work, we propose a feature selection method for developing a lightweight intrusion detection system capable of effectively identifying APTs at the initial compromise stage. Our approach leverages the XGBoost algorithm and Explainable Artificial Intelligence (XAI), specifically utilizing the SHAP (SHapley Additive exPlanations) method for identifying the most relevant features of the initial compromise stage. The results of our proposed method showed the ability to reduce the selected features of the SCVIC-APT-2021 dataset from 77 to just four while maintaining consistent evaluation metrics for the suggested system. The estimated metrics values are 97% precision, 100% recall, and a 98% F1 score. The proposed method not only aids in preventing successful APT consequences but also enhances understanding of APT behavior at early stages.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
Smart Buildings Energy Consumption Forecasting using Adaptive Evolutionary Ensemble Learning Models
Authors:
Mehdi Neshat,
Menasha Thilakaratne,
Mohammed El-Abd,
Seyedali Mirjalili,
Amir H. Gandomi,
John Boland
Abstract:
Smart buildings are gaining popularity because they can enhance energy efficiency, lower costs, improve security, and provide a more comfortable and convenient environment for building occupants. A considerable portion of the global energy supply is consumed in the building sector and plays a pivotal role in future decarbonization pathways. To manage energy consumption and improve energy efficienc…
▽ More
Smart buildings are gaining popularity because they can enhance energy efficiency, lower costs, improve security, and provide a more comfortable and convenient environment for building occupants. A considerable portion of the global energy supply is consumed in the building sector and plays a pivotal role in future decarbonization pathways. To manage energy consumption and improve energy efficiency in smart buildings, developing reliable and accurate energy demand forecasting is crucial and meaningful. However, extending an effective predictive model for the total energy use of appliances at the building level is challenging because of temporal oscillations and complex linear and non-linear patterns. This paper proposes three hybrid ensemble predictive models, incorporating Bagging, Stacking, and Voting mechanisms combined with a fast and effective evolutionary hyper-parameters tuner. The performance of the proposed energy forecasting model was evaluated using a hybrid dataset comprising meteorological parameters, appliance energy use, temperature, humidity, and lighting energy consumption from various sections of a building, collected by 18 sensors located in Stambroek, Mons, Belgium. To provide a comparative framework and investigate the efficiency of the proposed predictive model, 15 popular machine learning (ML) models, including two classic ML models, three NNs, a Decision Tree (DT), a Random Forest (RF), two Deep Learning (DL) and six Ensemble models, were compared. The prediction results indicate that the adaptive evolutionary bagging model surpassed other predictive models in both accuracy and learning error. Notably, it achieved accuracy gains of 12.6%, 13.7%, 12.9%, 27.04%, and 17.4% compared to Extreme Gradient Boosting (XGB), Categorical Boosting (CatBoost), GBM, LGBM, and Random Forest (RF).
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
Machine Learning-Based Quantification of Vesicoureteral Reflux with Enhancing Accuracy and Efficiency
Authors:
Muhyeeddin Alqaraleh,
Mowafaq Salem Alzboon,
Mohammad Subhi Al-Batah,
Lana Yasin Al Aesa,
Mohammed Hasan Abu-Arqoub,
Rashiq Rafiq Marie,
Firas Hussein Alsmad
Abstract:
Vesicoureteral reflux (VUR) is traditionally assessed using subjective grading systems, which introduces variability in diagnosis. This study investigates the use of machine learning to improve diagnostic consistency by analyzing voiding cystourethrogram (VCUG) images. A total of 113 VCUG images were reviewed, with expert grading of VUR severity. Nine image-based features were selected to train si…
▽ More
Vesicoureteral reflux (VUR) is traditionally assessed using subjective grading systems, which introduces variability in diagnosis. This study investigates the use of machine learning to improve diagnostic consistency by analyzing voiding cystourethrogram (VCUG) images. A total of 113 VCUG images were reviewed, with expert grading of VUR severity. Nine image-based features were selected to train six predictive models: Logistic Regression, Decision Tree, Gradient Boosting, Neural Network, and Stochastic Gradient Descent. The models were evaluated using leave-one-out cross-validation. Analysis identified deformation patterns in the renal calyces as key indicators of high-grade VUR. All models achieved accurate classifications with no false positives or negatives. High sensitivity to subtle image patterns characteristic of different VUR grades was confirmed by substantial Area Under the Curve (AUC) values. The results suggest that machine learning can offer an objective and standardized alternative to current subjective VUR assessments. These findings highlight renal calyceal deformation as a strong predictor of severe cases. Future research should aim to expand the dataset, refine imaging features, and improve model generalizability for broader clinical use.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
Collaboration Tools and their Role in Agile Software Projects
Authors:
Raman Mohammed Hussein,
Bryar A. Hassan
Abstract:
The purpose of this review is to understand the importance of collaboration tools which are Slack, Microsoft Teams, Confluence in Agile and software projects. Agile methodologies rely on flexibility, using cycles and integration throughout various levels of developing cycles. However, it is still a great problem for many teams to collaborate and communicate even if staff members and teams are work…
▽ More
The purpose of this review is to understand the importance of collaboration tools which are Slack, Microsoft Teams, Confluence in Agile and software projects. Agile methodologies rely on flexibility, using cycles and integration throughout various levels of developing cycles. However, it is still a great problem for many teams to collaborate and communicate even if staff members and teams are working remotely. In terms of collaboration, the applications and technologies mean better organization of work, increased mutually understandable openness and fast and efficient inter team and interpersonal interactions to enhance results of projects into productivity. This paper examines how these tools fit the Agile principles, how they facilitate iterative development, and encouraging effective initiation and tracking of tasks in small and large projects. The insights focus on how Slack, Microsoft Teams, and Confluence are essential for gaining better task coordination, supporting knowledge sharing, and adopting agile values across cross-functional contexts.
△ Less
Submitted 18 February, 2025;
originally announced June 2025.
-
A Comprehensive Survey of Unmanned Aerial Systems' Risks and Mitigation Strategies
Authors:
Sharad Shrestha,
Mohammed Ababneh,
Satyajayant Misra,
Henry M. Cathey, Jr.,
Roopa Vishwanathan,
Matt Jansen,
Jinhong Choi,
Rakesh Bobba,
Yeongjin Jang
Abstract:
In the last decade, the rapid growth of Unmanned Aircraft Systems (UAS) and Unmanned Aircraft Vehicles (UAV) in communication, defense, and transportation has increased. The application of UAS will continue to increase rapidly. This has led researchers to examine security vulnerabilities in various facets of UAS infrastructure and UAVs, which form a part of the UAS system to reinforce these critic…
▽ More
In the last decade, the rapid growth of Unmanned Aircraft Systems (UAS) and Unmanned Aircraft Vehicles (UAV) in communication, defense, and transportation has increased. The application of UAS will continue to increase rapidly. This has led researchers to examine security vulnerabilities in various facets of UAS infrastructure and UAVs, which form a part of the UAS system to reinforce these critical systems. This survey summarizes the cybersecurity vulnerabilities in several phases of UAV deployment, the likelihood of each vulnerability's occurrence, the impact of attacks, and mitigation strategies that could be applied. We go beyond the state-of-the-art by taking a comprehensive approach to enhancing UAS security by performing an analysis of both UAS-specific and non-UAS-specific mitigation strategies that are applicable within the UAS domain to define the lessons learned. We also present relevant cybersecurity standards and their recommendations in the UAS context. Despite the significant literature in UAS security and the relevance of cyberphysical and networked systems security approaches from the past, which we identify in the survey, we find several critical research gaps that require further investigation. These form part of our discussions and recommendations for the future exploration by our research community.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
SHIELD: Multi-task Multi-distribution Vehicle Routing Solver with Sparsity and Hierarchy
Authors:
Yong Liang Goh,
Zhiguang Cao,
Yining Ma,
Jianan Zhou,
Mohammed Haroon Dupty,
Wee Sun Lee
Abstract:
Recent advances toward foundation models for routing problems have shown great potential of a unified deep model for various VRP variants. However, they overlook the complex real-world customer distributions. In this work, we advance the Multi-Task VRP (MTVRP) setting to the more realistic yet challenging Multi-Task Multi-Distribution VRP (MTMDVRP) setting, and introduce SHIELD, a novel model that…
▽ More
Recent advances toward foundation models for routing problems have shown great potential of a unified deep model for various VRP variants. However, they overlook the complex real-world customer distributions. In this work, we advance the Multi-Task VRP (MTVRP) setting to the more realistic yet challenging Multi-Task Multi-Distribution VRP (MTMDVRP) setting, and introduce SHIELD, a novel model that leverages both sparsity and hierarchy principles. Building on a deeper decoder architecture, we first incorporate the Mixture-of-Depths (MoD) technique to enforce sparsity. This improves both efficiency and generalization by allowing the model to dynamically select nodes to use or skip each decoder layer, providing the needed capacity to adaptively allocate computation for learning the task/distribution specific and shared representations. We also develop a context-based clustering layer that exploits the presence of hierarchical structures in the problems to produce better local representations. These two designs inductively bias the network to identify key features that are common across tasks and distributions, leading to significantly improved generalization on unseen ones. Our empirical results demonstrate the superiority of our approach over existing methods on 9 real-world maps with 16 VRP variants each.
△ Less
Submitted 11 June, 2025; v1 submitted 9 June, 2025;
originally announced June 2025.
-
Explainable AI for Enhancing IDS Against Advanced Persistent Kill Chain
Authors:
Bassam Noori Shaker,
Bahaa Al-Musawi,
Mohammed Falih Hassan
Abstract:
Advanced Persistent Threats (APTs) represent a sophisticated and persistent cy-bersecurity challenge, characterized by stealthy, multi-phase, and targeted attacks aimed at compromising information systems over an extended period. Develop-ing an effective Intrusion Detection System (IDS) capable of detecting APTs at different phases relies on selecting network traffic features. However, not all of…
▽ More
Advanced Persistent Threats (APTs) represent a sophisticated and persistent cy-bersecurity challenge, characterized by stealthy, multi-phase, and targeted attacks aimed at compromising information systems over an extended period. Develop-ing an effective Intrusion Detection System (IDS) capable of detecting APTs at different phases relies on selecting network traffic features. However, not all of these features are directly related to the phases of APTs. Some network traffic features may be unrelated or have limited relevance to identifying malicious ac-tivity. Therefore, it is important to carefully select and analyze the most relevant features to improve the IDS performance. This work proposes a feature selection and classification model that integrates two prominent machine learning algo-rithms: SHapley Additive exPlanations (SHAP) and Extreme Gradient Boosting (XGBoost). The aim is to develop lightweight IDS based on a selected minimum number of influential features for detecting APTs at various phases. The pro-posed method also specifies the relevant features for each phase of APTs inde-pendently. Extensive experimental results on the SCVIC-APT-2021 dataset indi-cated that our proposed approach has improved performance compared to other standard techniques. Specifically, both the macro-average F1-score and recall reached 94% and 93 %, respectively, while reducing the complexity of the detec-tion model by selecting only 12 features out of 77.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
A Culturally-diverse Multilingual Multimodal Video Benchmark & Model
Authors:
Bhuiyan Sanjid Shafique,
Ashmal Vayani,
Muhammad Maaz,
Hanoona Abdul Rasheed,
Dinura Dissanayake,
Mohammed Irfan Kurpath,
Yahya Hmaiti,
Go Inoue,
Jean Lahoud,
Md. Safirur Rashid,
Shadid Intisar Quasem,
Maheen Fatima,
Franco Vidal,
Mykola Maslych,
Ketan Pravin More,
Sanoojan Baliah,
Hasindri Watawana,
Yuhao Li,
Fabian Farestam,
Leon Schaller,
Roman Tymtsiv,
Simon Weber,
Hisham Cholakkal,
Ivan Laptev,
Shin'ichi Satoh
, et al. (4 additional authors not shown)
Abstract:
Large multimodal models (LMMs) have recently gained attention due to their effectiveness to understand and generate descriptions of visual content. Most existing LMMs are in English language. While few recent works explore multilingual image LMMs, to the best of our knowledge, moving beyond the English language for cultural and linguistic inclusivity is yet to be investigated in the context of vid…
▽ More
Large multimodal models (LMMs) have recently gained attention due to their effectiveness to understand and generate descriptions of visual content. Most existing LMMs are in English language. While few recent works explore multilingual image LMMs, to the best of our knowledge, moving beyond the English language for cultural and linguistic inclusivity is yet to be investigated in the context of video LMMs. In pursuit of more inclusive video LMMs, we introduce a multilingual Video LMM benchmark, named ViMUL-Bench, to evaluate Video LMMs across 14 languages, including both low- and high-resource languages: English, Chinese, Spanish, French, German, Hindi, Arabic, Russian, Bengali, Urdu, Sinhala, Tamil, Swedish, and Japanese. Our ViMUL-Bench is designed to rigorously test video LMMs across 15 categories including eight culturally diverse categories, ranging from lifestyles and festivals to foods and rituals and from local landmarks to prominent cultural personalities. ViMUL-Bench comprises both open-ended (short and long-form) and multiple-choice questions spanning various video durations (short, medium, and long) with 8k samples that are manually verified by native language speakers. In addition, we also introduce a machine translated multilingual video training set comprising 1.2 million samples and develop a simple multilingual video LMM, named ViMUL, that is shown to provide a better tradeoff between high-and low-resource languages for video understanding. We hope our ViMUL-Bench and multilingual video LMM along with a large-scale multilingual video training set will help ease future research in developing cultural and linguistic inclusive multilingual video LMMs. Our proposed benchmark, video LMM and training data will be publicly released at https://mbzuai-oryx.github.io/ViMUL/.
△ Less
Submitted 8 June, 2025;
originally announced June 2025.
-
Towards Foundation Model on Temporal Knowledge Graph Reasoning
Authors:
Jiaxin Pan,
Mojtaba Nayyeri,
Osama Mohammed,
Daniel Hernandez,
Rongchuan Zhang,
Cheng Cheng,
Steffen Staab
Abstract:
Temporal Knowledge Graphs (TKGs) store temporal facts with quadruple formats (s, p, o, t). Existing Temporal Knowledge Graph Embedding (TKGE) models perform link prediction tasks in transductive or semi-inductive settings, which means the entities, relations, and temporal information in the test graph are fully or partially observed during training. Such reliance on seen elements during inference…
▽ More
Temporal Knowledge Graphs (TKGs) store temporal facts with quadruple formats (s, p, o, t). Existing Temporal Knowledge Graph Embedding (TKGE) models perform link prediction tasks in transductive or semi-inductive settings, which means the entities, relations, and temporal information in the test graph are fully or partially observed during training. Such reliance on seen elements during inference limits the models' ability to transfer to new domains and generalize to real-world scenarios. A central limitation is the difficulty in learning representations for entities, relations, and timestamps that are transferable and not tied to dataset-specific vocabularies. To overcome these limitations, we introduce the first fully-inductive approach to temporal knowledge graph link prediction. Our model employs sinusoidal positional encodings to capture fine-grained temporal patterns and generates adaptive entity and relation representations using message passing conditioned on both local and global temporal contexts. Our model design is agnostic to temporal granularity and time span, effectively addressing temporal discrepancies across TKGs and facilitating time-aware structural information transfer. As a pretrained, scalable, and transferable model, POSTRA demonstrates strong zero-shot performance on unseen temporal knowledge graphs, effectively generalizing to novel entities, relations, and timestamps. Extensive theoretical analysis and empirical results show that a single pretrained model can improve zero-shot performance on various inductive temporal reasoning scenarios, marking a significant step toward a foundation model for temporal KGs.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Uncertainty-Aware Multi-view Arrhythmia Classification from ECG
Authors:
Mohd Ashhad,
Sana Rahmani,
Mohammed Fayiz,
Ali Etemad,
Javad Hashemi
Abstract:
We propose a deep neural architecture that performs uncertainty-aware multi-view classification of arrhythmia from ECG. Our method learns two different views (1D and 2D) of single-lead ECG to capture different types of information. We use a fusion technique to reduce the conflict between the different views caused by noise and artifacts in ECG data, thus incorporating uncertainty to obtain stronge…
▽ More
We propose a deep neural architecture that performs uncertainty-aware multi-view classification of arrhythmia from ECG. Our method learns two different views (1D and 2D) of single-lead ECG to capture different types of information. We use a fusion technique to reduce the conflict between the different views caused by noise and artifacts in ECG data, thus incorporating uncertainty to obtain stronger final predictions. Our framework contains the following three modules (1) a time-series module to learn the morphological features from ECG; (2) an image-space learning module to learn the spatiotemporal features; and (3) the uncertainty-aware fusion module to fuse the information from the two different views. Experimental results on two real-world datasets demonstrate that our framework not only improves the performance on arrhythmia classification compared to the state-of-the-art but also shows better robustness to noise and artifacts present in ECG.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
Movie Facts and Fibs (MF$^2$): A Benchmark for Long Movie Understanding
Authors:
Emmanouil Zaranis,
António Farinhas,
Saul Santos,
Beatriz Canaverde,
Miguel Moura Ramos,
Aditya K Surikuchi,
André Viveiros,
Baohao Liao,
Elena Bueno-Benito,
Nithin Sivakumaran,
Pavlo Vasylenko,
Shoubin Yu,
Sonal Sannigrahi,
Wafaa Mohammed,
Ben Peters,
Danae Sánchez Villegas,
Elias Stengel-Eskin,
Giuseppe Attanasio,
Jaehong Yoon,
Stella Frank,
Alessandro Suglia,
Chrysoula Zerva,
Desmond Elliott,
Mariella Dimiccoli,
Mohit Bansal
, et al. (6 additional authors not shown)
Abstract:
Despite recent progress in vision-language models (VLMs), holistic understanding of long-form video content remains a significant challenge, partly due to limitations in current benchmarks. Many focus on peripheral, ``needle-in-a-haystack'' details, encouraging context-insensitive retrieval over deep comprehension. Others rely on large-scale, semi-automatically generated questions (often produced…
▽ More
Despite recent progress in vision-language models (VLMs), holistic understanding of long-form video content remains a significant challenge, partly due to limitations in current benchmarks. Many focus on peripheral, ``needle-in-a-haystack'' details, encouraging context-insensitive retrieval over deep comprehension. Others rely on large-scale, semi-automatically generated questions (often produced by language models themselves) that are easier for models to answer but fail to reflect genuine understanding. In this paper, we introduce MF$^2$, a new benchmark for evaluating whether models can comprehend, consolidate, and recall key narrative information from full-length movies (50-170 minutes long). MF$^2$ includes over 50 full-length, open-licensed movies, each paired with manually constructed sets of claim pairs -- one true (fact) and one plausible but false (fib), totalling over 850 pairs. These claims target core narrative elements such as character motivations and emotions, causal chains, and event order, and refer to memorable moments that humans can recall without rewatching the movie. Instead of multiple-choice formats, we adopt a binary claim evaluation protocol: for each pair, models must correctly identify both the true and false claims. This reduces biases like answer ordering and enables a more precise assessment of reasoning. Our experiments demonstrate that both open-weight and closed state-of-the-art models fall well short of human performance, underscoring the relative ease of the task for humans and their superior ability to retain and reason over critical narrative information -- an ability current VLMs lack.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
Sentiment Analysis in Learning Management Systems Understanding Student Feedback at Scale
Authors:
Mohammed Almutairi
Abstract:
During the wake of the Covid-19 pandemic, the educational paradigm has experienced a major change from in person learning traditional to online platforms. The change of learning convention has impacted the teacher-student especially in non-verbal communication. The absent of non-verbal communication has led to a reliance on verbal feedback which diminished the efficacy of the educational experienc…
▽ More
During the wake of the Covid-19 pandemic, the educational paradigm has experienced a major change from in person learning traditional to online platforms. The change of learning convention has impacted the teacher-student especially in non-verbal communication. The absent of non-verbal communication has led to a reliance on verbal feedback which diminished the efficacy of the educational experience. This paper explores the integration of sentiment analysis into learning management systems (LMS) to bridge the student-teacher's gap by offering an alternative approach to interpreting student feedback beyond its verbal context. The research involves data preparation, feature selection, and the development of a deep neural network model encompassing word embedding, LSTM, and attention mechanisms. This model is compared against a logistic regression baseline to evaluate its efficacy in understanding student feedback. The study aims to bridge the communication gap between instructors and students in online learning environments, offering insights into the emotional context of student feedback and ultimately improving the quality of online education.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Teaming in the AI Era: AI-Augmented Frameworks for Forming, Simulating, and Optimizing Human Teams
Authors:
Mohammed Almutairi
Abstract:
Effective teamwork is essential across diverse domains. During the team formation stage, a key challenge is forming teams that effectively balance user preferences with task objectives to enhance overall team satisfaction. In the team performing stage, maintaining cohesion and engagement is critical for sustaining high team performance. However, existing computational tools and algorithms for team…
▽ More
Effective teamwork is essential across diverse domains. During the team formation stage, a key challenge is forming teams that effectively balance user preferences with task objectives to enhance overall team satisfaction. In the team performing stage, maintaining cohesion and engagement is critical for sustaining high team performance. However, existing computational tools and algorithms for team optimization often rely on static data inputs, narrow algorithmic objectives, or solutions tailored for specific contexts, failing to account for the dynamic interplay of team members personalities, evolving goals, and changing individual preferences. Therefore, teams may encounter member dissatisfaction, as purely algorithmic assignments can reduce members commitment to team goals or experience suboptimal engagement due to the absence of timely, personalized guidance to help members adjust their behaviors and interactions as team dynamics evolve. Ultimately, these challenges can lead to reduced overall team performance. My Ph.D. dissertation aims to develop AI-augmented team optimization frameworks and practical systems that enhance team satisfaction, engagement, and performance. First, I propose a team formation framework that leverages a multi-armed bandit algorithm to iteratively refine team composition based on user preferences, ensuring alignment between individual needs and collective team goals to enhance team satisfaction. Second, I introduce tAIfa (Team AI Feedback Assistant), an AI-powered system that utilizes large language models (LLMs) to deliver immediate, personalized feedback to both teams and individual members, enhancing cohesion and engagement. Finally, I present PuppeteerLLM, an LLM-based simulation framework that simulates multi-agent teams to model complex team dynamics within realistic environments, incorporating task-driven collaboration and long-term coordination.
△ Less
Submitted 6 June, 2025; v1 submitted 5 June, 2025;
originally announced June 2025.
-
Towards Effective Multidisciplinary Health and HCI Teams based on AI Framework
Authors:
Mohammed Almutairi,
Diego Gómez-Zará
Abstract:
As a Ph.D. student with a diverse background in both public and private sectors, I have encountered numerous challenges in cross-disciplinary and multi-stakeholder team projects. My research on developing team compositions that involve multidisciplinary members from fields including education, academia, and health. Along with my advisor, we are focused on exploring how HCI can help individuals ass…
▽ More
As a Ph.D. student with a diverse background in both public and private sectors, I have encountered numerous challenges in cross-disciplinary and multi-stakeholder team projects. My research on developing team compositions that involve multidisciplinary members from fields including education, academia, and health. Along with my advisor, we are focused on exploring how HCI can help individuals assemble more effective teams. This effort involves developing socio-technical systems that guide and inform individuals of the potential teams that they can assemble. We employ state-of-the-art algorithms that prioritize inclusion among team members from diverse areas of expertise and familiarity between the team members. Our goal for attending this workshop is to engage in meaningful dialogues with scholars and researchers, leveraging these interactions to refine our approach to building an AI-driven team composition system to foster effective, interdisciplinary collaboration in health-focused HCI research.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Vision-Based Autonomous MM-Wave Reflector Using ArUco-Driven Angle-of-Arrival Estimation
Authors:
Josue Marroquin,
Nan Inzali,
Miles Dillon Lantz,
Campbell Freeman,
Amod Ashtekar,
\\Ajinkya Umesh Mulik,
Mohammed E Eltayeb
Abstract:
Reliable millimeter-wave (mmWave) communication in non-line-of-sight (NLoS) conditions remains a major challenge for both military and civilian operations, especially in urban or infrastructure-limited environments. This paper presents a vision-aided autonomous reflector system designed to enhance mmWave link performance by dynamically steering signal reflections using a motorized metallic plate.…
▽ More
Reliable millimeter-wave (mmWave) communication in non-line-of-sight (NLoS) conditions remains a major challenge for both military and civilian operations, especially in urban or infrastructure-limited environments. This paper presents a vision-aided autonomous reflector system designed to enhance mmWave link performance by dynamically steering signal reflections using a motorized metallic plate. The proposed system leverages a monocular camera to detect ArUco markers on allied transmitter and receiver nodes, estimate their angles of arrival, and align the reflector in real time for optimal signal redirection. This approach enables selective beam coverage by serving only authenticated targets with visible markers and reduces the risk of unintended signal exposure. The designed prototype, built on a Raspberry Pi 4 and low-power hardware, operates autonomously without reliance on external infrastructure or GPS. Experimental results at 60\,GHz demonstrate a 23\,dB average gain in received signal strength and an 0.89 probability of maintaining signal reception above a target threshold of -65 dB in an indoor environment, far exceeding the static and no-reflector baselines. These results demonstrate the system's potential for resilient and adaptive mmWave connectivity in complex and dynamic environments.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Classifying Dental Care Providers Through Machine Learning with Features Ranking
Authors:
Mohammad Subhi Al-Batah,
Mowafaq Salem Alzboon,
Muhyeeddin Alqaraleh,
Mohammed Hasan Abu-Arqoub,
Rashiq Rafiq Marie
Abstract:
This study investigates the application of machine learning (ML) models for classifying dental providers into two categories - standard rendering providers and safety net clinic (SNC) providers - using a 2018 dataset of 24,300 instances with 20 features. The dataset, characterized by high missing values (38.1%), includes service counts (preventive, treatment, exams), delivery systems (FFS, managed…
▽ More
This study investigates the application of machine learning (ML) models for classifying dental providers into two categories - standard rendering providers and safety net clinic (SNC) providers - using a 2018 dataset of 24,300 instances with 20 features. The dataset, characterized by high missing values (38.1%), includes service counts (preventive, treatment, exams), delivery systems (FFS, managed care), and beneficiary demographics. Feature ranking methods such as information gain, Gini index, and ANOVA were employed to identify critical predictors, revealing treatment-related metrics (TXMT_USER_CNT, TXMT_SVC_CNT) as top-ranked features. Twelve ML models, including k-Nearest Neighbors (kNN), Decision Trees, Support Vector Machines (SVM), Stochastic Gradient Descent (SGD), Random Forest, Neural Networks, and Gradient Boosting, were evaluated using 10-fold cross-validation. Classification accuracy was tested across incremental feature subsets derived from rankings. The Neural Network achieved the highest accuracy (94.1%) using all 20 features, followed by Gradient Boosting (93.2%) and Random Forest (93.0%). Models showed improved performance as more features were incorporated, with SGD and ensemble methods demonstrating robustness to missing data. Feature ranking highlighted the dominance of treatment service counts and annotation codes in distinguishing provider types, while demographic variables (AGE_GROUP, CALENDAR_YEAR) had minimal impact. The study underscores the importance of feature selection in enhancing model efficiency and accuracy, particularly in imbalanced healthcare datasets. These findings advocate for integrating feature-ranking techniques with advanced ML algorithms to optimize dental provider classification, enabling targeted resource allocation for underserved populations.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Point Cloud Quality Assessment Using the Perceptual Clustering Weighted Graph (PCW-Graph) and Attention Fusion Network
Authors:
Abdelouahed Laazoufi,
Mohammed El Hassouni,
Hocine Cherifi
Abstract:
No-Reference Point Cloud Quality Assessment (NR-PCQA) is critical for evaluating 3D content in real-world applications where reference models are unavailable.
No-Reference Point Cloud Quality Assessment (NR-PCQA) is critical for evaluating 3D content in real-world applications where reference models are unavailable.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
A Threat Intelligence Event Extraction Conceptual Model for Cyber Threat Intelligence Feeds
Authors:
Jamal H. Al-Yasiri,
Mohamad Fadli Bin Zolkipli,
Nik Fatinah N Mohd Farid,
Mohammed Alsamman,
Zainab Ali Mohammed
Abstract:
In response to the escalating cyber threats, the efficiency of Cyber Threat Intelligence (CTI) data collection has become paramount in ensuring robust cybersecurity. However, existing works encounter significant challenges in preprocessing large volumes of multilingual threat data, leading to inefficiencies in real-time threat analysis. This paper presents a systematic review of current techniques…
▽ More
In response to the escalating cyber threats, the efficiency of Cyber Threat Intelligence (CTI) data collection has become paramount in ensuring robust cybersecurity. However, existing works encounter significant challenges in preprocessing large volumes of multilingual threat data, leading to inefficiencies in real-time threat analysis. This paper presents a systematic review of current techniques aimed at enhancing CTI data collection efficiency. Additionally, it proposes a conceptual model to further advance the effectiveness of threat intelligence feeds. Following the PRISMA guidelines, the review examines relevant studies from the Scopus database, highlighting the critical role of artificial intelligence (AI) and machine learning models in optimizing CTI data preprocessing. The findings underscore the importance of AI-driven methods, particularly supervised and unsupervised learning, in significantly improving the accuracy of threat detection and event extraction, thereby strengthening cybersecurity. Furthermore, the study identifies a gap in the existing research and introduces XBC conceptual model integrating XLM-RoBERTa, BiGRU, and CRF, specifically developed to address this gap. This paper contributes conceptually to the field by providing a detailed analysis of current CTI data collection techniques and introducing an innovative conceptual model to enhance future threat intelligence capabilities.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
A Foundation Model for Spatial Proteomics
Authors:
Muhammad Shaban,
Yuzhou Chang,
Huaying Qiu,
Yao Yu Yeo,
Andrew H. Song,
Guillaume Jaume,
Yuchen Wang,
Luca L. Weishaupt,
Tong Ding,
Anurag Vaidya,
Abdallah Lamane,
Daniel Shao,
Mohammed Zidane,
Yunhao Bai,
Paige McCallum,
Shuli Luo,
Wenrui Wu,
Yang Wang,
Precious Cramer,
Chi Ngai Chan,
Pierre Stephan,
Johanna Schaffenrath,
Jia Le Lee,
Hendrik A. Michel,
Caiwei Tian
, et al. (35 additional authors not shown)
Abstract:
Foundation models have begun to transform image analysis by acting as pretrained generalist backbones that can be adapted to many tasks even when post-training data are limited, yet their impact on spatial proteomics, imaging that maps proteins at single-cell resolution, remains limited. Here, we introduce KRONOS, a foundation model built for spatial proteomics. KRONOS was trained in a self-superv…
▽ More
Foundation models have begun to transform image analysis by acting as pretrained generalist backbones that can be adapted to many tasks even when post-training data are limited, yet their impact on spatial proteomics, imaging that maps proteins at single-cell resolution, remains limited. Here, we introduce KRONOS, a foundation model built for spatial proteomics. KRONOS was trained in a self-supervised manner on over 47 million image patches covering 175 protein markers, 16 tissue types, and 8 fluorescence-based imaging platforms. We introduce key architectural adaptations to address the high-dimensional, multi-channel, and heterogeneous nature of multiplex imaging. We demonstrate that KRONOS learns biologically meaningful representations across multiple scales, ranging from cellular and microenvironment to tissue levels, enabling it to address diverse downstream tasks, including cell phenotyping, region classification, and patient stratification. Evaluated across 11 independent cohorts, KRONOS achieves state-of-the-art performance across cell phenotyping, treatment response prediction, and retrieval tasks, and is highly data-efficient. KRONOS also introduces the paradigm of segmentation-free patch-level processing for efficient and scalable spatial proteomics analysis, allowing cross-institutional comparisons, and as an image reverse search engine for spatial patterns. Together, these results position KRONOS as a flexible and scalable tool for spatial proteomics. The model is publicly accessible at https://github.com/mahmoodlab/KRONOS.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Deep Learning for Retinal Degeneration Assessment: A Comprehensive Analysis of the MARIO AMD Progression Challenge
Authors:
Rachid Zeghlache,
Ikram Brahim,
Pierre-Henri Conze,
Mathieu Lamard,
Mohammed El Amine Lazouni,
Zineb Aziza Elaouaber,
Leila Ryma Lazouni,
Christopher Nielsen,
Ahmad O. Ahsan,
Matthias Wilms,
Nils D. Forkert,
Lovre Antonio Budimir,
Ivana Matovinović,
Donik Vršnak,
Sven Lončarić,
Philippe Zhang,
Weili Jiang,
Yihao Li,
Yiding Hao,
Markus Frohmann,
Patrick Binder,
Marcel Huber,
Taha Emre,
Teresa Finisterra Araújo,
Marzieh Oghbaie
, et al. (25 additional authors not shown)
Abstract:
The MARIO challenge, held at MICCAI 2024, focused on advancing the automated detection and monitoring of age-related macular degeneration (AMD) through the analysis of optical coherence tomography (OCT) images. Designed to evaluate algorithmic performance in detecting neovascular activity changes within AMD, the challenge incorporated unique multi-modal datasets. The primary dataset, sourced from…
▽ More
The MARIO challenge, held at MICCAI 2024, focused on advancing the automated detection and monitoring of age-related macular degeneration (AMD) through the analysis of optical coherence tomography (OCT) images. Designed to evaluate algorithmic performance in detecting neovascular activity changes within AMD, the challenge incorporated unique multi-modal datasets. The primary dataset, sourced from Brest, France, was used by participating teams to train and test their models. The final ranking was determined based on performance on this dataset. An auxiliary dataset from Algeria was used post-challenge to evaluate population and device shifts from submitted solutions. Two tasks were involved in the MARIO challenge. The first one was the classification of evolution between two consecutive 2D OCT B-scans. The second one was the prediction of future AMD evolution over three months for patients undergoing anti-vascular endothelial growth factor (VEGF) therapy. Thirty-five teams participated, with the top 12 finalists presenting their methods. This paper outlines the challenge's structure, tasks, data characteristics, and winning methodologies, setting a benchmark for AMD monitoring using OCT, infrared imaging, and clinical data (such as the number of visits, age, gender, etc.). The results of this challenge indicate that artificial intelligence (AI) performs as well as a physician in measuring AMD progression (Task 1) but is not yet able of predicting future evolution (Task 2).
△ Less
Submitted 7 June, 2025; v1 submitted 3 June, 2025;
originally announced June 2025.
-
Unified Attention Modeling for Efficient Free-Viewing and Visual Search via Shared Representations
Authors:
Fatma Youssef Mohammed,
Kostas Alexis
Abstract:
Computational human attention modeling in free-viewing and task-specific settings is often studied separately, with limited exploration of whether a common representation exists between them. This work investigates this question and proposes a neural network architecture that builds upon the Human Attention transformer (HAT) to test the hypothesis. Our results demonstrate that free-viewing and vis…
▽ More
Computational human attention modeling in free-viewing and task-specific settings is often studied separately, with limited exploration of whether a common representation exists between them. This work investigates this question and proposes a neural network architecture that builds upon the Human Attention transformer (HAT) to test the hypothesis. Our results demonstrate that free-viewing and visual search can efficiently share a common representation, allowing a model trained in free-viewing attention to transfer its knowledge to task-driven visual search with a performance drop of only 3.86% in the predicted fixation scanpaths, measured by the semantic sequence score (SemSS) metric which reflects the similarity between predicted and human scanpaths. This transfer reduces computational costs by 92.29% in terms of GFLOPs and 31.23% in terms of trainable parameters.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Breaking the Barriers of Text-Hungry and Audio-Deficient AI
Authors:
Hamidou Tembine,
Issa Bamia,
Massa NDong,
Bakary Coulibaly,
Oumar Issiaka Traore,
Moussa Traore,
Moussa Sanogo,
Mamadou Eric Sangare,
Salif Kante,
Daryl Noupa Yongueng,
Hafiz Tiomoko Ali,
Malik Tiomoko,
Frejus Laleye,
Boualem Djehiche,
Wesmanegda Elisee Dipama,
Idris Baba Saje,
Hammid Mohammed Ibrahim,
Moumini Sanogo,
Marie Coursel Nininahazwe,
Abdul-Latif Siita,
Haine Mhlongo,
Teddy Nelvy Dieu Merci Kouka,
Mariam Serine Jeridi,
Mutiyamuogo Parfait Mupenge,
Lekoueiry Dehah
, et al. (9 additional authors not shown)
Abstract:
While global linguistic diversity spans more than 7164 recognized languages, the current dominant architecture of machine intelligence remains fundamentally biased toward written text. This bias excludes over 700 million people particularly in rural and remote regions who are audio-literate. In this work, we introduce a fully textless, audio-to-audio machine intelligence framework designed to serv…
▽ More
While global linguistic diversity spans more than 7164 recognized languages, the current dominant architecture of machine intelligence remains fundamentally biased toward written text. This bias excludes over 700 million people particularly in rural and remote regions who are audio-literate. In this work, we introduce a fully textless, audio-to-audio machine intelligence framework designed to serve this underserved population, and all the people who prefer audio-efficiency. Our contributions include novel Audio-to-Audio translation architectures that bypass text entirely, including spectrogram-, scalogram-, wavelet-, and unit-based models. Central to our approach is the Multiscale Audio-Semantic Transform (MAST), a representation that encodes tonal, prosodic, speaker, and expressive features. We further integrate MAST into a fractional diffusion of mean-field-type framework powered by fractional Brownian motion. It enables the generation of high-fidelity, semantically consistent speech without reliance on textual supervision. The result is a robust and scalable system capable of learning directly from raw audio, even in languages that are unwritten or rarely digitized. This work represents a fundamental shift toward audio-native machine intelligence systems, expanding access to language technologies for communities historically left out of the current machine intelligence ecosystem.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Unified Large Language Models for Misinformation Detection in Low-Resource Linguistic Settings
Authors:
Muhammad Islam,
Javed Ali Khan,
Mohammed Abaker,
Ali Daud,
Azeem Irshad
Abstract:
The rapid expansion of social media platforms has significantly increased the dissemination of forged content and misinformation, making the detection of fake news a critical area of research. Although fact-checking efforts predominantly focus on English-language news, there is a noticeable gap in resources and strategies to detect news in regional languages, such as Urdu. Advanced Fake News Detec…
▽ More
The rapid expansion of social media platforms has significantly increased the dissemination of forged content and misinformation, making the detection of fake news a critical area of research. Although fact-checking efforts predominantly focus on English-language news, there is a noticeable gap in resources and strategies to detect news in regional languages, such as Urdu. Advanced Fake News Detection (FND) techniques rely heavily on large, accurately labeled datasets. However, FND in under-resourced languages like Urdu faces substantial challenges due to the scarcity of extensive corpora and the lack of validated lexical resources. Current Urdu fake news datasets are often domain-specific and inaccessible to the public. They also lack human verification, relying mainly on unverified English-to-Urdu translations, which compromises their reliability in practical applications. This study highlights the necessity of developing reliable, expert-verified, and domain-independent Urdu-enhanced FND datasets to improve fake news detection in Urdu and other resource-constrained languages. This paper presents the first benchmark large FND dataset for Urdu news, which is publicly available for validation and deep analysis. We also evaluate this dataset using multiple state-of-the-art pre-trained large language models (LLMs), such as XLNet, mBERT, XLM-RoBERTa, RoBERTa, DistilBERT, and DeBERTa. Additionally, we propose a unified LLM model that outperforms the others with different embedding and feature extraction techniques. The performance of these models is compared based on accuracy, F1 score, precision, recall, and human judgment for vetting the sample results of news.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
Construction of DNA codes using $θ$-skew cyclic codes over $\mathbb{F}_4 + v \mathbb{F}_4$
Authors:
Joël Kabore,
Mohammed Elhassani Charkani
Abstract:
In this paper, we investigate $θ$-skew cyclic codes over the ring $R= \mathbb{F}_4 + v \mathbb{F}_4$, where $v^2=v$ and $θ$ is a non-trivial automorphism over $\mathbb{F}_4 + v \mathbb{F}_4$. This allows us to describe DNA code over this ring by characterizing $θ$-skew cyclic reversible DNA codes and $θ$-skew cyclic reversible complement DNA codes. We also explore the Gray images of $θ$-skew cycli…
▽ More
In this paper, we investigate $θ$-skew cyclic codes over the ring $R= \mathbb{F}_4 + v \mathbb{F}_4$, where $v^2=v$ and $θ$ is a non-trivial automorphism over $\mathbb{F}_4 + v \mathbb{F}_4$. This allows us to describe DNA code over this ring by characterizing $θ$-skew cyclic reversible DNA codes and $θ$-skew cyclic reversible complement DNA codes. We also explore the Gray images of $θ$-skew cyclic codes.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
PSI-PFL: Population Stability Index for Client Selection in non-IID Personalized Federated Learning
Authors:
Daniel-M. Jimenez-Gutierrez,
David Solans,
Mohammed Elbamby,
Nicolas Kourtellis
Abstract:
Federated Learning (FL) enables decentralized machine learning (ML) model training while preserving data privacy by keeping data localized across clients. However, non-independent and identically distributed (non-IID) data across clients poses a significant challenge, leading to skewed model updates and performance degradation. Addressing this, we propose PSI-PFL, a novel client selection framewor…
▽ More
Federated Learning (FL) enables decentralized machine learning (ML) model training while preserving data privacy by keeping data localized across clients. However, non-independent and identically distributed (non-IID) data across clients poses a significant challenge, leading to skewed model updates and performance degradation. Addressing this, we propose PSI-PFL, a novel client selection framework for Personalized Federated Learning (PFL) that leverages the Population Stability Index (PSI) to quantify and mitigate data heterogeneity (so-called non-IIDness). Our approach selects more homogeneous clients based on PSI, reducing the impact of label skew, one of the most detrimental factors in FL performance. Experimental results over multiple data modalities (tabular, image, text) demonstrate that PSI-PFL significantly improves global model accuracy, outperforming state-of-the-art baselines by up to 10\% under non-IID scenarios while ensuring fairer local performance. PSI-PFL enhances FL performance and offers practical benefits in applications where data privacy and heterogeneity are critical.
△ Less
Submitted 31 May, 2025;
originally announced June 2025.
-
Simplifying Bayesian Optimization Via In-Context Direct Optimum Sampling
Authors:
Gustavo Sutter Pessurno de Carvalho,
Mohammed Abdulrahman,
Hao Wang,
Sriram Ganapathi Subramanian,
Marc St-Aubin,
Sharon O'Sullivan,
Lawrence Wan,
Luis Ricardez-Sandoval,
Pascal Poupart,
Agustinus Kristiadi
Abstract:
The optimization of expensive black-box functions is ubiquitous in science and engineering. A common solution to this problem is Bayesian optimization (BO), which is generally comprised of two components: (i) a surrogate model and (ii) an acquisition function, which generally require expensive re-training and optimization steps at each iteration, respectively. Although recent work enabled in-conte…
▽ More
The optimization of expensive black-box functions is ubiquitous in science and engineering. A common solution to this problem is Bayesian optimization (BO), which is generally comprised of two components: (i) a surrogate model and (ii) an acquisition function, which generally require expensive re-training and optimization steps at each iteration, respectively. Although recent work enabled in-context surrogate models that do not require re-training, virtually all existing BO methods still require acquisition function maximization to select the next observation, which introduces many knobs to tune, such as Monte Carlo samplers and multi-start optimizers. In this work, we propose a completely in-context, zero-shot solution for BO that does not require surrogate fitting or acquisition function optimization. This is done by using a pre-trained deep generative model to directly sample from the posterior over the optimum point. We show that this process is equivalent to Thompson sampling and demonstrate the capabilities and cost-effectiveness of our foundation model on a suite of real-world benchmarks. We achieve an efficiency gain of more than 35x in terms of wall-clock time when compared with Gaussian process-based BO, enabling efficient parallel and distributed BO, e.g., for high-throughput optimization.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking
Authors:
Liangliang Zhang,
Zhuorui Jiang,
Hongliang Chi,
Haoyang Chen,
Mohammed Elkoumy,
Fali Wang,
Qiong Wu,
Zhengyi Zhou,
Shirui Pan,
Suhang Wang,
Yao Ma
Abstract:
Knowledge Graph Question Answering (KGQA) systems rely on high-quality benchmarks to evaluate complex multi-hop reasoning. However, despite their widespread use, popular datasets such as WebQSP and CWQ suffer from critical quality issues, including inaccurate or incomplete ground-truth annotations, poorly constructed questions that are ambiguous, trivial, or unanswerable, and outdated or inconsist…
▽ More
Knowledge Graph Question Answering (KGQA) systems rely on high-quality benchmarks to evaluate complex multi-hop reasoning. However, despite their widespread use, popular datasets such as WebQSP and CWQ suffer from critical quality issues, including inaccurate or incomplete ground-truth annotations, poorly constructed questions that are ambiguous, trivial, or unanswerable, and outdated or inconsistent knowledge. Through a manual audit of 16 popular KGQA datasets, including WebQSP and CWQ, we find that the average factual correctness rate is only 57 %. To address these issues, we introduce KGQAGen, an LLM-in-the-loop framework that systematically resolves these pitfalls. KGQAGen combines structured knowledge grounding, LLM-guided generation, and symbolic verification to produce challenging and verifiable QA instances. Using KGQAGen, we construct KGQAGen-10k, a ten-thousand scale benchmark grounded in Wikidata, and evaluate a diverse set of KG-RAG models. Experimental results demonstrate that even state-of-the-art systems struggle on this benchmark, highlighting its ability to expose limitations of existing models. Our findings advocate for more rigorous benchmark construction and position KGQAGen as a scalable framework for advancing KGQA evaluation.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.