-
ARCH-COMP25 Category Report: Stochastic Models
Authors:
Alessandro Abate,
Omid Akbarzadeh,
Henk A. P. Blom,
Sofie Haesaert,
Sina Hassani,
Abolfazl Lavaei,
Frederik Baymler Mathiesen,
Rahul Misra,
Amy Nejati,
Mathis Niehage,
Fie Ørum,
Anne Remke,
Behrad Samari,
Ruohan Wang,
Rafal Wisniewski,
Ben Wooding,
Mahdieh Zaker
Abstract:
This report is concerned with a friendly competition for formal verification and policy synthesis of stochastic models. The main goal of the report is to introduce new benchmarks and their properties within this category and recommend next steps toward next year's edition of the competition. In particular, this report introduces three recently developed software tools, a new water distribution net…
▽ More
This report is concerned with a friendly competition for formal verification and policy synthesis of stochastic models. The main goal of the report is to introduce new benchmarks and their properties within this category and recommend next steps toward next year's edition of the competition. In particular, this report introduces three recently developed software tools, a new water distribution network benchmark, and a collection of simplified benchmarks intended to facilitate further comparisons among tools that were previously not directly comparable. This friendly competition took place as part of the workshop Applied Verification for Continuous and Hybrid Systems (ARCH) in Summer 2025.
△ Less
Submitted 21 June, 2025;
originally announced June 2025.
-
Noise tolerance via reinforcement: Learning a reinforced quantum dynamics
Authors:
Abolfazl Ramezanpour
Abstract:
The performance of quantum simulations heavily depends on the efficiency of noise mitigation techniques and error correction algorithms. Reinforcement has emerged as a powerful strategy to enhance the performance of learning and optimization algorithms. In this study, we demonstrate that reinforced quantum dynamics can exhibit significant robustness against interactions with a noisy environment. W…
▽ More
The performance of quantum simulations heavily depends on the efficiency of noise mitigation techniques and error correction algorithms. Reinforcement has emerged as a powerful strategy to enhance the performance of learning and optimization algorithms. In this study, we demonstrate that reinforced quantum dynamics can exhibit significant robustness against interactions with a noisy environment. We study a quantum annealing process where, through reinforcement, the system is encouraged to maintain its current state or follow a noise-free evolution. A learning algorithm is employed to find a concise approximation of this reinforced dynamics, reducing the total evolution time and, consequently, the system's exposure to noisy interactions. This approach also avoids the complexities associated with implementing quantum feedback in such algorithms. The efficacy of our method is demonstrated through numerical simulations of reinforced quantum annealing with one- and two-qubit systems under Pauli noise.
△ Less
Submitted 14 June, 2025;
originally announced June 2025.
-
Fundamental Limits of Learning High-dimensional Simplices in Noisy Regimes
Authors:
Seyed Amir Hossein Saberi,
Amir Najafi,
Abolfazl Motahari,
Babak H. khalaj
Abstract:
In this paper, we establish sample complexity bounds for learning high-dimensional simplices in $\mathbb{R}^K$ from noisy data. Specifically, we consider $n$ i.i.d. samples uniformly drawn from an unknown simplex in $\mathbb{R}^K$, each corrupted by additive Gaussian noise of unknown variance. We prove an algorithm exists that, with high probability, outputs a simplex within $\ell_2$ or total vari…
▽ More
In this paper, we establish sample complexity bounds for learning high-dimensional simplices in $\mathbb{R}^K$ from noisy data. Specifically, we consider $n$ i.i.d. samples uniformly drawn from an unknown simplex in $\mathbb{R}^K$, each corrupted by additive Gaussian noise of unknown variance. We prove an algorithm exists that, with high probability, outputs a simplex within $\ell_2$ or total variation (TV) distance at most $\varepsilon$ from the true simplex, provided $n \ge (K^2/\varepsilon^2) e^{\mathcal{O}(K/\mathrm{SNR}^2)}$, where $\mathrm{SNR}$ is the signal-to-noise ratio. Extending our prior work~\citep{saberi2023sample}, we derive new information-theoretic lower bounds, showing that simplex estimation within TV distance $\varepsilon$ requires at least $n \ge Ω(K^3 σ^2/\varepsilon^2 + K/\varepsilon)$ samples, where $σ^2$ denotes the noise variance. In the noiseless scenario, our lower bound $n \ge Ω(K/\varepsilon)$ matches known upper bounds up to constant factors. We resolve an open question by demonstrating that when $\mathrm{SNR} \ge Ω(K^{1/2})$, noisy-case complexity aligns with the noiseless case. Our analysis leverages sample compression techniques (Ashtiani et al., 2018) and introduces a novel Fourier-based method for recovering distributions from noisy observations, potentially applicable beyond simplex learning.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
An Adversary-Resistant Multi-Agent LLM System via Credibility Scoring
Authors:
Sana Ebrahimi,
Mohsen Dehghankar,
Abolfazl Asudeh
Abstract:
While multi-agent LLM systems show strong capabilities in various domains, they are highly vulnerable to adversarial and low-performing agents. To resolve this issue, in this paper, we introduce a general and adversary-resistant multi-agent LLM framework based on credibility scoring. We model the collaborative query-answering process as an iterative game, where the agents communicate and contribut…
▽ More
While multi-agent LLM systems show strong capabilities in various domains, they are highly vulnerable to adversarial and low-performing agents. To resolve this issue, in this paper, we introduce a general and adversary-resistant multi-agent LLM framework based on credibility scoring. We model the collaborative query-answering process as an iterative game, where the agents communicate and contribute to a final system output. Our system associates a credibility score that is used when aggregating the team outputs. The credibility scores are learned gradually based on the past contributions of each agent in query answering. Our experiments across multiple tasks and settings demonstrate our system's effectiveness in mitigating adversarial influence and enhancing the resilience of multi-agent cooperation, even in the adversary-majority settings.
△ Less
Submitted 30 May, 2025;
originally announced May 2025.
-
Fair-Count-Min: Frequency Estimation under Equal Group-wise Approximation Factor
Authors:
Nima Shahbazi,
Stavros Sintos,
Abolfazl Asudeh
Abstract:
Frequency estimation in streaming data often relies on sketches like Count-Min (CM) to provide approximate answers with sublinear space. However, CM sketches introduce additive errors that disproportionately impact low-frequency elements, creating fairness concerns across different groups of elements. We introduce Fair-Count-Min, a frequency estimation sketch that guarantees equal expected approxi…
▽ More
Frequency estimation in streaming data often relies on sketches like Count-Min (CM) to provide approximate answers with sublinear space. However, CM sketches introduce additive errors that disproportionately impact low-frequency elements, creating fairness concerns across different groups of elements. We introduce Fair-Count-Min, a frequency estimation sketch that guarantees equal expected approximation factors across element groups, thus addressing the unfairness issue. We propose a column partitioning approach with group-aware semi-uniform hashing to eliminate collisions between elements from different groups. We provide theoretical guarantees for fairness, analyze the price of fairness, and validate our theoretical findings through extensive experiments on real-world and synthetic datasets. Our experimental results show that Fair-Count-Min achieves fairness with minimal additional error and maintains competitive efficiency compared to standard CM sketches.
△ Less
Submitted 24 May, 2025;
originally announced May 2025.
-
HENN: A Hierarchical Epsilon Net Navigation Graph for Approximate Nearest Neighbor Search
Authors:
Mohsen Dehghankar,
Abolfazl Asudeh
Abstract:
Hierarchical graph-based algorithms such as HNSW have achieved state-of-the-art performance for Approximate Nearest Neighbor (ANN) search in practice, yet they often lack theoretical guarantees on query time or recall due to their heavy use of randomized heuristic constructions. Conversely, existing theoretically grounded structures are typically difficult to implement and struggle to scale in rea…
▽ More
Hierarchical graph-based algorithms such as HNSW have achieved state-of-the-art performance for Approximate Nearest Neighbor (ANN) search in practice, yet they often lack theoretical guarantees on query time or recall due to their heavy use of randomized heuristic constructions. Conversely, existing theoretically grounded structures are typically difficult to implement and struggle to scale in real-world scenarios. We propose the Hierarchical $\varepsilon$-Net Navigation Graph (HENN), a novel graph-based indexing structure for ANN search that combines strong theoretical guarantees with practical efficiency. Built upon the theory of $\varepsilon$-nets, HENN guarantees polylogarithmic worst-case query time while preserving high recall and incurring minimal implementation overhead. Moreover, we establish a probabilistic polylogarithmic query time bound for HNSW, providing theoretical insight into its empirical success. In contrast to these prior hierarchical methods that may degrade to linear query time under adversarial data, HENN maintains provable performance independent of the input data distribution. Empirical evaluations demonstrate that HENN achieves faster query time while maintaining competitive recall on diverse data distributions, including adversarial inputs. These results underscore the effectiveness of HENN as a robust and scalable solution for fast and accurate nearest neighbor search.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
Toward Effective Reinforcement Learning Fine-Tuning for Medical VQA in Vision-Language Models
Authors:
Wenhui Zhu,
Xuanzhao Dong,
Xin Li,
Peijie Qiu,
Xiwen Chen,
Abolfazl Razi,
Aris Sotiras,
Yi Su,
Yalin Wang
Abstract:
Recently, reinforcement learning (RL)-based tuning has shifted the trajectory of Multimodal Large Language Models (MLLMs), particularly following the introduction of Group Relative Policy Optimization (GRPO). However, directly applying it to medical tasks remains challenging for achieving clinically grounded model behavior. Motivated by the need to align model response with clinical expectations,…
▽ More
Recently, reinforcement learning (RL)-based tuning has shifted the trajectory of Multimodal Large Language Models (MLLMs), particularly following the introduction of Group Relative Policy Optimization (GRPO). However, directly applying it to medical tasks remains challenging for achieving clinically grounded model behavior. Motivated by the need to align model response with clinical expectations, we investigate four critical dimensions that affect the effectiveness of RL-based tuning in medical visual question answering (VQA): base model initialization strategy, the role of medical semantic alignment, the impact of length-based rewards on long-chain reasoning, and the influence of bias. We conduct extensive experiments to analyze these factors for medical MLLMs, providing new insights into how models are domain-specifically fine-tuned. Additionally, our results also demonstrate that GRPO-based RL tuning consistently outperforms standard supervised fine-tuning (SFT) in both accuracy and reasoning quality.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Data Balancing Strategies: A Survey of Resampling and Augmentation Methods
Authors:
Behnam Yousefimehr,
Mehdi Ghatee,
Mohammad Amin Seifi,
Javad Fazli,
Sajed Tavakoli,
Zahra Rafei,
Shervin Ghaffari,
Abolfazl Nikahd,
Mahdi Razi Gandomani,
Alireza Orouji,
Ramtin Mahmoudi Kashani,
Sarina Heshmati,
Negin Sadat Mousavi
Abstract:
Imbalanced data poses a significant obstacle in machine learning, as an unequal distribution of class labels often results in skewed predictions and diminished model accuracy. To mitigate this problem, various resampling strategies have been developed, encompassing both oversampling and undersampling techniques aimed at modifying class proportions. Conventional oversampling approaches like SMOTE e…
▽ More
Imbalanced data poses a significant obstacle in machine learning, as an unequal distribution of class labels often results in skewed predictions and diminished model accuracy. To mitigate this problem, various resampling strategies have been developed, encompassing both oversampling and undersampling techniques aimed at modifying class proportions. Conventional oversampling approaches like SMOTE enhance the representation of the minority class, whereas undersampling methods focus on trimming down the majority class. Advances in deep learning have facilitated the creation of more complex solutions, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), which are capable of producing high-quality synthetic examples. This paper reviews a broad spectrum of data balancing methods, classifying them into categories including synthetic oversampling, adaptive techniques, generative models, ensemble-based strategies, hybrid approaches, undersampling, and neighbor-based methods. Furthermore, it highlights current developments in resampling techniques and discusses practical implementations and case studies that validate their effectiveness. The paper concludes by offering perspectives on potential directions for future exploration in this domain.
△ Less
Submitted 17 May, 2025;
originally announced May 2025.
-
DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models
Authors:
Xiwen Chen,
Wenhui Zhu,
Peijie Qiu,
Xuanzhao Dong,
Hao Wang,
Haiyu Wu,
Huayu Li,
Aristeidis Sotiras,
Yalin Wang,
Abolfazl Razi
Abstract:
Recent advances in reinforcement learning for language model post-training, such as Group Relative Policy Optimization (GRPO), have shown promise in low-resource settings. However, GRPO typically relies on solution-level and scalar reward signals that fail to capture the semantic diversity among sampled completions. This leads to what we identify as a diversity-quality inconsistency, where distinc…
▽ More
Recent advances in reinforcement learning for language model post-training, such as Group Relative Policy Optimization (GRPO), have shown promise in low-resource settings. However, GRPO typically relies on solution-level and scalar reward signals that fail to capture the semantic diversity among sampled completions. This leads to what we identify as a diversity-quality inconsistency, where distinct reasoning paths may receive indistinguishable rewards. To address this limitation, we propose $\textit{Diversity-aware Reward Adjustment}$ (DRA), a method that explicitly incorporates semantic diversity into the reward computation. DRA uses Submodular Mutual Information (SMI) to downweight redundant completions and amplify rewards for diverse ones. This encourages better exploration during learning, while maintaining stable exploitation of high-quality samples. Our method integrates seamlessly with both GRPO and its variant DR.~GRPO, resulting in $\textit{DRA-GRPO}$ and $\textit{DGA-DR.~GRPO}$. We evaluate our method on five mathematical reasoning benchmarks and find that it outperforms recent strong baselines. It achieves state-of-the-art performance with an average accuracy of 58.2%, using only 7,000 fine-tuning samples and a total training cost of approximately $55. The code is available at https://github.com/xiwenc1/DRA-GRPO.
△ Less
Submitted 15 May, 2025; v1 submitted 13 May, 2025;
originally announced May 2025.
-
Monte Carlo Beam Search for Actor-Critic Reinforcement Learning in Continuous Control
Authors:
Hazim Alzorgan,
Abolfazl Razi
Abstract:
Actor-critic methods, like Twin Delayed Deep Deterministic Policy Gradient (TD3), depend on basic noise-based exploration, which can result in less than optimal policy convergence. In this study, we introduce Monte Carlo Beam Search (MCBS), a new hybrid method that combines beam search and Monte Carlo rollouts with TD3 to improve exploration and action selection. MCBS produces several candidate ac…
▽ More
Actor-critic methods, like Twin Delayed Deep Deterministic Policy Gradient (TD3), depend on basic noise-based exploration, which can result in less than optimal policy convergence. In this study, we introduce Monte Carlo Beam Search (MCBS), a new hybrid method that combines beam search and Monte Carlo rollouts with TD3 to improve exploration and action selection. MCBS produces several candidate actions around the policy's output and assesses them through short-horizon rollouts, enabling the agent to make better-informed choices. We test MCBS across various continuous-control benchmarks, including HalfCheetah-v4, Walker2d-v5, and Swimmer-v5, showing enhanced sample efficiency and performance compared to standard TD3 and other baseline methods like SAC, PPO, and A2C. Our findings emphasize MCBS's capability to enhance policy learning through structured look-ahead search while ensuring computational efficiency. Additionally, we offer a detailed analysis of crucial hyperparameters, such as beam width and rollout depth, and explore adaptive strategies to optimize MCBS for complex control tasks. Our method shows a higher convergence rate across different environments compared to TD3, SAC, PPO, and A2C. For instance, we achieved 90% of the maximum achievable reward within around 200 thousand timesteps compared to 400 thousand timesteps for the second-best method.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
FIC-TSC: Learning Time Series Classification with Fisher Information Constraint
Authors:
Xiwen Chen,
Wenhui Zhu,
Peijie Qiu,
Hao Wang,
Huayu Li,
Zihan Li,
Yalin Wang,
Aristeidis Sotiras,
Abolfazl Razi
Abstract:
Analyzing time series data is crucial to a wide spectrum of applications, including economics, online marketplaces, and human healthcare. In particular, time series classification plays an indispensable role in segmenting different phases in stock markets, predicting customer behavior, and classifying worker actions and engagement levels. These aspects contribute significantly to the advancement o…
▽ More
Analyzing time series data is crucial to a wide spectrum of applications, including economics, online marketplaces, and human healthcare. In particular, time series classification plays an indispensable role in segmenting different phases in stock markets, predicting customer behavior, and classifying worker actions and engagement levels. These aspects contribute significantly to the advancement of automated decision-making and system optimization in real-world applications. However, there is a large consensus that time series data often suffers from domain shifts between training and test sets, which dramatically degrades the classification performance. Despite the success of (reversible) instance normalization in handling the domain shifts for time series regression tasks, its performance in classification is unsatisfactory. In this paper, we propose \textit{FIC-TSC}, a training framework for time series classification that leverages Fisher information as the constraint. We theoretically and empirically show this is an efficient and effective solution to guide the model converge toward flatter minima, which enhances its generalizability to distribution shifts. We rigorously evaluate our method on 30 UEA multivariate and 85 UCR univariate datasets. Our empirical results demonstrate the superiority of the proposed method over 14 recent state-of-the-art methods.
△ Less
Submitted 9 May, 2025;
originally announced May 2025.
-
Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions
Authors:
Mohammad Mahdi Abootorabi,
Omid Ghahroodi,
Pardis Sadat Zahraei,
Hossein Behzadasl,
Alireza Mirrokni,
Mobina Salimipanah,
Arash Rasouli,
Bahar Behzadipour,
Sara Azarnoush,
Benyamin Maleki,
Erfan Sadraiye,
Kiarash Kiani Feriz,
Mahdi Teymouri Nahad,
Ali Moghadasi,
Abolfazl Eshagh Abianeh,
Nizi Nazar,
Hamid R. Rabiee,
Mahdieh Soleymani Baghshah,
Meisam Ahmadi,
Ehsaneddin Asgari
Abstract:
Generative AI is reshaping art, gaming, and most notably animation. Recent breakthroughs in foundation and diffusion models have reduced the time and cost of producing animated content. Characters are central animation components, involving motion, emotions, gestures, and facial expressions. The pace and breadth of advances in recent months make it difficult to maintain a coherent view of the fiel…
▽ More
Generative AI is reshaping art, gaming, and most notably animation. Recent breakthroughs in foundation and diffusion models have reduced the time and cost of producing animated content. Characters are central animation components, involving motion, emotions, gestures, and facial expressions. The pace and breadth of advances in recent months make it difficult to maintain a coherent view of the field, motivating the need for an integrative review. Unlike earlier overviews that treat avatars, gestures, or facial animation in isolation, this survey offers a single, comprehensive perspective on all the main generative AI applications for character animation. We begin by examining the state-of-the-art in facial animation, expression rendering, image synthesis, avatar creation, gesture modeling, motion synthesis, object generation, and texture synthesis. We highlight leading research, practical deployments, commonly used datasets, and emerging trends for each area. To support newcomers, we also provide a comprehensive background section that introduces foundational models and evaluation metrics, equipping readers with the knowledge needed to enter the field. We discuss open challenges and map future research directions, providing a roadmap to advance AI-driven character-animation technologies. This survey is intended as a resource for researchers and developers entering the field of generative AI animation or adjacent fields. Resources are available at: https://github.com/llm-lab-org/Generative-AI-for-Character-Animation-Survey.
△ Less
Submitted 26 April, 2025;
originally announced April 2025.
-
Transfer Learning for High-dimensional Reduced Rank Time Series Models
Authors:
Mingliang Ma Abolfazl Safikhani
Abstract:
The objective of transfer learning is to enhance estimation and inference in a target data by leveraging knowledge gained from additional sources. Recent studies have explored transfer learning for independent observations in complex, high-dimensional models assuming sparsity, yet research on time series models remains limited. Our focus is on transfer learning for sequences of observations with t…
▽ More
The objective of transfer learning is to enhance estimation and inference in a target data by leveraging knowledge gained from additional sources. Recent studies have explored transfer learning for independent observations in complex, high-dimensional models assuming sparsity, yet research on time series models remains limited. Our focus is on transfer learning for sequences of observations with temporal dependencies and a more intricate model parameter structure. Specifically, we investigate the vector autoregressive model (VAR), a widely recognized model for time series data, where the transition matrix can be deconstructed into a combination of a sparse matrix and a low-rank one. We propose a new transfer learning algorithm tailored for estimating high-dimensional VAR models characterized by low-rank and sparse structures. Additionally, we present a novel approach for selecting informative observations from auxiliary datasets. Theoretical guarantees are established, encompassing model parameter consistency, informative set selection, and the asymptotic distribution of estimators under mild conditions. The latter facilitates the construction of entry-wise confidence intervals for model parameters. Finally, we demonstrate the empirical efficacy of our methodologies through both simulated and real-world datasets.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
How Effective Can Dropout Be in Multiple Instance Learning ?
Authors:
Wenhui Zhu,
Peijie Qiu,
Xiwen Chen,
Zhangsihao Yang,
Aristeidis Sotiras,
Abolfazl Razi,
Yalin Wang
Abstract:
Multiple Instance Learning (MIL) is a popular weakly-supervised method for various applications, with a particular interest in histological whole slide image (WSI) classification. Due to the gigapixel resolution of WSI, applications of MIL in WSI typically necessitate a two-stage training scheme: first, extract features from the pre-trained backbone and then perform MIL aggregation. However, it is…
▽ More
Multiple Instance Learning (MIL) is a popular weakly-supervised method for various applications, with a particular interest in histological whole slide image (WSI) classification. Due to the gigapixel resolution of WSI, applications of MIL in WSI typically necessitate a two-stage training scheme: first, extract features from the pre-trained backbone and then perform MIL aggregation. However, it is well-known that this suboptimal training scheme suffers from "noisy" feature embeddings from the backbone and inherent weak supervision, hindering MIL from learning rich and generalizable features. However, the most commonly used technique (i.e., dropout) for mitigating this issue has yet to be explored in MIL. In this paper, we empirically explore how effective the dropout can be in MIL. Interestingly, we observe that dropping the top-k most important instances within a bag leads to better performance and generalization even under noise attack. Based on this key observation, we propose a novel MIL-specific dropout method, termed MIL-Dropout, which systematically determines which instances to drop. Experiments on five MIL benchmark datasets and two WSI datasets demonstrate that MIL-Dropout boosts the performance of current MIL methods with a negligible computational cost. The code is available at https://github.com/ChongQingNoSubway/MILDropout.
△ Less
Submitted 20 May, 2025; v1 submitted 20 April, 2025;
originally announced April 2025.
-
Graph Based Deep Reinforcement Learning Aided by Transformers for Multi-Agent Cooperation
Authors:
Michael Elrod,
Niloufar Mehrabi,
Rahul Amin,
Manveen Kaur,
Long Cheng,
Jim Martin,
Abolfazl Razi
Abstract:
Mission planning for a fleet of cooperative autonomous drones in applications that involve serving distributed target points, such as disaster response, environmental monitoring, and surveillance, is challenging, especially under partial observability, limited communication range, and uncertain environments. Traditional path-planning algorithms struggle in these scenarios, particularly when prior…
▽ More
Mission planning for a fleet of cooperative autonomous drones in applications that involve serving distributed target points, such as disaster response, environmental monitoring, and surveillance, is challenging, especially under partial observability, limited communication range, and uncertain environments. Traditional path-planning algorithms struggle in these scenarios, particularly when prior information is not available. To address these challenges, we propose a novel framework that integrates Graph Neural Networks (GNNs), Deep Reinforcement Learning (DRL), and transformer-based mechanisms for enhanced multi-agent coordination and collective task execution. Our approach leverages GNNs to model agent-agent and agent-goal interactions through adaptive graph construction, enabling efficient information aggregation and decision-making under constrained communication. A transformer-based message-passing mechanism, augmented with edge-feature-enhanced attention, captures complex interaction patterns, while a Double Deep Q-Network (Double DQN) with prioritized experience replay optimizes agent policies in partially observable environments. This integration is carefully designed to address specific requirements of multi-agent navigation, such as scalability, adaptability, and efficient task execution. Experimental results demonstrate superior performance, with 90% service provisioning and 100% grid coverage (node discovery), while reducing the average steps per episode to 200, compared to 600 for benchmark methods such as particle swarm optimization (PSO), greedy algorithms and DQN.
△ Less
Submitted 10 April, 2025;
originally announced April 2025.
-
Extended Visibility of Autonomous Vehicles via Optimized Cooperative Perception under Imperfect Communication
Authors:
Ahmad Sarlak,
Rahul Amin,
Abolfazl Razi
Abstract:
Autonomous Vehicles (AVs) rely on individual perception systems to navigate safely. However, these systems face significant challenges in adverse weather conditions, complex road geometries, and dense traffic scenarios. Cooperative Perception (CP) has emerged as a promising approach to extending the perception quality of AVs by jointly processing shared camera feeds and sensor readings across mult…
▽ More
Autonomous Vehicles (AVs) rely on individual perception systems to navigate safely. However, these systems face significant challenges in adverse weather conditions, complex road geometries, and dense traffic scenarios. Cooperative Perception (CP) has emerged as a promising approach to extending the perception quality of AVs by jointly processing shared camera feeds and sensor readings across multiple vehicles. This work presents a novel CP framework designed to optimize vehicle selection and networking resource utilization under imperfect communications. Our optimized CP formation considers critical factors such as the helper vehicles' spatial position, visual range, motion blur, and available communication budgets. Furthermore, our resource optimization module allocates communication channels while adjusting power levels to maximize data flow efficiency between the ego and helper vehicles, considering realistic models of modern vehicular communication systems, such as LTE and 5G NR-V2X. We validate our approach through extensive experiments on pedestrian detection in challenging scenarios, using synthetic data generated by the CARLA simulator. The results demonstrate that our method significantly improves upon the perception quality of individual AVs with about 10% gain in detection accuracy. This substantial gain uncovers the unleashed potential of CP to enhance AV safety and performance in complex situations.
△ Less
Submitted 23 March, 2025;
originally announced March 2025.
-
Reinforcement Learning-Based Neuroadaptive Control of Robotic Manipulators under Deferred Constraints
Authors:
Hamed Rahimi Nohooji,
Abolfazl Zaraki,
Holger Voos
Abstract:
This paper presents a reinforcement learning-based neuroadaptive control framework for robotic manipulators operating under deferred constraints. The proposed approach improves traditional barrier Lyapunov functions by introducing a smooth constraint enforcement mechanism that offers two key advantages: (i) it minimizes control effort in unconstrained regions and progressively increases it near co…
▽ More
This paper presents a reinforcement learning-based neuroadaptive control framework for robotic manipulators operating under deferred constraints. The proposed approach improves traditional barrier Lyapunov functions by introducing a smooth constraint enforcement mechanism that offers two key advantages: (i) it minimizes control effort in unconstrained regions and progressively increases it near constraints, improving energy efficiency, and (ii) it enables gradual constraint activation through a prescribed-time shifting function, allowing safe operation even when initial conditions violate constraints. To address system uncertainties and improve adaptability, an actor-critic reinforcement learning framework is employed. The critic network estimates the value function, while the actor network learns an optimal control policy in real time, enabling adaptive constraint handling without requiring explicit system modeling. Lyapunov-based stability analysis guarantees the boundedness of all closed-loop signals. The effectiveness of the proposed method is validated through numerical simulations.
△ Less
Submitted 18 March, 2025;
originally announced March 2025.
-
Fire and Smoke Datasets in 20 Years: An In-depth Review
Authors:
Sayed Pedram Haeri Boroujeni,
Niloufar Mehrabi,
Fatemeh Afghah,
Connor Peter McGrath,
Danish Bhatkar,
Mithilesh Anil Biradar,
Abolfazl Razi
Abstract:
Fire and smoke phenomena pose a significant threat to the natural environment, ecosystems, and global economy, as well as human lives and wildlife. In this particular circumstance, there is a demand for more sophisticated and advanced technologies to implement an effective strategy for early detection, real-time monitoring, and minimizing the overall impacts of fires on ecological balance and publ…
▽ More
Fire and smoke phenomena pose a significant threat to the natural environment, ecosystems, and global economy, as well as human lives and wildlife. In this particular circumstance, there is a demand for more sophisticated and advanced technologies to implement an effective strategy for early detection, real-time monitoring, and minimizing the overall impacts of fires on ecological balance and public safety. Recently, the rapid advancement of Artificial Intelligence (AI) and Computer Vision (CV) frameworks has substantially revolutionized the momentum for developing efficient fire management systems. However, these systems extensively rely on the availability of adequate and high-quality fire and smoke data to create proficient Machine Learning (ML) methods for various tasks, such as detection and monitoring. Although fire and smoke datasets play a critical role in training, evaluating, and testing advanced Deep Learning (DL) models, a comprehensive review of the existing datasets is still unexplored. For this purpose, we provide an in-depth review to systematically analyze and evaluate fire and smoke datasets collected over the past 20 years. We investigate the characteristics of each dataset, including type, size, format, collection methods, and geographical diversities. We also review and highlight the unique features of each dataset, such as imaging modalities (RGB, thermal, infrared) and their applicability for different fire management tasks (classification, segmentation, detection). Furthermore, we summarize the strengths and weaknesses of each dataset and discuss their potential for advancing research and technology in fire management. Ultimately, we conduct extensive experimental analyses across different datasets using several state-of-the-art algorithms, such as ResNet-50, DeepLab-V3, and YoloV8.
△ Less
Submitted 17 March, 2025;
originally announced March 2025.
-
Prompt-OT: An Optimal Transport Regularization Paradigm for Knowledge Preservation in Vision-Language Model Adaptation
Authors:
Xiwen Chen,
Wenhui Zhu,
Peijie Qiu,
Hao Wang,
Huayu Li,
Haiyu Wu,
Aristeidis Sotiras,
Yalin Wang,
Abolfazl Razi
Abstract:
Vision-language models (VLMs) such as CLIP demonstrate strong performance but struggle when adapted to downstream tasks. Prompt learning has emerged as an efficient and effective strategy to adapt VLMs while preserving their pre-trained knowledge. However, existing methods still lead to overfitting and degrade zero-shot generalization. To address this challenge, we propose an optimal transport (OT…
▽ More
Vision-language models (VLMs) such as CLIP demonstrate strong performance but struggle when adapted to downstream tasks. Prompt learning has emerged as an efficient and effective strategy to adapt VLMs while preserving their pre-trained knowledge. However, existing methods still lead to overfitting and degrade zero-shot generalization. To address this challenge, we propose an optimal transport (OT)-guided prompt learning framework that mitigates forgetting by preserving the structural consistency of feature distributions between pre-trained and fine-tuned models. Unlike conventional point-wise constraints, OT naturally captures cross-instance relationships and expands the feasible parameter space for prompt tuning, allowing a better trade-off between adaptation and generalization. Our approach enforces joint constraints on both vision and text representations, ensuring a holistic feature alignment. Extensive experiments on benchmark datasets demonstrate that our simple yet effective method can outperform existing prompt learning strategies in base-to-novel generalization, cross-dataset evaluation, and domain generalization without additional augmentation or ensemble techniques. The code is available at https://github.com/ChongQingNoSubway/Prompt-OT
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
FairDeFace: Evaluating the Fairness and Adversarial Robustness of Face Obfuscation Methods
Authors:
Seyyed Mohammad Sadegh Moosavi Khorzooghi,
Poojitha Thota,
Mohit Singhal,
Abolfazl Asudeh,
Gautam Das,
Shirin Nilizadeh
Abstract:
The lack of a common platform and benchmark datasets for evaluating face obfuscation methods has been a challenge, with every method being tested using arbitrary experiments, datasets, and metrics. While prior work has demonstrated that face recognition systems exhibit bias against some demographic groups, there exists a substantial gap in our understanding regarding the fairness of face obfuscati…
▽ More
The lack of a common platform and benchmark datasets for evaluating face obfuscation methods has been a challenge, with every method being tested using arbitrary experiments, datasets, and metrics. While prior work has demonstrated that face recognition systems exhibit bias against some demographic groups, there exists a substantial gap in our understanding regarding the fairness of face obfuscation methods. Providing fair face obfuscation methods can ensure equitable protection across diverse demographic groups, especially since they can be used to preserve the privacy of vulnerable populations. To address these gaps, this paper introduces a comprehensive framework, named FairDeFace, designed to assess the adversarial robustness and fairness of face obfuscation methods. The framework introduces a set of modules encompassing data benchmarks, face detection and recognition algorithms, adversarial models, utility detection models, and fairness metrics. FairDeFace serves as a versatile platform where any face obfuscation method can be integrated, allowing for rigorous testing and comparison with other state-of-the-art methods. In its current implementation, FairDeFace incorporates 6 attacks, and several privacy, utility and fairness metrics. Using FairDeFace, and by conducting more than 500 experiments, we evaluated and compared the adversarial robustness of seven face obfuscation methods. This extensive analysis led to many interesting findings both in terms of the degree of robustness of existing methods and their biases against some gender or racial groups. FairDeFace also uses visualization of focused areas for both obfuscation and verification attacks to show not only which areas are mostly changed in the obfuscation process for some demographics, but also why they failed through focus area comparison of obfuscation and verification.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
Learning to Localize Leakage of Cryptographic Sensitive Variables
Authors:
Jimmy Gammell,
Anand Raghunathan,
Abolfazl Hashemi,
Kaushik Roy
Abstract:
While cryptographic algorithms such as the ubiquitous Advanced Encryption Standard (AES) are secure, *physical implementations* of these algorithms in hardware inevitably 'leak' sensitive data such as cryptographic keys. A particularly insidious form of leakage arises from the fact that hardware consumes power and emits radiation in a manner that is statistically associated with the data it proces…
▽ More
While cryptographic algorithms such as the ubiquitous Advanced Encryption Standard (AES) are secure, *physical implementations* of these algorithms in hardware inevitably 'leak' sensitive data such as cryptographic keys. A particularly insidious form of leakage arises from the fact that hardware consumes power and emits radiation in a manner that is statistically associated with the data it processes and the instructions it executes. Supervised deep learning has emerged as a state-of-the-art tool for carrying out *side-channel attacks*, which exploit this leakage by learning to map power/radiation measurements throughout encryption to the sensitive data operated on during that encryption. In this work we develop a principled deep learning framework for determining the relative leakage due to measurements recorded at different points in time, in order to inform *defense* against such attacks. This information is invaluable to cryptographic hardware designers for understanding *why* their hardware leaks and how they can mitigate it (e.g. by indicating the particular sections of code or electronic components which are responsible). Our framework is based on an adversarial game between a family of classifiers trained to estimate the conditional distributions of sensitive data given subsets of measurements, and a budget-constrained noise distribution which probabilistically erases individual measurements to maximize the loss of these classifiers. We demonstrate our method's efficacy and ability to overcome limitations of prior work through extensive experimental comparison with 8 baseline methods using 3 evaluation metrics and 6 publicly-available power/EM trace datasets from AES, ECC and RSA implementations. We provide an open-source PyTorch implementation of these experiments.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons
Authors:
Shaona Ghosh,
Heather Frase,
Adina Williams,
Sarah Luger,
Paul Röttger,
Fazl Barez,
Sean McGregor,
Kenneth Fricklas,
Mala Kumar,
Quentin Feuillade--Montixi,
Kurt Bollacker,
Felix Friedrich,
Ryan Tsang,
Bertie Vidgen,
Alicia Parrish,
Chris Knotz,
Eleonora Presani,
Jonathan Bennion,
Marisa Ferrara Boston,
Mike Kuniavsky,
Wiebke Hutiri,
James Ezick,
Malek Ben Salem,
Rajat Sahay,
Sujata Goswami
, et al. (77 additional authors not shown)
Abstract:
The rapid advancement and deployment of AI systems have created an urgent need for standard safety-evaluation frameworks. This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability. Its development employed an open process that included participants from multiple fields. The benchmark evaluates an AI system's resistance…
▽ More
The rapid advancement and deployment of AI systems have created an urgent need for standard safety-evaluation frameworks. This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability. Its development employed an open process that included participants from multiple fields. The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories, including violent crimes, nonviolent crimes, sex-related crimes, child sexual exploitation, indiscriminate weapons, suicide and self-harm, intellectual property, privacy, defamation, hate, sexual content, and specialized advice (election, financial, health, legal). Our method incorporates a complete assessment standard, extensive prompt datasets, a novel evaluation framework, a grading and reporting system, and the technical as well as organizational infrastructure for long-term support and evolution. In particular, the benchmark employs an understandable five-tier grading scale (Poor to Excellent) and incorporates an innovative entropy-based system-response evaluation.
In addition to unveiling the benchmark, this report also identifies limitations of our method and of building safety benchmarks generally, including evaluator uncertainty and the constraints of single-turn interactions. This work represents a crucial step toward establishing global standards for AI risk and reliability evaluation while acknowledging the need for continued development in areas such as multiturn interactions, multimodal understanding, coverage of additional languages, and emerging hazard categories. Our findings provide valuable insights for model developers, system integrators, and policymakers working to promote safer AI deployment.
△ Less
Submitted 18 April, 2025; v1 submitted 19 February, 2025;
originally announced March 2025.
-
Building Machine Learning Challenges for Anomaly Detection in Science
Authors:
Elizabeth G. Campolongo,
Yuan-Tang Chou,
Ekaterina Govorkova,
Wahid Bhimji,
Wei-Lun Chao,
Chris Harris,
Shih-Chieh Hsu,
Hilmar Lapp,
Mark S. Neubauer,
Josephine Namayanja,
Aneesh Subramanian,
Philip Harris,
Advaith Anand,
David E. Carlyn,
Subhankar Ghosh,
Christopher Lawrence,
Eric Moreno,
Ryan Raikman,
Jiaman Wu,
Ziheng Zhang,
Bayu Adhi,
Mohammad Ahmadi Gharehtoragh,
Saúl Alonso Monsalve,
Marta Babicz,
Furqan Baig
, et al. (125 additional authors not shown)
Abstract:
Scientific discoveries are often made by finding a pattern or object that was not predicted by the known rules of science. Oftentimes, these anomalous events or objects that do not conform to the norms are an indication that the rules of science governing the data are incomplete, and something new needs to be present to explain these unexpected outliers. The challenge of finding anomalies can be c…
▽ More
Scientific discoveries are often made by finding a pattern or object that was not predicted by the known rules of science. Oftentimes, these anomalous events or objects that do not conform to the norms are an indication that the rules of science governing the data are incomplete, and something new needs to be present to explain these unexpected outliers. The challenge of finding anomalies can be confounding since it requires codifying a complete knowledge of the known scientific behaviors and then projecting these known behaviors on the data to look for deviations. When utilizing machine learning, this presents a particular challenge since we require that the model not only understands scientific data perfectly but also recognizes when the data is inconsistent and out of the scope of its trained behavior. In this paper, we present three datasets aimed at developing machine learning-based anomaly detection for disparate scientific domains covering astrophysics, genomics, and polar science. We present the different datasets along with a scheme to make machine learning challenges around the three datasets findable, accessible, interoperable, and reusable (FAIR). Furthermore, we present an approach that generalizes to future machine learning challenges, enabling the possibility of large, more compute-intensive challenges that can ultimately lead to scientific discovery.
△ Less
Submitted 29 March, 2025; v1 submitted 3 March, 2025;
originally announced March 2025.
-
Leveraging Machine Learning and Deep Learning Techniques for Improved Pathological Staging of Prostate Cancer
Authors:
Raziehsadat Ghalamkarian,
Marziehsadat Ghalamkarian,
MortezaAli Ahmadi,
Sayed Mohammad Ahmadi,
Abolfazl Diyanat
Abstract:
Prostate cancer (Pca) continues to be a leading cause of cancer-related mortality in men, and the limitations in precision of traditional diagnostic methods such as the Digital Rectal Exam (DRE), Prostate-Specific Antigen (PSA) testing, and biopsies underscore the critical importance of accurate staging detection in enhancing treatment outcomes and improving patient prognosis. This study leverages…
▽ More
Prostate cancer (Pca) continues to be a leading cause of cancer-related mortality in men, and the limitations in precision of traditional diagnostic methods such as the Digital Rectal Exam (DRE), Prostate-Specific Antigen (PSA) testing, and biopsies underscore the critical importance of accurate staging detection in enhancing treatment outcomes and improving patient prognosis. This study leverages machine learning and deep learning approaches, along with feature selection and extraction methods, to enhance PCa pathological staging predictions using RNA sequencing data from The Cancer Genome Atlas (TCGA). Gene expression profiles from 486 tumors were analyzed using advanced algorithms, including Random Forest (RF), Logistic Regression (LR), Extreme Gradient Boosting (XGB), and Support Vector Machine (SVM). The performance of the study is measured with respect to the F1-score, as well as precision and recall, all of which are calculated as weighted averages. The results reveal that the highest test F1-score, approximately 83%, was achieved by the Random Forest algorithm, followed by Logistic Regression at 80%, while both Extreme Gradient Boosting (XGB) and Support Vector Machine (SVM) scored around 79%. Furthermore, deep learning models with data augmentation achieved an accuracy of 71. 23%, while PCA-based dimensionality reduction reached an accuracy of 69.86%. This research highlights the potential of AI-driven approaches in clinical oncology, paving the way for more reliable diagnostic tools that can ultimately improve patient outcomes.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
Predicting Drive Test Results in Mobile Networks Using Optimization Techniques
Authors:
MohammadJava Taheri,
Abolfazl Diyanat,
MortezaAli Ahmadi,
Ali Nazari
Abstract:
Mobile network operators constantly optimize their networks to ensure superior service quality and coverage. This optimization is crucial for maintaining an optimal user experience and requires extensive data collection and analysis. One of the primary methods for gathering this data is through drive tests, where technical teams use specialized equipment to collect signal information across variou…
▽ More
Mobile network operators constantly optimize their networks to ensure superior service quality and coverage. This optimization is crucial for maintaining an optimal user experience and requires extensive data collection and analysis. One of the primary methods for gathering this data is through drive tests, where technical teams use specialized equipment to collect signal information across various regions. However, drive tests are both costly and time-consuming, and they face challenges such as traffic conditions, environmental factors, and limited access to certain areas. These constraints make it difficult to replicate drive tests under similar conditions. In this study, we propose a method that enables operators to predict received signal strength at specific locations using data from other drive test points. By reducing the need for widespread drive tests, this approach allows operators to save time and resources while still obtaining the necessary data to optimize their networks and mitigate the challenges associated with traditional drive tests.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
Anomaly Detection in Cooperative Vehicle Perception Systems under Imperfect Communication
Authors:
Ashish Bastola,
Hao Wang,
Abolfazl Razi
Abstract:
Anomaly detection is a critical requirement for ensuring safety in autonomous driving. In this work, we leverage Cooperative Perception to share information across nearby vehicles, enabling more accurate identification and consensus of anomalous behaviors in complex traffic scenarios. To account for the real-world challenge of imperfect communication, we propose a cooperative-perception-based anom…
▽ More
Anomaly detection is a critical requirement for ensuring safety in autonomous driving. In this work, we leverage Cooperative Perception to share information across nearby vehicles, enabling more accurate identification and consensus of anomalous behaviors in complex traffic scenarios. To account for the real-world challenge of imperfect communication, we propose a cooperative-perception-based anomaly detection framework (CPAD), which is a robust architecture that remains effective under communication interruptions, thereby facilitating reliable performance even in low-bandwidth settings. Since no multi-agent anomaly detection dataset exists for vehicle trajectories, we introduce 15,000 different scenarios with a 90,000 trajectories benchmark dataset generated through rule-based vehicle dynamics analysis. Empirical results demonstrate that our approach outperforms standard anomaly classification methods in F1-score, AUC and showcase strong robustness to agent connection interruptions.
△ Less
Submitted 28 January, 2025;
originally announced January 2025.
-
Sequence Complementor: Complementing Transformers For Time Series Forecasting with Learnable Sequences
Authors:
Xiwen Chen,
Peijie Qiu,
Wenhui Zhu,
Huayu Li,
Hao Wang,
Aristeidis Sotiras,
Yalin Wang,
Abolfazl Razi
Abstract:
Since its introduction, the transformer has shifted the development trajectory away from traditional models (e.g., RNN, MLP) in time series forecasting, which is attributed to its ability to capture global dependencies within temporal tokens. Follow-up studies have largely involved altering the tokenization and self-attention modules to better adapt Transformers for addressing special challenges l…
▽ More
Since its introduction, the transformer has shifted the development trajectory away from traditional models (e.g., RNN, MLP) in time series forecasting, which is attributed to its ability to capture global dependencies within temporal tokens. Follow-up studies have largely involved altering the tokenization and self-attention modules to better adapt Transformers for addressing special challenges like non-stationarity, channel-wise dependency, and variable correlation in time series. However, we found that the expressive capability of sequence representation is a key factor influencing Transformer performance in time forecasting after investigating several representative methods, where there is an almost linear relationship between sequence representation entropy and mean square error, with more diverse representations performing better. In this paper, we propose a novel attention mechanism with Sequence Complementors and prove feasible from an information theory perspective, where these learnable sequences are able to provide complementary information beyond current input to feed attention. We further enhance the Sequence Complementors via a diversification loss that is theoretically covered. The empirical evaluation of both long-term and short-term forecasting has confirmed its superiority over the recent state-of-the-art methods.
△ Less
Submitted 5 January, 2025;
originally announced January 2025.
-
Diffusion Prism: Enhancing Diversity and Morphology Consistency in Mask-to-Image Diffusion
Authors:
Hao Wang,
Xiwen Chen,
Ashish Bastola,
Jiayou Qin,
Abolfazl Razi
Abstract:
The emergence of generative AI and controllable diffusion has made image-to-image synthesis increasingly practical and efficient. However, when input images exhibit low entropy and sparse, the inherent characteristics of diffusion models often result in limited diversity. This constraint significantly interferes with data augmentation. To address this, we propose Diffusion Prism, a training-free f…
▽ More
The emergence of generative AI and controllable diffusion has made image-to-image synthesis increasingly practical and efficient. However, when input images exhibit low entropy and sparse, the inherent characteristics of diffusion models often result in limited diversity. This constraint significantly interferes with data augmentation. To address this, we propose Diffusion Prism, a training-free framework that efficiently transforms binary masks into realistic and diverse samples while preserving morphological features. We explored that a small amount of artificial noise will significantly assist the image-denoising process. To prove this novel mask-to-image concept, we use nano-dendritic patterns as an example to demonstrate the merit of our method compared to existing controllable diffusion models. Furthermore, we extend the proposed framework to other biological patterns, highlighting its potential applications across various fields.
△ Less
Submitted 10 January, 2025; v1 submitted 1 January, 2025;
originally announced January 2025.
-
Multimodal Variational Autoencoder: a Barycentric View
Authors:
Peijie Qiu,
Wenhui Zhu,
Sayantan Kumar,
Xiwen Chen,
Xiaotong Sun,
Jin Yang,
Abolfazl Razi,
Yalin Wang,
Aristeidis Sotiras
Abstract:
Multiple signal modalities, such as vision and sounds, are naturally present in real-world phenomena. Recently, there has been growing interest in learning generative models, in particular variational autoencoder (VAE), to for multimodal representation learning especially in the case of missing modalities. The primary goal of these models is to learn a modality-invariant and modality-specific repr…
▽ More
Multiple signal modalities, such as vision and sounds, are naturally present in real-world phenomena. Recently, there has been growing interest in learning generative models, in particular variational autoencoder (VAE), to for multimodal representation learning especially in the case of missing modalities. The primary goal of these models is to learn a modality-invariant and modality-specific representation that characterizes information across multiple modalities. Previous attempts at multimodal VAEs approach this mainly through the lens of experts, aggregating unimodal inference distributions with a product of experts (PoE), a mixture of experts (MoE), or a combination of both. In this paper, we provide an alternative generic and theoretical formulation of multimodal VAE through the lens of barycenter. We first show that PoE and MoE are specific instances of barycenters, derived by minimizing the asymmetric weighted KL divergence to unimodal inference distributions. Our novel formulation extends these two barycenters to a more flexible choice by considering different types of divergences. In particular, we explore the Wasserstein barycenter defined by the 2-Wasserstein distance, which better preserves the geometry of unimodal distributions by capturing both modality-specific and modality-invariant representations compared to KL divergence. Empirical studies on three multimodal benchmarks demonstrated the effectiveness of the proposed method.
△ Less
Submitted 29 December, 2024;
originally announced December 2024.
-
GAP: Game Theory-Based Approach for Reliability and Power Management in Emerging Fog Computing
Authors:
Abolfazl Younesi,
Mohsen Ansari,
Alireza Ejlali,
Mohammad Amin Fazli,
Muhammad Shafique,
Jörg Henkel
Abstract:
Fog computing brings about a transformative shift in data management, presenting unprecedented opportunities for enhanced performance and reduced latency. However, one of the key aspects of fog computing revolves around ensuring efficient power and reliability management. To address this challenge, we have introduced a novel model that proposes a non-cooperative game theory-based strategy to strik…
▽ More
Fog computing brings about a transformative shift in data management, presenting unprecedented opportunities for enhanced performance and reduced latency. However, one of the key aspects of fog computing revolves around ensuring efficient power and reliability management. To address this challenge, we have introduced a novel model that proposes a non-cooperative game theory-based strategy to strike a balance between power consumption and reliability in decision-making processes. Our proposed model capitalizes on the Cold Primary/Backup strategy (CPB) to guarantee reliability target by re-executing tasks to different nodes when a fault occurs, while also leveraging Dynamic Voltage and Frequency Scaling (DVFS) to reduce power consumption during task execution and maximizing overall efficiency. Non-cooperative game theory plays a pivotal role in our model, as it facilitates the development of strategies and solutions that uphold reliability while reducing power consumption. By treating the trade-off between power and reliability as a non-cooperative game, our proposed method yields significant energy savings, with up to a 35% reduction in energy consumption, 41% decrease in wait time, and 31% shorter completion time compared to state-of-the-art approaches. Our findings underscore the value of game theory in optimizing power and reliability within fog computing environments, demonstrating its potential for driving substantial improvements
△ Less
Submitted 15 December, 2024;
originally announced December 2024.
-
Geographical Information Alignment Boosts Traffic Analysis via Transpose Cross-attention
Authors:
Xiangyu Jiang,
Xiwen Chen,
Hao Wang,
Abolfazl Razi
Abstract:
Traffic accident prediction is crucial for enhancing road safety and mitigating congestion, and recent Graph Neural Networks (GNNs) have shown promise in modeling the inherent graph-based traffic data. However, existing GNN- based approaches often overlook or do not explicitly exploit geographic position information, which often plays a critical role in understanding spatial dependencies. This is…
▽ More
Traffic accident prediction is crucial for enhancing road safety and mitigating congestion, and recent Graph Neural Networks (GNNs) have shown promise in modeling the inherent graph-based traffic data. However, existing GNN- based approaches often overlook or do not explicitly exploit geographic position information, which often plays a critical role in understanding spatial dependencies. This is also aligned with our observation, where accident locations are often highly relevant. To address this issue, we propose a plug-in-and-play module for common GNN frameworks, termed Geographic Information Alignment (GIA). This module can efficiently fuse the node feature and geographic position information through a novel Transpose Cross-attention mechanism. Due to the large number of nodes for traffic data, the conventional cross-attention mechanism performing the node-wise alignment may be infeasible in computation-limited resources. Instead, we take the transpose operation for Query, Key, and Value in the Cross-attention mechanism, which substantially reduces the computation cost while maintaining sufficient information. Experimental results for both traffic occurrence prediction and severity prediction (severity levels based on the interval of recorded crash counts) on large-scale city-wise datasets confirm the effectiveness of our proposed method. For example, our method can obtain gains ranging from 1.3% to 10.9% in F1 score and 0.3% to 4.8% in AUC.
△ Less
Submitted 3 December, 2024;
originally announced December 2024.
-
Many-MobileNet: Multi-Model Augmentation for Robust Retinal Disease Classification
Authors:
Hao Wang,
Wenhui Zhu,
Xuanzhao Dong,
Yanxi Chen,
Xin Li,
Peijie Qiu,
Xiwen Chen,
Vamsi Krishna Vasa,
Yujian Xiong,
Oana M. Dumitrascu,
Abolfazl Razi,
Yalin Wang
Abstract:
In this work, we propose Many-MobileNet, an efficient model fusion strategy for retinal disease classification using lightweight CNN architecture. Our method addresses key challenges such as overfitting and limited dataset variability by training multiple models with distinct data augmentation strategies and different model complexities. Through this fusion technique, we achieved robust generaliza…
▽ More
In this work, we propose Many-MobileNet, an efficient model fusion strategy for retinal disease classification using lightweight CNN architecture. Our method addresses key challenges such as overfitting and limited dataset variability by training multiple models with distinct data augmentation strategies and different model complexities. Through this fusion technique, we achieved robust generalization in data-scarce domains while balancing computational efficiency with feature extraction capabilities.
△ Less
Submitted 3 December, 2024;
originally announced December 2024.
-
Overview of NR Enhancements for Extended Reality (XR) in 3GPP 5G-Advanced
Authors:
Margarita Gapeyenko,
Stefano Paris,
Markus Isomaki,
Boyan Yanakiev,
Abolfazl Amiri,
Benoist Sébire,
Jorma Kaikkonen,
Chunli Wu,
Klaus I. Pedersen
Abstract:
Extended reality (XR) is unlocking numerous possibilities and continues attracting individuals and larger groups across different business sectors. With Virtual reality (VR), Augmented reality (AR), or Mixed reality (MR) it is possible to improve the way we access, deliver and exchange information in education, health care, entertainment, and many other aspects of our daily lives. However, to full…
▽ More
Extended reality (XR) is unlocking numerous possibilities and continues attracting individuals and larger groups across different business sectors. With Virtual reality (VR), Augmented reality (AR), or Mixed reality (MR) it is possible to improve the way we access, deliver and exchange information in education, health care, entertainment, and many other aspects of our daily lives. However, to fully exploit the potential of XR, it is important to provide reliable, fast and secure wireless connectivity to the users of XR and that requires refining existing solutions and tailoring those to support XR services. This article presents a tutorial on 3GPP 5G-Advanced Release 18 XR activities, summarizing physical as well as higher layer enhancements introduced for New Radio considering the specifics of XR. In addition, we also describe enhancements across 5G system architecture that impacted radio access network. Furthermore, the paper provides system-level simulation results for several Release 18 enhancements to show their benefits in terms of XR capacity and power saving gains. Finally, it concludes with an overview of future work in Release 19 that continues developing features to support XR services.
△ Less
Submitted 1 December, 2024;
originally announced December 2024.
-
Needle: A Generative AI-Powered Multi-modal Database for Answering Complex Natural Language Queries
Authors:
Mahdi Erfanian,
Mohsen Dehghankar,
Abolfazl Asudeh
Abstract:
Multi-modal datasets, like those involving images, often miss the detailed descriptions that properly capture the rich information encoded in each item. This makes answering complex natural language queries a major challenge in this domain. In particular, unlike the traditional nearest neighbor search, where the tuples and the query are represented as points in a single metric space, these setting…
▽ More
Multi-modal datasets, like those involving images, often miss the detailed descriptions that properly capture the rich information encoded in each item. This makes answering complex natural language queries a major challenge in this domain. In particular, unlike the traditional nearest neighbor search, where the tuples and the query are represented as points in a single metric space, these settings involve queries and tuples embedded in fundamentally different spaces, making the traditional query answering methods inapplicable. Existing literature addresses this challenge for image datasets through vector representations jointly trained on natural language and images. This technique, however, underperforms for complex queries due to various reasons.
This paper takes a step towards addressing this challenge by introducing a Generative-based Monte Carlo method that utilizes foundation models to generate synthetic samples that capture the complexity of the natural language query and represent it in the same metric space as the multi-modal data.
Following this method, we propose Needle, a database for image data retrieval. Instead of relying on contrastive learning or metadata-searching approaches, our system is based on synthetic data generation to capture the complexities of natural language queries. Our system is open-source and ready for deployment, designed to be easily adopted by researchers and developers. The comprehensive experiments on various benchmark datasets verify that this system significantly outperforms state-of-the-art text-to-image retrieval methods in the literature. Any foundation model and embedder can be easily integrated into Needle to improve the performance, piggybacking on the advancements in these technologies.
△ Less
Submitted 2 June, 2025; v1 submitted 30 November, 2024;
originally announced December 2024.
-
Rank It, Then Ask It: Input Reranking for Maximizing the Performance of LLMs on Symmetric Tasks
Authors:
Mohsen Dehghankar,
Abolfazl Asudeh
Abstract:
Large language models (LLMs) have quickly emerged as practical and versatile tools that provide new solutions for a wide range of domains. In this paper, we consider the application of LLMs on symmetric tasks where a query is asked on an (unordered) bag of elements. Examples of such tasks include answering aggregate queries on a database table. In general, when the bag contains a large number of e…
▽ More
Large language models (LLMs) have quickly emerged as practical and versatile tools that provide new solutions for a wide range of domains. In this paper, we consider the application of LLMs on symmetric tasks where a query is asked on an (unordered) bag of elements. Examples of such tasks include answering aggregate queries on a database table. In general, when the bag contains a large number of elements, LLMs tend to overlook some elements, leading to challenges in generating accurate responses to the query. LLMs receive their inputs as ordered sequences. However, in this problem, we leverage the fact that the symmetric input is not ordered, and reordering should not affect the LLM's response.
Observing that LLMs are less likely to miss elements at certain positions of the input, we introduce the problem of LLM input reranking: to find a ranking of the input that maximizes the LLM's accuracy for the given query without making explicit assumptions about the query. Finding the optimal ranking requires identifying (i) the relevance of each input element for answering the query and (ii) the importance of each rank position for the LLM's attention. We develop algorithms for estimating these values efficiently utilizing a helper LLM. We conduct comprehensive experiments on different synthetic and real datasets to validate our proposal and to evaluate the effectiveness of our proposed algorithms. Our experiments confirm that our reranking approach improves the accuracy of the LLMs on symmetric tasks by up to $99\%$ proximity to the optimum upper bound.
△ Less
Submitted 30 November, 2024;
originally announced December 2024.
-
Scene Co-pilot: Procedural Text to Video Generation with Human in the Loop
Authors:
Zhaofang Qian,
Abolfazl Sharifi,
Tucker Carroll,
Ser-Nam Lim
Abstract:
Video generation has achieved impressive quality, but it still suffers from artifacts such as temporal inconsistency and violation of physical laws. Leveraging 3D scenes can fundamentally resolve these issues by providing precise control over scene entities. To facilitate the easy generation of diverse photorealistic scenes, we propose Scene Copilot, a framework combining large language models (LL…
▽ More
Video generation has achieved impressive quality, but it still suffers from artifacts such as temporal inconsistency and violation of physical laws. Leveraging 3D scenes can fundamentally resolve these issues by providing precise control over scene entities. To facilitate the easy generation of diverse photorealistic scenes, we propose Scene Copilot, a framework combining large language models (LLMs) with a procedural 3D scene generator. Specifically, Scene Copilot consists of Scene Codex, BlenderGPT, and Human in the loop. Scene Codex is designed to translate textual user input into commands understandable by the 3D scene generator. BlenderGPT provides users with an intuitive and direct way to precisely control the generated 3D scene and the final output video. Furthermore, users can utilize Blender UI to receive instant visual feedback. Additionally, we have curated a procedural dataset of objects in code format to further enhance our system's capabilities. Each component works seamlessly together to support users in generating desired 3D scenes. Extensive experiments demonstrate the capability of our framework in customizing 3D scenes and video generation.
△ Less
Submitted 26 November, 2024;
originally announced November 2024.
-
Degrees of Freedom of Cache-Aided Interference Channels Assisted by Active Intelligent Reflecting Surfaces
Authors:
Abolfazl Changizi,
Ali H. Abdollahi Bafghi,
Masoumeh Nasiri-Kenari
Abstract:
This paper studies cache-aided wireless networks in the presence of active intelligent reflecting surfaces (IRS) from an information-theoretic perspective. Specifically, we explore interference management in a cache-aided wireless network assisted by an active IRS, to enhance the achievable degrees of freedom (DoF). To this end, we jointly design the content placement, delivery phase, and phase sh…
▽ More
This paper studies cache-aided wireless networks in the presence of active intelligent reflecting surfaces (IRS) from an information-theoretic perspective. Specifically, we explore interference management in a cache-aided wireless network assisted by an active IRS, to enhance the achievable degrees of freedom (DoF). To this end, we jointly design the content placement, delivery phase, and phase shifts of the IRS and propose a one-shot achievable scheme. Our scheme exploits transmitters' cooperation, cache contents (as side information), interference alignment, and IRS capabilities, adapting to the network's parameters. We derive the achievable one-shot sum-DoF for different sizes of cache memories, network configurations, and numbers of IRS elements. Our results highlight the potential of deploying an IRS in cache-aided wireless communication systems, underscoring the enhancement of achievable DoF for various parameter regimes, particularly when the sizes of the caches (especially at the transmitters) are inadequate. Notably, we show that access to an IRS with a sufficient number of elements enables the achievement of the maximum possible DoF for various parameter regimes of interest.
△ Less
Submitted 26 November, 2024;
originally announced November 2024.
-
sbi reloaded: a toolkit for simulation-based inference workflows
Authors:
Jan Boelts,
Michael Deistler,
Manuel Gloeckler,
Álvaro Tejero-Cantero,
Jan-Matthis Lueckmann,
Guy Moss,
Peter Steinbach,
Thomas Moreau,
Fabio Muratore,
Julia Linhart,
Conor Durkan,
Julius Vetter,
Benjamin Kurt Miller,
Maternus Herold,
Abolfazl Ziaeemehr,
Matthijs Pals,
Theo Gruner,
Sebastian Bischoff,
Nastya Krouglova,
Richard Gao,
Janne K. Lappalainen,
Bálint Mucsányi,
Felix Pei,
Auguste Schulz,
Zinovia Stefanidi
, et al. (8 additional authors not shown)
Abstract:
Scientists and engineers use simulators to model empirically observed phenomena. However, tuning the parameters of a simulator to ensure its outputs match observed data presents a significant challenge. Simulation-based inference (SBI) addresses this by enabling Bayesian inference for simulators, identifying parameters that match observed data and align with prior knowledge. Unlike traditional Bay…
▽ More
Scientists and engineers use simulators to model empirically observed phenomena. However, tuning the parameters of a simulator to ensure its outputs match observed data presents a significant challenge. Simulation-based inference (SBI) addresses this by enabling Bayesian inference for simulators, identifying parameters that match observed data and align with prior knowledge. Unlike traditional Bayesian inference, SBI only needs access to simulations from the model and does not require evaluations of the likelihood-function. In addition, SBI algorithms do not require gradients through the simulator, allow for massive parallelization of simulations, and can perform inference for different observations without further simulations or training, thereby amortizing inference. Over the past years, we have developed, maintained, and extended $\texttt{sbi}$, a PyTorch-based package that implements Bayesian SBI algorithms based on neural networks. The $\texttt{sbi}$ toolkit implements a wide range of inference methods, neural network architectures, sampling methods, and diagnostic tools. In addition, it provides well-tested default settings but also offers flexibility to fully customize every step of the simulation-based inference workflow. Taken together, the $\texttt{sbi}$ toolkit enables scientists and engineers to apply state-of-the-art SBI methods to black-box simulators, opening up new possibilities for aligning simulations with empirically observed data.
△ Less
Submitted 26 November, 2024;
originally announced November 2024.
-
RobustFormer: Noise-Robust Pre-training for images and videos
Authors:
Ashish Bastola,
Nishant Luitel,
Hao Wang,
Danda Pani Paudel,
Roshani Poudel,
Abolfazl Razi
Abstract:
While deep learning models are powerful tools that revolutionized many areas, they are also vulnerable to noise as they rely heavily on learning patterns and features from the exact details of the clean data. Transformers, which have become the backbone of modern vision models, are no exception. Current Discrete Wavelet Transforms (DWT) based methods do not benefit from masked autoencoder (MAE) pr…
▽ More
While deep learning models are powerful tools that revolutionized many areas, they are also vulnerable to noise as they rely heavily on learning patterns and features from the exact details of the clean data. Transformers, which have become the backbone of modern vision models, are no exception. Current Discrete Wavelet Transforms (DWT) based methods do not benefit from masked autoencoder (MAE) pre-training since the inverse DWT (iDWT) introduced in these approaches is computationally inefficient and lacks compatibility with video inputs in transformer architectures.
In this work, we present RobustFormer, a method that overcomes these limitations by enabling noise-robust pre-training for both images and videos; improving the efficiency of DWT-based methods by removing the need for computationally iDWT steps and simplifying the attention mechanism. To our knowledge, the proposed method is the first DWT-based method compatible with video inputs and masked pre-training. Our experiments show that MAE-based pre-training allows us to bypass the iDWT step, greatly reducing computation. Through extensive tests on benchmark datasets, RobustFormer achieves state-of-the-art results for both image and video tasks.
△ Less
Submitted 20 November, 2024;
originally announced November 2024.
-
An Efficient Matrix Multiplication Algorithm for Accelerating Inference in Binary and Ternary Neural Networks
Authors:
Mohsen Dehghankar,
Mahdi Erfanian,
Abolfazl Asudeh
Abstract:
Despite their tremendous success and versatility, Deep Neural Networks (DNNs) such as Large Language Models (LLMs) suffer from inference inefficiency and rely on advanced computational infrastructure. To address these challenges and make these models more accessible and cost-effective, in this paper, we propose algorithms to improve the inference time and memory efficiency of DNNs with binary and…
▽ More
Despite their tremendous success and versatility, Deep Neural Networks (DNNs) such as Large Language Models (LLMs) suffer from inference inefficiency and rely on advanced computational infrastructure. To address these challenges and make these models more accessible and cost-effective, in this paper, we propose algorithms to improve the inference time and memory efficiency of DNNs with binary and ternary weight matrices. Particularly focusing on matrix multiplication as the bottleneck operation of inference, we observe that, once trained, the weight matrices of a model no longer change. This allows us to preprocess these matrices and create indices that help reduce the storage requirements by a logarithmic factor while enabling our efficient inference algorithms. Specifically, for a $n\times n$ weight matrix, our efficient algorithm guarantees a time complexity of $O(\frac{n^2}{\log n})$, a logarithmic factor improvement over the standard vector-matrix multiplication. Besides theoretical analysis, we conduct extensive experiments to evaluate the practical efficiency of our algorithms. Our results confirm the superiority of our approach both with respect to time and memory, as we observed a reduction in the multiplication time up to 29x and memory usage up to 6x. When applied to LLMs, our experiments show up to a 5.24x speedup in the inference time.
△ Less
Submitted 2 May, 2025; v1 submitted 9 November, 2024;
originally announced November 2024.
-
Mining the Minoria: Unknown, Under-represented, and Under-performing Minority Groups
Authors:
Mohsen Dehghankar,
Abolfazl Asudeh
Abstract:
Due to a variety of reasons, such as privacy, data in the wild often misses the grouping information required for identifying minorities. On the other hand, it is known that machine learning models are only as good as the data they are trained on and, hence, may underperform for the under-represented minority groups. The missing grouping information presents a dilemma for responsible data scientis…
▽ More
Due to a variety of reasons, such as privacy, data in the wild often misses the grouping information required for identifying minorities. On the other hand, it is known that machine learning models are only as good as the data they are trained on and, hence, may underperform for the under-represented minority groups. The missing grouping information presents a dilemma for responsible data scientists who find themselves in an unknown-unknown situation, where not only do they not have access to the grouping attributes but do not also know what groups to consider.
This paper is an attempt to address this dilemma. Specifically, we propose a minority mining problem, where we find vectors in the attribute space that reveal potential groups that are under-represented and under-performing. Technically speaking, we propose a geometric transformation of data into a dual space and use notions such as the arrangement of hyperplanes to design an efficient algorithm for the problem in lower dimensions. Generalizing our solution to the higher dimensions is cursed by dimensionality. Therefore, we propose a solution based on smart exploration of the search space for such cases. We conduct comprehensive experiments using real-world and synthetic datasets alongside the theoretical analysis. Our experiment results demonstrate the effectiveness of our proposed solutions in mining the unknown, under-represented, and under-performing minorities.
△ Less
Submitted 20 April, 2025; v1 submitted 7 November, 2024;
originally announced November 2024.
-
Enhancing Graph Neural Networks in Large-scale Traffic Incident Analysis with Concurrency Hypothesis
Authors:
Xiwen Chen,
Sayed Pedram Haeri Boroujeni,
Xin Shu,
Huayu Li,
Abolfazl Razi
Abstract:
Despite recent progress in reducing road fatalities, the persistently high rate of traffic-related deaths highlights the necessity for improved safety interventions. Leveraging large-scale graph-based nationwide road network data across 49 states in the USA, our study first posits the Concurrency Hypothesis from intuitive observations, suggesting a significant likelihood of incidents occurring at…
▽ More
Despite recent progress in reducing road fatalities, the persistently high rate of traffic-related deaths highlights the necessity for improved safety interventions. Leveraging large-scale graph-based nationwide road network data across 49 states in the USA, our study first posits the Concurrency Hypothesis from intuitive observations, suggesting a significant likelihood of incidents occurring at neighboring nodes within the road network. To quantify this phenomenon, we introduce two novel metrics, Average Neighbor Crash Density (ANCD) and Average Neighbor Crash Continuity (ANCC), and subsequently employ them in statistical tests to validate the hypothesis rigorously. Building upon this foundation, we propose the Concurrency Prior (CP) method, a powerful approach designed to enhance the predictive capabilities of general Graph Neural Network (GNN) models in semi-supervised traffic incident prediction tasks. Our method allows GNNs to incorporate concurrent incident information, as mentioned in the hypothesis, via tokenization with negligible extra parameters.
The extensive experiments, utilizing real-world data across states and cities in the USA, demonstrate that integrating CP into 12 state-of-the-art GNN architectures leads to significant improvements, with gains ranging from 3% to 13% in F1 score and 1.3% to 9% in AUC metrics. The code is publicly available at https://github.com/xiwenc1/Incident-GNN-CP.
△ Less
Submitted 4 November, 2024;
originally announced November 2024.
-
Equitable Federated Learning with Activation Clustering
Authors:
Antesh Upadhyay,
Abolfazl Hashemi
Abstract:
Federated learning is a prominent distributed learning paradigm that incorporates collaboration among diverse clients, promotes data locality, and thus ensures privacy. These clients have their own technological, cultural, and other biases in the process of data generation. However, the present standard often ignores this bias/heterogeneity, perpetuating bias against certain groups rather than mit…
▽ More
Federated learning is a prominent distributed learning paradigm that incorporates collaboration among diverse clients, promotes data locality, and thus ensures privacy. These clients have their own technological, cultural, and other biases in the process of data generation. However, the present standard often ignores this bias/heterogeneity, perpetuating bias against certain groups rather than mitigating it. In response to this concern, we propose an equitable clustering-based framework where the clients are categorized/clustered based on how similar they are to each other. We propose a unique way to construct the similarity matrix that uses activation vectors. Furthermore, we propose a client weighing mechanism to ensure that each cluster receives equal importance and establish $O(1/\sqrt{K})$ rate of convergence to reach an $ε-$stationary solution. We assess the effectiveness of our proposed strategy against common baselines, demonstrating its efficacy in terms of reducing the bias existing amongst various client clusters and consequently ameliorating algorithmic bias against specific groups.
△ Less
Submitted 1 November, 2024; v1 submitted 24 October, 2024;
originally announced October 2024.
-
Gradual Domain Adaptation via Manifold-Constrained Distributionally Robust Optimization
Authors:
Amir Hossein Saberi,
Amir Najafi,
Ala Emrani,
Amin Behjati,
Yasaman Zolfimoselo,
Mahdi Shadrooy,
Abolfazl Motahari,
Babak H. Khalaj
Abstract:
The aim of this paper is to address the challenge of gradual domain adaptation within a class of manifold-constrained data distributions. In particular, we consider a sequence of $T\ge2$ data distributions $P_1,\ldots,P_T$ undergoing a gradual shift, where each pair of consecutive measures $P_i,P_{i+1}$ are close to each other in Wasserstein distance. We have a supervised dataset of size $n$ sampl…
▽ More
The aim of this paper is to address the challenge of gradual domain adaptation within a class of manifold-constrained data distributions. In particular, we consider a sequence of $T\ge2$ data distributions $P_1,\ldots,P_T$ undergoing a gradual shift, where each pair of consecutive measures $P_i,P_{i+1}$ are close to each other in Wasserstein distance. We have a supervised dataset of size $n$ sampled from $P_0$, while for the subsequent distributions in the sequence, only unlabeled i.i.d. samples are available. Moreover, we assume that all distributions exhibit a known favorable attribute, such as (but not limited to) having intra-class soft/hard margins. In this context, we propose a methodology rooted in Distributionally Robust Optimization (DRO) with an adaptive Wasserstein radius. We theoretically show that this method guarantees the classification error across all $P_i$s can be suitably bounded. Our bounds rely on a newly introduced {\it {compatibility}} measure, which fully characterizes the error propagation dynamics along the sequence. Specifically, for inadequately constrained distributions, the error can exponentially escalate as we progress through the gradual shifts. Conversely, for appropriately constrained distributions, the error can be demonstrated to be linear or even entirely eradicated. We have substantiated our theoretical findings through several experimental results.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
Adaptive Data Transport Mechanism for UAV Surveillance Missions in Lossy Environments
Authors:
Niloufar Mehrabi,
Sayed Pedram Haeri Boroujeni,
Jenna Hofseth,
Abolfazl Razi,
Long Cheng,
Manveen Kaur,
James Martin,
Rahul Amin
Abstract:
Unmanned Aerial Vehicles (UAVs) play an increasingly critical role in Intelligence, Surveillance, and Reconnaissance (ISR) missions such as border patrolling and criminal detection, thanks to their ability to access remote areas and transmit real-time imagery to processing servers. However, UAVs are highly constrained by payload size, power limits, and communication bandwidth, necessitating the de…
▽ More
Unmanned Aerial Vehicles (UAVs) play an increasingly critical role in Intelligence, Surveillance, and Reconnaissance (ISR) missions such as border patrolling and criminal detection, thanks to their ability to access remote areas and transmit real-time imagery to processing servers. However, UAVs are highly constrained by payload size, power limits, and communication bandwidth, necessitating the development of highly selective and efficient data transmission strategies. This has driven the development of various compression and optimal transmission technologies for UAVs. Nevertheless, most methods strive to preserve maximal information in transferred video frames, missing the fact that only certain parts of images/video frames might offer meaningful contributions to the ultimate mission objectives in the ISR scenarios involving moving object detection and tracking (OD/OT). This paper adopts a different perspective, and offers an alternative AI-driven scheduling policy that prioritizes selecting regions of the image that significantly contributes to the mission objective. The key idea is tiling the image into small patches and developing a deep reinforcement learning (DRL) framework that assigns higher transmission probabilities to patches that present higher overlaps with the detected object of interest, while penalizing sharp transitions over consecutive frames to promote smooth scheduling shifts. Although we used Yolov-8 object detection and UDP transmission protocols as a benchmark testing scenario the idea is general and applicable to different transmission protocols and OD/OT methods. To further boost the system's performance and avoid OD errors for cluttered image patches, we integrate it with interframe interpolations.
△ Less
Submitted 30 September, 2024;
originally announced October 2024.
-
Efficient learning of differential network in multi-source non-paranormal graphical models
Authors:
Mojtaba Nikahd,
Seyed Abolfazl Motahari
Abstract:
This paper addresses learning of sparse structural changes or differential network between two classes of non-paranormal graphical models. We assume a multi-source and heterogeneous dataset is available for each class, where the covariance matrices are identical for all non-paranormal graphical models. The differential network, which are encoded by the difference precision matrix, can then be deco…
▽ More
This paper addresses learning of sparse structural changes or differential network between two classes of non-paranormal graphical models. We assume a multi-source and heterogeneous dataset is available for each class, where the covariance matrices are identical for all non-paranormal graphical models. The differential network, which are encoded by the difference precision matrix, can then be decoded by optimizing a lasso penalized D-trace loss function. To this aim, an efficient approach is proposed that outputs the exact solution path, outperforming the previous methods that only sample from the solution path in pre-selected regularization parameters. Notably, our proposed method has low computational complexity, especially when the differential network are sparse. Our simulations on synthetic data demonstrate a superior performance for our strategy in terms of speed and accuracy compared to an existing method. Moreover, our strategy in combining datasets from multiple sources is shown to be very effective in inferring differential network in real-world problems. This is backed by our experimental results on drug resistance in tumor cancers. In the latter case, our strategy outputs important genes for drug resistance which are already confirmed by various independent studies.
△ Less
Submitted 3 October, 2024;
originally announced October 2024.
-
SoccerNet 2024 Challenges Results
Authors:
Anthony Cioppa,
Silvio Giancola,
Vladimir Somers,
Victor Joos,
Floriane Magera,
Jan Held,
Seyed Abolfazl Ghasemzadeh,
Xin Zhou,
Karolina Seweryn,
Mateusz Kowalczyk,
Zuzanna Mróz,
Szymon Łukasik,
Michał Hałoń,
Hassan Mkhallati,
Adrien Deliège,
Carlos Hinojosa,
Karen Sanchez,
Amir M. Mansourian,
Pierre Miralles,
Olivier Barnich,
Christophe De Vleeschouwer,
Alexandre Alahi,
Bernard Ghanem,
Marc Van Droogenbroeck,
Adam Gorski
, et al. (59 additional authors not shown)
Abstract:
The SoccerNet 2024 challenges represent the fourth annual video understanding challenges organized by the SoccerNet team. These challenges aim to advance research across multiple themes in football, including broadcast video understanding, field understanding, and player understanding. This year, the challenges encompass four vision-based tasks. (1) Ball Action Spotting, focusing on precisely loca…
▽ More
The SoccerNet 2024 challenges represent the fourth annual video understanding challenges organized by the SoccerNet team. These challenges aim to advance research across multiple themes in football, including broadcast video understanding, field understanding, and player understanding. This year, the challenges encompass four vision-based tasks. (1) Ball Action Spotting, focusing on precisely localizing when and which soccer actions related to the ball occur, (2) Dense Video Captioning, focusing on describing the broadcast with natural language and anchored timestamps, (3) Multi-View Foul Recognition, a novel task focusing on analyzing multiple viewpoints of a potential foul incident to classify whether a foul occurred and assess its severity, (4) Game State Reconstruction, another novel task focusing on reconstructing the game state from broadcast videos onto a 2D top-view map of the field. Detailed information about the tasks, challenges, and leaderboards can be found at https://www.soccer-net.org, with baselines and development kits available at https://github.com/SoccerNet.
△ Less
Submitted 16 September, 2024;
originally announced September 2024.
-
Loop corrections for hard spheres in Hamming space
Authors:
Abolfazl Ramezanpour,
Saman Moghimi-Araghi
Abstract:
We begin with an exact expression for the entropy of a system of hard spheres within the Hamming space. This entropy relies on probability marginals, which are determined by an extended set of Belief Propagation (BP) equations. The BP probability marginals are functions of auxiliary variables which are introduced to model the effects of loopy interactions on a tree-structured interaction graph. We…
▽ More
We begin with an exact expression for the entropy of a system of hard spheres within the Hamming space. This entropy relies on probability marginals, which are determined by an extended set of Belief Propagation (BP) equations. The BP probability marginals are functions of auxiliary variables which are introduced to model the effects of loopy interactions on a tree-structured interaction graph. We explore various reasonable and approximate probability distributions, ensuring they align with the exact solutions of the BP equations. Our approach is based on an ansatz of (in)homogeneous cavity marginals respecting the permutation symmetry of the problem. Through thorough analysis, we aim to minimize errors in the BP equations. Our findings support the conjecture that the maximum packing density asymptotically conforms to the lower bound proposed by Gilbert and Varshamov, further validated by the solution of the loopy BP equations.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
Leveraging Blockchain and ANFIS for Optimal Supply Chain Management
Authors:
Amirfarhad Farhadi,
Homayoun Safarpour Motealegh Mahalegi,
Abolfazl Pourrezaeian Firouzabad,
Azadeh Zamanifar,
Majid Sorouri
Abstract:
The supply chain is a critical segment of the product manufacturing cycle, continuously influenced by risky, uncertain, and undesirable events. Optimizing flexibility in the supply chain presents a complex, multi-objective, and nonlinear programming challenge. In the poultry supply chain, the development of mass customization capabilities has led manufacturing companies to increasingly focus on of…
▽ More
The supply chain is a critical segment of the product manufacturing cycle, continuously influenced by risky, uncertain, and undesirable events. Optimizing flexibility in the supply chain presents a complex, multi-objective, and nonlinear programming challenge. In the poultry supply chain, the development of mass customization capabilities has led manufacturing companies to increasingly focus on offering tailored and customized services for individual products. To safeguard against data tampering and ensure the integrity of setup costs and overall profitability, a multi-signature decentralized finance (DeFi) protocol, integrated with the IoT on a blockchain platform, is proposed. Managing the poultry supply chain involves uncertainties that may not account for parameters such as delivery time to retailers, reorder time, and the number of requested products. To address these challenges, this study employs an adaptive neuro-fuzzy inference system (ANFIS), combining neural networks with fuzzy logic to compensate for the lack of data training in parameter identification. Through MATLAB simulations, the study investigates the average shop delivery duration, the reorder time, and the number of products per order. By implementing the proposed technique, the average delivery time decreases from 40 to 37 minutes, the reorder time decreases from five to four days, and the quantity of items requested per order grows from six to eleven. Additionally, the ANFIS model enhances overall supply chain performance by reducing transaction times by 15\% compared to conventional systems, thereby improving real-time responsiveness and boosting transparency in supply chain operations, effectively resolving operational issues.
△ Less
Submitted 2 September, 2024; v1 submitted 30 August, 2024;
originally announced August 2024.
-
Submodular Maximization Approaches for Equitable Client Selection in Federated Learning
Authors:
Andrés Catalino Castillo Jiménez,
Ege C. Kaya,
Lintao Ye,
Abolfazl Hashemi
Abstract:
In a conventional Federated Learning framework, client selection for training typically involves the random sampling of a subset of clients in each iteration. However, this random selection often leads to disparate performance among clients, raising concerns regarding fairness, particularly in applications where equitable outcomes are crucial, such as in medical or financial machine learning tasks…
▽ More
In a conventional Federated Learning framework, client selection for training typically involves the random sampling of a subset of clients in each iteration. However, this random selection often leads to disparate performance among clients, raising concerns regarding fairness, particularly in applications where equitable outcomes are crucial, such as in medical or financial machine learning tasks. This disparity typically becomes more pronounced with the advent of performance-centric client sampling techniques. This paper introduces two novel methods, namely SUBTRUNC and UNIONFL, designed to address the limitations of random client selection. Both approaches utilize submodular function maximization to achieve more balanced models. By modifying the facility location problem, they aim to mitigate the fairness concerns associated with random selection. SUBTRUNC leverages client loss information to diversify solutions, while UNIONFL relies on historical client selection data to ensure a more equitable performance of the final model. Moreover, these algorithms are accompanied by robust theoretical guarantees regarding convergence under reasonable assumptions. The efficacy of these methods is demonstrated through extensive evaluations across heterogeneous scenarios, revealing significant improvements in fairness as measured by a client dissimilarity metric.
△ Less
Submitted 27 August, 2024; v1 submitted 24 August, 2024;
originally announced August 2024.