Search | arXiv e-print repository

Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers

Authors: Mohammed Mehedi Hasan, Hao Li, Emad Fallahzadeh, Bram Adams, Ahmed E. Hassan

Abstract: Although Foundation Models (FMs), such as GPT-4, are increasingly used in domains like finance and software engineering, reliance on textual interfaces limits these models' real-world interaction. To address this, FM providers introduced tool calling-triggering a proliferation of frameworks with distinct tool interfaces. In late 2024, Anthropic introduced the Model Context Protocol (MCP) to standa… ▽ More Although Foundation Models (FMs), such as GPT-4, are increasingly used in domains like finance and software engineering, reliance on textual interfaces limits these models' real-world interaction. To address this, FM providers introduced tool calling-triggering a proliferation of frameworks with distinct tool interfaces. In late 2024, Anthropic introduced the Model Context Protocol (MCP) to standardize this tool ecosystem, which has become the de facto standard with over eight million weekly SDK downloads. Despite its adoption, MCP's AI-driven, non-deterministic control flow introduces new risks to sustainability, security, and maintainability, warranting closer examination. Towards this end, we present the first large-scale empirical study of MCP. Using state-of-the-art health metrics and a hybrid analysis pipeline, combining a general-purpose static analysis tool with an MCP-specific scanner, we evaluate 1,899 open-source MCP servers to assess their health, security, and maintainability. Despite MCP servers demonstrating strong health metrics, we identify eight distinct vulnerabilities-only three overlapping with traditional software vulnerabilities. Additionally, 7.2% of servers contain general vulnerabilities and 5.5% exhibit MCP-specific tool poisoning. Regarding maintainability, while 66% exhibit code smells, 14.4% contain ten bug patterns overlapping prior research. These findings highlight the need for MCP-specific vulnerability detection techniques while reaffirming the value of traditional analysis and refactoring practices. △ Less

Submitted 16 June, 2025; originally announced June 2025.

arXiv:2506.10985 [pdf]

Collaboration Tools and their Role in Agile Software Projects

Authors: Raman Mohammed Hussein, Bryar A. Hassan

Abstract: The purpose of this review is to understand the importance of collaboration tools which are Slack, Microsoft Teams, Confluence in Agile and software projects. Agile methodologies rely on flexibility, using cycles and integration throughout various levels of developing cycles. However, it is still a great problem for many teams to collaborate and communicate even if staff members and teams are work… ▽ More The purpose of this review is to understand the importance of collaboration tools which are Slack, Microsoft Teams, Confluence in Agile and software projects. Agile methodologies rely on flexibility, using cycles and integration throughout various levels of developing cycles. However, it is still a great problem for many teams to collaborate and communicate even if staff members and teams are working remotely. In terms of collaboration, the applications and technologies mean better organization of work, increased mutually understandable openness and fast and efficient inter team and interpersonal interactions to enhance results of projects into productivity. This paper examines how these tools fit the Agile principles, how they facilitate iterative development, and encouraging effective initiation and tracking of tasks in small and large projects. The insights focus on how Slack, Microsoft Teams, and Confluence are essential for gaining better task coordination, supporting knowledge sharing, and adopting agile values across cross-functional contexts. △ Less

Submitted 18 February, 2025; originally announced June 2025.

Comments: https://www.middleeastconference.org/_files/ugd/614b1f_82fa5f91169a44278723a921b27e2864.pdf ISBN: 979-8-89695-015-8

arXiv:2506.08313 [pdf, ps, other]

A New Lifetime Distribution: Exponentiated Exponential-Pareto-HalfNormal Mixture Model for Biomedical Applications

Authors: Oriyomi Ahmad Hassan, Aisha Tunrayo Maradesa, Abdulazeez Toyosi Alabi, Oyejide Surajudeen Salam, Ajani Busari, Akinwale Victor Famotire, Habeeb Abiodun Afolabi, Solomon Adeleke, Abayomi Ayodele Akomolafe

Abstract: This study introduces the Exponentiated-Exponential-Pareto-Half Normal Mixture Distribution (EEPHND), a novel hybrid model developed to overcome the limitations of classical distributions in modeling complex real-world data. By compounding the Exponentiated-Exponential-Pareto (EEP) and Half-Normal distributions through a mixture mechanism, EEPHND effectively captures both early-time symmetry and l… ▽ More This study introduces the Exponentiated-Exponential-Pareto-Half Normal Mixture Distribution (EEPHND), a novel hybrid model developed to overcome the limitations of classical distributions in modeling complex real-world data. By compounding the Exponentiated-Exponential-Pareto (EEP) and Half-Normal distributions through a mixture mechanism, EEPHND effectively captures both early-time symmetry and long-tail behavior, features which are commonly observed in survival and reliability data. The model offers closed-form expressions for its probability density, cumulative distribution, survival and hazard functions, moments, and reliability metrics, ensuring analytical traceability and interpretability in the presence of censoring and heterogeneous risk dynamics. When applied to a real-world lung cancer dataset, EEPHND outperformed competing models in both goodness-of-fit and predictive accuracy, achieving a Concordance Index (CI) of 0.9997. These results highlight its potential as a flexible and powerful tool for survival analysis and biomedical engineering. △ Less

Submitted 9 June, 2025; originally announced June 2025.

Comments: 22 pages, 9 figures

arXiv:2506.04418 [pdf, ps, other]

Characterizing Multi-Hunk Patches: Divergence, Proximity, and LLM Repair Challenges

Authors: Noor Nashid, Daniel Ding, Keheliya Gallaba, Ahmed E. Hassan, Ali Mesbah

Abstract: Multi-hunk bugs, where fixes span disjoint regions of code, are common in practice, yet remain underrepresented in automated repair. Existing techniques and benchmarks pre-dominantly target single-hunk scenarios, overlooking the added complexity of coordinating semantically related changes across the codebase. In this work, we characterize HUNK4J, a dataset of multi-hunk patches derived from 372 r… ▽ More Multi-hunk bugs, where fixes span disjoint regions of code, are common in practice, yet remain underrepresented in automated repair. Existing techniques and benchmarks pre-dominantly target single-hunk scenarios, overlooking the added complexity of coordinating semantically related changes across the codebase. In this work, we characterize HUNK4J, a dataset of multi-hunk patches derived from 372 real-world defects. We propose hunk divergence, a metric that quantifies the variation among edits in a patch by capturing lexical, structural, and file-level differences, while incorporating the number of hunks involved. We further define spatial proximity, a classification that models how hunks are spatially distributed across the program hierarchy. Our empirical study spanning six LLMs reveals that model success rates decline with increased divergence and spatial dispersion. Notably, when using the LLM alone, no model succeeds in the most dispersed Fragment class. These findings highlight a critical gap in LLM capabilities and motivate divergence-aware repair strategies. △ Less

Submitted 4 June, 2025; originally announced June 2025.

arXiv:2506.00249 [pdf, ps, other]

MIR: Methodology Inspiration Retrieval for Scientific Research Problems

Authors: Aniketh Garikaparthi, Manasi Patwardhan, Aditya Sanjiv Kanade, Aman Hassan, Lovekesh Vig, Arman Cohan

Abstract: There has been a surge of interest in harnessing the reasoning capabilities of Large Language Models (LLMs) to accelerate scientific discovery. While existing approaches rely on grounding the discovery process within the relevant literature, effectiveness varies significantly with the quality and nature of the retrieved literature. We address the challenge of retrieving prior work whose concepts c… ▽ More There has been a surge of interest in harnessing the reasoning capabilities of Large Language Models (LLMs) to accelerate scientific discovery. While existing approaches rely on grounding the discovery process within the relevant literature, effectiveness varies significantly with the quality and nature of the retrieved literature. We address the challenge of retrieving prior work whose concepts can inspire solutions for a given research problem, a task we define as Methodology Inspiration Retrieval (MIR). We construct a novel dataset tailored for training and evaluating retrievers on MIR, and establish baselines. To address MIR, we build the Methodology Adjacency Graph (MAG); capturing methodological lineage through citation relationships. We leverage MAG to embed an "intuitive prior" into dense retrievers for identifying patterns of methodological inspiration beyond superficial semantic similarity. This achieves significant gains of +5.4 in Recall@3 and +7.8 in Mean Average Precision (mAP) over strong baselines. Further, we adapt LLM-based re-ranking strategies to MIR, yielding additional improvements of +4.5 in Recall@3 and +4.8 in mAP. Through extensive ablation studies and qualitative analyses, we exhibit the promise of MIR in enhancing automated scientific discovery and outline avenues for advancing inspiration-driven retrieval. △ Less

Submitted 30 May, 2025; originally announced June 2025.

Comments: ACL 2025

arXiv:2505.21931 [pdf, other]

Large Language Models for Solving Economic Dispatch Problem

Authors: Sina Mohammadi, Ali Hassan, Rouzbeh Haghighi, Van-Hai Bui, Wencong Su

Abstract: This paper investigates the capability of off-the-shelf large language models (LLMs) to solve the economic dispatch (ED) problem. ED is a hard-constrained optimization problem solved on a day-ahead timescale by grid operators to minimize electricity generation costs while accounting for physical and engineering constraints. Numerous approaches have been proposed, but these typically require either… ▽ More This paper investigates the capability of off-the-shelf large language models (LLMs) to solve the economic dispatch (ED) problem. ED is a hard-constrained optimization problem solved on a day-ahead timescale by grid operators to minimize electricity generation costs while accounting for physical and engineering constraints. Numerous approaches have been proposed, but these typically require either mathematical formulations, face convergence issues, or depend on extensive labeled data and training time. This work implements LLMs enhanced with reasoning capabilities to address the classic lossless ED problem. The proposed approach avoids the need for explicit mathematical formulations, does not suffer from convergence challenges, and requires neither labeled data nor extensive training. A few-shot learning technique is utilized in two different prompting contexts. The IEEE 118-bus system with 19 generation units serves as the evaluation benchmark. Results demonstrate that various prompting strategies enable LLMs to effectively solve the ED problem, offering a convenient and efficient alternative. Consequently, this approach presents a promising future solution for ED tasks, particularly when foundational power system models are available. △ Less

Submitted 27 May, 2025; originally announced May 2025.

Comments: 5 pages, 3 figures, Accepted, 2025 IEEE Energy Conversion Conference and Expo (ECCE 2025), Philadelphia, PA

arXiv:2505.20973 [pdf, ps, other]

Towards Conversational Development Environments: Using Theory-of-Mind and Multi-Agent Architectures for Requirements Refinement

Authors: Keheliya Gallaba, Ali Arabat, Dayi Lin, Mohammed Sayagh, Ahmed E. Hassan

Abstract: Foundation Models (FMs) have shown remarkable capabilities in various natural language tasks. However, their ability to accurately capture stakeholder requirements remains a significant challenge for using FMs for software development. This paper introduces a novel approach that leverages an FM-powered multi-agent system called AlignMind to address this issue. By having a cognitive architecture th… ▽ More Foundation Models (FMs) have shown remarkable capabilities in various natural language tasks. However, their ability to accurately capture stakeholder requirements remains a significant challenge for using FMs for software development. This paper introduces a novel approach that leverages an FM-powered multi-agent system called AlignMind to address this issue. By having a cognitive architecture that enhances FMs with Theory-of-Mind capabilities, our approach considers the mental states and perspectives of software makers. This allows our solution to iteratively clarify the beliefs, desires, and intentions of stakeholders, translating these into a set of refined requirements and a corresponding actionable natural language workflow in the often-overlooked requirements refinement phase of software engineering, which is crucial after initial elicitation. Through a multifaceted evaluation covering 150 diverse use cases, we demonstrate that our approach can accurately capture the intents and requirements of stakeholders, articulating them as both specifications and a step-by-step plan of action. Our findings suggest that the potential for significant improvements in the software development process justifies these investments. Our work lays the groundwork for future innovation in building intent-first development environments, where software makers can seamlessly collaborate with AIs to create software that truly meets their needs. △ Less

Submitted 28 May, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

arXiv:2505.12834 [pdf]

A Study on the Refining Handwritten Font by Mixing Font Styles

Authors: Avinash Kumar, Kyeolhee Kang, Ammar ul Hassan, Jaeyoung Choi

Abstract: Handwritten fonts have a distinct expressive character, but they are often difficult to read due to unclear or inconsistent handwriting. FontFusionGAN (FFGAN) is a novel method for improving handwritten fonts by combining them with printed fonts. Our method implements generative adversarial network (GAN) to generate font that mix the desirable features of handwritten and printed fonts. By training… ▽ More Handwritten fonts have a distinct expressive character, but they are often difficult to read due to unclear or inconsistent handwriting. FontFusionGAN (FFGAN) is a novel method for improving handwritten fonts by combining them with printed fonts. Our method implements generative adversarial network (GAN) to generate font that mix the desirable features of handwritten and printed fonts. By training the GAN on a dataset of handwritten and printed fonts, it can generate legible and visually appealing font images. We apply our method to a dataset of handwritten fonts and demonstrate that it significantly enhances the readability of the original fonts while preserving their unique aesthetic. Our method has the potential to improve the readability of handwritten fonts, which would be helpful for a variety of applications including document creation, letter writing, and assisting individuals with reading and writing difficulties. In addition to addressing the difficulties of font creation for languages with complex character sets, our method is applicable to other text-image-related tasks, such as font attribute control and multilingual font style transfer. △ Less

Submitted 19 May, 2025; originally announced May 2025.

Comments: 4 pages, 3 figures, MITA 2023 (The 19th International Conference on Multimedia Information Technology and Applications July. 11 ~ July 14, 2023, Technical University of Ostrava, Ostrava, Czech)

arXiv:2505.10736 [pdf, other]

Model Performance-Guided Evaluation Data Selection for Effective Prompt Optimization

Authors: Ximing Dong, Shaowei Wang, Dayi Lin, Ahmed E. Hassan

Abstract: Optimizing Large Language Model (LLM) performance requires well-crafted prompts, but manual prompt engineering is labor-intensive and often ineffective. Automated prompt optimization techniques address this challenge but the majority of them rely on randomly selected evaluation subsets, which fail to represent the full dataset, leading to unreliable evaluations and suboptimal prompts. Existing cor… ▽ More Optimizing Large Language Model (LLM) performance requires well-crafted prompts, but manual prompt engineering is labor-intensive and often ineffective. Automated prompt optimization techniques address this challenge but the majority of them rely on randomly selected evaluation subsets, which fail to represent the full dataset, leading to unreliable evaluations and suboptimal prompts. Existing coreset selection methods, designed for LLM benchmarking, are unsuitable for prompt optimization due to challenges in clustering similar samples, high data collection costs, and the unavailability of performance data for new or private datasets. To overcome these issues, we propose IPOMP, an Iterative evaluation data selection for effective Prompt Optimization using real-time Model Performance. IPOMP is a two-stage approach that selects representative and diverse samples using semantic clustering and boundary analysis, followed by iterative refinement with real-time model performance data to replace redundant samples. Evaluations on the BIG-bench dataset show that IPOMP improves effectiveness by 1.6% to 5.3% and stability by at least 57% compared with SOTA baselines, with minimal computational overhead below 1%. Furthermore, the results demonstrate that our real-time performance-guided refinement approach can be universally applied to enhance existing coreset selection methods. △ Less

Submitted 21 May, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

Comments: ACL 2025, Findings

arXiv:2505.10640 [pdf, ps, other]

doi 10.1145/3711896.3736572

The Hitchhikers Guide to Production-ready Trustworthy Foundation Model powered Software (FMware)

Authors: Kirill Vasilevski, Benjamin Rombaut, Gopi Krishnan Rajbahadur, Gustavo A. Oliva, Keheliya Gallaba, Filipe R. Cogo, Jiahuei Lin, Dayi Lin, Haoxiang Zhang, Bouyan Chen, Kishanthan Thangarajah, Ahmed E. Hassan, Zhen Ming Jiang

Abstract: Foundation Models (FMs) such as Large Language Models (LLMs) are reshaping the software industry by enabling FMware, systems that integrate these FMs as core components. In this KDD 2025 tutorial, we present a comprehensive exploration of FMware that combines a curated catalogue of challenges with real-world production concerns. We first discuss the state of research and practice in building FMwar… ▽ More Foundation Models (FMs) such as Large Language Models (LLMs) are reshaping the software industry by enabling FMware, systems that integrate these FMs as core components. In this KDD 2025 tutorial, we present a comprehensive exploration of FMware that combines a curated catalogue of challenges with real-world production concerns. We first discuss the state of research and practice in building FMware. We further examine the difficulties in selecting suitable models, aligning high-quality domain-specific data, engineering robust prompts, and orchestrating autonomous agents. We then address the complex journey from impressive demos to production-ready systems by outlining issues in system testing, optimization, deployment, and integration with legacy software. Drawing on our industrial experience and recent research in the area, we provide actionable insights and a technology roadmap for overcoming these challenges. Attendees will gain practical strategies to enable the creation of trustworthy FMware in the evolving technology landscape. △ Less

Submitted 2 June, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

arXiv:2505.04629 [pdf]

From Dialect Gaps to Identity Maps: Tackling Variability in Speaker Verification

Authors: Abdulhady Abas Abdullah, Soran Badawi, Dana A. Abdullah, Dana Rasul Hamad, Hanan Abdulrahman Taher, Sabat Salih Muhamad, Aram Mahmood Ahmed, Bryar A. Hassan, Sirwan Abdolwahed Aula, Tarik A. Rashid

Abstract: The complexity and difficulties of Kurdish speaker detection among its several dialects are investigated in this work. Because of its great phonetic and lexical differences, Kurdish with several dialects including Kurmanji, Sorani, and Hawrami offers special challenges for speaker recognition systems. The main difficulties in building a strong speaker identification system capable of precisely ide… ▽ More The complexity and difficulties of Kurdish speaker detection among its several dialects are investigated in this work. Because of its great phonetic and lexical differences, Kurdish with several dialects including Kurmanji, Sorani, and Hawrami offers special challenges for speaker recognition systems. The main difficulties in building a strong speaker identification system capable of precisely identifying speakers across several dialects are investigated in this work. To raise the accuracy and dependability of these systems, it also suggests solutions like sophisticated machine learning approaches, data augmentation tactics, and the building of thorough dialect-specific corpus. The results show that customized strategies for every dialect together with cross-dialect training greatly enhance recognition performance. △ Less

Submitted 21 April, 2025; originally announced May 2025.

arXiv:2505.03832 [pdf]

Video Forgery Detection for Surveillance Cameras: A Review

Authors: Noor B. Tayfor, Tarik A. Rashid, Shko M. Qader, Bryar A. Hassan, Mohammed H. Abdalla, Jafar Majidpour, Aram M. Ahmed, Hussein M. Ali, Aso M. Aladdin, Abdulhady A. Abdullah, Ahmed S. Shamsaldin, Haval M. Sidqi, Abdulrahman Salih, Zaher M. Yaseen, Azad A. Ameen, Janmenjoy Nayak, Mahmood Yashar Hamza

Abstract: The widespread availability of video recording through smartphones and digital devices has made video-based evidence more accessible than ever. Surveillance footage plays a crucial role in security, law enforcement, and judicial processes. However, with the rise of advanced video editing tools, tampering with digital recordings has become increasingly easy, raising concerns about their authenticit… ▽ More The widespread availability of video recording through smartphones and digital devices has made video-based evidence more accessible than ever. Surveillance footage plays a crucial role in security, law enforcement, and judicial processes. However, with the rise of advanced video editing tools, tampering with digital recordings has become increasingly easy, raising concerns about their authenticity. Ensuring the integrity of surveillance videos is essential, as manipulated footage can lead to misinformation and undermine judicial decisions. This paper provides a comprehensive review of existing forensic techniques used to detect video forgery, focusing on their effectiveness in verifying the authenticity of surveillance recordings. Various methods, including compression-based analysis, frame duplication detection, and machine learning-based approaches, are explored. The findings highlight the growing necessity for more robust forensic techniques to counteract evolving forgery methods. Strengthening video forensic capabilities will ensure that surveillance recordings remain credible and admissible as legal evidence. △ Less

Submitted 4 May, 2025; originally announced May 2025.

arXiv:2505.02961 [pdf, ps, other]

Can We Recycle Our Old Models? An Empirical Evaluation of Model Selection Mechanisms for AIOps Solutions

Authors: Yingzhe Lyu, Hao Li, Heng Li, Ahmed E. Hassan

Abstract: AIOps (Artificial Intelligence for IT Operations) solutions leverage the tremendous amount of data produced during the operation of large-scale systems and machine learning models to assist software practitioners in their system operations. Existing AIOps solutions usually maintain AIOps models against concept drift through periodical retraining, despite leaving a pile of discarded historical mode… ▽ More AIOps (Artificial Intelligence for IT Operations) solutions leverage the tremendous amount of data produced during the operation of large-scale systems and machine learning models to assist software practitioners in their system operations. Existing AIOps solutions usually maintain AIOps models against concept drift through periodical retraining, despite leaving a pile of discarded historical models that may perform well on specific future data. Other prior works propose dynamically selecting models for prediction tasks from a set of candidate models to optimize the model performance. However, there is no prior work in the AIOps area that assesses the use of model selection mechanisms on historical models to improve model performance or robustness. To fill the gap, we evaluate several model selection mechanisms by assessing their capabilities in selecting the optimal AIOps models that were built in the past to make predictions for the target data. We performed a case study on three large-scale public operation datasets: two trace datasets from the cloud computing platforms of Google and Alibaba, and one disk stats dataset from the BackBlaze cloud storage data center. We observe that the model selection mechnisms utilizing temporal adjacency tend to have a better performance and can prevail the periodical retraining approach. Our findings also highlight a performance gap between existing model selection mechnisms and the theoretical upper bound which may motivate future researchers and practitioners in investigating more efficient and effective model selection mechanisms that fit in the context of AIOps. △ Less

Submitted 5 May, 2025; originally announced May 2025.

Comments: arXiv admin note: text overlap with arXiv:2311.03213

arXiv:2504.16498 [pdf, other]

LiDAL-Assisted RLNC-NOMA in OWC Systems

Authors: Ahmed A. Hassan, Ahmad Adnan Qidan, Taisir Elgorashi, Jaafar Elmirghani

Abstract: Optical wireless communication (OWC) is envisioned as a key enabler for immersive indoor data transmission in future wireless communication networks. However, multi-user interference management arises as a challenge in dense indoor OWC systems composed of multiple optical access points (APs) serving multiple users. In this paper, we propose a novel dual-function OWC system for communication and lo… ▽ More Optical wireless communication (OWC) is envisioned as a key enabler for immersive indoor data transmission in future wireless communication networks. However, multi-user interference management arises as a challenge in dense indoor OWC systems composed of multiple optical access points (APs) serving multiple users. In this paper, we propose a novel dual-function OWC system for communication and localization. Non-orthogonal multiple access (NOMA) with random linear network coding (RLNC) is designed for data transmission, where NOMA allows the serving of multiple users simultaneously through controlling the power domain, and RLNC helps minimize errors that might occur during signal processing phase. This setup is assisted with a light detection and localization system (LiDAL) that can passively obtain spatio-temporal indoor information of user presence and location for dynamic-user grouping. The designed LiDAL system helps to improve the estimation of channel state information (CSI) in realistic indoor network scenarios, where the CSI of indoor users might be noisy and/or highly correlated. We evaluate the performance of NOMA combined with RLNC by analyzing the probability of successful decoding compared to conventional NOMA and orthogonal schemes. In addition, we derive the Cramer-Rao Lower Bound (CRLB) to evaluate the accuracy of location estimation. The results show that the proposed RLNC-NOMA improves the probability of successful decoding and the overall system performance. The results also show the high accuracy of the unbiased location estimator and its assistant in reducing the imperfection of CSI, leading to high overall system performance. △ Less

Submitted 23 April, 2025; originally announced April 2025.

arXiv:2504.09242 [pdf, other]

Development of a PPO-Reinforcement Learned Walking Tripedal Soft-Legged Robot using SOFA

Authors: Yomna Mokhtar, Tarek Shohdy, Abdallah A. Hassan, Mostafa Eshra, Omar Elmenawy, Osama Khalil, Haitham El-Hussieny

Abstract: Rigid robots were extensively researched, whereas soft robotics remains an underexplored field. Utilizing soft-legged robots in performing tasks as a replacement for human beings is an important stride to take, especially under harsh and hazardous conditions over rough terrain environments. For the demand to teach any robot how to behave in different scenarios, a real-time physical and visual simu… ▽ More Rigid robots were extensively researched, whereas soft robotics remains an underexplored field. Utilizing soft-legged robots in performing tasks as a replacement for human beings is an important stride to take, especially under harsh and hazardous conditions over rough terrain environments. For the demand to teach any robot how to behave in different scenarios, a real-time physical and visual simulation is essential. When it comes to soft robots specifically, a simulation framework is still an arduous problem that needs to be disclosed. Using the simulation open framework architecture (SOFA) is an advantageous step. However, neither SOFA's manual nor prior public SOFA projects show its maximum capabilities the users can reach. So, we resolved this by establishing customized settings and handling the framework components appropriately. Settling on perfect, fine-tuned SOFA parameters has stimulated our motivation towards implementing the state-of-the-art (SOTA) reinforcement learning (RL) method of proximal policy optimization (PPO). The final representation is a well-defined, ready-to-deploy walking, tripedal, soft-legged robot based on PPO-RL in a SOFA environment. Robot navigation performance is a key metric to be considered for measuring the success resolution. Although in the simulated soft robots case, an 82\% success rate in reaching a single goal is a groundbreaking output, we pushed the boundaries to further steps by evaluating the progress under assigning a sequence of goals. While trailing the platform steps, outperforming discovery has been observed with an accumulative squared error deviation of 19 mm. The full code is publicly available at \href{https://github.com/tarekshohdy/PPO_SOFA_Soft_Legged_Robot.git}{github.com/tarekshohdy/PPO$\textunderscore$SOFA$\textunderscore$Soft$\textunderscore$Legged$\textunderscore$ Robot.git} △ Less

Submitted 12 April, 2025; originally announced April 2025.

arXiv:2504.08767 [pdf]

A Proposed Hybrid Recommender System for Tourism Industry in Iraq Using Evolutionary Apriori and K-means Algorithms

Authors: Bryar A. Hassan, Alla A. Hassan, Joan Lu, Aram M. Ahmed, Tarik A. Rashid

Abstract: The rapid proliferation of tourism data across sectors, including accommodations, cultural sites, and events, has made it increasingly challenging for travelers to identify relevant and personalized recommendations. While traditional recommender systems such as collaborative, content-based, and context-aware systems offer partial solutions, they often struggle with issues like data sparsity and ov… ▽ More The rapid proliferation of tourism data across sectors, including accommodations, cultural sites, and events, has made it increasingly challenging for travelers to identify relevant and personalized recommendations. While traditional recommender systems such as collaborative, content-based, and context-aware systems offer partial solutions, they often struggle with issues like data sparsity and overspecialization. This study proposes a novel hybrid recommender system that combines evolutionary Apriori and K-means clustering algorithms to improve recommendation accuracy and efficiency in the tourism domain. Designed specifically to address the diverse and dynamic tourism landscape in Iraq, the system provides personalized recommendations and clusters of tourist destinations tailored to user preferences and contextual information. To evaluate the systems performance, experiments were conducted on an augmented dataset representative of Iraqs tourism activity, comparing the proposed system with existing methods. Results indicate that the proposed hybrid system significantly reduces execution time by 27-56% and space consumption by 24-31%, while achieving consistently lower Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) values, thereby enhancing prediction accuracy. This approach offers a scalable, context-aware framework that is well-suited for application in regions where tourism data is limited, such as Iraq, ultimately advancing tourism recommender systems by addressing their limitations in complex and data-scarce environments. △ Less

Submitted 1 April, 2025; originally announced April 2025.

arXiv:2504.06285 [pdf]

Reducing Formal Context Extraction: A Newly Proposed Framework from Big Corpora

Authors: Bryar A. Hassan, Shko M. Qader, Alla A. Hassan, Joan Lu, Aram M. Ahmed, Jafar Majidpour, Tarik A. Rashid

Abstract: Automating the extraction of concept hierarchies from free text is advantageous because manual generation is frequently labor- and resource-intensive. Free result, the whole procedure for concept hierarchy learning from free text entails several phases, including sentence-level text processing, sentence splitting, and tokenization. Lemmatization is after formal context analysis (FCA) to derive the… ▽ More Automating the extraction of concept hierarchies from free text is advantageous because manual generation is frequently labor- and resource-intensive. Free result, the whole procedure for concept hierarchy learning from free text entails several phases, including sentence-level text processing, sentence splitting, and tokenization. Lemmatization is after formal context analysis (FCA) to derive the pairings. Nevertheless, there could be a few uninteresting and incorrect pairings in the formal context. It may take a while to generate formal context; thus, size reduction formal context is necessary to weed out irrelevant and incorrect pairings to extract the concept lattice and hierarchies more quickly. This study aims to propose a framework for reducing formal context in extracting concept hierarchies from free text to reduce the ambiguity of the formal context. We achieve this by reducing the size of the formal context using a hybrid of a WordNet-based method and a frequency-based technique. Using 385 samples from the Wikipedia corpus and the suggested framework, tests are carried out to examine the reduced size of formal context, leading to concept lattice and concept hierarchy. With the help of concept lattice-invariants, the generated formal context lattice is compared to the normal one. In contrast to basic ones, the homomorphic between the resultant lattices retains up to 98% of the quality of the generating concept hierarchies, and the reduced concept lattice receives the structural connection of the standard one. Additionally, the new framework is compared to five baseline techniques to calculate the running time on random datasets with various densities. The findings demonstrate that, in various fill ratios, hybrid approaches of the proposed method outperform other indicated competing strategies in concept lattice performance. △ Less

Submitted 1 April, 2025; originally announced April 2025.

arXiv:2504.01767 [pdf, other]

Leveraging Embedding Techniques in Multimodal Machine Learning for Mental Illness Assessment

Authors: Abdelrahaman A. Hassan, Abdelrahman A. Ali, Aya E. Fouda, Radwa J. Hanafy, Mohammed E. Fouda

Abstract: The increasing global prevalence of mental disorders, such as depression and PTSD, requires objective and scalable diagnostic tools. Traditional clinical assessments often face limitations in accessibility, objectivity, and consistency. This paper investigates the potential of multimodal machine learning to address these challenges, leveraging the complementary information available in text, audio… ▽ More The increasing global prevalence of mental disorders, such as depression and PTSD, requires objective and scalable diagnostic tools. Traditional clinical assessments often face limitations in accessibility, objectivity, and consistency. This paper investigates the potential of multimodal machine learning to address these challenges, leveraging the complementary information available in text, audio, and video data. Our approach involves a comprehensive analysis of various data preprocessing techniques, including novel chunking and utterance-based formatting strategies. We systematically evaluate a range of state-of-the-art embedding models for each modality and employ Convolutional Neural Networks (CNNs) and Bidirectional LSTM Networks (BiLSTMs) for feature extraction. We explore data-level, feature-level, and decision-level fusion techniques, including a novel integration of Large Language Model (LLM) predictions. We also investigate the impact of replacing Multilayer Perceptron classifiers with Support Vector Machines. We extend our analysis to severity prediction using PHQ-8 and PCL-C scores and multi-class classification (considering co-occurring conditions). Our results demonstrate that utterance-based chunking significantly improves performance, particularly for text and audio modalities. Decision-level fusion, incorporating LLM predictions, achieves the highest accuracy, with a balanced accuracy of 94.8% for depression and 96.2% for PTSD detection. The combination of CNN-BiLSTM architectures with utterance-level chunking, coupled with the integration of external LLM, provides a powerful and nuanced approach to the detection and assessment of mental health conditions. Our findings highlight the potential of MMML for developing more accurate, accessible, and personalized mental healthcare tools. △ Less

Submitted 2 April, 2025; originally announced April 2025.

arXiv:2504.00975 [pdf, other]

Resource Allocation for RIS-Assisted CoMP-NOMA Networks using Reinforcement Learning

Authors: Muhammad Umer, Muhammad Ahmed Mohsin, Huma Ghafoor, Syed Ali Hassan

Abstract: This thesis delves into the forefront of wireless communication by exploring the synergistic integration of three transformative technologies: STAR-RIS, CoMP, and NOMA. Driven by the ever-increasing demand for higher data rates, improved spectral efficiency, and expanded coverage in the evolving landscape of 6G development, this research investigates the potential of these technologies to revoluti… ▽ More This thesis delves into the forefront of wireless communication by exploring the synergistic integration of three transformative technologies: STAR-RIS, CoMP, and NOMA. Driven by the ever-increasing demand for higher data rates, improved spectral efficiency, and expanded coverage in the evolving landscape of 6G development, this research investigates the potential of these technologies to revolutionize future wireless networks. The thesis analyzes the performance gains achievable through strategic deployment of STAR-RIS, focusing on mitigating inter-cell interference, enhancing signal strength, and extending coverage to cell-edge users. Resource sharing strategies for STAR-RIS elements are explored, optimizing both transmission and reflection functionalities. Analytical frameworks are developed to quantify the benefits of STAR-RIS assisted CoMP-NOMA networks under realistic channel conditions, deriving key performance metrics such as ergodic rates and outage probabilities. Additionally, the research delves into energy-efficient design approaches for CoMP-NOMA networks incorporating RIS, proposing novel RIS configurations and optimization algorithms to achieve a balance between performance and energy consumption. Furthermore, the application of Deep Reinforcement Learning (DRL) techniques for intelligent and adaptive optimization in aerial RIS-assisted CoMP-NOMA networks is explored, aiming to maximize network sum rate while meeting user quality of service requirements. Through a comprehensive investigation of these technologies and their synergistic potential, this thesis contributes valuable insights into the future of wireless communication, paving the way for the development of more efficient, reliable, and sustainable networks capable of meeting the demands of our increasingly connected world. △ Less

Submitted 19 May, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

arXiv:2503.19876 [pdf]

SLA-Awareness for AI-assisted coding

Authors: Kishanthan Thangarajah, Arthur Leung, Boyuan Chen, Ahmed E. Hassan

Abstract: The integration of AI-assisted coding tools within development environments drastically reduces development time, and allows developers to focus more on creative and critical aspects of software engineering through the use of Code Large Language Models (CodeLLMs). These coding assistants automate repetitive and time-consuming coding tasks such as code generation, code completion, code summarizatio… ▽ More The integration of AI-assisted coding tools within development environments drastically reduces development time, and allows developers to focus more on creative and critical aspects of software engineering through the use of Code Large Language Models (CodeLLMs). These coding assistants automate repetitive and time-consuming coding tasks such as code generation, code completion, code summarization, and code translation. Responsiveness is a crucial requirement of these coding assistants to maintain real-time interactivity, such that their use does not impede the developers' workflows. Different coding tasks have unique characteristics and latency requirements: Time-To-First-Token (TTFT) latency is essential for code completion tasks, while End-To-End (E2E) latency is crucial for code translation tasks. Managing these varying requirements simultaneously while optimizing resource usage poses significant challenges. Existing work adopts the Model-as-a-Service paradigm for serving individual CodeLLMs, but cannot effectively manage latency requirements of concurrent coding tasks and sequences of CodeLLM inference calls, due to a lack of end-to-end latency awareness. Another challenge is keeping resource utilization high, when the serving system is deployed on a shared cluster environment. To address these challenges, we propose Coding Assistant Task Orchestrator (CATO), a runtime system designed to serve a diverse assortment of coding tasks while meeting latency requirements and maximizing resource utilization. Our experiments demonstrate that when all types of coding tasks were served simultaneously, for TTFT-critical tasks, CATO improves overall Goodput rate and resource utilization by up to 10% and 41.1%, respectively. P95 E2E latency was also reduced by 18% for code summarization tasks, and P95 TTFT for code generation tasks were reduced by 14% compared against state-of-the-art systems. △ Less

Submitted 25 March, 2025; originally announced March 2025.

arXiv:2503.14122 [pdf]

Aesthetics of Connectivity: Envisioning Empowerment Through Smart Clothing

Authors: Yannick Kibolwe Mulundule, Yao Cheng, Amir Ubed, Abdiaziz Omar Hassan

Abstract: Empowerment in smart clothing, which incorporates advanced technologies, requires the integration of scientific and technological expertise with artistic and design principles. Little research has focused on this unique and innovative field of design until now, and that is about to change. The concept of 'wearables' cut across several fields. A global 'language' that permits both free-form creativ… ▽ More Empowerment in smart clothing, which incorporates advanced technologies, requires the integration of scientific and technological expertise with artistic and design principles. Little research has focused on this unique and innovative field of design until now, and that is about to change. The concept of 'wearables' cut across several fields. A global 'language' that permits both free-form creativity and a methodical design approach is required. Smart clothing designers often seek guidance in their research since it may be difficult to prioritize and understand issues like as usability, production, style, consumer culture, reuse, and end-user needs. Researchers in this research made sure that their design tool was presented in a manner that practitioners from many walks of life could understand. The 'critical route' is a useful tool for smart technology implementation design, study, and development since it helps to clarify the path that must be taken. △ Less

Submitted 28 March, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

arXiv:2503.06720 [pdf, other]

Intelligent Spectrum Sharing in Integrated TN-NTNs: A Hierarchical Deep Reinforcement Learning Approach

Authors: Muhammad Umer, Muhammad Ahmed Mohsin, Ali Arshad Nasir, Hatem Abou-Zeid, Syed ALi Hassan

Abstract: Integrating non-terrestrial networks (NTNs) with terrestrial networks (TNs) is key to enhancing coverage, capacity, and reliability in future wireless communications. However, the multi-tier, heterogeneous architecture of these integrated TN-NTNs introduces complex challenges in spectrum sharing and interference management. Conventional optimization approaches struggle to handle the high-dimension… ▽ More Integrating non-terrestrial networks (NTNs) with terrestrial networks (TNs) is key to enhancing coverage, capacity, and reliability in future wireless communications. However, the multi-tier, heterogeneous architecture of these integrated TN-NTNs introduces complex challenges in spectrum sharing and interference management. Conventional optimization approaches struggle to handle the high-dimensional decision space and dynamic nature of these networks. This paper proposes a novel hierarchical deep reinforcement learning (HDRL) framework to address these challenges and enable intelligent spectrum sharing. The proposed framework leverages the inherent hierarchy of the network, with separate policies for each tier, to learn and optimize spectrum allocation decisions at different timescales and levels of abstraction. By decomposing the complex spectrum sharing problem into manageable sub-tasks and allowing for efficient coordination among the tiers, the HDRL approach offers a scalable and adaptive solution for spectrum management in future TN-NTNs. Simulation results demonstrate the superior performance of the proposed framework compared to traditional approaches, highlighting its potential to enhance spectral efficiency and network capacity in dynamic, multi-tier environments. △ Less

Submitted 9 March, 2025; originally announced March 2025.

Comments: Accepted at IEEE Wireless Communications

arXiv:2503.00673 [pdf, other]

Towards Refining Developer Questions using LLM-Based Named Entity Recognition for Developer Chatroom Conversations

Authors: Pouya Fathollahzadeh, Mariam El Mezouar, Hao Li, Ying Zou, Ahmed E. Hassan

Abstract: In software engineering chatrooms, communication is often hindered by imprecise questions that cannot be answered. Recognizing key entities can be essential for improving question clarity and facilitating better exchange. However, existing research using natural language processing techniques often overlooks these software-specific nuances. In this paper, we introduce Software-specific Named Entit… ▽ More In software engineering chatrooms, communication is often hindered by imprecise questions that cannot be answered. Recognizing key entities can be essential for improving question clarity and facilitating better exchange. However, existing research using natural language processing techniques often overlooks these software-specific nuances. In this paper, we introduce Software-specific Named Entity Recognition, Intent Detection, and Resolution Classification (SENIR), a labeling approach that leverages a Large Language Model to annotate entities, intents, and resolution status in developer chatroom conversations. To offer quantitative guidance for improving question clarity and resolvability, we build a resolution prediction model that leverages SENIR's entity and intent labels along with additional predictive features. We evaluate SENIR on the DISCO dataset using a subset of annotated chatroom dialogues. SENIR achieves an 86% F-score for entity recognition, a 71% F-score for intent detection, and an 89% F-score for resolution status classification. Furthermore, our resolution prediction model, tested with various sampling strategies (random undersampling and oversampling with SMOTE) and evaluation methods (5-fold cross-validation, 10-fold cross-validation, and bootstrapping), demonstrates AUC values ranging from 0.7 to 0.8. Key factors influencing resolution include positive sentiment and entities such as Programming Language and User Variable across multiple intents, while diagnostic entities are more relevant in error-related questions. Moreover, resolution rates vary significantly by intent: questions about API Usage and API Change achieve higher resolution rates, whereas Discrepancy and Review have lower resolution rates. A Chi-Square analysis confirms the statistical significance of these differences. △ Less

Submitted 1 March, 2025; originally announced March 2025.

arXiv:2502.19439 [pdf]

doi 10.1201/9781003601555

Multi-objective Cat Swarm Optimization Algorithm based on a Grid System

Authors: Aram M. Ahmed, Bryar A. Hassan, Tarik A. Rashid, Kaniaw A. Noori, Soran Ab. M. Saeed, Omed H. Ahmed, Shahla U. Umar

Abstract: This paper presents a multi-objective version of the Cat Swarm Optimization Algorithm called the Grid-based Multi-objective Cat Swarm Optimization Algorithm (GMOCSO). Convergence and diversity preservation are the two main goals pursued by modern multi-objective algorithms to yield robust results. To achieve these goals, we first replace the roulette wheel method of the original CSO algorithm with… ▽ More This paper presents a multi-objective version of the Cat Swarm Optimization Algorithm called the Grid-based Multi-objective Cat Swarm Optimization Algorithm (GMOCSO). Convergence and diversity preservation are the two main goals pursued by modern multi-objective algorithms to yield robust results. To achieve these goals, we first replace the roulette wheel method of the original CSO algorithm with a greedy method. Then, two key concepts from Pareto Archived Evolution Strategy Algorithm (PAES) are adopted: the grid system and double archive strategy. Several test functions and a real-world scenario called the Pressure vessel design problem are used to evaluate the proposed algorithm's performance. In the experiment, the proposed algorithm is compared with other well-known algorithms using different metrics such as Reversed Generational Distance, Spacing metric, and Spread metric. The optimization results show the robustness of the proposed algorithm, and the results are further confirmed using statistical methods and graphs. Finally, conclusions and future directions were presented.. △ Less

Submitted 22 February, 2025; originally announced February 2025.

arXiv:2502.15903 [pdf, other]

Computation Offloading Strategies in Integrated Terrestrial and Non-Terrestrial Networks

Authors: Muhammad Ahmed Mohsin, Muhammad Umer, Amara Umar, Hatem Abou-Zeid, Syed Ali Hassan

Abstract: The rapid growth of computation-intensive applications like augmented reality, autonomous driving, remote healthcare, and smart cities has exposed the limitations of traditional terrestrial networks, particularly in terms of inadequate coverage, limited capacity, and high latency in remote areas. This chapter explores how integrated terrestrial and non-terrestrial networks (IT-NTNs) can address th… ▽ More The rapid growth of computation-intensive applications like augmented reality, autonomous driving, remote healthcare, and smart cities has exposed the limitations of traditional terrestrial networks, particularly in terms of inadequate coverage, limited capacity, and high latency in remote areas. This chapter explores how integrated terrestrial and non-terrestrial networks (IT-NTNs) can address these challenges and enable efficient computation offloading. We examine mobile edge computing (MEC) and its evolution toward multiple-access edge computing, highlighting the critical role computation offloading plays for resource-constrained devices. We then discuss the architecture of IT-NTNs, focusing on how terrestrial base stations, unmanned aerial vehicles (UAVs), high-altitude platforms (HAPs), and LEO satellites work together to deliver ubiquitous connectivity. Furthermore, we analyze various computation offloading strategies, including edge, cloud, and hybrid offloading, outlining their strengths and weaknesses. Key enabling technologies such as NOMA, mmWave/THz communication, and reconfigurable intelligent surfaces (RIS) are also explored as essential components of existing algorithms for resource allocation, task offloading decisions, and mobility management. Finally, we conclude by highlighting the transformative impact of computation offloading in IT-NTNs across diverse application areas and discuss key challenges and future research directions, emphasizing the potential of these networks to revolutionize communication and computation paradigms. △ Less

Submitted 21 February, 2025; originally announced February 2025.

Comments: Paper accepted as chapter to Elsevier

arXiv:2502.04202 [pdf, other]

GUIWatcher: Automatically Detecting GUI Lags by Analyzing Mobile Application Screencasts

Authors: Wei Liu, Feng Lin, Linqiang Guo, Tse-Hsun Chen, Ahmed E. Hassan

Abstract: The Graphical User Interface (GUI) plays a central role in mobile applications, directly affecting usability and user satisfaction. Poor GUI performance, such as lag or unresponsiveness, can lead to negative user experience and decreased mobile application (app) ratings. In this paper, we present GUIWatcher, a framework designed to detect GUI lags by analyzing screencasts recorded during mobile ap… ▽ More The Graphical User Interface (GUI) plays a central role in mobile applications, directly affecting usability and user satisfaction. Poor GUI performance, such as lag or unresponsiveness, can lead to negative user experience and decreased mobile application (app) ratings. In this paper, we present GUIWatcher, a framework designed to detect GUI lags by analyzing screencasts recorded during mobile app testing. GUIWatcher uses computer vision techniques to identify three types of lag-inducing frames (i.e., janky frames, long loading frames, and frozen frames) and prioritizes the most severe ones that significantly impact user experience. Our approach was evaluated using real-world mobile application tests, achieving high accuracy in detecting GUI lags in screencasts, with an average precision of 0.91 and recall of 0.96. The comprehensive bug reports generated from the lags detected by GUIWatcher help developers focus on the more critical issues and debug them efficiently. Additionally, GUIWatcher has been deployed in a real-world production environment, continuously monitoring app performance and successfully identifying critical GUI performance issues. By offering a practical solution for identifying and addressing GUI lags, GUIWatcher contributes to enhancing user satisfaction and the overall quality of mobile apps. △ Less

Submitted 6 February, 2025; originally announced February 2025.

Comments: ICSE-SEIP 2025

arXiv:2502.03412 [pdf, other]

Deep Reinforcement Learning-Based Optimization of Second-Life Battery Utilization in Electric Vehicles Charging Stations

Authors: Rouzbeh Haghighi, Ali Hassan, Van-Hai Bui, Akhtar Hussain, Wencong Su

Abstract: The rapid rise in electric vehicle (EV) adoption presents significant challenges in managing the vast number of retired EV batteries. Research indicates that second-life batteries (SLBs) from EVs typically retain considerable residual capacity, offering extended utility. These batteries can be effectively repurposed for use in EV charging stations (EVCS), providing a cost-effective alternative to… ▽ More The rapid rise in electric vehicle (EV) adoption presents significant challenges in managing the vast number of retired EV batteries. Research indicates that second-life batteries (SLBs) from EVs typically retain considerable residual capacity, offering extended utility. These batteries can be effectively repurposed for use in EV charging stations (EVCS), providing a cost-effective alternative to new batteries and reducing overall planning costs. Integrating battery energy storage systems (BESS) with SLBs into EVCS is a promising strategy to alleviate system overload. However, efficient operation of EVCS with integrated BESS is hindered by uncertainties such as fluctuating EV arrival and departure times and variable power prices from the grid. This paper presents a deep reinforcement learning-based (DRL) planning framework for EV charging stations with BESS, leveraging SLBs. We employ the advanced soft actor-critic (SAC) approach, training the model on a year's worth of data to account for seasonal variations, including weekdays and holidays. A tailored reward function enables effective offline training, allowing real-time optimization of EVCS operations under uncertainty. △ Less

Submitted 5 February, 2025; originally announced February 2025.

Comments: 5 pages, 6 figures, Accepted, 2025 IEEE Power and Energy Society General Meeting (PESGM 2025), Austin, TX, USA

arXiv:2501.15576 [pdf]

First Real-Time Detection of Ambient Backscatters using Uplink Sounding Reference Signals of a Commercial 4G Smartphone

Authors: Ahmed ElSanhoury, Islam Galal, Khaled AlKady, Aml ElKhodary, Dinh-Thuy Phan-Huy, Ayman M. Hassan

Abstract: Recently, cellular Ambient Backscattering has been proposed for cellular networks. Up to now an Ambient backscatter device, called zero-energy device or tag, broadcasted its message by backscattering ambient downlink waves from the closest Base Station (BS) according to a predefined pattern. A tag was detected by smartphones nearby. This paper presents, for the first time, a novel ambient backscat… ▽ More Recently, cellular Ambient Backscattering has been proposed for cellular networks. Up to now an Ambient backscatter device, called zero-energy device or tag, broadcasted its message by backscattering ambient downlink waves from the closest Base Station (BS) according to a predefined pattern. A tag was detected by smartphones nearby. This paper presents, for the first time, a novel ambient backscatter communication system exploiting uplink ambient waves from smartphones instead of downlink waves. In this novel system, a BS connected to a smartphone monitors the uplink pilot signals and detects TAGs in proximity. The proposed system is implemented and tested with one prototype of TAG, a commercial off-the shelf 4G smartphone and a 4G Software Defined Radio (SDR) BS. Indoor and outdoor experiments were conducted to assess the proposed technique. These very preliminary experiments exhibit a promising potential. In indoor, a detection probability of more than 90% has been achieved without false alarm when the TAG was 3 meters from the UE, and the BS 20 meters away of them, behind walls and obstacles. △ Less

Submitted 10 April, 2025; v1 submitted 26 January, 2025; originally announced January 2025.

Comments: 11 pages, 19 figures, submitted to JRFID (2nd round after revision)

arXiv:2501.09135 [pdf, other]

HAFix: History-Augmented Large Language Models for Bug Fixing

Authors: Yu Shi, Abdul Ali Bangash, Emad Fallahzadeh, Bram Adams, Ahmed E. Hassan

Abstract: Recent studies have explored the performance of Large Language Models (LLMs) on various Software Engineering (SE) tasks, such as code generation and bug fixing. However, these approaches typically rely on the context data from the current snapshot of the project, overlooking the potential of rich historical data from real-world software repositories. Additionally, the impact of prompt styles on LL… ▽ More Recent studies have explored the performance of Large Language Models (LLMs) on various Software Engineering (SE) tasks, such as code generation and bug fixing. However, these approaches typically rely on the context data from the current snapshot of the project, overlooking the potential of rich historical data from real-world software repositories. Additionally, the impact of prompt styles on LLM performance within a historical context remains underexplored. To address these gaps, we propose HAFix, which stands for History-Augmented LLMs on Bug Fixing, a novel approach that leverages individual historical heuristics associated with bugs and aggregates the results of these heuristics (HAFix-Agg) to enhance LLMs' bug-fixing capabilities. To empirically evaluate HAFix, we employ Code Llama on a dataset of 51 single-line bugs, sourced from 11 open-source projects, by mining the historical context data of bugs and operationalizing this context in the form of seven heuristics. Our evaluation demonstrates that historical heuristics significantly enhance bug-fixing performance. For example, the FLN-all heuristic achieves a 10% improvement in performance compared to a non-historical baseline inspired by GitHub Copilot. Furthermore, HAFix-Agg fixes 45% more bugs than the baseline, outperforming FLN-all and demonstrating the best performance overall. Moreover, within the context of historical heuristics, we identify the Instruction style prompt as the most effective template for LLMs in bug fixing. Finally, we provide a pragmatic trade-off analysis of bug-fixing performance, cost, and time efficiency, offering valuable insights for the practical deployment of our approach in real-world scenarios. △ Less

Submitted 15 January, 2025; originally announced January 2025.

Comments: 55 pages, 18 figures

arXiv:2501.00965 [pdf, other]

doi 10.1007/s10664-024-10485-1

A Large-Scale Exploratory Study on the Proxy Pattern in Ethereum

Authors: Amir M. Ebrahimi, Bram Adams, Gustavo A. Oliva, Ahmed E. Hassan

Abstract: The proxy pattern is a well-known design pattern with numerous use cases in several sectors of the software industry. As such, the use of the proxy pattern is also a common approach in the development of complex decentralized applications (DApps) on the Ethereum blockchain. Despite the importance of proxy contracts, little is known about (i) how their prevalence changed over time, (ii) the ways in… ▽ More The proxy pattern is a well-known design pattern with numerous use cases in several sectors of the software industry. As such, the use of the proxy pattern is also a common approach in the development of complex decentralized applications (DApps) on the Ethereum blockchain. Despite the importance of proxy contracts, little is known about (i) how their prevalence changed over time, (ii) the ways in which developers integrate proxies in the design of DApps, and (iii) what proxy types are being most commonly leveraged by developers. This study bridges these gaps through a comprehensive analysis of Ethereum smart contracts, utilizing a dataset of 50 million contracts and 1.6 billion transactions as of September 2022. Our findings reveal that 14.2% of all deployed smart contracts are proxy contracts. We show that proxy contracts are being more actively used than non-proxy contracts. Also, the usage of proxy contracts in various contexts, transactions involving proxy contracts, and adoption of proxy contracts by users have shown an upward trend over time, peaking at the end of our study period. They are either deployed through off-chain scripts or on-chain factory contracts, with the former and latter being employed in 39.1% and 60.9% of identified usage contexts in turn. We found that while the majority (67.8%) of proxies act as an interceptor, 32.2% enables upgradeability. Proxy contracts are typically (79%) implemented based on known reference implementations with 29.4% being of type ERC-1167, a class of proxies that aims to cheaply reuse and clone contracts' functionality. Our evaluation shows that our proposed behavioral proxy detection method has a precision and recall of 100% in detecting active proxies. Finally, we derive a set of practical recommendations for developers and introduce open research questions to guide future research on the topic. △ Less

Submitted 1 January, 2025; originally announced January 2025.

Journal ref: Empirical Software Engineering. 29, 2024, 1-51

arXiv:2501.00674 [pdf, other]

UPC Sentinel: An Accurate Approach for Detecting Upgradeability Proxy Contracts in Ethereum

Authors: Amir M. Ebrahimi, Bram Adams, Gustavo A. Oliva, Ahmed E. Hassan

Abstract: Software applications that run on a blockchain platform are known as DApps. DApps are built using smart contracts, which are immutable after deployment. Just like any real-world software system, DApps need to receive new features and bug fixes over time in order to remain useful and secure. However, Ethereum lacks native solutions for post-deployment smart contract maintenance, requiring developer… ▽ More Software applications that run on a blockchain platform are known as DApps. DApps are built using smart contracts, which are immutable after deployment. Just like any real-world software system, DApps need to receive new features and bug fixes over time in order to remain useful and secure. However, Ethereum lacks native solutions for post-deployment smart contract maintenance, requiring developers to devise their own methods. A popular method is known as the upgradeability proxy contract (UPC), which involves implementing the proxy design pattern (as defined by the Gang of Four). In this method, client calls first hit a proxy contract, which then delegates calls to a certain implementation contract. Most importantly, the proxy contract can be reconfigured during runtime to delegate calls to another implementation contract, effectively enabling application upgrades. For researchers, the accurate detection of UPCs is a strong requirement in the understanding of how exactly real-world DApps are maintained over time. For practitioners, the accurate detection of UPCs is crucial for providing application behavior transparency and enabling auditing. In this paper, we introduce UPC Sentinel, a novel three-layer algorithm that utilizes both static and dynamic analysis of smart contract bytecode to accurately detect active UPCs. We evaluated UPC Sentinel using two distinct ground truth datasets. In the first dataset, our method demonstrated a near-perfect accuracy of 99%. The evaluation on the second dataset further established our method's efficacy, showing a perfect precision rate of 100% and a near-perfect recall of 99.3%, outperforming the state of the art. Finally, we discuss the potential value of UPC Sentinel in advancing future research efforts. △ Less

Submitted 31 December, 2024; originally announced January 2025.

Comments: Accepted for publication in Empirical Software Engineering

arXiv:2501.00106 [pdf, other]

LicenseGPT: A Fine-tuned Foundation Model for Publicly Available Dataset License Compliance

Authors: Jingwen Tan, Gopi Krishnan Rajbahadur, Zi Li, Xiangfu Song, Jianshan Lin, Dan Li, Zibin Zheng, Ahmed E. Hassan

Abstract: Dataset license compliance is a critical yet complex aspect of developing commercial AI products, particularly with the increasing use of publicly available datasets. Ambiguities in dataset licenses pose significant legal risks, making it challenging even for software IP lawyers to accurately interpret rights and obligations. In this paper, we introduce LicenseGPT, a fine-tuned foundation model (F… ▽ More Dataset license compliance is a critical yet complex aspect of developing commercial AI products, particularly with the increasing use of publicly available datasets. Ambiguities in dataset licenses pose significant legal risks, making it challenging even for software IP lawyers to accurately interpret rights and obligations. In this paper, we introduce LicenseGPT, a fine-tuned foundation model (FM) specifically designed for dataset license compliance analysis. We first evaluate existing legal FMs (i.e., FMs specialized in understanding and processing legal texts) and find that the best-performing model achieves a Prediction Agreement (PA) of only 43.75%. LicenseGPT, fine-tuned on a curated dataset of 500 licenses annotated by legal experts, significantly improves PA to 64.30%, outperforming both legal and general-purpose FMs. Through an A/B test and user study with software IP lawyers, we demonstrate that LicenseGPT reduces analysis time by 94.44%, from 108 seconds to 6 seconds per license, without compromising accuracy. Software IP lawyers perceive LicenseGPT as a valuable supplementary tool that enhances efficiency while acknowledging the need for human oversight in complex cases. Our work underscores the potential of specialized AI tools in legal practice and offers a publicly available resource for practitioners and researchers. △ Less

Submitted 30 December, 2024; originally announced January 2025.

arXiv:2412.18957 [pdf, other]

RIS-Assisted Aerial Non-Terrestrial Networks: An Intelligent Synergy with Deep Reinforcement Learning

Authors: Muhammad Umer, Muhammad Ahmed Mohsin, Aryan Kaushik, Qurrat-ul-Ain Nadeem, Ali Arshad Nasir, Syed Ali Hassan

Abstract: Reconfigurable intelligent surface (RIS)-assisted aerial non-terrestrial networks (NTNs) offer a promising paradigm for enhancing wireless communications in the era of 6G and beyond. By integrating RIS with aerial platforms such as unmanned aerial vehicles (UAVs) and high-altitude platforms (HAPs), these networks can intelligently control signal propagation, extending coverage, improving capacity,… ▽ More Reconfigurable intelligent surface (RIS)-assisted aerial non-terrestrial networks (NTNs) offer a promising paradigm for enhancing wireless communications in the era of 6G and beyond. By integrating RIS with aerial platforms such as unmanned aerial vehicles (UAVs) and high-altitude platforms (HAPs), these networks can intelligently control signal propagation, extending coverage, improving capacity, and enhancing link reliability. This article explores the application of deep reinforcement learning (DRL) as a powerful tool for optimizing RIS-assisted aerial NTNs. We focus on hybrid proximal policy optimization (H-PPO), a robust DRL algorithm well-suited for handling the complex, hybrid action spaces inherent in these networks. Through a case study of an aerial RIS (ARIS)-aided coordinated multi-point non-orthogonal multiple access (CoMP-NOMA) network, we demonstrate how H-PPO can effectively optimize the system and maximize the sum rate while adhering to system constraints. Finally, we discuss key challenges and promising research directions for DRL-powered RIS-assisted aerial NTNs, highlighting their potential to transform next-generation wireless networks. △ Less

Submitted 25 December, 2024; originally announced December 2024.

Comments: IEEE Vehicular Technology Magazine

arXiv:2412.11137 [pdf]

doi 10.1007/s12010-024-05110-2

Decoding Drug Discovery: Exploring A-to-Z In silico Methods for Beginners

Authors: Hezha O. Rasul, Dlzar D. Ghafour, Bakhtyar K. Aziz, Bryar A. Hassan, Tarik A. Rashid, Arif Kivrak

Abstract: The drug development process is a critical challenge in the pharmaceutical industry due to its time-consuming nature and the need to discover new drug potentials to address various ailments. The initial step in drug development, drug target identification, often consumes considerable time. While valid, traditional methods such as in vivo and in vitro approaches are limited in their ability to anal… ▽ More The drug development process is a critical challenge in the pharmaceutical industry due to its time-consuming nature and the need to discover new drug potentials to address various ailments. The initial step in drug development, drug target identification, often consumes considerable time. While valid, traditional methods such as in vivo and in vitro approaches are limited in their ability to analyze vast amounts of data efficiently, leading to wasteful outcomes. To expedite and streamline drug development, an increasing reliance on computer-aided drug design (CADD) approaches has merged. These sophisticated in silico methods offer a promising avenue for efficiently identifying viable drug candidates, thus providing pharmaceutical firms with significant opportunities to uncover new prospective drug targets. The main goal of this work is to review in silico methods used in the drug development process with a focus on identifying therapeutic targets linked to specific diseases at the genetic or protein level. This article thoroughly discusses A-to-Z in silico techniques, which are essential for identifying the targets of bioactive compounds and their potential therapeutic effects. This review intends to improve drug discovery processes by illuminating the state of these cutting-edge approaches, thereby maximizing the effectiveness and duration of clinical trials for novel drug target investigation. △ Less

Submitted 15 December, 2024; originally announced December 2024.

Comments: https://link.springer.com/article/10.1007/s12010-024-05110-2

arXiv:2412.03796 [pdf, other]

Automated Multi-Label Annotation for Mental Health Illnesses Using Large Language Models

Authors: Abdelrahaman A. Hassan, Radwa J. Hanafy, Mohammed E. Fouda

Abstract: The growing prevalence and complexity of mental health disorders present significant challenges for accurate diagnosis and treatment, particularly in understanding the interplay between co-occurring conditions. Mental health disorders, such as depression and Anxiety, often co-occur, yet current datasets derived from social media posts typically focus on single-disorder labels, limiting their utili… ▽ More The growing prevalence and complexity of mental health disorders present significant challenges for accurate diagnosis and treatment, particularly in understanding the interplay between co-occurring conditions. Mental health disorders, such as depression and Anxiety, often co-occur, yet current datasets derived from social media posts typically focus on single-disorder labels, limiting their utility in comprehensive diagnostic analyses. This paper addresses this critical gap by proposing a novel methodology for cleaning, sampling, labeling, and combining data to create versatile multi-label datasets. Our approach introduces a synthetic labeling technique to transform single-label datasets into multi-label annotations, capturing the complexity of overlapping mental health conditions. To achieve this, two single-label datasets are first merged into a foundational multi-label dataset, enabling realistic analyses of co-occurring diagnoses. We then design and evaluate various prompting strategies for large language models (LLMs), ranging from single-label predictions to unrestricted prompts capable of detecting any present disorders. After rigorously assessing multiple LLMs and prompt configurations, the optimal combinations are identified and applied to label six additional single-disorder datasets from RMHD. The result is SPAADE-DR, a robust, multi-label dataset encompassing diverse mental health conditions. This research demonstrates the transformative potential of LLM-driven synthetic labeling in advancing mental health diagnostics from social media data, paving the way for more nuanced, data-driven insights into mental health care. △ Less

Submitted 4 December, 2024; originally announced December 2024.

arXiv:2412.02907 [pdf, other]

Predicting post-release defects with knowledge units (KUs) of programming languages: an empirical study

Authors: Md Ahasanuzzaman, Gustavo A. Oliva, Ahmed E. Hassan, Zhen Ming, Jiang

Abstract: Defect prediction plays a crucial role in software engineering, enabling developers to identify defect-prone code and improve software quality. While extensive research has focused on refining machine learning models for defect prediction, the exploration of new data sources for feature engineering remains limited. Defect prediction models primarily rely on traditional metrics such as product, pro… ▽ More Defect prediction plays a crucial role in software engineering, enabling developers to identify defect-prone code and improve software quality. While extensive research has focused on refining machine learning models for defect prediction, the exploration of new data sources for feature engineering remains limited. Defect prediction models primarily rely on traditional metrics such as product, process, and code ownership metrics, which, while effective, do not capture language-specific traits that may influence defect proneness. To address this gap, we introduce Knowledge Units (KUs) of programming languages as a novel feature set for analyzing software systems and defect prediction. A KU is a cohesive set of key capabilities that are offered by one or more building blocks of a given programming language. We conduct an empirical study leveraging 28 KUs that are derived from Java certification exams and compare their effectiveness against traditional metrics in predicting post-release defects across 8 well-maintained Java software systems. Our results show that KUs provide significant predictive power, achieving a median AUC of 0.82, outperforming individual group of traditional metric-based models. Among KU features, Method & Encapsulation, Inheritance, and Exception Handling emerge as the most influential predictors. Furthermore, combining KUs with traditional metrics enhances prediction performance, yielding a median AUC of 0.89. We also introduce a cost-effective model using only 10 features, which maintains strong predictive performance while reducing feature engineering costs. Our findings demonstrate the value of KUs in predicting post-release defects, offering a complementary perspective to traditional metrics. This study can be helpful to researchers who wish to analyze software systems from a perspective that is complementary to that of traditional metrics. △ Less

Submitted 3 March, 2025; v1 submitted 3 December, 2024; originally announced December 2024.

arXiv:2411.17793 [pdf, other]

Engineering AI Judge Systems

Authors: Jiahuei Lin, Dayi Lin, Sky Zhang, Ahmed E. Hassan

Abstract: AI judge systems are designed to automatically evaluate Foundation Model-powered software (i.e., FMware). Due to the intrinsic dynamic and stochastic nature of FMware, the development of AI judge systems requires a unique engineering life cycle and presents new challenges. In this paper, we discuss the challenges based on our industrial experiences in developing AI judge systems for FMware. These… ▽ More AI judge systems are designed to automatically evaluate Foundation Model-powered software (i.e., FMware). Due to the intrinsic dynamic and stochastic nature of FMware, the development of AI judge systems requires a unique engineering life cycle and presents new challenges. In this paper, we discuss the challenges based on our industrial experiences in developing AI judge systems for FMware. These challenges lead to substantial time consumption, cost and inaccurate judgments. We propose a framework that tackles the challenges with the goal of improving the productivity of developing high-quality AI judge systems. Finally, we evaluate our framework with a case study on judging a commit message generation FMware. The accuracy of the judgments made by the AI judge system developed with our framework outperforms those made by the AI judge system that is developed without our framework by up to 6.2%, with a significant reduction in development effort. △ Less

Submitted 26 November, 2024; originally announced November 2024.

arXiv:2411.09837 [pdf, ps, other]

Real-time Adapting Routing (RAR): Improving Efficiency Through Continuous Learning in Software Powered by Layered Foundation Models

Authors: Kirill Vasilevski, Dayi Lin, Ahmed E. Hassan

Abstract: To balance the quality and inference cost of a Foundation Model (FM, such as large language models (LLMs)) powered software, people often opt to train a routing model that routes requests to FMs with different sizes and capabilities. Existing routing models rely on learning the optimal routing decision from carefully curated data, require complex computations to be updated, and do not consider the… ▽ More To balance the quality and inference cost of a Foundation Model (FM, such as large language models (LLMs)) powered software, people often opt to train a routing model that routes requests to FMs with different sizes and capabilities. Existing routing models rely on learning the optimal routing decision from carefully curated data, require complex computations to be updated, and do not consider the potential evolution of weaker FMs. In this paper, we propose Real-time Adaptive Routing (RAR), an approach to continuously adapt FM routing decisions while using guided in-context learning to enhance the capabilities of weaker FM. The goal is to reduce reliance on stronger, more expensive FMs. We evaluate our approach on different subsets of the popular MMLU benchmark. Over time, our approach routes 50.2% fewer requests to computationally expensive models while maintaining around 90.5% of the general response quality. In addition, the guides generated from stronger models have shown intra-domain generalization and led to a better quality of responses compared to an equivalent approach with a standalone weaker FM. △ Less

Submitted 2 June, 2025; v1 submitted 14 November, 2024; originally announced November 2024.

arXiv:2411.09580 [pdf, other]

Software Performance Engineering for Foundation Model-Powered Software (FMware)

Authors: Haoxiang Zhang, Shi Chang, Arthur Leung, Kishanthan Thangarajah, Boyuan Chen, Hanan Lutfiyya, Ahmed E. Hassan

Abstract: The rise of Foundation Models (FMs) like Large Language Models (LLMs) is revolutionizing software development. Despite the impressive prototypes, transforming FMware into production-ready products demands complex engineering across various domains. A critical but overlooked aspect is performance engineering, which aims at ensuring FMware meets performance goals such as throughput and latency to av… ▽ More The rise of Foundation Models (FMs) like Large Language Models (LLMs) is revolutionizing software development. Despite the impressive prototypes, transforming FMware into production-ready products demands complex engineering across various domains. A critical but overlooked aspect is performance engineering, which aims at ensuring FMware meets performance goals such as throughput and latency to avoid user dissatisfaction and financial loss. Often, performance considerations are an afterthought, leading to costly optimization efforts post-deployment. FMware's high computational resource demands highlight the need for efficient hardware use. Continuous performance engineering is essential to prevent degradation. This paper highlights the significance of Software Performance Engineering (SPE) in FMware, identifying four key challenges: cognitive architecture design, communication protocols, tuning and optimization, and deployment. These challenges are based on literature surveys and experiences from developing an in-house FMware system. We discuss problems, current practices, and innovative paths for the software engineering community. △ Less

Submitted 14 November, 2024; originally announced November 2024.

arXiv:2411.07395 [pdf]

Data-Centric Learning Framework for Real-Time Detection of Aiming Beam in Fluorescence Lifetime Imaging Guided Surgery

Authors: Mohamed Abul Hassan, Pu Sun, Xiangnan Zhou, Lisanne Kraft, Kelsey T Hadfield, Katjana Ehrlich, Jinyi Qi, Andrew Birkeland, Laura Marcu

Abstract: This study introduces a novel data-centric approach to improve real-time surgical guidance using fiber-based fluorescence lifetime imaging (FLIm). A key aspect of the methodology is the accurate detection of the aiming beam, which is essential for localizing points used to map FLIm measurements onto the tissue region within the surgical field. The primary challenge arises from the complex and vari… ▽ More This study introduces a novel data-centric approach to improve real-time surgical guidance using fiber-based fluorescence lifetime imaging (FLIm). A key aspect of the methodology is the accurate detection of the aiming beam, which is essential for localizing points used to map FLIm measurements onto the tissue region within the surgical field. The primary challenge arises from the complex and variable conditions encountered in the surgical environment, particularly in Transoral Robotic Surgery (TORS). Uneven illumination in the surgical field can cause reflections, reduce contrast, and results in inconsistent color representation, further complicating aiming beam detection. To overcome these challenges, an instance segmentation model was developed using a data-centric training strategy that improves accuracy by minimizing label noise and enhancing detection robustness. The model was evaluated on a dataset comprising 40 in vivo surgical videos, demonstrating a median detection rate of 85%. This performance was maintained when the model was integrated in a clinical system, achieving a similar detection rate of 85% during TORS procedures conducted in patients. The system's computational efficiency, measured at approximately 24 frames per second (FPS), was sufficient for real-time surgical guidance. This study enhances the reliability of FLIm-based aiming beam detection in complex surgical environments, advancing the feasibility of real-time, image-guided interventions for improved surgical precision △ Less

Submitted 11 November, 2024; originally announced November 2024.

arXiv:2411.03585 [pdf]

doi 10.5121/ijwmn.2024.16501

Potential Use of IoT Distance Measurement Tool in Boule Sports

Authors: Wahidah Md Shah, M Azim. Adnan, Aslinda Hassan, Norharyati Harum, Isredza Rahmi A. Hamid

Abstract: In Petanque, each player aims to throw the boule closer to the jack. The closest boule to the jack among players will score the point. Currently, the distance of the boule to the jack is still measured using manual measurement tools such as measuring tape, string, and calipers. The manual measurement method is considered time-consuming and prone to inconsistent reading, which the ordinary referees… ▽ More In Petanque, each player aims to throw the boule closer to the jack. The closest boule to the jack among players will score the point. Currently, the distance of the boule to the jack is still measured using manual measurement tools such as measuring tape, string, and calipers. The manual measurement method is considered time-consuming and prone to inconsistent reading, which the ordinary referees and players conduct. A steady hand is required to hold the tape at two ends while squatting or kneeling. The technique of reading the measurement is also important to determine the accuracy of the length. This project aims to design and develop a prototype device that can measure the distance between jack and boule using a microcontroller and ultrasonic sensor technology. The device is expected to provide an instant measurement of the distance between the jack and the boule. The measurement data can be displayed on the mobile device to ease the user to view the result. This prototype device also counts the score points and determines the winner. △ Less

Submitted 5 November, 2024; originally announced November 2024.

Comments: 10 pages

Journal ref: International Journal of Wireless & Mobile Networks (IJWMN), Vol.16, No.4/5. Oct. 2024

arXiv:2411.03455 [pdf, other]

Watson: A Cognitive Observability Framework for the Reasoning of LLM-Powered Agents

Authors: Benjamin Rombaut, Sogol Masoumzadeh, Kirill Vasilevski, Dayi Lin, Ahmed E. Hassan

Abstract: As foundation models (FMs) play an increasingly prominent role in complex software systems, such as agentic software, they introduce significant observability and debuggability challenges. Although recent Large Reasoning Models (LRMs) generate their thought processes as part of the output, in many scenarios fast-thinking Large Language Models (LLMs) are still preferred due to latency constraints.… ▽ More As foundation models (FMs) play an increasingly prominent role in complex software systems, such as agentic software, they introduce significant observability and debuggability challenges. Although recent Large Reasoning Models (LRMs) generate their thought processes as part of the output, in many scenarios fast-thinking Large Language Models (LLMs) are still preferred due to latency constraints. LLM-powered agents operate autonomously with opaque implicit reasoning, making it difficult to debug their unexpected behaviors or errors. In this paper, we introduce Watson, a novel framework that provides reasoning observability into the implicit reasoning processes of agents driven by fast-thinking LLMs, allowing the identification and localization of errors and guidance for corrections. We demonstrate the accuracy of the recovered implicit reasoning trace by Watson and its usefulness through debugging and improving the performance of LLM-powered agents in two scenarios: Massive Multitask Language Understanding (MMLU) benchmark and SWE-bench-lite. Using Watson, we were able to observe and identify the implicit reasoning errors, and automatically provide targeted corrections at runtime that improve the Pass@1 of agents on MMLU and SWE-bench-lite by 7.58 (13.45% relative improvement) and 7.76 (12.31% relative improvement) percentage points, respectively, without updates to models or the cognitive architecture of the agents. △ Less

Submitted 5 March, 2025; v1 submitted 5 November, 2024; originally announced November 2024.

arXiv:2411.01963 [pdf, other]

doi 10.1109/ICMLA61862.2024.00138

V-CAS: A Realtime Vehicle Anti Collision System Using Vision Transformer on Multi-Camera Streams

Authors: Muhammad Waqas Ashraf, Ali Hassan, Imad Ali Shah

Abstract: This paper introduces a real-time Vehicle Collision Avoidance System (V-CAS) designed to enhance vehicle safety through adaptive braking based on environmental perception. V-CAS leverages the advanced vision-based transformer model RT-DETR, DeepSORT tracking, speed estimation, brake light detection, and an adaptive braking mechanism. It computes a composite collision risk score based on vehicles'… ▽ More This paper introduces a real-time Vehicle Collision Avoidance System (V-CAS) designed to enhance vehicle safety through adaptive braking based on environmental perception. V-CAS leverages the advanced vision-based transformer model RT-DETR, DeepSORT tracking, speed estimation, brake light detection, and an adaptive braking mechanism. It computes a composite collision risk score based on vehicles' relative accelerations, distances, and detected braking actions, using brake light signals and trajectory data from multiple camera streams to improve scene perception. Implemented on the Jetson Orin Nano, V-CAS enables real-time collision risk assessment and proactive mitigation through adaptive braking. A comprehensive training process was conducted on various datasets for comparative analysis, followed by fine-tuning the selected object detection model using transfer learning. The system's effectiveness was rigorously evaluated on the Car Crash Dataset (CCD) from YouTube and through real-time experiments, achieving over 98% accuracy with an average proactive alert time of 1.13 seconds. Results indicate significant improvements in object detection and tracking, enhancing collision avoidance compared to traditional single-camera methods. This research demonstrates the potential of low-cost, multi-camera embedded vision transformer systems to advance automotive safety through enhanced environmental perception and proactive collision avoidance mechanisms. △ Less

Submitted 4 November, 2024; originally announced November 2024.

Comments: Accepted at ICMLA 2024

Journal ref: In 2024 International Conference on Machine Learning and Applications (ICMLA), pp.939-944

arXiv:2411.01338 [pdf, other]

Deep Reinforcement Learning for Trajectory and Phase Shift Optimization of Aerial RIS in CoMP-NOMA Networks

Authors: Muhammad Umer, Muhammad Ahmed Mohsin, Aamir Mahmood, Kapal Dev, Haejoon Jung, Mikael Gidlund, Syed Ali Hassan

Abstract: This paper explores the potential of aerial reconfigurable intelligent surfaces (ARIS) to enhance coordinated multi-point non-orthogonal multiple access (CoMP-NOMA) networks. We consider a system model where a UAV-mounted RIS assists in serving multiple users through NOMA while coordinating with multiple base stations. The optimization of UAV trajectory, RIS phase shifts, and NOMA power control co… ▽ More This paper explores the potential of aerial reconfigurable intelligent surfaces (ARIS) to enhance coordinated multi-point non-orthogonal multiple access (CoMP-NOMA) networks. We consider a system model where a UAV-mounted RIS assists in serving multiple users through NOMA while coordinating with multiple base stations. The optimization of UAV trajectory, RIS phase shifts, and NOMA power control constitutes a complex problem due to the hybrid nature of the parameters, involving both continuous and discrete values. To tackle this challenge, we propose a novel framework utilizing the multi-output proximal policy optimization (MO-PPO) algorithm. MO-PPO effectively handles the diverse nature of these optimization parameters, and through extensive simulations, we demonstrate its effectiveness in achieving near-optimal performance and adapting to dynamic environments. Our findings highlight the benefits of integrating ARIS in CoMP-NOMA networks for improved spectral efficiency and coverage in future wireless networks. △ Less

Submitted 2 November, 2024; originally announced November 2024.

Comments: IEEE Globecom 2024

arXiv:2411.01074 [pdf, other]

Improving DNN Modularization via Activation-Driven Training

Authors: Tuan Ngo, Abid Hassan, Saad Shafiq, Nenad Medvidovic

Abstract: Deep Neural Networks (DNNs) suffer from significant retraining costs when adapting to evolving requirements. Modularizing DNNs offers the promise of improving their reusability. Previous work has proposed techniques to decompose DNN models into modules both during and after training. However, these strategies yield several shortcomings, including significant weight overlaps and accuracy losses acr… ▽ More Deep Neural Networks (DNNs) suffer from significant retraining costs when adapting to evolving requirements. Modularizing DNNs offers the promise of improving their reusability. Previous work has proposed techniques to decompose DNN models into modules both during and after training. However, these strategies yield several shortcomings, including significant weight overlaps and accuracy losses across modules, restricted focus on convolutional layers only, and added complexity and training time by introducing auxiliary masks to control modularity. In this work, we propose MODA, an activation-driven modular training approach. MODA promotes inherent modularity within a DNN model by directly regulating the activation outputs of its layers based on three modular objectives: intra-class affinity, inter-class dispersion, and compactness. MODA is evaluated using three well-known DNN models and three datasets with varying sizes. This evaluation indicates that, compared to the existing state-of-the-art, using MODA yields several advantages: (1) MODA accomplishes modularization with 29% less training time; (2) the resultant modules generated by MODA comprise 2.4x fewer weights and 3.5x less weight overlap while (3) preserving the original model's accuracy without additional fine-tuning; in module replacement scenarios, (4) MODA improves the accuracy of a target class by 12% on average while ensuring minimal impact on the accuracy of other classes. △ Less

Submitted 1 November, 2024; originally announced November 2024.

arXiv:2411.00907

On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance

Authors: Jaskirat Singh, Bram Adams, Ahmed E. Hassan

Abstract: To help MLOps engineers decide which operator to use in which deployment scenario, this study aims to empirically assess the accuracy vs latency trade-off of white-box (training-based) and black-box operators (non-training-based) and their combinations in an Edge AI setup. We perform inference experiments including 3 white-box (i.e., QAT, Pruning, Knowledge Distillation), 2 black-box (i.e., Partit… ▽ More To help MLOps engineers decide which operator to use in which deployment scenario, this study aims to empirically assess the accuracy vs latency trade-off of white-box (training-based) and black-box operators (non-training-based) and their combinations in an Edge AI setup. We perform inference experiments including 3 white-box (i.e., QAT, Pruning, Knowledge Distillation), 2 black-box (i.e., Partition, SPTQ), and their combined operators (i.e., Distilled SPTQ, SPTQ Partition) across 3 tiers (i.e., Mobile, Edge, Cloud) on 4 commonly-used Computer Vision and Natural Language Processing models to identify the effective strategies, considering the perspective of MLOps Engineers. Our Results indicate that the combination of Distillation and SPTQ operators (i.e., DSPTQ) should be preferred over non-hybrid operators when lower latency is required in the edge at small to medium accuracy drop. Among the non-hybrid operators, the Distilled operator is a better alternative in both mobile and edge tiers for lower latency performance at the cost of small to medium accuracy loss. Moreover, the operators involving distillation show lower latency in resource-constrained tiers (Mobile, Edge) compared to the operators involving Partitioning across Mobile and Edge tiers. For textual subject models, which have low input data size requirements, the Cloud tier is a better alternative for the deployment of operators than the Mobile, Edge, or Mobile-Edge tier (the latter being used for operators involving partitioning). In contrast, for image-based subject models, which have high input data size requirements, the Edge tier is a better alternative for operators than Mobile, Edge, or their combination. △ Less

Submitted 22 January, 2025; v1 submitted 1 November, 2024; originally announced November 2024.

Comments: In Approach Section, Pruning & Knowledge Distillation methods of Intel Neural Compressor don't reduce the model size & improve performance, respectively, unlike previous studies. There are issues exporting QAT models from PyTorch to ONNX, raising concerns about our latency results

arXiv:2410.20791 [pdf, other]

From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap

Authors: Gopi Krishnan Rajbahadur, Gustavo A. Oliva, Dayi Lin, Ahmed E. Hassan

Abstract: The rapid expansion of foundation models (FMs), such as large language models (LLMs), has given rise to FMware--software systems that integrate FMs as core components. While building demonstration-level FMware is relatively straightforward, transitioning to production-ready systems presents numerous challenges, including reliability, high implementation costs, scalability, and compliance with priv… ▽ More The rapid expansion of foundation models (FMs), such as large language models (LLMs), has given rise to FMware--software systems that integrate FMs as core components. While building demonstration-level FMware is relatively straightforward, transitioning to production-ready systems presents numerous challenges, including reliability, high implementation costs, scalability, and compliance with privacy regulations. Our paper conducts a semi-structured thematic synthesis to identify the key challenges in productionizing FMware across diverse data sources including our own industry experience in developing FMArts--a FMware lifecycle engineering platform and integrating it into Huawei cloud, grey literature, academic publications, hands-on involvement in the Open Platform for Enterprise AI (OPEA), organizing the AIware conference and Bootcamp, and co-leading the ISO SPDX SBOM working group on AI and datasets. We identify critical issues in FM selection, data and model alignment, prompt engineering, agent orchestration, system testing, and deployment, alongside cross-cutting concerns such as memory management, observability, and feedback integration. We discuss needed technologies and strategies to address these challenges and offer guidance on how to enable the transition from demonstration systems to scalable, production-ready FMware solutions. Our findings underscore the importance of continued research and multi-industry collaboration to advance the development of production-ready FMware. △ Less

Submitted 27 January, 2025; v1 submitted 28 October, 2024; originally announced October 2024.

arXiv:2410.13073 [pdf, other]

PromptExp: Multi-granularity Prompt Explanation of Large Language Models

Authors: Ximing Dong, Shaowei Wang, Dayi Lin, Gopi Krishnan Rajbahadur, Boquan Zhou, Shichao Liu, Ahmed E. Hassan

Abstract: Large Language Models excel in tasks like natural language understanding and text generation. Prompt engineering plays a critical role in leveraging LLM effectively. However, LLMs black-box nature hinders its interpretability and effective prompting engineering. A wide range of model explanation approaches have been developed for deep learning models, However, these local explanations are designed… ▽ More Large Language Models excel in tasks like natural language understanding and text generation. Prompt engineering plays a critical role in leveraging LLM effectively. However, LLMs black-box nature hinders its interpretability and effective prompting engineering. A wide range of model explanation approaches have been developed for deep learning models, However, these local explanations are designed for single-output tasks like classification and regression,and cannot be directly applied to LLMs, which generate sequences of tokens. Recent efforts in LLM explanation focus on natural language explanations, but they are prone to hallucinations and inaccuracies. To address this, we introduce PromptExp , a framework for multi-granularity prompt explanations by aggregating token-level insights. PromptExp introduces two token-level explanation approaches: 1. an aggregation-based approach combining local explanation techniques, and 2. a perturbation-based approach with novel techniques to evaluate token masking impact. PromptExp supports both white-box and black-box explanations and extends explanations to higher granularity levels, enabling flexible analysis. We evaluate PromptExp in case studies such as sentiment analysis, showing the perturbation-based approach performs best using semantic similarity to assess perturbation impact. Furthermore, we conducted a user study to confirm PromptExp's accuracy and practical value, and demonstrate its potential to enhance LLM interpretability. △ Less

Submitted 30 October, 2024; v1 submitted 16 October, 2024; originally announced October 2024.

Comments: 11 pages

arXiv:2410.09012 [pdf, other]

Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models

Authors: Hao Li, Cor-Paul Bezemer, Ahmed E. Hassan

Abstract: Foundation models (FMs) such as large language models (LLMs) have significantly impacted many fields, including software engineering (SE). The interaction between SE and FMs has led to the integration of FMs into SE practices (FM4SE) and the application of SE methodologies to FMs (SE4FM). While several literature surveys exist on academic contributions to these trends, we are the first to provide… ▽ More Foundation models (FMs) such as large language models (LLMs) have significantly impacted many fields, including software engineering (SE). The interaction between SE and FMs has led to the integration of FMs into SE practices (FM4SE) and the application of SE methodologies to FMs (SE4FM). While several literature surveys exist on academic contributions to these trends, we are the first to provide a practitioner's view. We analyze 155 FM4SE and 997 SE4FM blog posts from leading technology companies, leveraging an FM-powered surveying approach to systematically label and summarize the discussed activities and tasks. We observed that while code generation is the most prominent FM4SE task, FMs are leveraged for many other SE activities such as code understanding, summarization, and API recommendation. The majority of blog posts on SE4FM are about model deployment & operation, and system architecture & orchestration. Although the emphasis is on cloud deployments, there is a growing interest in compressing FMs and deploying them on smaller devices such as edge or mobile devices. We outline eight future research directions inspired by our gained insights, aiming to bridge the gap between academic findings and real-world applications. Our study not only enriches the body of knowledge on practical applications of FM4SE and SE4FM but also demonstrates the utility of FMs as a powerful and efficient approach in conducting literature surveys within technical and grey literature domains. Our dataset, results, code and used prompts can be found in our online replication package at https://github.com/SAILResearch/fmse-blogs. △ Less

Submitted 6 January, 2025; v1 submitted 11 October, 2024; originally announced October 2024.

Comments: ICSE-SEIP 2025

arXiv:2410.06145 [pdf, other]

Serverless Cold Starts and Where to Find Them

Authors: Artjom Joosen, Ahmed Hassan, Martin Asenov, Rajkarn Singh, Luke Darlow, Jianfeng Wang, Qiwen Deng, Adam Barker

Abstract: This paper releases and analyzes a month-long trace of 85 billion user requests and 11.9 million cold starts from Huawei's serverless cloud platform. Our analysis spans workloads from five data centers. We focus on cold starts and provide a comprehensive examination of the underlying factors influencing the number and duration of cold starts. These factors include trigger types, request synchronic… ▽ More This paper releases and analyzes a month-long trace of 85 billion user requests and 11.9 million cold starts from Huawei's serverless cloud platform. Our analysis spans workloads from five data centers. We focus on cold starts and provide a comprehensive examination of the underlying factors influencing the number and duration of cold starts. These factors include trigger types, request synchronicity, runtime languages, and function resource allocations. We investigate components of cold starts, including pod allocation time, code and dependency deployment time, and scheduling delays, and examine their relationships with runtime languages, trigger types, and resource allocation. We introduce pod utility ratio to measure the pod's useful lifetime relative to its cold start time, giving a more complete picture of cold starts, and see that some pods with long cold start times have longer useful lifetimes. Our findings reveal the complexity and multifaceted origins of the number, duration, and characteristics of cold starts, driven by differences in trigger types, runtime languages, and function resource allocations. For example, cold starts in Region 1 take up to 7 seconds, dominated by dependency deployment time and scheduling. In Region 2, cold starts take up to 3 seconds and are dominated by pod allocation time. Based on this, we identify opportunities to reduce the number and duration of cold starts using strategies for multi-region scheduling. Finally, we suggest directions for future research to address these challenges and enhance the performance of serverless cloud platforms. Our datasets and code are available here https://github.com/sir-lab/data-release △ Less

Submitted 8 October, 2024; originally announced October 2024.

ACM Class: C.4; D.4.7

Showing 1–50 of 319 results for author: Hassan, A