-
Reliability of Capacitive Read in Arrays of Ferroelectric Capacitors
Authors:
Luca Fehlings,
Muhtasim Alam Chowdhury,
Banafsheh Saber Latibari,
Soheil Salehi,
Erika Covi
Abstract:
The non-destructive capacitance read-out of ferroelectric capacitors (FeCaps) based on doped HfO$_2$ metal-ferroelectric-metal (MFM) structures offers the potential for low-power and highly scalable crossbar arrays. This is due to a number of factors, including the selector-less design, the absence of sneak paths, the power-efficient charge-based read operation, and the reduced IR drop. Neverthele…
▽ More
The non-destructive capacitance read-out of ferroelectric capacitors (FeCaps) based on doped HfO$_2$ metal-ferroelectric-metal (MFM) structures offers the potential for low-power and highly scalable crossbar arrays. This is due to a number of factors, including the selector-less design, the absence of sneak paths, the power-efficient charge-based read operation, and the reduced IR drop. Nevertheless, a reliable capacitive readout presents certain challenges, particularly in regard to device variability and the trade-off between read yield and read disturbances, which can ultimately result in bit-flips. This paper presents a digital read macro for HfO$_2$ FeCaps and provides design guidelines for capacitive readout of HfO$_2$ FeCaps, taking device-centric reliability and yield challenges into account. An experimentally calibrated physics-based compact model of HfO$_2$ FeCaps is employed to investigate the reliability of the read-out operation of the FeCap macro through Monte Carlo simulations. Based on this analysis, we identify limitations posed by the device variability and propose potential mitigation strategies through design-technology co-optimization (DTCO) of the FeCap device characteristics and the CMOS circuit design. Finally, we examine the potential applications of the FeCap macro in the context of secure hardware. We identify potential security threats and propose strategies to enhance the robustness of the system.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
Computation- and Communication-Efficient Online FL for Resource-Constrained Aerial Vehicles
Authors:
Md-Ferdous Pervej,
Richeng Jin,
Md Moin Uddin Chowdhury,
Simran Singh,
İsmail Güvenç,
Huaiyu Dai
Abstract:
Privacy-preserving distributed machine learning (ML) and aerial connected vehicle (ACV)-assisted edge computing have drawn significant attention lately. Since the onboard sensors of ACVs can capture new data as they move along their trajectories, the continual arrival of such 'newly' sensed data leads to online learning and demands carefully crafting the trajectories. Besides, as typical ACVs are…
▽ More
Privacy-preserving distributed machine learning (ML) and aerial connected vehicle (ACV)-assisted edge computing have drawn significant attention lately. Since the onboard sensors of ACVs can capture new data as they move along their trajectories, the continual arrival of such 'newly' sensed data leads to online learning and demands carefully crafting the trajectories. Besides, as typical ACVs are inherently resource-constrained, computation- and communication-efficient ML solutions are needed. Therefore, we propose a computation- and communication-efficient online aerial federated learning (2CEOAFL) algorithm to take the benefits of continual sensed data and limited onboard resources of the ACVs. In particular, considering independently owned ACVs act as selfish data collectors, we first model their trajectories according to their respective time-varying data distributions. We then propose a 2CEOAFL algorithm that allows the flying ACVs to (a) prune the received dense ML model to make it shallow, (b) train the pruned model, and (c) probabilistically quantize and offload their trained accumulated gradients to the central server (CS). Our extensive simulation results show that the proposed 2CEOAFL algorithm delivers comparable performances to its non-pruned and nonquantized, hence, computation- and communication-inefficient counterparts.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
A Large Language Model-Supported Threat Modeling Framework for Transportation Cyber-Physical Systems
Authors:
M Sabbir Salek,
Mashrur Chowdhury,
Muhaimin Bin Munir,
Yuchen Cai,
Mohammad Imtiaz Hasan,
Jean-Michel Tine,
Latifur Khan,
Mizanur Rahman
Abstract:
Modern transportation systems rely on cyber-physical systems (CPS), where cyber systems interact seamlessly with physical systems like transportation-related sensors and actuators to enhance safety, mobility, and energy efficiency. However, growing automation and connectivity increase exposure to cyber vulnerabilities. Existing threat modeling frameworks for transportation CPS are often limited in…
▽ More
Modern transportation systems rely on cyber-physical systems (CPS), where cyber systems interact seamlessly with physical systems like transportation-related sensors and actuators to enhance safety, mobility, and energy efficiency. However, growing automation and connectivity increase exposure to cyber vulnerabilities. Existing threat modeling frameworks for transportation CPS are often limited in scope, resource-intensive, and dependent on significant cybersecurity expertise. To address these gaps, we present TraCR-TMF (Transportation Cybersecurity and Resiliency Threat Modeling Framework), a large language model (LLM)-based framework that minimizes expert intervention. TraCR-TMF identifies threats, potential attack techniques, and corresponding countermeasures by leveraging the MITRE ATT&CK matrix through three LLM-based approaches: (i) a retrieval-augmented generation (RAG) method requiring no expert input, (ii) an in-context learning approach requiring low expert input, and (iii) a supervised fine-tuning method requiring moderate expert input. TraCR-TMF also maps attack paths to critical assets by analyzing vulnerabilities using a customized LLM. The framework was evaluated in two scenarios. First, it identified relevant attack techniques across transportation CPS applications, with 90% precision as validated by experts. Second, using a fine-tuned LLM, it successfully predicted multiple exploitations including lateral movement, data exfiltration, and ransomware-related encryption that occurred during a major real-world cyberattack incident. These results demonstrate TraCR-TMF's effectiveness in CPS threat modeling, its reduced reliance on cybersecurity expertise, and its adaptability across CPS domains.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
EXP-Bench: Can AI Conduct AI Research Experiments?
Authors:
Patrick Tser Jern Kon,
Jiachen Liu,
Xinyi Zhu,
Qiuyi Ding,
Jingjia Peng,
Jiarong Xing,
Yibo Huang,
Yiming Qiu,
Jayanth Srinivasa,
Myungjin Lee,
Mosharaf Chowdhury,
Matei Zaharia,
Ang Chen
Abstract:
Automating AI research holds immense potential for accelerating scientific progress, yet current AI agents struggle with the complexities of rigorous, end-to-end experimentation. We introduce EXP-Bench, a novel benchmark designed to systematically evaluate AI agents on complete research experiments sourced from influential AI publications. Given a research question and incomplete starter code, EXP…
▽ More
Automating AI research holds immense potential for accelerating scientific progress, yet current AI agents struggle with the complexities of rigorous, end-to-end experimentation. We introduce EXP-Bench, a novel benchmark designed to systematically evaluate AI agents on complete research experiments sourced from influential AI publications. Given a research question and incomplete starter code, EXP-Bench challenges AI agents to formulate hypotheses, design and implement experimental procedures, execute them, and analyze results. To enable the creation of such intricate and authentic tasks with high-fidelity, we design a semi-autonomous pipeline to extract and structure crucial experimental details from these research papers and their associated open-source code. With the pipeline, EXP-Bench curated 461 AI research tasks from 51 top-tier AI research papers. Evaluations of leading LLM-based agents, such as OpenHands and IterativeAgent on EXP-Bench demonstrate partial capabilities: while scores on individual experimental aspects such as design or implementation correctness occasionally reach 20-35%, the success rate for complete, executable experiments was a mere 0.5%. By identifying these bottlenecks and providing realistic step-by-step experiment procedures, EXP-Bench serves as a vital tool for future AI agents to improve their ability to conduct AI research experiments. EXP-Bench is open-sourced at https://github.com/Just-Curieous/Curie/tree/main/benchmark/exp_bench.
△ Less
Submitted 1 June, 2025; v1 submitted 30 May, 2025;
originally announced May 2025.
-
Grid-LOGAT: Grid Based Local and Global Area Transcription for Video Question Answering
Authors:
Md Intisar Chowdhury,
Kittinun Aukkapinyo,
Hiroshi Fujimura,
Joo Ann Woo,
Wasu Wasusatein,
Fadoua Ghourabi
Abstract:
In this paper, we propose a Grid-based Local and Global Area Transcription (Grid-LoGAT) system for Video Question Answering (VideoQA). The system operates in two phases. First, extracting text transcripts from video frames using a Vision-Language Model (VLM). Next, processing questions using these transcripts to generate answers through a Large Language Model (LLM). This design ensures image priva…
▽ More
In this paper, we propose a Grid-based Local and Global Area Transcription (Grid-LoGAT) system for Video Question Answering (VideoQA). The system operates in two phases. First, extracting text transcripts from video frames using a Vision-Language Model (VLM). Next, processing questions using these transcripts to generate answers through a Large Language Model (LLM). This design ensures image privacy by deploying the VLM on edge devices and the LLM in the cloud. To improve transcript quality, we propose grid-based visual prompting, which extracts intricate local details from each grid cell and integrates them with global information. Evaluation results show that Grid-LoGAT, using the open-source VLM (LLaVA-1.6-7B) and LLM (Llama-3.1-8B), outperforms state-of-the-art methods with similar baseline models on NExT-QA and STAR-QA datasets with an accuracy of 65.9% and 50.11% respectively. Additionally, our method surpasses the non-grid version by 24 points on localization-based questions we created using NExT-QA. (This paper is accepted by IEEE ICIP 2025.)
△ Less
Submitted 4 June, 2025; v1 submitted 30 May, 2025;
originally announced May 2025.
-
MONSTR: Model-Oriented Neutron Strain Tomographic Reconstruction
Authors:
Mohammad Samin Nur Chowdhury,
Shimin Tang,
Singanallur V. Venkatakrishnan,
Hassina Z. Bilheux,
Gregery T. Buzzard,
Charles A. Bouman
Abstract:
Residual strain, a tensor quantity, is a critical material property that impacts the overall performance of metal parts. Neutron Bragg edge strain tomography is a technique for imaging residual strain that works by making conventional hyperspectral computed tomography measurements, extracting the average projected strain at each detector pixel, and processing the resulting strain sinogram using a…
▽ More
Residual strain, a tensor quantity, is a critical material property that impacts the overall performance of metal parts. Neutron Bragg edge strain tomography is a technique for imaging residual strain that works by making conventional hyperspectral computed tomography measurements, extracting the average projected strain at each detector pixel, and processing the resulting strain sinogram using a reconstruction algorithm. However, the reconstruction is severely ill-posed as the underlying inverse problem involves inferring a tensor at each voxel from scalar sinogram data.
In this paper, we introduce the model-oriented neutron strain tomographic reconstruction (MONSTR) algorithm that reconstructs the 2D residual strain tensor from the neutron Bragg edge strain measurements. MONSTR is based on using the multi-agent consensus equilibrium framework for the tensor tomographic reconstruction. Specifically, we formulate the reconstruction as a consensus solution of a collection of agents representing detector physics, the tomographic reconstruction process, and physics-based constraints from continuum mechanics. Using simulated data, we demonstrate high-quality reconstruction of the strain tensor even when using very few measurements.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
Design and Analysis of a Grid-connected DC Fast Charging Station for Dhaka-Chittagong Highway
Authors:
Alif Ahmed,
Minhajur Rahman,
Mohammad Jawad Chowdhury,
Khandakar Abdulla Al Mamun
Abstract:
The growing adoption of electric vehicles (EVs) necessitates the development of efficient and reliable charging infrastructure, particularly fast charging stations (FCS) for addressing challenges such as range anxiety and long charging times. This paper presents the design and feasibility analysis of a grid-connected DC fast charging station for the Dhaka-Chittagong highway, a critical transportat…
▽ More
The growing adoption of electric vehicles (EVs) necessitates the development of efficient and reliable charging infrastructure, particularly fast charging stations (FCS) for addressing challenges such as range anxiety and long charging times. This paper presents the design and feasibility analysis of a grid-connected DC fast charging station for the Dhaka-Chittagong highway, a critical transportation corridor in Bangladesh. The proposed system incorporates advanced components, including a step-down transformer, Vienna Rectifier, and LC filter, to convert high-voltage AC power from the grid into a stable DC output. Simulated using MATLAB Simulink, the model delivers a peak output of 400V DC and 120 kW power, enabling rapid and efficient EV charging. The study also evaluates the system's performance, analyzing charging times, energy consumption, and distance ranges for representative EVs. By addressing key technical, environmental, and economic considerations, this paper provides a comprehensive roadmap for deploying fast charging infrastructure, fostering EV adoption, and advancing sustainable transportation in Bangladesh.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Retrieval Augmented Generation-based Large Language Models for Bridging Transportation Cybersecurity Legal Knowledge Gaps
Authors:
Khandakar Ashrafi Akbar,
Md Nahiyan Uddin,
Latifur Khan,
Trayce Hockstad,
Mizanur Rahman,
Mashrur Chowdhury,
Bhavani Thuraisingham
Abstract:
As connected and automated transportation systems evolve, there is a growing need for federal and state authorities to revise existing laws and develop new statutes to address emerging cybersecurity and data privacy challenges. This study introduces a Retrieval-Augmented Generation (RAG) based Large Language Model (LLM) framework designed to support policymakers by extracting relevant legal conten…
▽ More
As connected and automated transportation systems evolve, there is a growing need for federal and state authorities to revise existing laws and develop new statutes to address emerging cybersecurity and data privacy challenges. This study introduces a Retrieval-Augmented Generation (RAG) based Large Language Model (LLM) framework designed to support policymakers by extracting relevant legal content and generating accurate, inquiry-specific responses. The framework focuses on reducing hallucinations in LLMs by using a curated set of domain-specific questions to guide response generation. By incorporating retrieval mechanisms, the system enhances the factual grounding and specificity of its outputs. Our analysis shows that the proposed RAG-based LLM outperforms leading commercial LLMs across four evaluation metrics: AlignScore, ParaScore, BERTScore, and ROUGE, demonstrating its effectiveness in producing reliable and context-aware legal insights. This approach offers a scalable, AI-driven method for legislative analysis, supporting efforts to update legal frameworks in line with advancements in transportation technologies.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Advancing Tabular Stroke Modelling Through a Novel Hybrid Architecture and Feature-Selection Synergy
Authors:
Yousuf Islam,
Md. Jalal Uddin Chowdhury,
Sumon Chandra Das
Abstract:
Brain stroke remains one of the principal causes of death and disability worldwide, yet most tabular-data prediction models still hover below the 95% accuracy threshold, limiting real-world utility. Addressing this gap, the present work develops and validates a completely data-driven and interpretable machine-learning framework designed to predict strokes using ten routinely gathered demographic,…
▽ More
Brain stroke remains one of the principal causes of death and disability worldwide, yet most tabular-data prediction models still hover below the 95% accuracy threshold, limiting real-world utility. Addressing this gap, the present work develops and validates a completely data-driven and interpretable machine-learning framework designed to predict strokes using ten routinely gathered demographic, lifestyle, and clinical variables sourced from a public cohort of 4,981 records. We employ a detailed exploratory data analysis (EDA) to understand the dataset's structure and distribution, followed by rigorous data preprocessing, including handling missing values, outlier removal, and class imbalance correction using Synthetic Minority Over-sampling Technique (SMOTE). To streamline feature selection, point-biserial correlation and random-forest Gini importance were utilized, and ten varied algorithms-encompassing tree ensembles, boosting, kernel methods, and a multilayer neural network-were optimized using stratified five-fold cross-validation. Their predictions based on probabilities helped us build the proposed model, which included Random Forest, XGBoost, LightGBM, and a support-vector classifier, with logistic regression acting as a meta-learner. The proposed model achieved an accuracy rate of 97.2% and an F1-score of 97.15%, indicating a significant enhancement compared to the leading individual model, LightGBM, which had an accuracy of 91.4%. Our study's findings indicate that rigorous preprocessing, coupled with a diverse hybrid model, can convert low-cost tabular data into a nearly clinical-grade stroke-risk assessment tool.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
Exploring Dynamic Load Balancing Algorithms for Block-Structured Mesh-and-Particle Simulations in AMReX
Authors:
Amitash Nanda,
Md Kamal Hossain Chowdhury,
Hannah Ross,
Kevin Gott
Abstract:
Load balancing is critical for successful large-scale high-performance computing (HPC) simulations. With modern supercomputers increasing in complexity and variability, dynamic load balancing is becoming more critical to use computational resources efficiently. In this study, performed during a summer collaboration at Lawrence Berkeley National Laboratory, we investigate various standard dynamic l…
▽ More
Load balancing is critical for successful large-scale high-performance computing (HPC) simulations. With modern supercomputers increasing in complexity and variability, dynamic load balancing is becoming more critical to use computational resources efficiently. In this study, performed during a summer collaboration at Lawrence Berkeley National Laboratory, we investigate various standard dynamic load-balancing algorithms. This includes the time evaluation of a brute-force solve for application in algorithmic evaluation, as well as quality and time evaluations of the Knapsack algorithm, an SFC algorithm, and two novel algorithms: a painter's partition-based SFC algorithm and a combination Knapsack+SFC methodology-based on hardware topology. The results suggest Knapsack and painter's partition-based algorithms should be among the first algorithms evaluated by HPC codes for cases with limited weight deviation and will perform at least slightly better than AMReX's percentage-tracking partitioning strategy across most simulations, although effects diminish as weight variety increases.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
OptiGait-LGBM: An Efficient Approach of Gait-based Person Re-identification in Non-Overlapping Regions
Authors:
Md. Sakib Hassan Chowdhury,
Md. Hafiz Ahamed,
Bishowjit Paul,
Sarafat Hussain Abhi,
Abu Bakar Siddique,
Md. Robius Sany
Abstract:
Gait recognition, known for its ability to identify individuals from a distance, has gained significant attention in recent times due to its non-intrusive verification. While video-based gait identification systems perform well on large public datasets, their performance drops when applied to real-world, unconstrained gait data due to various factors. Among these, uncontrolled outdoor environments…
▽ More
Gait recognition, known for its ability to identify individuals from a distance, has gained significant attention in recent times due to its non-intrusive verification. While video-based gait identification systems perform well on large public datasets, their performance drops when applied to real-world, unconstrained gait data due to various factors. Among these, uncontrolled outdoor environments, non-overlapping camera views, varying illumination, and computational efficiency are core challenges in gait-based authentication. Currently, no dataset addresses all these challenges simultaneously. In this paper, we propose an OptiGait-LGBM model capable of recognizing person re-identification under these constraints using a skeletal model approach, which helps mitigate inconsistencies in a person's appearance. The model constructs a dataset from landmark positions, minimizing memory usage by using non-sequential data. A benchmark dataset, RUET-GAIT, is introduced to represent uncontrolled gait sequences in complex outdoor environments. The process involves extracting skeletal joint landmarks, generating numerical datasets, and developing an OptiGait-LGBM gait classification model. Our aim is to address the aforementioned challenges with minimal computational cost compared to existing methods. A comparative analysis with ensemble techniques such as Random Forest and CatBoost demonstrates that the proposed approach outperforms them in terms of accuracy, memory usage, and training time. This method provides a novel, low-cost, and memory-efficient video-based gait recognition solution for real-world scenarios.
△ Less
Submitted 10 May, 2025;
originally announced May 2025.
-
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
Authors:
Jae-Won Chung,
Jiachen Liu,
Jeff J. Ma,
Ruofan Wu,
Oh Jun Kweon,
Yuxuan Xia,
Zhiyu Wu,
Mosharaf Chowdhury
Abstract:
As the adoption of Generative AI in real-world services grow explosively, energy has emerged as a critical bottleneck resource. However, energy remains a metric that is often overlooked, under-explored, or poorly understood in the context of building ML systems. We present the ML.ENERGY Benchmark, a benchmark suite and tool for measuring inference energy consumption under realistic service environ…
▽ More
As the adoption of Generative AI in real-world services grow explosively, energy has emerged as a critical bottleneck resource. However, energy remains a metric that is often overlooked, under-explored, or poorly understood in the context of building ML systems. We present the ML.ENERGY Benchmark, a benchmark suite and tool for measuring inference energy consumption under realistic service environments, and the corresponding ML.ENERGY Leaderboard, which have served as a valuable resource for those hoping to understand and optimize the energy consumption of their generative AI services. In this paper, we explain four key design principles for benchmarking ML energy we have acquired over time, and then describe how they are implemented in the ML.ENERGY Benchmark. We then highlight results from the latest iteration of the benchmark, including energy measurements of 40 widely used model architectures across 6 different tasks, case studies of how ML design choices impact energy consumption, and how automated optimization recommendations can lead to significant (sometimes more than 40%) energy savings without changing what is being computed by the model. The ML.ENERGY Benchmark is open-source and can be easily extended to various customized models and application scenarios.
△ Less
Submitted 9 May, 2025;
originally announced May 2025.
-
Quantum Energy Teleportation across Multi-Qubit Systems using W-State Entanglement
Authors:
Alif Elham Khan,
Humayra Anjum,
Mahdy Rahman Chowdhury
Abstract:
Quantum-energy teleportation (QET) has so far only been realised on a two-qubit platform. Real-world communication, however, typically involves multiple parties. Here we design and experimentally demonstrate the first multi-qubit QET protocol using a robust W-state multipartite entanglement. Three-, four- and five-qubit circuits were executed both on noiseless simulators and on IBM superconducting…
▽ More
Quantum-energy teleportation (QET) has so far only been realised on a two-qubit platform. Real-world communication, however, typically involves multiple parties. Here we design and experimentally demonstrate the first multi-qubit QET protocol using a robust W-state multipartite entanglement. Three-, four- and five-qubit circuits were executed both on noiseless simulators and on IBM superconducting hardware. In every case a single sender injects an energy E0 that is then deterministically and decrementally harvested by several remote receivers, confirming that energy introduced at one node can be redistributed among many entangled subsystems at light-speed-limited classical latency. Our results open a practical route toward energy-aware quantum networks.
△ Less
Submitted 3 May, 2025;
originally announced May 2025.
-
Design and Application of Multimodal Large Language Model Based System for End to End Automation of Accident Dataset Generation
Authors:
MD Thamed Bin Zaman Chowdhury,
Moazzem Hossain
Abstract:
Road traffic accidents remain a major public safety and socio-economic issue in developing countries like Bangladesh. Existing accident data collection is largely manual, fragmented, and unreliable, resulting in underreporting and inconsistent records. This research proposes a fully automated system using Large Language Models (LLMs) and web scraping techniques to address these challenges. The pip…
▽ More
Road traffic accidents remain a major public safety and socio-economic issue in developing countries like Bangladesh. Existing accident data collection is largely manual, fragmented, and unreliable, resulting in underreporting and inconsistent records. This research proposes a fully automated system using Large Language Models (LLMs) and web scraping techniques to address these challenges. The pipeline consists of four components: automated web scraping code generation, news collection from online sources, accident news classification with structured data extraction, and duplicate removal. The system uses the multimodal generative LLM Gemini-2.0-Flash for seamless automation. The code generation module classifies webpages into pagination, dynamic, or infinite scrolling categories and generates suitable Python scripts for scraping. LLMs also classify and extract key accident information such as date, time, location, fatalities, injuries, road type, vehicle types, and pedestrian involvement. A deduplication algorithm ensures data integrity by removing duplicate reports. The system scraped 14 major Bangladeshi news sites over 111 days (Oct 1, 2024 - Jan 20, 2025), processing over 15,000 news articles and identifying 705 unique accidents. The code generation module achieved 91.3% calibration and 80% validation accuracy. Chittagong reported the highest number of accidents (80), fatalities (70), and injuries (115), followed by Dhaka, Faridpur, Gazipur, and Cox's Bazar. Peak accident times were morning (8-9 AM), noon (12-1 PM), and evening (6-7 PM). A public repository was also developed with usage instructions. This study demonstrates the viability of an LLM-powered, scalable system for accurate, low-effort accident data collection, providing a foundation for data-driven road safety policymaking in Bangladesh.
△ Less
Submitted 23 April, 2025;
originally announced May 2025.
-
Durghotona GPT: A Web Scraping and Large Language Model Based Framework to Generate Road Accident Dataset Automatically in Bangladesh
Authors:
MD Thamed Bin Zaman Chowdhury,
Moazzem Hossain,
Md. Ridwanul Islam
Abstract:
Road accidents pose significant concerns globally. They lead to large financial losses, injuries, disabilities, and societal challenges. Accurate and timely accident data is essential for predicting and mitigating these events. This paper presents a novel framework named 'Durghotona GPT' that integrates web scraping and Large Language Models (LLMs) to automate the generation of comprehensive accid…
▽ More
Road accidents pose significant concerns globally. They lead to large financial losses, injuries, disabilities, and societal challenges. Accurate and timely accident data is essential for predicting and mitigating these events. This paper presents a novel framework named 'Durghotona GPT' that integrates web scraping and Large Language Models (LLMs) to automate the generation of comprehensive accident datasets from prominent national dailies in Bangladesh. The authors collected accident reports from three major newspapers: Prothom Alo, Dhaka Tribune, and The Daily Star. The collected news was then processed using the newest available LLMs: GPT-4, GPT-3.5, and Llama-3. The framework efficiently extracts relevant information, categorizes reports, and compiles detailed datasets. Thus, this framework overcomes limitations of manual data collection methods such as delays, errors, and communication gaps. The authors' evaluation demonstrates that Llama-3, an open-source model, performs comparably to GPT-4. It achieved 89% accuracy in the authors' evaluation. Therefore, it can be considered a cost-effective alternative for similar tasks. The results suggest that the framework developed by the authors can drastically enhance the quality and availability of accident data. As a result, it can support critical applications in traffic safety analysis, urban planning, and public health. The authors also developed an interface for 'Durghotona GPT' for ease of use as part of this paper. Future work will focus on expanding data collection methods and refining LLMs to further increase dataset accuracy and applicability.
△ Less
Submitted 23 April, 2025;
originally announced April 2025.
-
Evaluation Framework for AI Systems in "the Wild"
Authors:
Sarah Jabbour,
Trenton Chang,
Anindya Das Antar,
Joseph Peper,
Insu Jang,
Jiachen Liu,
Jae-Won Chung,
Shiqi He,
Michael Wellman,
Bryan Goodman,
Elizabeth Bondi-Kelly,
Kevin Samy,
Rada Mihalcea,
Mosharaf Chowdhury,
David Jurgens,
Lu Wang
Abstract:
Generative AI (GenAI) models have become vital across industries, yet current evaluation methods have not adapted to their widespread use. Traditional evaluations often rely on benchmarks and fixed datasets, frequently failing to reflect real-world performance, which creates a gap between lab-tested outcomes and practical applications. This white paper proposes a comprehensive framework for how we…
▽ More
Generative AI (GenAI) models have become vital across industries, yet current evaluation methods have not adapted to their widespread use. Traditional evaluations often rely on benchmarks and fixed datasets, frequently failing to reflect real-world performance, which creates a gap between lab-tested outcomes and practical applications. This white paper proposes a comprehensive framework for how we should evaluate real-world GenAI systems, emphasizing diverse, evolving inputs and holistic, dynamic, and ongoing assessment approaches. The paper offers guidance for practitioners on how to design evaluation methods that accurately reflect real-time capabilities, and provides policymakers with recommendations for crafting GenAI policies focused on societal impacts, rather than fixed performance numbers or parameter sizes. We advocate for holistic frameworks that integrate performance, fairness, and ethics and the use of continuous, outcome-oriented methods that combine human and automated assessments while also being transparent to foster trust among stakeholders. Implementing these strategies ensures GenAI models are not only technically proficient but also ethically responsible and impactful.
△ Less
Submitted 28 April, 2025; v1 submitted 23 April, 2025;
originally announced April 2025.
-
Med-2D SegNet: A Light Weight Deep Neural Network for Medical 2D Image Segmentation
Authors:
Md. Sanaullah Chowdhury,
Salauddin Tapu,
Noyon Kumar Sarkar,
Ferdous Bin Ali,
Lameya Sabrin
Abstract:
Accurate and efficient medical image segmentation is crucial for advancing clinical diagnostics and surgical planning, yet remains a complex challenge due to the variability in anatomical structures and the demand for low-complexity models. In this paper, we introduced Med-2D SegNet, a novel and highly efficient segmentation architecture that delivers outstanding accuracy while maintaining a minim…
▽ More
Accurate and efficient medical image segmentation is crucial for advancing clinical diagnostics and surgical planning, yet remains a complex challenge due to the variability in anatomical structures and the demand for low-complexity models. In this paper, we introduced Med-2D SegNet, a novel and highly efficient segmentation architecture that delivers outstanding accuracy while maintaining a minimal computational footprint. Med-2D SegNet achieves state-of-the-art performance across multiple benchmark datasets, including KVASIR-SEG, PH2, EndoVis, and GLAS, with an average Dice similarity coefficient (DSC) of 89.77% across 20 diverse datasets. Central to its success is the compact Med Block, a specialized encoder design that incorporates dimension expansion and parameter reduction, enabling precise feature extraction while keeping model parameters to a low count of just 2.07 million. Med-2D SegNet excels in cross-dataset generalization, particularly in polyp segmentation, where it was trained on KVASIR-SEG and showed strong performance on unseen datasets, demonstrating its robustness in zero-shot learning scenarios, even though we acknowledge that further improvements are possible. With top-tier performance in both binary and multi-class segmentation, Med-2D SegNet redefines the balance between accuracy and efficiency, setting a new benchmark for medical image analysis. This work paves the way for developing accessible, high-performance diagnostic tools suitable for clinical environments and resource-constrained settings, making it a step forward in the democratization of advanced medical technology.
△ Less
Submitted 20 April, 2025;
originally announced April 2025.
-
LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers
Authors:
Md Abtahi Majeed Chowdhury,
Md Rifat Ur Rahman,
Akil Ahmad Taki
Abstract:
Positional embeddings (PE) play a crucial role in Vision Transformers (ViTs) by providing spatial information otherwise lost due to the permutation invariant nature of self attention. While absolute positional embeddings (APE) have shown theoretical advantages over relative positional embeddings (RPE), particularly due to the ability of sinusoidal functions to preserve spatial inductive biases lik…
▽ More
Positional embeddings (PE) play a crucial role in Vision Transformers (ViTs) by providing spatial information otherwise lost due to the permutation invariant nature of self attention. While absolute positional embeddings (APE) have shown theoretical advantages over relative positional embeddings (RPE), particularly due to the ability of sinusoidal functions to preserve spatial inductive biases like monotonicity and shift invariance, a fundamental challenge arises when mapping a 2D grid to a 1D sequence. Existing methods have mostly overlooked or never explored the impact of patch ordering in positional embeddings. To address this, we propose LOOPE, a learnable patch-ordering method that optimizes spatial representation for a given set of frequencies, providing a principled approach to patch order optimization. Empirical results show that our PE significantly improves classification accuracy across various ViT architectures. To rigorously evaluate the effectiveness of positional embeddings, we introduce the "Three Cell Experiment", a novel benchmarking framework that assesses the ability of PEs to retain relative and absolute positional information across different ViT architectures. Unlike standard evaluations, which typically report a performance gap of 4 to 6% between models with and without PE, our method reveals a striking 30 to 35% difference, offering a more sensitive diagnostic tool to measure the efficacy of PEs. Our experimental analysis confirms that the proposed LOOPE demonstrates enhanced effectiveness in retaining both relative and absolute positional information.
△ Less
Submitted 19 April, 2025;
originally announced April 2025.
-
Advanced Deep Learning and Large Language Models: Comprehensive Insights for Cancer Detection
Authors:
Yassine Habchi,
Hamza Kheddar,
Yassine Himeur,
Adel Belouchrani,
Erchin Serpedin,
Fouad Khelifi,
Muhammad E. H. Chowdhury
Abstract:
The rapid advancement of deep learning (DL) has transformed healthcare, particularly in cancer detection and diagnosis. DL surpasses traditional machine learning and human accuracy, making it a critical tool for identifying diseases. Despite numerous reviews on DL in healthcare, a comprehensive analysis of its role in cancer detection remains limited. Existing studies focus on specific aspects, le…
▽ More
The rapid advancement of deep learning (DL) has transformed healthcare, particularly in cancer detection and diagnosis. DL surpasses traditional machine learning and human accuracy, making it a critical tool for identifying diseases. Despite numerous reviews on DL in healthcare, a comprehensive analysis of its role in cancer detection remains limited. Existing studies focus on specific aspects, leaving gaps in understanding its broader impact. This paper addresses these gaps by reviewing advanced DL techniques, including transfer learning (TL), reinforcement learning (RL), federated learning (FL), Transformers, and large language models (LLMs). These approaches enhance accuracy, tackle data scarcity, and enable decentralized learning while maintaining data privacy. TL adapts pre-trained models to new datasets, improving performance with limited labeled data. RL optimizes diagnostic pathways and treatment strategies, while FL fosters collaborative model development without sharing sensitive data. Transformers and LLMs, traditionally used in natural language processing, are now applied to medical data for improved interpretability. Additionally, this review examines these techniques' efficiency in cancer diagnosis, addresses challenges like data imbalance, and proposes solutions. It serves as a resource for researchers and practitioners, providing insights into current trends and guiding future research in advanced DL for cancer detection.
△ Less
Submitted 30 March, 2025;
originally announced April 2025.
-
Quantum Computing Supported Adversarial Attack-Resilient Autonomous Vehicle Perception Module for Traffic Sign Classification
Authors:
Reek Majumder,
Mashrur Chowdhury,
Sakib Mahmud Khan,
Zadid Khan,
Fahim Ahmad,
Frank Ngeni,
Gurcan Comert,
Judith Mwakalonge,
Dimitra Michalaka
Abstract:
Deep learning (DL)-based image classification models are essential for autonomous vehicle (AV) perception modules since incorrect categorization might have severe repercussions. Adversarial attacks are widely studied cyberattacks that can lead DL models to predict inaccurate output, such as incorrectly classified traffic signs by the perception module of an autonomous vehicle. In this study, we cr…
▽ More
Deep learning (DL)-based image classification models are essential for autonomous vehicle (AV) perception modules since incorrect categorization might have severe repercussions. Adversarial attacks are widely studied cyberattacks that can lead DL models to predict inaccurate output, such as incorrectly classified traffic signs by the perception module of an autonomous vehicle. In this study, we create and compare hybrid classical-quantum deep learning (HCQ-DL) models with classical deep learning (C-DL) models to demonstrate robustness against adversarial attacks for perception modules. Before feeding them into the quantum system, we used transfer learning models, alexnet and vgg-16, as feature extractors. We tested over 1000 quantum circuits in our HCQ-DL models for projected gradient descent (PGD), fast gradient sign attack (FGSA), and gradient attack (GA), which are three well-known untargeted adversarial approaches. We evaluated the performance of all models during adversarial attacks and no-attack scenarios. Our HCQ-DL models maintain accuracy above 95\% during a no-attack scenario and above 91\% for GA and FGSA attacks, which is higher than C-DL models. During the PGD attack, our alexnet-based HCQ-DL model maintained an accuracy of 85\% compared to C-DL models that achieved accuracies below 21\%. Our results highlight that the HCQ-DL models provide improved accuracy for traffic sign classification under adversarial settings compared to their classical counterparts.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
Hardware Design and Security Needs Attention: From Survey to Path Forward
Authors:
Sujan Ghimire,
Muhtasim Alam Chowdhury,
Banafsheh Saber Latibari,
Muntasir Mamun,
Jaeden Wolf Carpenter,
Benjamin Tan,
Hammond Pearce,
Pratik Satam,
Soheil Salehi
Abstract:
Recent advances in attention-based artificial intelligence (AI) models have unlocked vast potential to automate digital hardware design while enhancing and strengthening security measures against various threats. This rapidly emerging field leverages Large Language Models (LLMs) to generate HDL code, identify vulnerabilities, and sometimes mitigate them. The state of the art in this design automat…
▽ More
Recent advances in attention-based artificial intelligence (AI) models have unlocked vast potential to automate digital hardware design while enhancing and strengthening security measures against various threats. This rapidly emerging field leverages Large Language Models (LLMs) to generate HDL code, identify vulnerabilities, and sometimes mitigate them. The state of the art in this design automation space utilizes optimized LLMs with HDL datasets, creating automated systems for register-transfer level (RTL) generation, verification, and debugging, and establishing LLM-driven design environments for streamlined logic designs. Additionally, attention-based models like graph attention have shown promise in chip design applications, including floorplanning. This survey investigates the integration of these models into hardware-related domains, emphasizing logic design and hardware security, with or without the use of IP libraries. This study explores the commercial and academic landscape, highlighting technical hurdles and future prospects for automating hardware design and security. Moreover, it provides new insights into the study of LLM-driven design systems, advances in hardware security mechanisms, and the impact of influential works on industry practices. Through the examination of 30 representative approaches and illustrative case studies, this paper underscores the transformative potential of attention-based models in revolutionizing hardware design while addressing the challenges that lie ahead in this interdisciplinary domain.
△ Less
Submitted 10 April, 2025;
originally announced April 2025.
-
CardioTabNet: A Novel Hybrid Transformer Model for Heart Disease Prediction using Tabular Medical Data
Authors:
Md. Shaheenur Islam Sumon,
Md. Sakib Bin Islam,
Md. Sohanur Rahman,
Md. Sakib Abrar Hossain,
Amith Khandakar,
Anwarul Hasan,
M Murugappan,
Muhammad E. H. Chowdhury
Abstract:
The early detection and prediction of cardiovascular diseases are crucial for reducing the severe morbidity and mortality associated with these conditions worldwide. A multi-headed self-attention mechanism, widely used in natural language processing (NLP), is operated by Transformers to understand feature interactions in feature spaces. However, the relationships between various features within bi…
▽ More
The early detection and prediction of cardiovascular diseases are crucial for reducing the severe morbidity and mortality associated with these conditions worldwide. A multi-headed self-attention mechanism, widely used in natural language processing (NLP), is operated by Transformers to understand feature interactions in feature spaces. However, the relationships between various features within biological systems remain ambiguous in these spaces, highlighting the necessity of early detection and prediction of cardiovascular diseases to reduce the severe morbidity and mortality with these conditions worldwide. We handle this issue with CardioTabNet, which exploits the strength of tab transformer to extract feature space which carries strong understanding of clinical cardiovascular data and its feature ranking. As a result, performance of downstream classical models significantly showed outstanding result. Our study utilizes the open-source dataset for heart disease prediction with 1190 instances and 11 features. In total, 11 features are divided into numerical (age, resting blood pressure, cholesterol, maximum heart rate, old peak, weight, and fasting blood sugar) and categorical (resting ECG, exercise angina, and ST slope). Tab transformer was used to extract important features and ranked them using random forest (RF) feature ranking algorithm. Ten machine-learning models were used to predict heart disease using selected features. After extracting high-quality features, the top downstream model (a hyper-tuned ExtraTree classifier) achieved an average accuracy rate of 94.1% and an average Area Under Curve (AUC) of 95.0%. Furthermore, a nomogram analysis was conducted to evaluate the model's effectiveness in cardiovascular risk assessment. A benchmarking study was conducted using state-of-the-art models to evaluate our transformer-driven framework.
△ Less
Submitted 22 March, 2025;
originally announced March 2025.
-
TEANet: A Transpose-Enhanced Autoencoder Network for Wearable Stress Monitoring
Authors:
Md Santo Ali,
Sapnil Sarker Bipro,
Mohammod Abdul Motin,
Sumaiya Kabir,
Manish Sharma,
M. E. H. Chowdhury
Abstract:
Mental stress poses a significant public health concern due to its detrimental effects on physical and mental well-being, necessitating the development of continuous stress monitoring tools for wearable devices. Blood volume pulse (BVP) sensors, readily available in many smartwatches, offer a convenient and cost-effective solution for stress monitoring. This study proposes a deep learning approach…
▽ More
Mental stress poses a significant public health concern due to its detrimental effects on physical and mental well-being, necessitating the development of continuous stress monitoring tools for wearable devices. Blood volume pulse (BVP) sensors, readily available in many smartwatches, offer a convenient and cost-effective solution for stress monitoring. This study proposes a deep learning approach, a Transpose-Enhanced Autoencoder Network (TEANet), for stress detection using BVP signals. The proposed TEANet model was trained and validated utilizing a self-collected RUET SPML dataset, comprising 19 healthy subjects, and the publicly available wearable stress and affect detection (WESAD) dataset, comprising 15 healthy subjects. It achieves the highest accuracy of 92.51% and 96.94%, F1 scores of 95.03% and 95.95%, and kappa of 0.7915 and 0.9350 for RUET SPML, and WESAD datasets respectively. The proposed TEANet effectively detects mental stress through BVP signals with high accuracy, making it a promising tool for continuous stress monitoring. Furthermore, the proposed model effectively addresses class imbalances and demonstrates high accuracy, underscoring its potential for reliable real-time stress monitoring using wearable devices.
△ Less
Submitted 16 March, 2025;
originally announced March 2025.
-
Equivalent-Circuit Thermal Model for Batteries with One-Shot Parameter Identification
Authors:
Myisha A. Chowdhury,
Qiugang Lu
Abstract:
Accurate state of temperature (SOT) estimation for batteries is crucial for regulating their temperature within a desired range to ensure safe operation and optimal performance. The existing measurement-based methods often generate noisy signals and cannot scale up for large-scale battery packs. The electrochemical model-based methods, on the contrary, offer high accuracy but are computationally e…
▽ More
Accurate state of temperature (SOT) estimation for batteries is crucial for regulating their temperature within a desired range to ensure safe operation and optimal performance. The existing measurement-based methods often generate noisy signals and cannot scale up for large-scale battery packs. The electrochemical model-based methods, on the contrary, offer high accuracy but are computationally expensive. To tackle these issues, inspired by the equivalentcircuit voltage model for batteries, this paper presents a novel equivalent-circuit electro-thermal model (ECTM) for modeling battery surface temperature. By approximating the complex heat generation inside batteries with data-driven nonlinear (polynomial) functions of key measurable parameters such as state-of-charge (SOC), current, and terminal voltage, our ECTM is simplified into a linear form that admits rapid solutions. Such simplified ECTM can be readily identified with one single (one-shot) cycle data. The proposed model is extensively validated with benchmark NASA, MIT, and Oxford battery datasets. Simulation results verify the accuracy of the model, despite being identified with one-shot cycle data, in predicting battery temperatures robustly under different battery degradation status and ambient conditions.
△ Less
Submitted 16 March, 2025;
originally announced March 2025.
-
GAN-Based Single-Stage Defense for Traffic Sign Classification Under Adversarial Patch Attack
Authors:
Abyad Enan,
Mashrur Chowdhury
Abstract:
Computer Vision plays a critical role in ensuring the safe navigation of autonomous vehicles (AVs). An AV perception module is responsible for capturing and interpreting the surrounding environment to facilitate safe navigation. This module enables AVs to recognize traffic signs, traffic lights, and various road users. However, the perception module is vulnerable to adversarial attacks, which can…
▽ More
Computer Vision plays a critical role in ensuring the safe navigation of autonomous vehicles (AVs). An AV perception module is responsible for capturing and interpreting the surrounding environment to facilitate safe navigation. This module enables AVs to recognize traffic signs, traffic lights, and various road users. However, the perception module is vulnerable to adversarial attacks, which can compromise their accuracy and reliability. One such attack is the adversarial patch attack (APA), a physical attack in which an adversary strategically places a specially crafted sticker on an object to deceive object classifiers. In APA, an adversarial patch is positioned on a target object, leading the classifier to misidentify it. Such an APA can cause AVs to misclassify traffic signs, leading to catastrophic incidents. To enhance the security of an AV perception system against APAs, this study develops a Generative Adversarial Network (GAN)-based single-stage defense strategy for traffic sign classification. This approach is tailored to defend against APAs on different classes of traffic signs without prior knowledge of a patch's design. This study found this approach to be effective against patches of varying sizes. Our experimental analysis demonstrates that the defense strategy presented in this paper improves the classifier's accuracy under APA conditions by up to 80.8% and enhances overall classification accuracy for all the traffic signs considered in this study by 58%, compared to a classifier without any defense mechanism. Our defense strategy is model-agnostic, making it applicable to any traffic sign classifier, regardless of the underlying classification model.
△ Less
Submitted 16 March, 2025;
originally announced March 2025.
-
Lithium-ion Battery Capacity Prediction via Conditional Recurrent Generative Adversarial Network-based Time-Series Regeneration
Authors:
Myisha A. Chowdhury,
Gift Modekwe,
Qiugang Lu
Abstract:
Accurate capacity prediction is essential for the safe and reliable operation of batteries by anticipating potential failures beforehand. The performance of state-of-the-art capacity prediction methods is significantly hindered by the limited availability of training data, primarily attributed to the expensive experimentation and data sharing restrictions. To tackle this issue, this paper presents…
▽ More
Accurate capacity prediction is essential for the safe and reliable operation of batteries by anticipating potential failures beforehand. The performance of state-of-the-art capacity prediction methods is significantly hindered by the limited availability of training data, primarily attributed to the expensive experimentation and data sharing restrictions. To tackle this issue, this paper presents a recurrent conditional generative adversarial network (RCGAN) scheme to enrich the limited battery data by adding high-fidelity synthetic ones to improve the capacity prediction. The proposed RCGAN scheme consists of a generator network to generate synthetic samples that closely resemble the true data and a discriminator network to differentiate real and synthetic samples. Long shortterm memory (LSTM)-based generator and discriminator are leveraged to learn the temporal and spatial distributions in the multivariate time-series battery data. Moreover, the generator is conditioned on the capacity value to account for changes in battery dynamics due to the degradation over usage cycles. The effectiveness of the RCGAN is evaluated across six batteries from two benchmark datasets (NASA and MIT). The raw data is then augmented with synthetic samples from the RCGAN to train LSTM and gate recurrent unit (GRU) models for capacity prediction. Simulation results show that the models trained with augmented datasets significantly outperform those trained with the original datasets in capacity prediction.
△ Less
Submitted 15 March, 2025;
originally announced March 2025.
-
Exchange-Coupled Spins for Robust High-Temperature Qubits
Authors:
Aniruddha Chakraborty,
Md. Fahim F. Chowdhury,
Mohamad Niknam,
Louis S. Bouchard,
Jayasimha Atulasimha
Abstract:
We show that Heisenberg exchange interactions between the neighboring spins comprising an ensemble spin qubit (E-qubit) can act as an intrinsic error mitigator, increasing gate fidelity even at high temperatures. As an example, the fidelity of a π gate applied to E-qubits above 1 K was studied by tuning the ferromagnetic exchange strength to show an exchange coupled E-qubit exhibits higher fidelit…
▽ More
We show that Heisenberg exchange interactions between the neighboring spins comprising an ensemble spin qubit (E-qubit) can act as an intrinsic error mitigator, increasing gate fidelity even at high temperatures. As an example, the fidelity of a π gate applied to E-qubits above 1 K was studied by tuning the ferromagnetic exchange strength to show an exchange coupled E-qubit exhibits higher fidelity than a single-spin based qubit. We also investigate the coherence properties of E-qubits and find that the coherence time of an E-qubit extends linearly with the number of spins in the ensemble. This suggests that exchange interactions effectively suppress decoherence induced by thermal noise, achieving a coherence time greater than 1 ms at 1 K with an ensemble of only seven spins. Additionally, the ferromagnetic isotropic exchange prevents fidelity loss induced by spatial field gradients/inhomogeneity in Zeeman and/or control fields. Therefore, exchange-coupled spin qubits could enable fault-tolerant quantum operations and long-coherence times at elevated temperatures (>1 K).
△ Less
Submitted 15 March, 2025;
originally announced March 2025.
-
Experimental evaluation of xApp Conflict Mitigation Framework in O-RAN: Insights from Testbed deployment in OTIC
Authors:
Abida Sultana,
Cezary Adamczyk,
Mayukh Roy Chowdhury,
Adrian Kliks,
Aloizio Da Silva
Abstract:
Conflict Mitigation (CM) in Open Radio Access Network (O-RAN) is a topic that is gaining importance as commercial O-RAN deployments become more complex. Although research on CM is already covered in terms of simulated network scenarios, it lacks validation using real-world deployment and Over The Air (OTA) Radio Frequency (RF) transmission. Our objective is to conduct the first assessment of the C…
▽ More
Conflict Mitigation (CM) in Open Radio Access Network (O-RAN) is a topic that is gaining importance as commercial O-RAN deployments become more complex. Although research on CM is already covered in terms of simulated network scenarios, it lacks validation using real-world deployment and Over The Air (OTA) Radio Frequency (RF) transmission. Our objective is to conduct the first assessment of the Conflict Mitigation Framework (CMF) for O-RAN using a real-world testbed and OTA RF transmission. This paper presents results of an experiment using a dedicated testbed built in an O-RAN Open Test and Integration Center (OTIC) to confirm the validity of one of the Conflict Resolution (CR) schemes proposed by existing research. The results show that the implemented conflict detection and resolution mechanisms allow a significant improvement in network operation stability by reducing the variability of the measured Downlink (DL) throughput by 78%.
△ Less
Submitted 15 May, 2025; v1 submitted 14 March, 2025;
originally announced March 2025.
-
Cornstarch: Distributed Multimodal Training Must Be Multimodality-Aware
Authors:
Insu Jang,
Runyu Lu,
Nikhil Bansal,
Ang Chen,
Mosharaf Chowdhury
Abstract:
Multimodal large language models (MLLMs) extend the capabilities of large language models (LLMs) by combining heterogeneous model architectures to handle diverse modalities like images and audio. However, this inherent heterogeneity in MLLM model structure and data types makes makeshift extensions to existing LLM training frameworks unsuitable for efficient MLLM training.
In this paper, we prese…
▽ More
Multimodal large language models (MLLMs) extend the capabilities of large language models (LLMs) by combining heterogeneous model architectures to handle diverse modalities like images and audio. However, this inherent heterogeneity in MLLM model structure and data types makes makeshift extensions to existing LLM training frameworks unsuitable for efficient MLLM training.
In this paper, we present Cornstarch, the first general-purpose distributed MLLM training framework. Cornstarch facilitates modular MLLM construction, enables composable parallelization of constituent models, and introduces MLLM-specific optimizations to pipeline and context parallelism for efficient distributed MLLM training. Our evaluation shows that Cornstarch outperforms state-of-the-art solutions by up to $1.57\times$ in terms of training throughput.
Cornstarch is an open-source project available at https://github.com/cornstarch-org/Cornstarch.
△ Less
Submitted 17 March, 2025; v1 submitted 14 March, 2025;
originally announced March 2025.
-
Predicting and Understanding College Student Mental Health with Interpretable Machine Learning
Authors:
Meghna Roy Chowdhury,
Wei Xuan,
Shreyas Sen,
Yixue Zhao,
Yi Ding
Abstract:
Mental health issues among college students have reached critical levels, significantly impacting academic performance and overall wellbeing. Predicting and understanding mental health status among college students is challenging due to three main factors: the necessity for large-scale longitudinal datasets, the prevalence of black-box machine learning models lacking transparency, and the tendency…
▽ More
Mental health issues among college students have reached critical levels, significantly impacting academic performance and overall wellbeing. Predicting and understanding mental health status among college students is challenging due to three main factors: the necessity for large-scale longitudinal datasets, the prevalence of black-box machine learning models lacking transparency, and the tendency of existing approaches to provide aggregated insights at the population level rather than individualized understanding.
To tackle these challenges, this paper presents I-HOPE, the first Interpretable Hierarchical mOdel for Personalized mEntal health prediction. I-HOPE is a two-stage hierarchical model that connects raw behavioral features to mental health status through five defined behavioral categories as interaction labels. We evaluate I-HOPE on the College Experience Study, the longest longitudinal mobile sensing dataset. This dataset spans five years and captures data from both pre-pandemic periods and the COVID-19 pandemic. I-HOPE achieves a prediction accuracy of 91%, significantly surpassing the 60-70% accuracy of baseline methods. In addition, I-HOPE distills complex patterns into interpretable and individualized insights, enabling the future development of tailored interventions and improving mental health support. The code is available at https://github.com/roycmeghna/I-HOPE.
△ Less
Submitted 10 June, 2025; v1 submitted 10 March, 2025;
originally announced March 2025.
-
Financial Markets and ESG: How Big Data is Transforming Sustainable Investing in Developing countries
Authors:
A T M Omor Faruq,
Md Ataur Rahman Chowdhury
Abstract:
This study explores the role of big data adoption and financial market development in driving ESG investments in developing countries, using an instrumental variable (IV) approach to address endogeneity. The results show that big data adoption significantly enhances ESG investing, as data-driven analytics improve sustainability assessments and capital allocation. Financial market development also…
▽ More
This study explores the role of big data adoption and financial market development in driving ESG investments in developing countries, using an instrumental variable (IV) approach to address endogeneity. The results show that big data adoption significantly enhances ESG investing, as data-driven analytics improve sustainability assessments and capital allocation. Financial market development also positively influences ESG investments, but its effect is relatively small. A key finding is that inflation negatively impacts ESG investment, highlighting the importance of macroeconomic stability in fostering sustainable finance. In contrast, GDP per capita and foreign direct investment (FDI) are not significant determinants, suggesting that economic growth alone does not drive sustainability efforts. Overall, this study provides empirical evidence that leveraging big data and financial market improvements can accelerate sustainable investing in emerging economies. Policymakers should focus on technological advancements, financial reforms, and inflation control to strengthen ESG investments and long-term sustainability commitments.
△ Less
Submitted 9 March, 2025;
originally announced March 2025.
-
The Role, Trends, and Applications of Machine Learning in Undersea Communication: A Bangladesh Perspective
Authors:
Yousuf Islam,
Sumon Chandra Das,
Md. Jalal Uddin Chowdhury
Abstract:
The rapid evolution of machine learning (ML) has brought about groundbreaking developments in numerous industries, not the least of which is in the area of undersea communication. This domain is critical for applications like ocean exploration, environmental monitoring, resource management, and national security. Bangladesh, a maritime nation with abundant resources in the Bay of Bengal, can harne…
▽ More
The rapid evolution of machine learning (ML) has brought about groundbreaking developments in numerous industries, not the least of which is in the area of undersea communication. This domain is critical for applications like ocean exploration, environmental monitoring, resource management, and national security. Bangladesh, a maritime nation with abundant resources in the Bay of Bengal, can harness the immense potential of ML to tackle the unprecedented challenges associated with underwater communication. Beyond that, environmental conditions are unique to the region: in addition to signal attenuation, multipath propagation, noise interference, and limited bandwidth. In this study, we address the necessity to bring ML into communication via undersea; it investigates the latest technologies under the domain of ML in that respect, such as deep learning and reinforcement learning, especially concentrating on Bangladesh scenarios in the sense of implementation. This paper offers a contextualized regional perspective by incorporating region-specific needs, case studies, and recent research to propose a roadmap for deploying ML-driven solutions to improve safety at sea, promote sustainable resource use, and enhance disaster response systems. This research ultimately highlights the promise of ML-powered solutions for transforming undersea communication, leading to more efficient and cost-effective technologies that subsequently contribute to both economic growth and environmental sustainability.
△ Less
Submitted 1 March, 2025;
originally announced March 2025.
-
Broadband Absorption in Cadmium Telluride Thin-Film Solar Cells via Composite Light Trapping Techniques
Authors:
Asif Al Suny,
Tazrian Noor,
Md. Hasibul Hossain,
A. F. M. Afnan Uzzaman Sheikh,
Mustafa Habib Chowdhury
Abstract:
Composite light-trapping structures offer a promising approach to achieving broadband absorption and high efficiency in thin-film solar cells (TFSCs) in order to accelerate sustainable energy solutions. As the leading material in thin-film solar technology, cadmium telluride (CdTe) faces challenges from surface reflective losses across the solar spectrum and weak absorption in the near-infrared (N…
▽ More
Composite light-trapping structures offer a promising approach to achieving broadband absorption and high efficiency in thin-film solar cells (TFSCs) in order to accelerate sustainable energy solutions. As the leading material in thin-film solar technology, cadmium telluride (CdTe) faces challenges from surface reflective losses across the solar spectrum and weak absorption in the near-infrared (NIR) range. This computational study addresses these limitations by employing a dual light trapping technique: the top surfaces of both the CdS and CdTe layers are tapered as nanocones (NCs), while germanium (Ge) spherical nanoparticles (NPs) are embedded within the CdTe absorber layer to enhance broadband absorption. Numerical simulations using Finite-Difference Time Domain (FDTD) and other methods are used to optimize the parameters and configurations of both nanostructures, aiming to achieve peak optoelectronic performance. The results show that a short-circuit current density ($J_{sc}$) of 35.38 mA/$cm^2$ and a power conversion efficiency (PCE) of 27.76% can be achieved with optimal nanocone (NC) texturing and spherical Ge nanoparticle (NP) configurations, a 45.45% and 80.72% increase compared to baseline structure in $J_{sc}$ and PCE respectively. To understand the enhancement mechanisms, the study includes analyses using diffraction grating theory and Mie theory. Fabricability of these structures is also evaluated. Furthermore, an additional study on the effects of incident angle variation and polarization change demonstrates that the optimal structure is robust under practical conditions, maintaining consistent performance.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
QORT-Former: Query-optimized Real-time Transformer for Understanding Two Hands Manipulating Objects
Authors:
Elkhan Ismayilzada,
MD Khalequzzaman Chowdhury Sayem,
Yihalem Yimolal Tiruneh,
Mubarrat Tajoar Chowdhury,
Muhammadjon Boboev,
Seungryul Baek
Abstract:
Significant advancements have been achieved in the realm of understanding poses and interactions of two hands manipulating an object. The emergence of augmented reality (AR) and virtual reality (VR) technologies has heightened the demand for real-time performance in these applications. However, current state-of-the-art models often exhibit promising results at the expense of substantial computatio…
▽ More
Significant advancements have been achieved in the realm of understanding poses and interactions of two hands manipulating an object. The emergence of augmented reality (AR) and virtual reality (VR) technologies has heightened the demand for real-time performance in these applications. However, current state-of-the-art models often exhibit promising results at the expense of substantial computational overhead. In this paper, we present a query-optimized real-time Transformer (QORT-Former), the first Transformer-based real-time framework for 3D pose estimation of two hands and an object. We first limit the number of queries and decoders to meet the efficiency requirement. Given limited number of queries and decoders, we propose to optimize queries which are taken as input to the Transformer decoder, to secure better accuracy: (1) we propose to divide queries into three types (a left hand query, a right hand query and an object query) and enhance query features (2) by using the contact information between hands and an object and (3) by using three-step update of enhanced image and query features with respect to one another. With proposed methods, we achieved real-time pose estimation performance using just 108 queries and 1 decoder (53.5 FPS on an RTX 3090TI GPU). Surpassing state-of-the-art results on the H2O dataset by 17.6% (left hand), 22.8% (right hand), and 27.2% (object), as well as on the FPHA dataset by 5.3% (right hand) and 10.4% (object), our method excels in accuracy. Additionally, it sets the state-of-the-art in interaction recognition, maintaining real-time efficiency with an off-the-shelf action recognition module.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Invariance principle for the Gaussian Multiplicative Chaos via a high dimensional CLT with low rank increments
Authors:
Mriganka Basu Roy Chowdhury,
Shirshendu Ganguly
Abstract:
Gaussian multiplicative chaos (GMC) is a canonical random fractal measure obtained by exponentiating log-correlated Gaussian processes, first constructed in the seminal work of Kahane (1985). Since then it has served as an important building block in constructions of quantum field theories and Liouville quantum gravity. However, in many natural settings, non-Gaussian log-correlated processes arise…
▽ More
Gaussian multiplicative chaos (GMC) is a canonical random fractal measure obtained by exponentiating log-correlated Gaussian processes, first constructed in the seminal work of Kahane (1985). Since then it has served as an important building block in constructions of quantum field theories and Liouville quantum gravity. However, in many natural settings, non-Gaussian log-correlated processes arise. In this paper, we investigate the universality of GMC through an invariance principle. We consider the model of a random Fourier series, a process known to be log-correlated. While the Gaussian Fourier series has been a classical object of study, recently, the non-Gaussian counterpart was investigated and the associated multiplicative chaos constructed by Junnila in 2016. We show that the Gaussian and non-Gaussian variables can be coupled so that the associated chaos measures are almost surely mutually absolutely continuous throughout the entire sub-critical regime. This solves the main open problem from Kim and Kriechbaum (2024) who had earlier established such a result for a part of the regime. The main ingredient is a new high dimensional CLT for a sum of independent (but not i.i.d.) random vectors belonging to rank one subspaces with error bounds involving the isotropic properties of the covariance matrix of the sum, which we expect will find other applications. The proof relies on a path-wise analysis of Skorokhod embeddings as well as a perturbative result about square roots of positive semi-definite matrices which, surprisingly, appears to be new.
△ Less
Submitted 24 February, 2025;
originally announced February 2025.
-
Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents
Authors:
Patrick Tser Jern Kon,
Jiachen Liu,
Qiuyi Ding,
Yiming Qiu,
Zhenning Yang,
Yibo Huang,
Jayanth Srinivasa,
Myungjin Lee,
Mosharaf Chowdhury,
Ang Chen
Abstract:
Scientific experimentation, a cornerstone of human progress, demands rigor in reliability, methodical control, and interpretability to yield meaningful results. Despite the growing capabilities of large language models (LLMs) in automating different aspects of the scientific process, automating rigorous experimentation remains a significant challenge. To address this gap, we propose Curie, an AI a…
▽ More
Scientific experimentation, a cornerstone of human progress, demands rigor in reliability, methodical control, and interpretability to yield meaningful results. Despite the growing capabilities of large language models (LLMs) in automating different aspects of the scientific process, automating rigorous experimentation remains a significant challenge. To address this gap, we propose Curie, an AI agent framework designed to embed rigor into the experimentation process through three key components: an intra-agent rigor module to enhance reliability, an inter-agent rigor module to maintain methodical control, and an experiment knowledge module to enhance interpretability. To evaluate Curie, we design a novel experimental benchmark composed of 46 questions across four computer science domains, derived from influential research papers, and widely adopted open-source projects. Compared to the strongest baseline tested, we achieve a 3.4$\times$ improvement in correctly answering experimental questions. Curie is open-sourced at https://github.com/Just-Curieous/Curie.
△ Less
Submitted 25 February, 2025; v1 submitted 21 February, 2025;
originally announced February 2025.
-
Intelligent Soft Matter: Towards Embodied Intelligence
Authors:
Vladimir A. Baulin,
Achille Giacometti,
Dmitry Fedosov,
Stephen Ebbens,
Nydia R. Varela-Rosales,
Neus Feliu,
Mithun Chowdhury,
Minghan Hu,
Rudolf Füchslin,
Marjolein Dijkstra,
Matan Mussel,
René van Roij,
Dong Xie,
Vassil Tzanov,
Mengjie Zu,
Samuel Hidalgo-Caballero,
Ye Yuan,
Luca Cocconi,
Cheol-Min Ghim,
Cécile Cottin-Bizonne,
M. Carmen Miguel,
Maria Jose Esplandiu,
Juliane Simmchen,
Wolfgang J. Parak,
Marco Werner
, et al. (2 additional authors not shown)
Abstract:
Intelligent soft matter stands at the intersection of materials science, physics, and cognitive science, promising to change how we design and interact with materials. This transformative field seeks to create materials that possess life-like capabilities, such as perception, learning, memory, and adaptive behavior. Unlike traditional materials, which typically perform static or predefined functio…
▽ More
Intelligent soft matter stands at the intersection of materials science, physics, and cognitive science, promising to change how we design and interact with materials. This transformative field seeks to create materials that possess life-like capabilities, such as perception, learning, memory, and adaptive behavior. Unlike traditional materials, which typically perform static or predefined functions, intelligent soft matter dynamically interacts with its environment. It integrates multiple sensory inputs, retains experiences, and makes decisions to optimize its responses. Inspired by biological systems, these materials intend to leverage the inherent properties of soft matter: flexibility, self-evolving, and responsiveness to perform functions that mimic cognitive processes. By synthesizing current research trends and projecting their evolution, we present a forward-looking perspective on how intelligent soft matter could be constructed, with the aim of inspiring innovations in fields such as biomedical devices, adaptive robotics, and beyond. We highlight new pathways for integrating design of sensing, memory and action with internal low-power operations and discuss challenges for practical implementation of materials with "intelligent behavior". These approaches outline a path towards to more robust, versatile and scalable materials that can potentially act, compute, and "think" by their inherent intrinsic material behaviour beyond traditional smart technologies relying on external control.
△ Less
Submitted 23 May, 2025; v1 submitted 18 February, 2025;
originally announced February 2025.
-
Abduction of Domain Relationships from Data for VQA
Authors:
Al Mehdi Saadat Chowdhury,
Paulo Shakarian,
Gerardo I. Simari
Abstract:
In this paper, we study the problem of visual question answering (VQA) where the image and query are represented by ASP programs that lack domain data. We provide an approach that is orthogonal and complementary to existing knowledge augmentation techniques where we abduce domain relationships of image constructs from past examples. After framing the abduction problem, we provide a baseline appro…
▽ More
In this paper, we study the problem of visual question answering (VQA) where the image and query are represented by ASP programs that lack domain data. We provide an approach that is orthogonal and complementary to existing knowledge augmentation techniques where we abduce domain relationships of image constructs from past examples. After framing the abduction problem, we provide a baseline approach, and an implementation that significantly improves the accuracy of query answering yet requires few examples.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
Unlocking Mental Health: Exploring College Students' Well-being through Smartphone Behaviors
Authors:
Wei Xuan,
Meghna Roy Chowdhury,
Yi Ding,
Yixue Zhao
Abstract:
The global mental health crisis is a pressing concern, with college students particularly vulnerable to rising mental health disorders. The widespread use of smartphones among young adults, while offering numerous benefits, has also been linked to negative outcomes such as addiction and regret, significantly impacting well-being. Leveraging the longest longitudinal dataset collected over four coll…
▽ More
The global mental health crisis is a pressing concern, with college students particularly vulnerable to rising mental health disorders. The widespread use of smartphones among young adults, while offering numerous benefits, has also been linked to negative outcomes such as addiction and regret, significantly impacting well-being. Leveraging the longest longitudinal dataset collected over four college years through passive mobile sensing, this study is the first to examine the relationship between students' smartphone unlocking behaviors and their mental health at scale in real-world settings. We provide the first evidence demonstrating the predictability of phone unlocking behaviors for mental health outcomes based on a large dataset, highlighting the potential of these novel features for future predictive models. Our findings reveal important variations in smartphone usage across genders and locations, offering a deeper understanding of the interplay between digital behaviors and mental health. We highlight future research directions aimed at mitigating adverse effects and promoting digital well-being in this population.
△ Less
Submitted 28 May, 2025; v1 submitted 12 February, 2025;
originally announced February 2025.
-
Deep Learning in Automated Power Line Inspection: A Review
Authors:
Md. Ahasan Atick Faisal,
Imene Mecheter,
Yazan Qiblawey,
Javier Hernandez Fernandez,
Muhammad E. H. Chowdhury,
Serkan Kiranyaz
Abstract:
In recent years, power line maintenance has seen a paradigm shift by moving towards computer vision-powered automated inspection. The utilization of an extensive collection of videos and images has become essential for maintaining the reliability, safety, and sustainability of electricity transmission. A significant focus on applying deep learning techniques for enhancing power line inspection pro…
▽ More
In recent years, power line maintenance has seen a paradigm shift by moving towards computer vision-powered automated inspection. The utilization of an extensive collection of videos and images has become essential for maintaining the reliability, safety, and sustainability of electricity transmission. A significant focus on applying deep learning techniques for enhancing power line inspection processes has been observed in recent research. A comprehensive review of existing studies has been conducted in this paper, to aid researchers and industries in developing improved deep learning-based systems for analyzing power line data. The conventional steps of data analysis in power line inspections have been examined, and the body of current research has been systematically categorized into two main areas: the detection of components and the diagnosis of faults. A detailed summary of the diverse methods and techniques employed in these areas has been encapsulated, providing insights into their functionality and use cases. Special attention has been given to the exploration of deep learning-based methodologies for the analysis of power line inspection data, with an exposition of their fundamental principles and practical applications. Moreover, a vision for future research directions has been outlined, highlighting the need for advancements such as edge-cloud collaboration, and multi-modal analysis among others. Thus, this paper serves as a comprehensive resource for researchers delving into deep learning for power line analysis, illuminating the extent of current knowledge and the potential areas for future investigation.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
Smart IoT Security: Lightweight Machine Learning Techniques for Multi-Class Attack Detection in IoT Networks
Authors:
Shahran Rahman Alve,
Muhammad Zawad Mahmud,
Samiha Islam,
Md. Asaduzzaman Chowdhury,
Jahirul Islam
Abstract:
In the growing terrain of the Internet of Things (IoT), it is vital that networks are secure to protect against a range of cyber threats. Based on the strong machine learning framework, this study proposes novel lightweight ensemble approaches for improving multi-class attack detection of IoT devices. Using the large CICIoT 2023 dataset with 34 attack types distributed amongst 10 attack categories…
▽ More
In the growing terrain of the Internet of Things (IoT), it is vital that networks are secure to protect against a range of cyber threats. Based on the strong machine learning framework, this study proposes novel lightweight ensemble approaches for improving multi-class attack detection of IoT devices. Using the large CICIoT 2023 dataset with 34 attack types distributed amongst 10 attack categories, we systematically evaluated the performance of a wide variety of modern machine learning methods with the aim of establishing the best-performing algorithmic choice to secure IoT applications. In particular, we explore approaches based on ML classifiers to tackle the biocharges characterized by the challenging and heterogeneous nature of attack vectors in IoT environments. The method that performed best was the Decision Tree, with an accuracy of 99.56% and an F1 score of 99.62%, showing that this model is capable of accurately and reliably detecting threats.The Random Forest model was the next best-performing model with 98.22% and an F1 score of 98.24%, suggesting that ML methods are quite effective in a situation of high-dimensional data. Our results highlight the potential for using ML classifiers in bolstering security for IoT devices and also serve as motivations for future investigations targeting scalable, keystroke-based attack detection systems. We believe that our method provides a new path to develop complex machine learning algorithms for low-resource IoT devices, balancing both accuracy and time efficiency needs. In summary, these contributions enrich the state of the art of the IoT security literature, laying down solid ground and guidelines for the deployment of smart, adaptive security in IoT settings.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
Preparing for Kyber in Securing Intelligent Transportation Systems Communications: A Case Study on Fault-Enabled Chosen-Ciphertext Attack
Authors:
Kaiyuan Zhang,
M Sabbir Salek,
Antian Wang,
Mizanur Rahman,
Mashrur Chowdhury,
Yingjie Lao
Abstract:
Intelligent transportation systems (ITS) are characterized by wired or wireless communication among different entities, such as vehicles, roadside infrastructure, and traffic management infrastructure. These communications demand different levels of security, depending on how sensitive the data is. The national ITS reference architecture (ARC-IT) defines three security levels, i.e., high, moderate…
▽ More
Intelligent transportation systems (ITS) are characterized by wired or wireless communication among different entities, such as vehicles, roadside infrastructure, and traffic management infrastructure. These communications demand different levels of security, depending on how sensitive the data is. The national ITS reference architecture (ARC-IT) defines three security levels, i.e., high, moderate, and low-security levels, based on the different security requirements of ITS applications. In this study, we present a generalized approach to secure ITS communications using a standardized key encapsulation mechanism, known as Kyber, designed for post-quantum cryptography (PQC). We modified the encryption and decryption systems for ITS communications while mapping the security levels of ITS applications to the three versions of Kyber, i.e., Kyber-512, Kyber-768, and Kyber-1024. Then, we conducted a case study using a benchmark fault-enabled chosen-ciphertext attack to evaluate the security provided by the different Kyber versions. The encryption and decryption times observed for different Kyber security levels and the total number of iterations required to recover the secret key using the chosen-ciphertext attack are presented. Our analyses show that higher security levels increase the time required for a successful attack, with Kyber-512 being breached in 183 seconds, Kyber-768 in 337 seconds, and Kyber-1024 in 615 seconds. In addition, attack time instabilities are observed for Kyber-512, 768, and 1024 under 5,000, 6,000, and 8,000 inequalities, respectively. The relationships among the different Kyber versions, and the respective attack requirements and performances underscore the ITS communication security Kyber could provide in the PQC era.
△ Less
Submitted 3 February, 2025;
originally announced February 2025.
-
Mordal: Automated Pretrained Model Selection for Vision Language Models
Authors:
Shiqi He,
Insu Jang,
Mosharaf Chowdhury
Abstract:
Incorporating multiple modalities into large language models (LLMs) is a powerful way to enhance their understanding of non-textual data, enabling them to perform multimodal tasks. Vision language models (VLMs) form the fastest growing category of multimodal models because of their many practical use cases, including in healthcare, robotics, and accessibility. Unfortunately, even though different…
▽ More
Incorporating multiple modalities into large language models (LLMs) is a powerful way to enhance their understanding of non-textual data, enabling them to perform multimodal tasks. Vision language models (VLMs) form the fastest growing category of multimodal models because of their many practical use cases, including in healthcare, robotics, and accessibility. Unfortunately, even though different VLMs in the literature demonstrate impressive visual capabilities in different benchmarks, they are handcrafted by human experts; there is no automated framework to create task-specific multimodal models.
We introduce Mordal, an automated multimodal model search framework that efficiently finds the best VLM for a user-defined task without manual intervention. Mordal achieves this both by reducing the number of candidates to consider during the search process and by minimizing the time required to evaluate each remaining candidate. Our evaluation shows that Mordal can find the best VLM for a given problem using up to $8.9\times$--$11.6\times$ lower GPU hours than grid search. In the process of our evaluation, we have also discovered new VLMs that outperform their state-of-the-art counterparts.
△ Less
Submitted 31 January, 2025;
originally announced February 2025.
-
Accelerating PageRank Algorithmic Tasks with a new Programmable Hardware Architecture
Authors:
Md Rownak Hossain Chowdhury,
Mostafizur Rahman
Abstract:
Addressing the growing demands of artificial intelligence (AI) and data analytics requires new computing approaches. In this paper, we propose a reconfigurable hardware accelerator designed specifically for AI and data-intensive applications. Our architecture features a messaging-based intelligent computing scheme that allows for dynamic programming at runtime using a minimal instruction set. To a…
▽ More
Addressing the growing demands of artificial intelligence (AI) and data analytics requires new computing approaches. In this paper, we propose a reconfigurable hardware accelerator designed specifically for AI and data-intensive applications. Our architecture features a messaging-based intelligent computing scheme that allows for dynamic programming at runtime using a minimal instruction set. To assess our hardware's effectiveness, we conducted a case study in TSMC 28nm technology node. The simulation-based study involved analyzing a protein network using the computationally demanding PageRank algorithm. The results demonstrate that our hardware can analyze a 5,000-node protein network in just 213.6 milliseconds over 100 iterations. These outcomes signify the potential of our design to achieve cutting-edge performance in next-generation AI applications.
△ Less
Submitted 19 December, 2024;
originally announced February 2025.
-
Self-CephaloNet: A Two-stage Novel Framework using Operational Neural Network for Cephalometric Analysis
Authors:
Md. Shaheenur Islam Sumon,
Khandaker Reajul Islam,
Tanzila Rafique,
Gazi Shamim Hassan,
Md. Sakib Abrar Hossain,
Kanchon Kanti Podder,
Noha Barhom,
Faleh Tamimi,
Abdulrahman Alqahtani,
Muhammad E. H. Chowdhury
Abstract:
Cephalometric analysis is essential for the diagnosis and treatment planning of orthodontics. In lateral cephalograms, however, the manual detection of anatomical landmarks is a time-consuming procedure. Deep learning solutions hold the potential to address the time constraints associated with certain tasks; however, concerns regarding their performance have been observed. To address this critical…
▽ More
Cephalometric analysis is essential for the diagnosis and treatment planning of orthodontics. In lateral cephalograms, however, the manual detection of anatomical landmarks is a time-consuming procedure. Deep learning solutions hold the potential to address the time constraints associated with certain tasks; however, concerns regarding their performance have been observed. To address this critical issue, we proposed an end-to-end cascaded deep learning framework (Self-CepahloNet) for the task, which demonstrated benchmark performance over the ISBI 2015 dataset in predicting 19 dental landmarks. Due to their adaptive nodal capabilities, Self-ONN (self-operational neural networks) demonstrate superior learning performance for complex feature spaces over conventional convolutional neural networks. To leverage this attribute, we introduced a novel self-bottleneck in the HRNetV2 (High Resolution Network) backbone, which has exhibited benchmark performance on the ISBI 2015 dataset for the dental landmark detection task. Our first-stage results surpassed previous studies, showcasing the efficacy of our singular end-to-end deep learning model, which achieved a remarkable 70.95% success rate in detecting cephalometric landmarks within a 2mm range for the Test1 and Test2 datasets. Moreover, the second stage significantly improved overall performance, yielding an impressive 82.25% average success rate for the datasets above within the same 2mm distance. Furthermore, external validation was conducted using the PKU cephalogram dataset. Our model demonstrated a commendable success rate of 75.95% within the 2mm range.
△ Less
Submitted 19 January, 2025;
originally announced January 2025.
-
From Scarcity to Capability: Empowering Fake News Detection in Low-Resource Languages with LLMs
Authors:
Hrithik Majumdar Shibu,
Shrestha Datta,
Md. Sumon Miah,
Nasrullah Sami,
Mahruba Sharmin Chowdhury,
Md. Saiful Islam
Abstract:
The rapid spread of fake news presents a significant global challenge, particularly in low-resource languages like Bangla, which lack adequate datasets and detection tools. Although manual fact-checking is accurate, it is expensive and slow to prevent the dissemination of fake news. Addressing this gap, we introduce BanFakeNews-2.0, a robust dataset to enhance Bangla fake news detection. This vers…
▽ More
The rapid spread of fake news presents a significant global challenge, particularly in low-resource languages like Bangla, which lack adequate datasets and detection tools. Although manual fact-checking is accurate, it is expensive and slow to prevent the dissemination of fake news. Addressing this gap, we introduce BanFakeNews-2.0, a robust dataset to enhance Bangla fake news detection. This version includes 11,700 additional, meticulously curated fake news articles validated from credible sources, creating a proportional dataset of 47,000 authentic and 13,000 fake news items across 13 categories. In addition, we created a manually curated independent test set of 460 fake and 540 authentic news items for rigorous evaluation. We invest efforts in collecting fake news from credible sources and manually verified while preserving the linguistic richness. We develop a benchmark system utilizing transformer-based architectures, including fine-tuned Bidirectional Encoder Representations from Transformers variants (F1-87\%) and Large Language Models with Quantized Low-Rank Approximation (F1-89\%), that significantly outperforms traditional methods. BanFakeNews-2.0 offers a valuable resource to advance research and application in fake news detection for low-resourced languages. We publicly release our dataset and model on Github to foster research in this direction.
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
Empowering Agricultural Insights: RiceLeafBD -- A Novel Dataset and Optimal Model Selection for Rice Leaf Disease Diagnosis through Transfer Learning Technique
Authors:
Sadia Afrin Rimi,
Md. Jalal Uddin Chowdhury,
Rifat Abdullah,
Iftekhar Ahmed,
Mahrima Akter Mim,
Mohammad Shoaib Rahman
Abstract:
The number of people living in this agricultural nation of ours, which is surrounded by lush greenery, is growing on a daily basis. As a result of this, the level of arable land is decreasing, as well as residential houses and industrial factories. The food crisis is becoming the main threat for us in the upcoming days. Because on the one hand, the population is increasing, and on the other hand,…
▽ More
The number of people living in this agricultural nation of ours, which is surrounded by lush greenery, is growing on a daily basis. As a result of this, the level of arable land is decreasing, as well as residential houses and industrial factories. The food crisis is becoming the main threat for us in the upcoming days. Because on the one hand, the population is increasing, and on the other hand, the amount of food crop production is decreasing due to the attack of diseases. Rice is one of the most significant cultivated crops since it provides food for more than half of the world's population. Bangladesh is dependent on rice (Oryza sativa) as a vital crop for its agriculture, but it faces a significant problem as a result of the ongoing decline in rice yield brought on by common diseases. Early disease detection is the main difficulty in rice crop cultivation. In this paper, we proposed our own dataset, which was collected from the Bangladesh field, and also applied deep learning and transfer learning models for the evaluation of the datasets. We elaborately explain our dataset and also give direction for further research work to serve society using this dataset. We applied a light CNN model and pre-trained InceptionNet-V2, EfficientNet-V2, and MobileNet-V2 models, which achieved 91.5% performance for the EfficientNet-V2 model of this work. The results obtained assaulted other models and even exceeded approaches that are considered to be part of the state of the art. It has been demonstrated by this study that it is possible to precisely and effectively identify diseases that affect rice leaves using this unbiased datasets. After analysis of the performance of different models, the proposed datasets are significant for the society for research work to provide solutions for decreasing rice leaf disease.
△ Less
Submitted 15 January, 2025;
originally announced January 2025.
-
ASTRID -- An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems
Authors:
Mohita Chowdhury,
Yajie Vera He,
Aisling Higham,
Ernest Lim
Abstract:
Large Language Models (LLMs) have shown impressive potential in clinical question answering (QA), with Retrieval Augmented Generation (RAG) emerging as a leading approach for ensuring the factual accuracy of model responses. However, current automated RAG metrics perform poorly in clinical and conversational use cases. Using clinical human evaluations of responses is expensive, unscalable, and not…
▽ More
Large Language Models (LLMs) have shown impressive potential in clinical question answering (QA), with Retrieval Augmented Generation (RAG) emerging as a leading approach for ensuring the factual accuracy of model responses. However, current automated RAG metrics perform poorly in clinical and conversational use cases. Using clinical human evaluations of responses is expensive, unscalable, and not conducive to the continuous iterative development of RAG systems. To address these challenges, we introduce ASTRID - an Automated and Scalable TRIaD for evaluating clinical QA systems leveraging RAG - consisting of three metrics: Context Relevance (CR), Refusal Accuracy (RA), and Conversational Faithfulness (CF). Our novel evaluation metric, CF, is designed to better capture the faithfulness of a model's response to the knowledge base without penalising conversational elements. To validate our triad, we curate a dataset of over 200 real-world patient questions posed to an LLM-based QA agent during surgical follow-up for cataract surgery - the highest volume operation in the world - augmented with clinician-selected questions for emergency, clinical, and non-clinical out-of-domain scenarios. We demonstrate that CF can predict human ratings of faithfulness better than existing definitions for conversational use cases. Furthermore, we show that evaluation using our triad consisting of CF, RA, and CR exhibits alignment with clinician assessment for inappropriate, harmful, or unhelpful responses. Finally, using nine different LLMs, we demonstrate that the three metrics can closely agree with human evaluations, highlighting the potential of these metrics for use in LLM-driven automated evaluation pipelines. We also publish the prompts and datasets for these experiments, providing valuable resources for further research and development.
△ Less
Submitted 14 January, 2025;
originally announced January 2025.
-
A Pan-cancer Classification Model using Multi-view Feature Selection Method and Ensemble Classifier
Authors:
Tareque Mohmud Chowdhury,
Farzana Tabassum,
Sabrina Islam,
Abu Raihan Mostofa Kamal
Abstract:
Accurately identifying cancer samples is crucial for precise diagnosis and effective patient treatment. Traditional methods falter with high-dimensional and high feature-to-sample count ratios, which are critical for classifying cancer samples. This study aims to develop a novel feature selection framework specifically for transcriptome data and propose two ensemble classifiers. For feature select…
▽ More
Accurately identifying cancer samples is crucial for precise diagnosis and effective patient treatment. Traditional methods falter with high-dimensional and high feature-to-sample count ratios, which are critical for classifying cancer samples. This study aims to develop a novel feature selection framework specifically for transcriptome data and propose two ensemble classifiers. For feature selection, we partition the transcriptome dataset vertically based on feature types. Then apply the Boruta feature selection process on each of the partitions, combine the results, and apply Boruta again on the combined result. We repeat the process with different parameters of Boruta and prepare the final feature set. Finally, we constructed two ensemble ML models based on LR, SVM and XGBoost classifiers with max voting and averaging probability approach. We used 10-fold cross-validation to ensure robust and reliable classification performance. With 97.11\% accuracy and 0.9996 AUC value, our approach performs better compared to existing state-of-the-art methods to classify 33 types of cancers. A set of 12 types of cancer is traditionally challenging to differentiate between each other due to their similarity in tissue of origin. Our method accurately identifies over 90\% of samples from these 12 types of cancers, which outperforms all known methods presented in existing literature. The gene set enrichment analysis reveals that our framework's selected features have enriched the pathways highly related to cancers. This study develops a feature selection framework to select features highly related to cancer development and leads to identifying different types of cancer samples with higher accuracy.
△ Less
Submitted 12 January, 2025;
originally announced January 2025.
-
Cybersecurity in Transportation Systems: Policies and Technology Directions
Authors:
Ostonya Thomas,
M Sabbir Salek,
Jean-Michel Tine,
Mizanur Rahman,
Trayce Hockstad,
Mashrur Chowdhury
Abstract:
The transportation industry is experiencing vast digitalization as a plethora of technologies are being implemented to improve efficiency, functionality, and safety. Although technological advancements bring many benefits to transportation, integrating cyberspace across transportation sectors has introduced new and deliberate cyber threats. In the past, public agencies assumed digital infrastructu…
▽ More
The transportation industry is experiencing vast digitalization as a plethora of technologies are being implemented to improve efficiency, functionality, and safety. Although technological advancements bring many benefits to transportation, integrating cyberspace across transportation sectors has introduced new and deliberate cyber threats. In the past, public agencies assumed digital infrastructure was secured since its vulnerabilities were unknown to adversaries. However, with the expansion of cyberspace, this assumption has become invalid. With the rapid advancement of wireless technologies, transportation systems are increasingly interconnected with both transportation and non-transportation networks in an internet-of-things ecosystem, expanding cyberspace in transportation and increasing threats and vulnerabilities. This study investigates some prominent reasons for the increase in cyber vulnerabilities in transportation. In addition, this study presents various collaborative strategies among stakeholders that could help improve cybersecurity in the transportation industry. These strategies address programmatic and policy aspects and suggest avenues for technological research and development. The latter highlights opportunities for future research to enhance the cybersecurity of transportation systems and infrastructure by leveraging hybrid approaches and emerging technologies.
△ Less
Submitted 9 January, 2025;
originally announced January 2025.