-
Scenario of Use Scheme: Threat Model Specification for Speaker Privacy Protection in the Medical Domain
Authors:
Mehtab Ur Rahman,
Martha Larson,
Louis ten Bosch,
Cristian Tejedor-García
Abstract:
Speech recordings are being more frequently used to detect and monitor disease, leading to privacy concerns. Beyond cryptography, protection of speech can be addressed by approaches, such as perturbation, disentanglement, and re-synthesis, that eliminate sensitive information of the speaker, leaving the information necessary for medical analysis purposes. In order for such privacy protective appro…
▽ More
Speech recordings are being more frequently used to detect and monitor disease, leading to privacy concerns. Beyond cryptography, protection of speech can be addressed by approaches, such as perturbation, disentanglement, and re-synthesis, that eliminate sensitive information of the speaker, leaving the information necessary for medical analysis purposes. In order for such privacy protective approaches to be developed, clear and systematic specifications of assumptions concerning medical settings and the needs of medical professionals are necessary. In this paper, we propose a Scenario of Use Scheme that incorporates an Attacker Model, which characterizes the adversary against whom the speaker's privacy must be defended, and a Protector Model, which specifies the defense. We discuss the connection of the scheme with previous work on speech privacy. Finally, we present a concrete example of a specified Scenario of Use and a set of experiments about protecting speaker data against gender inference attacks while maintaining utility for Parkinson's detection.
△ Less
Submitted 26 September, 2024; v1 submitted 24 September, 2024;
originally announced September 2024.
-
FUSED-Net: Detecting Traffic Signs with Limited Data
Authors:
Md. Atiqur Rahman,
Nahian Ibn Asad,
Md. Mushfiqul Haque Omi,
Md. Bakhtiar Hasan,
Sabbir Ahmed,
Md. Hasanul Kabir
Abstract:
Automatic Traffic Sign Recognition is paramount in modern transportation systems, motivating several research endeavors to focus on performance improvement by utilizing large-scale datasets. As the appearance of traffic signs varies across countries, curating large-scale datasets is often impractical; and requires efficient models that can produce satisfactory performance using limited data. In th…
▽ More
Automatic Traffic Sign Recognition is paramount in modern transportation systems, motivating several research endeavors to focus on performance improvement by utilizing large-scale datasets. As the appearance of traffic signs varies across countries, curating large-scale datasets is often impractical; and requires efficient models that can produce satisfactory performance using limited data. In this connection, we present 'FUSED-Net', built-upon Faster RCNN for traffic sign detection, enhanced by Unfrozen Parameters, Pseudo-Support Sets, Embedding Normalization, and Domain Adaptation while reducing data requirement. Unlike traditional approaches, we keep all parameters unfrozen during training, enabling FUSED-Net to learn from limited samples. The generation of a Pseudo-Support Set through data augmentation further enhances performance by compensating for the scarcity of target domain data. Additionally, Embedding Normalization is incorporated to reduce intra-class variance, standardizing feature representation. Domain Adaptation, achieved by pre-training on a diverse traffic sign dataset distinct from the target domain, improves model generalization. Evaluating FUSED-Net on the BDTSD dataset, we achieved 2.4x, 2.2x, 1.5x, and 1.3x improvements of mAP in 1-shot, 3-shot, 5-shot, and 10-shot scenarios, respectively compared to the state-of-the-art Few-Shot Object Detection (FSOD) models. Additionally, we outperform state-of-the-art works on the cross-domain FSOD benchmark under several scenarios.
△ Less
Submitted 3 January, 2025; v1 submitted 23 September, 2024;
originally announced September 2024.
-
An Integrated Blockchain and IPFS Solution for Secure and Efficient Source Code Repository Hosting using Middleman Approach
Authors:
Md. Rafid Haque,
Sakibul Islam Munna,
Sabbir Ahmed,
Md. Tahmid Islam,
Md Mehedi Hassan Onik,
A. B. M. Ashikur Rahman
Abstract:
Version control systems (VCS) are essential for software development, yet centralized VCS present risks such as data loss, security breaches, and ownership disputes. While blockchain-based approaches to decentralized source code repository hosting have been explored, many existing solutions struggle with challenges related to security, scalability, efficiency, and real-time collaboration. This stu…
▽ More
Version control systems (VCS) are essential for software development, yet centralized VCS present risks such as data loss, security breaches, and ownership disputes. While blockchain-based approaches to decentralized source code repository hosting have been explored, many existing solutions struggle with challenges related to security, scalability, efficiency, and real-time collaboration. This study seeks to enhance these efforts by proposing a novel decentralized solution that leverages the Ethereum blockchain and IPFS for secure, efficient, and resilient code repository hosting and governance. Our approach introduces a hybrid architecture that combines the immutable and decentralized nature of blockchain with the efficiency of IPFS for off-chain storage. To facilitate real-time collaboration, we integrate a temporary centralized Middleman IPFS that manages transaction processing and enhances operational efficiency without compromising long-term security. This Middleman IPFS acts as an intermediary, balancing the speed of centralized systems with the resilience of decentralized architectures. Our system uses smart contracts to maintain access control and key management by dynamically verifying access rights, ensuring that only authorized users can retrieve and decrypt data stored on IPFS. This integration allows for secure, real-time collaboration in environments where multiple collaborators need concurrent access to shared resources. Our system employs a hybrid encryption scheme that combines symmetric and asymmetric cryptography. The encrypted keys are stored on the blockchain, while IPFS handles the efficient storage of the codebase itself, with a Middleman IPFS maintaining concurrent collaboration, providing a robust and scalable solution for managing large-scale, collaborative coding projects.
△ Less
Submitted 22 September, 2024;
originally announced September 2024.
-
Seeing is Believing: The Role of Scatterplots in Recommender System Trust and Decision-Making
Authors:
Bhavana Doppalapudi,
Md Dilshadur Rahman,
Paul Rosen
Abstract:
The accuracy of recommender systems influences their trust and decision-making when using them. Providing additional information, such as visualizations, offers context that would otherwise be lacking. However, the role of visualizations in influencing trust and decisions with recommender systems is under-explored. To bridge this gap, we conducted a two-part human-subject experiment to investigate…
▽ More
The accuracy of recommender systems influences their trust and decision-making when using them. Providing additional information, such as visualizations, offers context that would otherwise be lacking. However, the role of visualizations in influencing trust and decisions with recommender systems is under-explored. To bridge this gap, we conducted a two-part human-subject experiment to investigate the impact of scatterplots on recommender system decisions. Our first study focuses on high-level decisions, such as selecting which recommender system to use. The second study focuses on low-level decisions, such as agreeing or disagreeing with a specific recommendation. Our results show scatterplots accompanied by higher levels of accuracy influence decisions and that participants tended to trust the recommendations more when scatterplots were accompanied by descriptive accuracy (e.g., \textit{high}, \textit{medium}, or \textit{low}) instead of numeric accuracy (e.g., \textit{90\%}). Furthermore, we observed scatterplots often assisted participants in validating their decisions. Based on the results, we believe that scatterplots and visualizations, in general, can aid in making informed decisions, validating decisions, and building trust in recommendation systems.
△ Less
Submitted 20 September, 2024;
originally announced September 2024.
-
Applying Pre-trained Multilingual BERT in Embeddings for Improved Malicious Prompt Injection Attacks Detection
Authors:
Md Abdur Rahman,
Hossain Shahriar,
Fan Wu,
Alfredo Cuzzocrea
Abstract:
Large language models (LLMs) are renowned for their exceptional capabilities, and applying to a wide range of applications. However, this widespread use brings significant vulnerabilities. Also, it is well observed that there are huge gap which lies in the need for effective detection and mitigation strategies against malicious prompt injection attacks in large language models, as current approach…
▽ More
Large language models (LLMs) are renowned for their exceptional capabilities, and applying to a wide range of applications. However, this widespread use brings significant vulnerabilities. Also, it is well observed that there are huge gap which lies in the need for effective detection and mitigation strategies against malicious prompt injection attacks in large language models, as current approaches may not adequately address the complexity and evolving nature of these vulnerabilities in real-world applications. Therefore, this work focuses the impact of malicious prompt injection attacks which is one of most dangerous vulnerability on real LLMs applications. It examines to apply various BERT (Bidirectional Encoder Representations from Transformers) like multilingual BERT, DistilBert for classifying malicious prompts from legitimate prompts. Also, we observed how tokenizing the prompt texts and generating embeddings using multilingual BERT contributes to improve the performance of various machine learning methods: Gaussian Naive Bayes, Random Forest, Support Vector Machine, and Logistic Regression. The performance of each model is rigorously analyzed with various parameters to improve the binary classification to discover malicious prompts. Multilingual BERT approach to embed the prompts significantly improved and outperformed the existing works and achieves an outstanding accuracy of 96.55% by Logistic regression. Additionally, we investigated the incorrect predictions of the model to gain insights into its limitations. The findings can guide researchers in tuning various BERT for finding the most suitable model for diverse LLMs vulnerabilities.
△ Less
Submitted 20 September, 2024;
originally announced September 2024.
-
A VLBI Calibrator Grid at 600MHz for Fast Radio Transient Localizations with CHIME/FRB Outriggers
Authors:
Shion Andrew,
Calvin Leung,
Alexander Li,
Kiyoshi W. Masui,
Bridget C. Andersen,
Kevin Bandura,
Alice P. Curtin,
Jane Kaczmarek,
Adam E. Lanman,
Mattias Lazda,
Juan Mena-Parra,
Daniele Michilli,
Kenzie Nimmo,
Aaron B. Pearlman,
Mubdi Rahman,
Vishwangi Shah,
Kaitlyn Shin,
Haochen Wang
Abstract:
The Canadian Hydrogen Intensity Mapping Experiment Fast Radio Burst (CHIME/FRB) Project has a new VLBI Outrigger at the Green Bank Observatory (GBO), which forms a 3300km baseline with CHIME operating at 400-800MHz. Using 100ms long full-array baseband "snapshots" collected commensally during FRB and pulsar triggers, we perform a shallow, wide-area VLBI survey covering a significant fraction of th…
▽ More
The Canadian Hydrogen Intensity Mapping Experiment Fast Radio Burst (CHIME/FRB) Project has a new VLBI Outrigger at the Green Bank Observatory (GBO), which forms a 3300km baseline with CHIME operating at 400-800MHz. Using 100ms long full-array baseband "snapshots" collected commensally during FRB and pulsar triggers, we perform a shallow, wide-area VLBI survey covering a significant fraction of the Northern sky targeted at the positions of compact sources from the Radio Fundamental Catalog. In addition, our survey contains calibrators detected from two 1s long trial baseband snapshots for a deeper survey with CHIME and GBO. In this paper, we present the largest catalog of compact calibrators suitable for 30-milliarcsecond-scale VLBI observations at sub-GHz frequencies to date. Our catalog consists of 200 total calibrators in the Northern Hemisphere that are compact on 30-milliarcsecond scales with fluxes above 100mJy. This calibrator grid will enable the precise localization of hundreds of FRBs a year with CHIME/FRB-Outriggers.
△ Less
Submitted 17 September, 2024;
originally announced September 2024.
-
Uddessho: An Extensive Benchmark Dataset for Multimodal Author Intent Classification in Low-Resource Bangla Language
Authors:
Fatema Tuj Johora Faria,
Mukaffi Bin Moin,
Md. Mahfuzur Rahman,
Md Morshed Alam Shanto,
Asif Iftekher Fahim,
Md. Moinul Hoque
Abstract:
With the increasing popularity of daily information sharing and acquisition on the Internet, this paper introduces an innovative approach for intent classification in Bangla language, focusing on social media posts where individuals share their thoughts and opinions. The proposed method leverages multimodal data with particular emphasis on authorship identification, aiming to understand the underl…
▽ More
With the increasing popularity of daily information sharing and acquisition on the Internet, this paper introduces an innovative approach for intent classification in Bangla language, focusing on social media posts where individuals share their thoughts and opinions. The proposed method leverages multimodal data with particular emphasis on authorship identification, aiming to understand the underlying purpose behind textual content, especially in the context of varied user-generated posts on social media. Current methods often face challenges in low-resource languages like Bangla, particularly when author traits intricately link with intent, as observed in social media posts. To address this, we present the Multimodal-based Author Bangla Intent Classification (MABIC) framework, utilizing text and images to gain deeper insights into the conveyed intentions. We have created a dataset named "Uddessho," comprising 3,048 instances sourced from social media. Our methodology comprises two approaches for classifying textual intent and multimodal author intent, incorporating early fusion and late fusion techniques. In our experiments, the unimodal approach achieved an accuracy of 64.53% in interpreting Bangla textual intent. In contrast, our multimodal approach significantly outperformed traditional unimodal methods, achieving an accuracy of 76.19%. This represents an improvement of 11.66%. To our best knowledge, this is the first research work on multimodal-based author intent classification for low-resource Bangla language social media posts.
△ Less
Submitted 14 September, 2024;
originally announced September 2024.
-
A Systematic Investigation of PbSe Thermoelectric Material
Authors:
Md. Moklesur Rahman,
Md Kamal Hossain,
Fateha Samad,
Fysol Ibna Abbas
Abstract:
The thermoelectric characteristics of lead selenium (PbSe) doped with gallium (Ga) are investigated in this study. When the lead sulfide (PbSe) is tuned with appropriate dopants, it exhibits satisfactory ZT values, hence making it a promising thermoelectric material. This study examines the electrical conductivity, Seebeck coefficient, thermal conductivity, and power factor of PbSe, with varying a…
▽ More
The thermoelectric characteristics of lead selenium (PbSe) doped with gallium (Ga) are investigated in this study. When the lead sulfide (PbSe) is tuned with appropriate dopants, it exhibits satisfactory ZT values, hence making it a promising thermoelectric material. This study examines the electrical conductivity, Seebeck coefficient, thermal conductivity, and power factor of PbSe, with varying amounts of added Ga. Results indicate that incorporating Ga into PbSe improves its thermoelectric performance, with a maximum ZT value of approximately 1.2 at 873 K for the optimal doping concentration of 0.005 atomic percent. This improvement is attributed to the combined effects of increased electrical conductivity and reduced thermal conductivity. These findings suggest that Ga-doped PbSe is a promising candidate for mid-temperature thermoelectric applications.
△ Less
Submitted 31 March, 2025; v1 submitted 13 September, 2024;
originally announced September 2024.
-
Tuning into Climate Risks: Extracting Innovation from Television News for Clean Energy Firms
Authors:
Wasim Ahmad,
Mohammad Arshad Rahman,
Suruchi Shrimali,
Preeti Roy
Abstract:
This article develops multiple novel climate risk measures (or variables) based on the television news coverage by Bloomberg, CNBC, and Fox Business, and examines how they affect the systematic and idiosyncratic risks of clean energy firms in the United States. The measures are built on climate related keywords and cover the volume of coverage, type of coverage (climate crisis, renewable energy, a…
▽ More
This article develops multiple novel climate risk measures (or variables) based on the television news coverage by Bloomberg, CNBC, and Fox Business, and examines how they affect the systematic and idiosyncratic risks of clean energy firms in the United States. The measures are built on climate related keywords and cover the volume of coverage, type of coverage (climate crisis, renewable energy, and government & human initiatives), and media sentiments. We show that an increase in the aggregate measure of climate risk, as indicated by coverage volume, reduces idiosyncratic risk while increasing systematic risk. When climate risk is segregated, we find that systematic risk is positively affected by the physical risk of climate crises and transition risk from government & human initiatives, but no such impact is evident for idiosyncratic risk. Additionally, we observe an asymmetry in risk behavior: negative sentiments tend to decrease idiosyncratic risk and increase systematic risk, while positive sentiments have no significant impact. These findings remain robust to including print media and climate policy uncertainty variables, though some deviations are noted during the COVID-19 period.
△ Less
Submitted 23 November, 2024; v1 submitted 13 September, 2024;
originally announced September 2024.
-
CTLESS: A scatter-window projection and deep learning-based transmission-less attenuation compensation method for myocardial perfusion SPECT
Authors:
Zitong Yu,
Md Ashequr Rahman,
Craig K. Abbey,
Richard Laforest,
Nancy A. Obuchowski,
Barry A. Siegel,
Abhinav K. Jha
Abstract:
Attenuation compensation (AC), while being beneficial for visual-interpretation tasks in myocardial perfusion imaging (MPI) by SPECT, typically requires the availability of a separate X-ray CT component, leading to additional radiation dose, higher costs, and potentially inaccurate diagnosis due to SPECT/CT misalignment. To address these issues, we developed a method for cardiac SPECT AC using dee…
▽ More
Attenuation compensation (AC), while being beneficial for visual-interpretation tasks in myocardial perfusion imaging (MPI) by SPECT, typically requires the availability of a separate X-ray CT component, leading to additional radiation dose, higher costs, and potentially inaccurate diagnosis due to SPECT/CT misalignment. To address these issues, we developed a method for cardiac SPECT AC using deep learning and emission scatter-window photons without a separate transmission scan (CTLESS). In this method, an estimated attenuation map reconstructed from scatter-energy window projections is segmented into different regions using a multi-channel input multi-decoder network trained on CT scans. Pre-defined attenuation coefficients are assigned to these regions, yielding the attenuation map used for AC. We objectively evaluated this method in a retrospective study with anonymized clinical SPECT/CT stress MPI images on the clinical task of detecting defects with an anthropomorphic model observer. CTLESS yielded statistically non-inferior performance compared to a CT-based AC (CTAC) method and significantly outperformed a non-AC (NAC) method on this clinical task. Similar results were observed in stratified analyses with different sexes, defect extents and severities. The method was observed to generalize across two SPECT scanners, each with a different camera. In addition, CTLESS yielded similar performance as CTAC and outperformed NAC method on the metrics of root mean squared error and structural similarity index measure. Moreover, as we reduced the training dataset size, CTLESS yielded relatively stable AUC values and generally outperformed another DL-based AC method that directly estimated the attenuation coefficient within each voxel. These results demonstrate the capability of the CTLESS method for transmission-less AC in SPECT and motivate further clinical evaluation.
△ Less
Submitted 12 September, 2024;
originally announced September 2024.
-
AgileIR: Memory-Efficient Group Shifted Windows Attention for Agile Image Restoration
Authors:
Hongyi Cai,
Mohammad Mahdinur Rahman,
Mohammad Shahid Akhtar,
Jie Li,
Jingyu Wu,
Zhili Fang
Abstract:
Image Transformers show a magnificent success in Image Restoration tasks. Nevertheless, most of transformer-based models are strictly bounded by exorbitant memory occupancy. Our goal is to reduce the memory consumption of Swin Transformer and at the same time speed up the model during training process. Thus, we introduce AgileIR, group shifted attention mechanism along with window attention, which…
▽ More
Image Transformers show a magnificent success in Image Restoration tasks. Nevertheless, most of transformer-based models are strictly bounded by exorbitant memory occupancy. Our goal is to reduce the memory consumption of Swin Transformer and at the same time speed up the model during training process. Thus, we introduce AgileIR, group shifted attention mechanism along with window attention, which sparsely simplifies the model in architecture. We propose Group Shifted Window Attention (GSWA) to decompose Shift Window Multi-head Self Attention (SW-MSA) and Window Multi-head Self Attention (W-MSA) into groups across their attention heads, contributing to shrinking memory usage in back propagation. In addition to that, we keep shifted window masking and its shifted learnable biases during training, in order to induce the model interacting across windows within the channel. We also re-allocate projection parameters to accelerate attention matrix calculation, which we found a negligible decrease in performance. As a result of experiment, compared with our baseline SwinIR and other efficient quantization models, AgileIR keeps the performance still at 32.20 dB on Set5 evaluation dataset, exceeding other methods with tailor-made efficient methods and saves over 50% memory while a large batch size is employed.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
A Remote Control Painting System for Exterior Walls of High-Rise Buildings through Robotic System
Authors:
Diganta Das,
Dipanjali Kundu,
Anichur Rahman,
Muaz Rahman,
Sadia Sazzad
Abstract:
Exterior painting of high-rise buildings is a challenging task. In our country, as well as in other countries of the world, this task is accomplished manually, which is risky and life-threatening for the workers. Researchers and industry experts are trying to find an automatic and robotic solution for the exterior painting of high-rise building walls. In this paper, we propose a solution to this p…
▽ More
Exterior painting of high-rise buildings is a challenging task. In our country, as well as in other countries of the world, this task is accomplished manually, which is risky and life-threatening for the workers. Researchers and industry experts are trying to find an automatic and robotic solution for the exterior painting of high-rise building walls. In this paper, we propose a solution to this problem. We design and implement a prototype for automatically painting the building walls' exteriors. A spray mechanism was introduced in the prototype that can move in four different directions (up-down and left-right). All the movements are achieved by using microcontroller-operated servo motors. Further, these components create a scope to upgrade the proposed remote-controlled system to a robotic system in the future. In the presented system, all the operations are controlled remotely from a smartphone interface. Bluetooth technology is used for remote communications. It is expected that the suggested system will improve productivity with better workplace safety.
△ Less
Submitted 8 September, 2024;
originally announced September 2024.
-
Introducing Ensemble Machine Learning Algorithms for Automatic Test Case Generation using Learning Based Testing
Authors:
Sheikh Md. Mushfiqur Rahman,
Nasir U. Eisty
Abstract:
Ensemble methods are powerful machine learning algorithms that combine multiple models to enhance prediction capabilities and reduce generalization errors. However, their potential to generate effective test cases for fault detection in a System Under Test (SUT) has not been extensively explored. This study aims to systematically investigate the combination of ensemble methods and base classifiers…
▽ More
Ensemble methods are powerful machine learning algorithms that combine multiple models to enhance prediction capabilities and reduce generalization errors. However, their potential to generate effective test cases for fault detection in a System Under Test (SUT) has not been extensively explored. This study aims to systematically investigate the combination of ensemble methods and base classifiers for model inference in a Learning Based Testing (LBT) algorithm to generate fault-detecting test cases for SUTs as a proof of concept. We conduct a series of experiments on functions, generating effective test cases using different ensemble methods and classifier combinations for model inference in our proposed LBT method. We then compare the test suites based on their mutation score. The results indicate that Boosting ensemble methods show overall better performance in generating effective test cases, and the proposed method is performing better than random generation. This analysis helps determine the appropriate ensemble methods for various types of functions. By incorporating ensemble methods into the LBT, this research contributes to the understanding of how to leverage ensemble methods for effective test case generation.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
On the Prevalence, Evolution, and Impact of Code Smells in Simulation Modelling Software
Authors:
Riasat Mahbub,
Mohammad Masudur Rahman,
Muhammad Ahsanul Habib
Abstract:
Simulation modelling systems are routinely used to test or understand real-world scenarios in a controlled setting. They have found numerous applications in scientific research, engineering, and industrial operations. Due to their complex nature, the simulation systems could suffer from various code quality issues and technical debt. However, to date, there has not been any investigation into thei…
▽ More
Simulation modelling systems are routinely used to test or understand real-world scenarios in a controlled setting. They have found numerous applications in scientific research, engineering, and industrial operations. Due to their complex nature, the simulation systems could suffer from various code quality issues and technical debt. However, to date, there has not been any investigation into their code quality issues (e.g. code smells). In this paper, we conduct an empirical study investigating the prevalence, evolution, and impact of code smells in simulation software systems. First, we employ static analysis tools (e.g. Designite) to detect and quantify the prevalence of various code smells in 155 simulation and 327 traditional projects from Github. Our findings reveal that certain code smells (e.g. Long Statement, Magic Number) are more prevalent in simulation software systems than in traditional software systems. Second, we analyze the evolution of these code smells across multiple project versions and investigate their chances of survival. Our experiments show that some code smells such as Magic Number and Long Parameter List can survive a long time in simulation software systems. Finally, we examine any association between software bugs and code smells. Our experiments show that although Design and Architecture code smells are introduced simultaneously with bugs, there is no significant association between code smells and bugs in simulation systems.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
UAV (Unmanned Aerial Vehicles): Diverse Applications of UAV Datasets in Segmentation, Classification, Detection, and Tracking
Authors:
Md. Mahfuzur Rahman,
Sunzida Siddique,
Marufa Kamal,
Rakib Hossain Rifat,
Kishor Datta Gupta
Abstract:
Unmanned Aerial Vehicles (UAVs), have greatly revolutionized the process of gathering and analyzing data in diverse research domains, providing unmatched adaptability and effectiveness. This paper presents a thorough examination of Unmanned Aerial Vehicle (UAV) datasets, emphasizing their wide range of applications and progress. UAV datasets consist of various types of data, such as satellite imag…
▽ More
Unmanned Aerial Vehicles (UAVs), have greatly revolutionized the process of gathering and analyzing data in diverse research domains, providing unmatched adaptability and effectiveness. This paper presents a thorough examination of Unmanned Aerial Vehicle (UAV) datasets, emphasizing their wide range of applications and progress. UAV datasets consist of various types of data, such as satellite imagery, images captured by drones, and videos. These datasets can be categorized as either unimodal or multimodal, offering a wide range of detailed and comprehensive information. These datasets play a crucial role in disaster damage assessment, aerial surveillance, object recognition, and tracking. They facilitate the development of sophisticated models for tasks like semantic segmentation, pose estimation, vehicle re-identification, and gesture recognition. By leveraging UAV datasets, researchers can significantly enhance the capabilities of computer vision models, thereby advancing technology and improving our understanding of complex, dynamic environments from an aerial perspective. This review aims to encapsulate the multifaceted utility of UAV datasets, emphasizing their pivotal role in driving innovation and practical applications in multiple domains.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
Automatic Detection of LLM-generated Code: A Case Study of Claude 3 Haiku
Authors:
Musfiqur Rahman,
SayedHassan Khatoonabadi,
Ahmad Abdellatif,
Emad Shihab
Abstract:
Using Large Language Models (LLMs) has gained popularity among software developers for generating source code. However, the use of LLM-generated code can introduce risks of adding suboptimal, defective, and vulnerable code. This makes it necessary to devise methods for the accurate detection of LLM-generated code. Toward this goal, we perform a case study of Claude 3 Haiku (or Claude 3 for brevity…
▽ More
Using Large Language Models (LLMs) has gained popularity among software developers for generating source code. However, the use of LLM-generated code can introduce risks of adding suboptimal, defective, and vulnerable code. This makes it necessary to devise methods for the accurate detection of LLM-generated code. Toward this goal, we perform a case study of Claude 3 Haiku (or Claude 3 for brevity) on CodeSearchNet dataset. We divide our analyses into two parts: function-level and class-level. We extract 22 software metric features, such as Code Lines and Cyclomatic Complexity, for each level of granularity. We then analyze code snippets generated by Claude 3 and their human-authored counterparts using the extracted features to understand how unique the code generated by Claude 3 is. In the following step, we use the unique characteristics of Claude 3-generated code to build Machine Learning (ML) models and identify which features of the code snippets make them more detectable by ML models. Our results indicate that Claude 3 tends to generate longer functions, but shorter classes than humans, and this characteristic can be used to detect Claude 3-generated code with ML models with 82% and 66% accuracies for function-level and class-level snippets, respectively.
△ Less
Submitted 2 September, 2024;
originally announced September 2024.
-
Container Data Item: An Abstract Datatype for Efficient Container-based Edge Computing
Authors:
Md Rezwanur Rahman,
Tarun Annapareddy,
Shirin Ebadi,
Varsha Natarajan,
Adarsh Srinivasan,
Eric Keller,
Shivakant Mishra
Abstract:
We present Container Data Item (CDI), an abstract datatype that allows multiple containers to efficiently operate on a common data item while preserving their strong security and isolation semantics. Application developers can use CDIs to enable multiple containers to operate on the same data, synchronize execution among themselves, and control the ownership of the shared data item during runtime.…
▽ More
We present Container Data Item (CDI), an abstract datatype that allows multiple containers to efficiently operate on a common data item while preserving their strong security and isolation semantics. Application developers can use CDIs to enable multiple containers to operate on the same data, synchronize execution among themselves, and control the ownership of the shared data item during runtime. These containers may reside on the same server or different servers. CDI is designed to support microservice based applications comprised of a set of interconnected microservices, each implemented by a separate dedicated container. CDI preserves the important isolation semantics of containers by ensuring that exactly one container owns a CDI object at any instant and the ownership of a CDI object may be transferred from one container to another only by the current CDI object owner. We present three different implementations of CDI that allow different containers residing on the same server as well containers residing on different servers to use CDI for efficiently operating on a common data item. The paper provides an extensive performance evaluation of CDI along with two representative applications, an augmented reality application and a decentralized workflow orchestrator.
△ Less
Submitted 1 September, 2024;
originally announced September 2024.
-
Programmable refractive functions
Authors:
Md Sadman Sakib Rahman,
Tianyi Gan,
Mona Jarrahi,
Aydogan Ozcan
Abstract:
Snell's law dictates the phenomenon of light refraction at the interface between two media. Here, we demonstrate, for the first time, arbitrary programming of light refraction through an engineered material where the direction of the output wave can be set independently for different directions of the input wave, covering arbitrarily selected permutations of light refraction between the input and…
▽ More
Snell's law dictates the phenomenon of light refraction at the interface between two media. Here, we demonstrate, for the first time, arbitrary programming of light refraction through an engineered material where the direction of the output wave can be set independently for different directions of the input wave, covering arbitrarily selected permutations of light refraction between the input and output apertures. Formed by a set of cascaded transmissive layers with optimized phase profiles, this refractive function generator (RFG) spans only a few tens of wavelengths in the axial direction. In addition to monochrome RFG designs, we also report wavelength-multiplexed refractive functions, where a distinct refractive function is implemented at each wavelength through the same engineered material volume, i.e., the permutation of light refraction is switched from one desired function to another function by changing the illumination wavelength. As an experimental proof of concept, we demonstrate negative refractive function at the terahertz part of the spectrum using a 3D-printed material. Arbitrary programming of refractive functions enables new design capabilities for optical materials, devices and systems.
△ Less
Submitted 7 April, 2025; v1 submitted 31 August, 2024;
originally announced September 2024.
-
Modularity in Transformers: Investigating Neuron Separability & Specialization
Authors:
Nicholas Pochinkov,
Thomas Jones,
Mohammed Rashidur Rahman
Abstract:
Transformer models are increasingly prevalent in various applications, yet our understanding of their internal workings remains limited. This paper investigates the modularity and task specialization of neurons within transformer architectures, focusing on both vision (ViT) and language (Mistral 7B) models. Using a combination of selective pruning and MoEfication clustering techniques, we analyze…
▽ More
Transformer models are increasingly prevalent in various applications, yet our understanding of their internal workings remains limited. This paper investigates the modularity and task specialization of neurons within transformer architectures, focusing on both vision (ViT) and language (Mistral 7B) models. Using a combination of selective pruning and MoEfication clustering techniques, we analyze the overlap and specialization of neurons across different tasks and data subsets. Our findings reveal evidence of task-specific neuron clusters, with varying degrees of overlap between related tasks. We observe that neuron importance patterns persist to some extent even in randomly initialized models, suggesting an inherent structure that training refines. Additionally, we find that neuron clusters identified through MoEfication correspond more strongly to task-specific neurons in earlier and later layers of the models. This work contributes to a more nuanced understanding of transformer internals and offers insights into potential avenues for improving model interpretability and efficiency.
△ Less
Submitted 30 August, 2024;
originally announced August 2024.
-
Striking the Right Balance: Systematic Assessment of Evaluation Method Distribution Across Contribution Types
Authors:
Feng Lin,
Arran Zeyu Wang,
Md Dilshadur Rahman,
Danielle Albers Szafir,
Ghulam Jilani Quadri
Abstract:
In the rapidly evolving field of information visualization, rigorous evaluation is essential for validating new techniques, understanding user interactions, and demonstrating the effectiveness and usability of visualizations. Faithful evaluations provide valuable insights into how users interact with and perceive the system, enabling designers to identify potential weaknesses and make informed dec…
▽ More
In the rapidly evolving field of information visualization, rigorous evaluation is essential for validating new techniques, understanding user interactions, and demonstrating the effectiveness and usability of visualizations. Faithful evaluations provide valuable insights into how users interact with and perceive the system, enabling designers to identify potential weaknesses and make informed decisions about design choices and improvements. However, an emerging trend of multiple evaluations within a single research raises critical questions about the sustainability, feasibility, and methodological rigor of such an approach. New researchers and students, influenced by this trend, may believe -- multiple evaluations are necessary for a study, regardless of the contribution types. However, the number of evaluations in a study should depend on its contributions and merits, not on the trend of including multiple evaluations to strengthen a paper. So, how many evaluations are enough? This is a situational question and cannot be formulaically determined. Our objective is to summarize current trends and patterns to assess the distribution of evaluation methods over different paper contribution types. In this paper, we identify this trend through a non-exhaustive literature survey of evaluation patterns in 214 papers in the two most recent years' VIS issues in IEEE TVCG from 2023 and 2024. We then discuss various evaluation strategy patterns in the information visualization field to guide practical choices and how this paper will open avenues for further discussion.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Electron FLASH platform for pre-clinical research: LINAC modification, simplification of pulse control and dosimetry
Authors:
Banghao Zhou,
Lixiang Guo,
Weiguo Lu,
Mahbubur Rahman,
Rongxiao Zhang,
Varghese Anto Chirayath,
Yang Kyun Park,
Strahinja Stojadinovic,
Marvin Garza,
Ken Kang-Hsin Wang
Abstract:
Background: FLASH radiotherapy is a treatment regime that delivers therapeutic dose to tumors at an ultra-high dose rate while maintaining adequate normal tissue sparing. However, a comprehensive understanding of the underlying mechanisms, potential late toxicities, and optimal fractionation schemes is important for successful clinical translation. This has necessitated extensive pre-clinical inve…
▽ More
Background: FLASH radiotherapy is a treatment regime that delivers therapeutic dose to tumors at an ultra-high dose rate while maintaining adequate normal tissue sparing. However, a comprehensive understanding of the underlying mechanisms, potential late toxicities, and optimal fractionation schemes is important for successful clinical translation. This has necessitated extensive pre-clinical investigations, leading several research institutions to initiate dedicated FLASH research programs. Purpose: This work describes a workflow for establishing an easily accessible electron FLASH (eFLASH) platform. The platform incorporates simplified pulse control, optimized dose rate delivery, and validated Monte Carlo (MC) dose engine for accurate in vivo dosimetry dedicated to FLASH pre-clinical studies. Methods: Adjustment of the automatic frequency control (AFC) module allowed us to optimize the LINAC pulse form to achieve a uniform dose rate. A MC model for the 6 MeV FLASH beam was commissioned to ensure accurate dose calculation necessary for reproducible in vivo studies. Results: Optimizing the AFC module enabled the generation of a uniform pulse form, ensuring consistent dose per pulse and a uniform dose rate throughout FLASH irradiation. The MC model closely agreed with film measurements. MC dose calculations indicated that 6 MeV FLASH is adequate to achieve a uniform dose distribution for mouse whole brain irradiation but may not be optimal for the spinal cord study. Conclusions: We present a novel workflow for establishing a LINAC-based eFLASH research platform, incorporating techniques for optimized dose rate delivery, a simplified pulse control system, and validated MC engine. This work provides researchers with valuable new approaches to facilitate the development of robust and accessible LINAC-based system for FLASH studies.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
SONICS: Synthetic Or Not -- Identifying Counterfeit Songs
Authors:
Md Awsafur Rahman,
Zaber Ibn Abdul Hakim,
Najibul Haque Sarker,
Bishmoy Paul,
Shaikh Anowarul Fattah
Abstract:
The recent surge in AI-generated songs presents exciting possibilities and challenges. These innovations necessitate the ability to distinguish between human-composed and synthetic songs to safeguard artistic integrity and protect human musical artistry. Existing research and datasets in fake song detection only focus on singing voice deepfake detection (SVDD), where the vocals are AI-generated bu…
▽ More
The recent surge in AI-generated songs presents exciting possibilities and challenges. These innovations necessitate the ability to distinguish between human-composed and synthetic songs to safeguard artistic integrity and protect human musical artistry. Existing research and datasets in fake song detection only focus on singing voice deepfake detection (SVDD), where the vocals are AI-generated but the instrumental music is sourced from real songs. However, these approaches are inadequate for detecting contemporary end-to-end artificial songs where all components (vocals, music, lyrics, and style) could be AI-generated. Additionally, existing datasets lack music-lyrics diversity, long-duration songs, and open-access fake songs. To address these gaps, we introduce SONICS, a novel dataset for end-to-end Synthetic Song Detection (SSD), comprising over 97k songs (4,751 hours) with over 49k synthetic songs from popular platforms like Suno and Udio. Furthermore, we highlight the importance of modeling long-range temporal dependencies in songs for effective authenticity detection, an aspect entirely overlooked in existing methods. To utilize long-range patterns, we introduce SpecTTTra, a novel architecture that significantly improves time and memory efficiency over conventional CNN and Transformer-based models. For long songs, our top-performing variant outperforms ViT by 8% in F1 score, is 38% faster, and uses 26% less memory, while also surpassing ConvNeXt with a 1% F1 score gain, 20% speed boost, and 67% memory reduction.
△ Less
Submitted 24 February, 2025; v1 submitted 26 August, 2024;
originally announced August 2024.
-
OpenNav: Efficient Open Vocabulary 3D Object Detection for Smart Wheelchair Navigation
Authors:
Muhammad Rameez ur Rahman,
Piero Simonetto,
Anna Polato,
Francesco Pasti,
Luca Tonin,
Sebastiano Vascon
Abstract:
Open vocabulary 3D object detection (OV3D) allows precise and extensible object recognition crucial for adapting to diverse environments encountered in assistive robotics. This paper presents OpenNav, a zero-shot 3D object detection pipeline based on RGB-D images for smart wheelchairs. Our pipeline integrates an open-vocabulary 2D object detector with a mask generator for semantic segmentation, fo…
▽ More
Open vocabulary 3D object detection (OV3D) allows precise and extensible object recognition crucial for adapting to diverse environments encountered in assistive robotics. This paper presents OpenNav, a zero-shot 3D object detection pipeline based on RGB-D images for smart wheelchairs. Our pipeline integrates an open-vocabulary 2D object detector with a mask generator for semantic segmentation, followed by depth isolation and point cloud construction to create 3D bounding boxes. The smart wheelchair exploits these 3D bounding boxes to identify potential targets and navigate safely. We demonstrate OpenNav's performance through experiments on the Replica dataset and we report preliminary results with a real wheelchair. OpenNav improves state-of-the-art significantly on the Replica dataset at mAP25 (+9pts) and mAP50 (+5pts) with marginal improvement at mAP. The code is publicly available at this link: https://github.com/EasyWalk-PRIN/OpenNav.
△ Less
Submitted 25 August, 2024;
originally announced August 2024.
-
Bi3+ Doped Nanocrystalline Ni-Co-Zn Spinel Ferrites: Tuning of Physical, Electrical, Dielectric and Magnetic Properties for Advanced Spintronics Applications
Authors:
Md. Mahfuzur Rahman,
Nazmul Hasan,
Sumaiya Tabassum,
M. Harun-Or-Rashid,
Md. Harunur Rashid,
Md. Arifuzzaman
Abstract:
This study reports the synthesis and characterization of nanocrystalline Ni0.5Co0.2Zn0.3BixFe2-xO4 x varis by 0.0, 0.025, 0.050, 0.075, 0.100 ferrites synthesized via the sol-gel auto combustion method.The low coercivity values 23.68 to 87.71 Oe are observed,classifying the investigated materials as soft ferromagnetic.The increased magnetic anisotropy K through Bi3+ doping indicates tunable stabil…
▽ More
This study reports the synthesis and characterization of nanocrystalline Ni0.5Co0.2Zn0.3BixFe2-xO4 x varis by 0.0, 0.025, 0.050, 0.075, 0.100 ferrites synthesized via the sol-gel auto combustion method.The low coercivity values 23.68 to 87.71 Oe are observed,classifying the investigated materials as soft ferromagnetic.The increased magnetic anisotropy K through Bi3+ doping indicates tunable stability in magnetic orientations,making them suitable for multifunctional applications.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Hysteretic response to different modes of ramping an external field in sparse and dense Ising spin glasses
Authors:
Mahajabin Rahman,
Stefan Boettcher
Abstract:
We consider the hysteretic behavior of Ising spin glasses at $T=0$ for various modes of driving. Previous studies mostly focused on an infinitely slow speed $\dot{H}$ by which the external field $H$ was ramped to trigger avalanches of spin flips by starting with destabilizing a single spin while few have focused on the effect of different driving methods. First, we show that this conventional prot…
▽ More
We consider the hysteretic behavior of Ising spin glasses at $T=0$ for various modes of driving. Previous studies mostly focused on an infinitely slow speed $\dot{H}$ by which the external field $H$ was ramped to trigger avalanches of spin flips by starting with destabilizing a single spin while few have focused on the effect of different driving methods. First, we show that this conventional protocol imposes a system size dependence. Then, we numerically analyze the response of Ising spin glasses at rates $\dot{H}$ that are fixed as well, to elucidate the differences in the response. Specifically, we compare three different modes of ramping ($\dot{H}=c/N$, $\dot{H}=c/\sqrt{N}$, and $\dot{H}=c$ for constant $c$) for two types of spin glass systems of size $N$, representing dense networks by the Sherrington-Kirkpatrick model and sparse networks by the lattice spin glass in $d=3$ dimensions known as the Edwards Anderson model. Depending on the mode of ramping, we find that the response of each system, in form of spin-flip avalanches and other observables, can vary considerably. In particular, in the $N$-independent mode applied to the lattice spin glass, which is closest to experimental reality, we observe a percolation transition with a broad avalanche distribution between phases of localized and system-spanning responses. We explore implications for combinatorial optimization problems pertaining to sparse systems.
△ Less
Submitted 20 September, 2024; v1 submitted 19 August, 2024;
originally announced August 2024.
-
ECG-Free Assessment of Cardiac Valve Events Using Seismocardiography
Authors:
Mohammad Muntasir Rahman,
Aysha Mann,
Amirtaha Taebi
Abstract:
Seismocardiogram (SCG) signals can play a crucial role in remote cardiac monitoring, capturing important events such as aortic valve opening (AO) and mitral valve closure (MC). However, existing SCG methods for detecting AO and MC typically rely on electrocardiogram (ECG) data. In this study, we propose an innovative approach to identify AO and MC events in SCG signals without the need for ECG inf…
▽ More
Seismocardiogram (SCG) signals can play a crucial role in remote cardiac monitoring, capturing important events such as aortic valve opening (AO) and mitral valve closure (MC). However, existing SCG methods for detecting AO and MC typically rely on electrocardiogram (ECG) data. In this study, we propose an innovative approach to identify AO and MC events in SCG signals without the need for ECG information. Our method utilized a template bank, which consists of signal templates extracted from SCG waveforms of 5 healthy subjects. These templates represent characteristic features of a heart cycle. When analyzing new, unseen SCG signals from another group of 6 healthy subjects, we employ these templates to accurately detect cardiac cycles and subsequently pinpoint AO and MC events. Our results demonstrate the effectiveness of the proposed template bank approach in achieving ECG-independent AO and MC detection, laying the groundwork for more convenient remote cardiovascular assessment.
△ Less
Submitted 18 August, 2024;
originally announced August 2024.
-
Contactless seismocardiography via Gunnar-Farneback optical flow
Authors:
Mohammad Muntasir Rahman,
Amirtaha Taebi
Abstract:
Seismocardiography (SCG) has gained significant attention due to its potential applications in monitoring cardiac health and diagnosing cardiovascular conditions. Conventional SCG methods rely on accelerometers attached to the chest, which can be uncomfortable or inconvenient. In recent years, researchers have explored non-contact methods to capture SCG signals, and one promising approach involves…
▽ More
Seismocardiography (SCG) has gained significant attention due to its potential applications in monitoring cardiac health and diagnosing cardiovascular conditions. Conventional SCG methods rely on accelerometers attached to the chest, which can be uncomfortable or inconvenient. In recent years, researchers have explored non-contact methods to capture SCG signals, and one promising approach involves analyzing video recordings of the chest. In this study, we investigate a vision-based method based on the Gunnar-Farneback optical flow to extract SCG signals from the chest skin movements recorded by a smartphone camera. We compared the SCG signals extracted from the chest videos of four healthy subjects with those obtained from accelerometers and our previous method based on sticker tracking. Our results demonstrated that the vision-based SCG signals extracted by the proposed method closely resembled those from accelerometers and stickers, although these signals were captured from slightly different locations. The mean squared error between the vision-based SCG signals and accelerometer-based signals was found to be within a reasonable range, especially between signals on head-to-foot direction (0.2$<$MSE$<$1.5). Additionally, heart rates derived from the vision-based SCG exhibited good agreement with the gold-standard ECG measurements, with a mean difference of 0.8 bpm. These results indicate the potential of this non-invasive method in health monitoring and diagnostics.
△ Less
Submitted 18 August, 2024;
originally announced August 2024.
-
OVOSE: Open-Vocabulary Semantic Segmentation in Event-Based Cameras
Authors:
Muhammad Rameez Ur Rahman,
Jhony H. Giraldo,
Indro Spinelli,
Stéphane Lathuilière,
Fabio Galasso
Abstract:
Event cameras, known for low-latency operation and superior performance in challenging lighting conditions, are suitable for sensitive computer vision tasks such as semantic segmentation in autonomous driving. However, challenges arise due to limited event-based data and the absence of large-scale segmentation benchmarks. Current works are confined to closed-set semantic segmentation, limiting the…
▽ More
Event cameras, known for low-latency operation and superior performance in challenging lighting conditions, are suitable for sensitive computer vision tasks such as semantic segmentation in autonomous driving. However, challenges arise due to limited event-based data and the absence of large-scale segmentation benchmarks. Current works are confined to closed-set semantic segmentation, limiting their adaptability to other applications. In this paper, we introduce OVOSE, the first Open-Vocabulary Semantic Segmentation algorithm for Event cameras. OVOSE leverages synthetic event data and knowledge distillation from a pre-trained image-based foundation model to an event-based counterpart, effectively preserving spatial context and transferring open-vocabulary semantic segmentation capabilities. We evaluate the performance of OVOSE on two driving semantic segmentation datasets DDD17, and DSEC-Semantic, comparing it with existing conventional image open-vocabulary models adapted for event-based data. Similarly, we compare OVOSE with state-of-the-art methods designed for closed-set settings in unsupervised domain adaptation for event-based semantic segmentation. OVOSE demonstrates superior performance, showcasing its potential for real-world applications. The code is available at https://github.com/ram95d/OVOSE.
△ Less
Submitted 18 August, 2024;
originally announced August 2024.
-
ConVerSum: A Contrastive Learning-based Approach for Data-Scarce Solution of Cross-Lingual Summarization Beyond Direct Equivalents
Authors:
Sanzana Karim Lora,
M. Sohel Rahman,
Rifat Shahriyar
Abstract:
Cross-lingual summarization (CLS) is a sophisticated branch in Natural Language Processing that demands models to accurately translate and summarize articles from different source languages. Despite the improvement of the subsequent studies, This area still needs data-efficient solutions along with effective training methodologies. To the best of our knowledge, there is no feasible solution for CL…
▽ More
Cross-lingual summarization (CLS) is a sophisticated branch in Natural Language Processing that demands models to accurately translate and summarize articles from different source languages. Despite the improvement of the subsequent studies, This area still needs data-efficient solutions along with effective training methodologies. To the best of our knowledge, there is no feasible solution for CLS when there is no available high-quality CLS data. In this paper, we propose a novel data-efficient approach, ConVerSum, for CLS leveraging the power of contrastive learning, generating versatile candidate summaries in different languages based on the given source document and contrasting these summaries with reference summaries concerning the given documents. After that, we train the model with a contrastive ranking loss. Then, we rigorously evaluate the proposed approach against current methodologies and compare it to powerful Large Language Models (LLMs)- Gemini, GPT 3.5, and GPT 4o proving our model performs better for low-resource languages' CLS. These findings represent a substantial improvement in the area, opening the door to more efficient and accurate cross-lingual summarizing techniques.
△ Less
Submitted 25 November, 2024; v1 submitted 17 August, 2024;
originally announced August 2024.
-
Comparative Performance Analysis of Transformer-Based Pre-Trained Models for Detecting Keratoconus Disease
Authors:
Nayeem Ahmed,
Md Maruf Rahman,
Md Fatin Ishrak,
Md Imran Kabir Joy,
Md Sanowar Hossain Sabuj,
Md. Sadekur Rahman
Abstract:
This study compares eight pre-trained CNNs for diagnosing keratoconus, a degenerative eye disease. A carefully selected dataset of keratoconus, normal, and suspicious cases was used. The models tested include DenseNet121, EfficientNetB0, InceptionResNetV2, InceptionV3, MobileNetV2, ResNet50, VGG16, and VGG19. To maximize model training, bad sample removal, resizing, rescaling, and augmentation wer…
▽ More
This study compares eight pre-trained CNNs for diagnosing keratoconus, a degenerative eye disease. A carefully selected dataset of keratoconus, normal, and suspicious cases was used. The models tested include DenseNet121, EfficientNetB0, InceptionResNetV2, InceptionV3, MobileNetV2, ResNet50, VGG16, and VGG19. To maximize model training, bad sample removal, resizing, rescaling, and augmentation were used. The models were trained with similar parameters, activation function, classification function, and optimizer to compare performance. To determine class separation effectiveness, each model was evaluated on accuracy, precision, recall, and F1-score. MobileNetV2 was the best accurate model in identifying keratoconus and normal cases with few misclassifications. InceptionV3 and DenseNet121 both performed well in keratoconus detection, but they had trouble with questionable cases. In contrast, EfficientNetB0, ResNet50, and VGG19 had more difficulty distinguishing dubious cases from regular ones, indicating the need for model refining and development. A detailed comparison of state-of-the-art CNN architectures for automated keratoconus identification reveals each model's benefits and weaknesses. This study shows that advanced deep learning models can enhance keratoconus diagnosis and treatment planning. Future research should explore hybrid models and integrate clinical parameters to improve diagnostic accuracy and robustness in real-world clinical applications, paving the way for more effective AI-driven ophthalmology tools.
△ Less
Submitted 16 August, 2024;
originally announced August 2024.
-
Nanoscale Surfactant Transport: Bridging Molecular and Continuum Models
Authors:
Muhammad Rizwanur Rahman,
James P. Ewen,
Li Shen,
D. M. Heyes,
Daniele Dini,
E. R. Smith
Abstract:
Surfactant transport is central to a diverse range of natural phenomena, and for many practical applications in physics and engineering. Surprisingly, this process remains relatively poorly understood at the molecular scale. This study investigates the mechanism behind the transport of surfactant monolayers on flat and curved liquid vapor interfaces using nonequilibrium molecular dynamics simulati…
▽ More
Surfactant transport is central to a diverse range of natural phenomena, and for many practical applications in physics and engineering. Surprisingly, this process remains relatively poorly understood at the molecular scale. This study investigates the mechanism behind the transport of surfactant monolayers on flat and curved liquid vapor interfaces using nonequilibrium molecular dynamics simulations, which are compared with the continuum transport model. This approach not only provides fresh molecular level insight into surfactant dynamics, but also confirms the nanoscale mechanism of the lateral migration of surfactant molecules along a thin film that continuously deforms as surfactants spread. By connecting the continuum model where the long wave approximations prevail, to the molecular details where such approximations break down, we establish that the transport equation preserves substantial accuracy in capturing the underlying physics. Moreover, the relative importance of the different mechanisms of the transport process are identified. Consequently, we derive a novel, exact molecular equation for surfactant transport along a deforming surface. Finally, our findings demonstrate that the spreading of surfactants at the molecular scale adheres to expected scaling laws and aligns well with experimental observations.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
mhGPT: A Lightweight Generative Pre-Trained Transformer for Mental Health Text Analysis
Authors:
Dae-young Kim,
Rebecca Hwa,
Muhammad Mahbubur Rahman
Abstract:
This paper introduces mhGPT, a lightweight generative pre-trained transformer trained on mental health-related social media and PubMed articles. Fine-tuned for specific mental health tasks, mhGPT was evaluated under limited hardware constraints and compared with state-of-the-art models like MentaLLaMA and Gemma. Despite having only 1.98 billion parameters and using just 5% of the dataset, mhGPT ou…
▽ More
This paper introduces mhGPT, a lightweight generative pre-trained transformer trained on mental health-related social media and PubMed articles. Fine-tuned for specific mental health tasks, mhGPT was evaluated under limited hardware constraints and compared with state-of-the-art models like MentaLLaMA and Gemma. Despite having only 1.98 billion parameters and using just 5% of the dataset, mhGPT outperformed larger models and matched the performance of models trained on significantly more data. The key contributions include integrating diverse mental health data, creating a custom tokenizer, and optimizing a smaller architecture for low-resource settings. This research could advance AI-driven mental health care, especially in areas with limited computing power.
△ Less
Submitted 15 August, 2024;
originally announced August 2024.
-
Unidirectional imaging with partially coherent light
Authors:
Guangdong Ma,
Che-Yung Shen,
Jingxi Li,
Luzhe Huang,
Cagatay Isil,
Fazil Onuralp Ardic,
Xilin Yang,
Yuhang Li,
Yuntian Wang,
Md Sadman Sakib Rahman,
Aydogan Ozcan
Abstract:
Unidirectional imagers form images of input objects only in one direction, e.g., from field-of-view (FOV) A to FOV B, while blocking the image formation in the reverse direction, from FOV B to FOV A. Here, we report unidirectional imaging under spatially partially coherent light and demonstrate high-quality imaging only in the forward direction (A->B) with high power efficiency while distorting th…
▽ More
Unidirectional imagers form images of input objects only in one direction, e.g., from field-of-view (FOV) A to FOV B, while blocking the image formation in the reverse direction, from FOV B to FOV A. Here, we report unidirectional imaging under spatially partially coherent light and demonstrate high-quality imaging only in the forward direction (A->B) with high power efficiency while distorting the image formation in the backward direction (B->A) along with low power efficiency. Our reciprocal design features a set of spatially engineered linear diffractive layers that are statistically optimized for partially coherent illumination with a given phase correlation length. Our analyses reveal that when illuminated by a partially coherent beam with a correlation length of ~1.5 w or larger, where w is the wavelength of light, diffractive unidirectional imagers achieve robust performance, exhibiting asymmetric imaging performance between the forward and backward directions - as desired. A partially coherent unidirectional imager designed with a smaller correlation length of less than 1.5 w still supports unidirectional image transmission, but with a reduced figure of merit. These partially coherent diffractive unidirectional imagers are compact (axially spanning less than 75 w), polarization-independent, and compatible with various types of illumination sources, making them well-suited for applications in asymmetric visual information processing and communication.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
The Impact of Environment Configurations on the Stability of AI-Enabled Systems
Authors:
Musfiqur Rahman,
SayedHassan Khatoonabadi,
Ahmad Abdellatif,
Haya Samaana,
Emad Shihab
Abstract:
Nowadays, software systems tend to include Artificial Intelligence (AI) components. Changes in the operational environment have been known to negatively impact the stability of AI-enabled software systems by causing unintended changes in behavior. However, how an environment configuration impacts the behavior of such systems has yet to be explored. Understanding and quantifying the degree of insta…
▽ More
Nowadays, software systems tend to include Artificial Intelligence (AI) components. Changes in the operational environment have been known to negatively impact the stability of AI-enabled software systems by causing unintended changes in behavior. However, how an environment configuration impacts the behavior of such systems has yet to be explored. Understanding and quantifying the degree of instability caused by different environment settings can help practitioners decide the best environment configuration for the most stable AI systems. To achieve this goal, we performed experiments with eight different combinations of three key environment variables (operating system, Python version, and CPU architecture) on $30$ open-source AI-enabled systems using the Travis CI platform. We determine the existence and the degree of instability introduced by each configuration using three metrics: the output of an AI component of the system (model performance), the time required to build and run the system (processing time), and the cost associated with building and running the system (expense). Our results indicate that changes in environment configurations lead to instability across all three metrics; however, it is observed more frequently with respect to processing time and expense rather than model performance. For example, between Linux and MacOS, instability is observed in 23\%, 96.67\%, and 100\% of the studied projects in model performance, processing time, and expense, respectively. Our findings underscore the importance of identifying the optimal combination of configuration settings to mitigate drops in model performance and reduce the processing time and expense before deploying an AI-enabled system.
△ Less
Submitted 17 April, 2025; v1 submitted 5 August, 2024;
originally announced August 2024.
-
GraphAge: Unleashing the power of Graph Neural Network to Decode Epigenetic Aging
Authors:
Saleh Sakib Ahmed,
Nahian Shabab,
Md. Abul Hassan Samee,
M. Sohel Rahman
Abstract:
DNA methylation is a crucial epigenetic marker used in various clocks to predict epigenetic age. However, many existing clocks fail to account for crucial information about CpG sites and their interrelationships, such as co-methylation patterns. We present a novel approach to represent methylation data as a graph, using methylation values and relevant information about CpG sites as nodes, and rela…
▽ More
DNA methylation is a crucial epigenetic marker used in various clocks to predict epigenetic age. However, many existing clocks fail to account for crucial information about CpG sites and their interrelationships, such as co-methylation patterns. We present a novel approach to represent methylation data as a graph, using methylation values and relevant information about CpG sites as nodes, and relationships like co-methylation, same gene, and same chromosome as edges. We then use a Graph Neural Network (GNN) to predict age. Thus our model, GraphAge, leverages both structural and positional information for prediction as well as better interpretation. Although we had to train in a constrained compute setting, GraphAge still showed competitive performance with a Mean Absolute Error (MAE) of 3.207 and a Mean Squared Error (MSE) of 25.277, slightly outperforming the current state of the art. Perhaps more importantly, we utilized GNN explainer for interpretation purposes and were able to unearth interesting insights (e.g., key CpG sites, pathways, and their relationships through Methylation Regulated Networks in the context of aging), which were not possible to 'decode' without leveraging the unique capability of GraphAge to 'encode' various structural relationships. GraphAge has the potential to consume and utilize all relevant information (if available) about an individual that relates to the complex process of aging. So, in that sense, it is one of its kind and can be seen as the first benchmark for a multimodal model that can incorporate all this information in order to close the gap in our understanding of the true nature of aging.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Gemma 2: Improving Open Language Models at a Practical Size
Authors:
Gemma Team,
Morgane Riviere,
Shreya Pathak,
Pier Giuseppe Sessa,
Cassidy Hardin,
Surya Bhupatiraju,
Léonard Hussenot,
Thomas Mesnard,
Bobak Shahriari,
Alexandre Ramé,
Johan Ferret,
Peter Liu,
Pouya Tafti,
Abe Friesen,
Michelle Casbon,
Sabela Ramos,
Ravin Kumar,
Charline Le Lan,
Sammy Jerome,
Anton Tsitsulin,
Nino Vieillard,
Piotr Stanczyk,
Sertan Girgin,
Nikola Momchev,
Matt Hoffman
, et al. (173 additional authors not shown)
Abstract:
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al…
▽ More
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community.
△ Less
Submitted 2 October, 2024; v1 submitted 31 July, 2024;
originally announced August 2024.
-
Enhancing material property prediction with ensemble deep graph convolutional networks
Authors:
Chowdhury Mohammad Abid Rahman,
Ghadendra Bhandari,
Nasser M Nasrabadi,
Aldo H. Romero,
Prashnna K. Gyawali
Abstract:
Machine learning (ML) models have emerged as powerful tools for accelerating materials discovery and design by enabling accurate predictions of properties from compositional and structural data. These capabilities are vital for developing advanced technologies across fields such as energy, electronics, and biomedicine, potentially reducing the time and resources needed for new material exploration…
▽ More
Machine learning (ML) models have emerged as powerful tools for accelerating materials discovery and design by enabling accurate predictions of properties from compositional and structural data. These capabilities are vital for developing advanced technologies across fields such as energy, electronics, and biomedicine, potentially reducing the time and resources needed for new material exploration and promoting rapid innovation cycles. Recent efforts have focused on employing advanced ML algorithms, including deep learning - based graph neural network, for property prediction. Additionally, ensemble models have proven to enhance the generalizability and robustness of ML and DL. However, the use of such ensemble strategies in deep graph networks for material property prediction remains underexplored. Our research provides an in-depth evaluation of ensemble strategies in deep learning - based graph neural network, specifically targeting material property prediction tasks. By testing the Crystal Graph Convolutional Neural Network (CGCNN) and its multitask version, MT-CGCNN, we demonstrated that ensemble techniques, especially prediction averaging, substantially improve precision beyond traditional metrics for key properties like formation energy per atom ($ΔE^{f}$), band gap ($E_{g}$) and density ($ρ$) in 33,990 stable inorganic materials. These findings support the broader application of ensemble methods to enhance predictive accuracy in the field.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
DefTesPY: Cyber defense model with enhanced data modeling and analysis for Tesla company via Python Language
Authors:
Naresh Kshetri,
Irin Sultana,
Mir Mehedi Rahman,
Darshana Shah
Abstract:
Several types of cyber-attacks on automobiles and business firms keep on rising as we are preparing to counter cybercrimes with several new technologies and defense models. Cyber defense (also, counter intelligence) is a computer network defense mechanism that involves response to activities, critical infrastructure protection, and information assurance for corporations, government bodies, and oth…
▽ More
Several types of cyber-attacks on automobiles and business firms keep on rising as we are preparing to counter cybercrimes with several new technologies and defense models. Cyber defense (also, counter intelligence) is a computer network defense mechanism that involves response to activities, critical infrastructure protection, and information assurance for corporations, government bodies, and other conceivable networks. Cyber defense focuses on preventing, detecting, and responding to assaults or threats in a timely manner so that no infrastructure or information is compromised. With the increasing volume and complexity of cyber threats, most companies need cyber defense to protect sensitive information and assets. We can control attacker actions by utilizing firewalls at different levels, an intrusion detection system (IDS), with the intrusion prevention system (IPS) which can be installed independently or in combination with other protection approaches. Tesla is an American clean energy and automotive company in Austin, Texas, USA. The recent data breach at Tesla affected over 75,000 individuals as the company pinpoints two former employees as the offender revealing more than 23,000 internal files from 2015 to 2022. In this work, we will emphasize data modeling and data analysis using cyber defense model and python with a survey of the Tesla company. We have proposed a defense model, DefTesPY, with enhanced data modeling and data analysis based on the encountered cyber-attacks and cybercrimes for Tesla company till date.
△ Less
Submitted 19 July, 2024;
originally announced July 2024.
-
CellularLint: A Systematic Approach to Identify Inconsistent Behavior in Cellular Network Specifications
Authors:
Mirza Masfiqur Rahman,
Imtiaz Karim,
Elisa Bertino
Abstract:
In recent years, there has been a growing focus on scrutinizing the security of cellular networks, often attributing security vulnerabilities to issues in the underlying protocol design descriptions. These protocol design specifications, typically extensive documents that are thousands of pages long, can harbor inaccuracies, underspecifications, implicit assumptions, and internal inconsistencies.…
▽ More
In recent years, there has been a growing focus on scrutinizing the security of cellular networks, often attributing security vulnerabilities to issues in the underlying protocol design descriptions. These protocol design specifications, typically extensive documents that are thousands of pages long, can harbor inaccuracies, underspecifications, implicit assumptions, and internal inconsistencies. In light of the evolving landscape, we introduce CellularLint--a semi-automatic framework for inconsistency detection within the standards of 4G and 5G, capitalizing on a suite of natural language processing techniques. Our proposed method uses a revamped few-shot learning mechanism on domain-adapted large language models. Pre-trained on a vast corpus of cellular network protocols, this method enables CellularLint to simultaneously detect inconsistencies at various levels of semantics and practical use cases. In doing so, CellularLint significantly advances the automated analysis of protocol specifications in a scalable fashion. In our investigation, we focused on the Non-Access Stratum (NAS) and the security specifications of 4G and 5G networks, ultimately uncovering 157 inconsistencies with 82.67% accuracy. After verification of these inconsistencies on open-source implementations and 17 commercial devices, we confirm that they indeed have a substantial impact on design decisions, potentially leading to concerns related to privacy, integrity, availability, and interoperability.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
A Comprehensive Review of Recommender Systems: Transitioning from Theory to Practice
Authors:
Shaina Raza,
Mizanur Rahman,
Safiullah Kamawal,
Armin Toroghi,
Ananya Raval,
Farshad Navah,
Amirmohammad Kazemeini
Abstract:
Recommender Systems (RS) play an integral role in enhancing user experiences by providing personalized item suggestions. This survey reviews the progress in RS inclusively from 2017 to 2024, effectively connecting theoretical advances with practical applications. We explore the development from traditional RS techniques like content-based and collaborative filtering to advanced methods involving d…
▽ More
Recommender Systems (RS) play an integral role in enhancing user experiences by providing personalized item suggestions. This survey reviews the progress in RS inclusively from 2017 to 2024, effectively connecting theoretical advances with practical applications. We explore the development from traditional RS techniques like content-based and collaborative filtering to advanced methods involving deep learning, graph-based models, reinforcement learning, and large language models. We also discuss specialized systems such as context-aware, review-based, and fairness-aware RS. The primary goal of this survey is to bridge theory with practice. It addresses challenges across various sectors, including e-commerce, healthcare, and finance, emphasizing the need for scalable, real-time, and trustworthy solutions. Through this survey, we promote stronger partnerships between academic research and industry practices. The insights offered by this survey aim to guide industry professionals in optimizing RS deployment and to inspire future research directions, especially in addressing emerging technological and societal trends\footnote. The survey resources are available in the public GitHub repository https://github.com/VectorInstitute/Recommender-Systems-Survey. (Recommender systems, large language models, chatgpt, responsible AI)
△ Less
Submitted 22 February, 2025; v1 submitted 18 July, 2024;
originally announced July 2024.
-
Enhancing Depressive Post Detection in Bangla: A Comparative Study of TF-IDF, BERT and FastText Embeddings
Authors:
Saad Ahmed Sazan,
Mahdi H. Miraz,
A B M Muntasir Rahman
Abstract:
Due to massive adoption of social media, detection of users' depression through social media analytics bears significant importance, particularly for underrepresented languages, such as Bangla. This study introduces a well-grounded approach to identify depressive social media posts in Bangla, by employing advanced natural language processing techniques. The dataset used in this work, annotated by…
▽ More
Due to massive adoption of social media, detection of users' depression through social media analytics bears significant importance, particularly for underrepresented languages, such as Bangla. This study introduces a well-grounded approach to identify depressive social media posts in Bangla, by employing advanced natural language processing techniques. The dataset used in this work, annotated by domain experts, includes both depressive and non-depressive posts, ensuring high-quality data for model training and evaluation. To address the prevalent issue of class imbalance, we utilised random oversampling for the minority class, thereby enhancing the model's ability to accurately detect depressive posts. We explored various numerical representation techniques, including Term Frequency-Inverse Document Frequency (TF-IDF), Bidirectional Encoder Representations from Transformers (BERT) embedding and FastText embedding, by integrating them with a deep learning-based Convolutional Neural Network-Bidirectional Long Short-Term Memory (CNN-BiLSTM) model. The results obtained through extensive experimentation, indicate that the BERT approach performed better the others, achieving a F1-score of 84%. This indicates that BERT, in combination with the CNN-BiLSTM architecture, effectively recognises the nuances of Bangla texts relevant to depressive contents. Comparative analysis with the existing state-of-the-art methods demonstrates that our approach with BERT embedding performs better than others in terms of evaluation metrics and the reliability of dataset annotations. Our research significantly contribution to the development of reliable tools for detecting depressive posts in the Bangla language. By highlighting the efficacy of different embedding techniques and deep learning models, this study paves the way for improved mental health monitoring through social media platforms.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Missile detection and destruction robot using detection algorithm
Authors:
Md Kamrul Siam,
Shafayet Ahmed,
Md Habibur Rahman,
Amir Hossain Mollah
Abstract:
This research is based on the present missile detection technologies in the world and the analysis of these technologies to find a cost effective solution to implement the system in Bangladesh. The paper will give an idea of the missile detection technologies using the electro-optical sensor and the pulse doppler radar. The system is made to detect the target missile. Automatic detection and destr…
▽ More
This research is based on the present missile detection technologies in the world and the analysis of these technologies to find a cost effective solution to implement the system in Bangladesh. The paper will give an idea of the missile detection technologies using the electro-optical sensor and the pulse doppler radar. The system is made to detect the target missile. Automatic detection and destruction with the help of ultrasonic sonar, a metal detector sensor, and a smoke detector sensor. The system is mainly based on an ultrasonic sonar sensor. It has a transducer, a transmitter, and a receiver. Transducer is connected with the connected with controller. When it detects an object by following the algorithm, it finds its distance and angle. It can also assure whether the system can destroy the object or not by using another algorithm's simulation.
△ Less
Submitted 11 July, 2024; v1 submitted 10 July, 2024;
originally announced July 2024.
-
CAV-AD: A Robust Framework for Detection of Anomalous Data and Malicious Sensors in CAV Networks
Authors:
Md Sazedur Rahman,
Mohamed Elmahallawy,
Sanjay Madria,
Samuel Frimpong
Abstract:
The adoption of connected and automated vehicles (CAVs) has sparked considerable interest across diverse industries, including public transportation, underground mining, and agriculture sectors. However, CAVs' reliance on sensor readings makes them vulnerable to significant threats. Manipulating these readings can compromise CAV network security, posing serious risks for malicious activities. Alth…
▽ More
The adoption of connected and automated vehicles (CAVs) has sparked considerable interest across diverse industries, including public transportation, underground mining, and agriculture sectors. However, CAVs' reliance on sensor readings makes them vulnerable to significant threats. Manipulating these readings can compromise CAV network security, posing serious risks for malicious activities. Although several anomaly detection (AD) approaches for CAV networks are proposed, they often fail to: i) detect multiple anomalies in specific sensor(s) with high accuracy or F1 score, and ii) identify the specific sensor being attacked. In response, this paper proposes a novel framework tailored to CAV networks, called CAV-AD, for distinguishing abnormal readings amidst multiple anomaly data while identifying malicious sensors. Specifically, CAV-AD comprises two main components: i) A novel CNN model architecture called optimized omni-scale CNN (O-OS-CNN), which optimally selects the time scale by generating all possible kernel sizes for input time series data; ii) An amplification block to increase the values of anomaly readings, enhancing sensitivity for detecting anomalies. Not only that, but CAV-AD integrates the proposed O-OS-CNN with a Kalman filter to instantly identify the malicious sensors. We extensively train CAV-AD using real-world datasets containing both instant and constant attacks, evaluating its performance in detecting intrusions from multiple anomalies, which presents a more challenging scenario. Our results demonstrate that CAV-AD outperforms state-of-the-art methods, achieving an average accuracy of 98% and an average F1 score of 89\%, while accurately identifying the malicious sensors.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Redefining POI Popularity: Integrating User Preferences and Recency for Enhanced Recommendations
Authors:
Alif Al Hasan,
Md. Musfique Anwar,
M. Arifur Rahman
Abstract:
The task of point-of-interest (POI) recommendation is to predict users' immediate future movements based on their previous records and present circumstances. Popularity is considered as one of the primary deciding factors for selecting the next place to visit. Existing approaches mainly focused on the number of check-ins to model the popularity of a POI. However, not enough attention is paid to th…
▽ More
The task of point-of-interest (POI) recommendation is to predict users' immediate future movements based on their previous records and present circumstances. Popularity is considered as one of the primary deciding factors for selecting the next place to visit. Existing approaches mainly focused on the number of check-ins to model the popularity of a POI. However, not enough attention is paid to the temporal impact or number of people check-ins for a particular POI. Thus, to prioritize more on recent check-ins, we propose recency-oriented definition of POI's popularity by considering the temporal effect of the POIs, the number of check-ins, as well as the number of users who registered in those check-ins. Our experimental results on real dataset show the efficacy of the proposed approach.
△ Less
Submitted 21 January, 2025; v1 submitted 7 July, 2024;
originally announced July 2024.
-
Code Hallucination
Authors:
Mirza Masfiqur Rahman,
Ashish Kundu
Abstract:
Generative models such as large language models are extensively used as code copilots and for whole program generation. However, the programs they generate often have questionable correctness, authenticity and reliability in terms of integration as they might not follow the user requirements, provide incorrect and/or nonsensical outputs, or even contain semantic/syntactic errors - overall known as…
▽ More
Generative models such as large language models are extensively used as code copilots and for whole program generation. However, the programs they generate often have questionable correctness, authenticity and reliability in terms of integration as they might not follow the user requirements, provide incorrect and/or nonsensical outputs, or even contain semantic/syntactic errors - overall known as LLM hallucination. In this work, we present several types of code hallucination. We have generated such hallucinated code manually using large language models. We also present a technique - HallTrigger, in order to demonstrate efficient ways of generating arbitrary code hallucination. Our method leverages 3 different dynamic attributes of LLMs to craft prompts that can successfully trigger hallucinations from models without the need to access model architecture or parameters. Results from popular blackbox models suggest that HallTrigger is indeed effective and the pervasive LLM hallucination have sheer impact on software development.
△ Less
Submitted 7 August, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Authors:
Md Tahmid Rahman Laskar,
Sawsan Alqahtani,
M Saiful Bari,
Mizanur Rahman,
Mohammad Abdullah Matin Khan,
Haidar Khan,
Israt Jahan,
Amran Bhuiyan,
Chee Wei Tan,
Md Rizwan Parvez,
Enamul Hoque,
Shafiq Joty,
Jimmy Huang
Abstract:
Large Language Models (LLMs) have recently gained significant attention due to their remarkable capabilities in performing diverse tasks across various domains. However, a thorough evaluation of these models is crucial before deploying them in real-world applications to ensure they produce reliable performance. Despite the well-established importance of evaluating LLMs in the community, the comple…
▽ More
Large Language Models (LLMs) have recently gained significant attention due to their remarkable capabilities in performing diverse tasks across various domains. However, a thorough evaluation of these models is crucial before deploying them in real-world applications to ensure they produce reliable performance. Despite the well-established importance of evaluating LLMs in the community, the complexity of the evaluation process has led to varied evaluation setups, causing inconsistencies in findings and interpretations. To address this, we systematically review the primary challenges and limitations causing these inconsistencies and unreliable evaluations in various steps of LLM evaluation. Based on our critical review, we present our perspectives and recommendations to ensure LLM evaluations are reproducible, reliable, and robust.
△ Less
Submitted 3 October, 2024; v1 submitted 4 July, 2024;
originally announced July 2024.
-
Celeb-FBI: A Benchmark Dataset on Human Full Body Images and Age, Gender, Height and Weight Estimation using Deep Learning Approach
Authors:
Pronay Debnath,
Usafa Akther Rifa,
Busra Kamal Rafa,
Ali Haider Talukder Akib,
Md. Aminur Rahman
Abstract:
The scarcity of comprehensive datasets in surveillance, identification, image retrieval systems, and healthcare poses a significant challenge for researchers in exploring new methodologies and advancing knowledge in these respective fields. Furthermore, the need for full-body image datasets with detailed attributes like height, weight, age, and gender is particularly significant in areas such as f…
▽ More
The scarcity of comprehensive datasets in surveillance, identification, image retrieval systems, and healthcare poses a significant challenge for researchers in exploring new methodologies and advancing knowledge in these respective fields. Furthermore, the need for full-body image datasets with detailed attributes like height, weight, age, and gender is particularly significant in areas such as fashion industry analytics, ergonomic design assessment, virtual reality avatar creation, and sports performance analysis. To address this gap, we have created the 'Celeb-FBI' dataset which contains 7,211 full-body images of individuals accompanied by detailed information on their height, age, weight, and gender. Following the dataset creation, we proceed with the preprocessing stages, including image cleaning, scaling, and the application of Synthetic Minority Oversampling Technique (SMOTE). Subsequently, utilizing this prepared dataset, we employed three deep learning approaches: Convolutional Neural Network (CNN), 50-layer ResNet, and 16-layer VGG, which are used for estimating height, weight, age, and gender from human full-body images. From the results obtained, ResNet-50 performed best for the system with an accuracy rate of 79.18% for age, 95.43% for gender, 85.60% for height and 81.91% for weight.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Potential Renovation of Information Search Process with the Power of Large Language Model for Healthcare
Authors:
Forhan Bin Emdad,
Mohammad Ishtiaque Rahman
Abstract:
This paper explores the development of the Six Stages of Information Search Model and its enhancement through the application of the Large Language Model (LLM) powered Information Search Processes (ISP) in healthcare. The Six Stages Model, a foundational framework in information science, outlines the sequential phases individuals undergo during information seeking: initiation, selection, explorati…
▽ More
This paper explores the development of the Six Stages of Information Search Model and its enhancement through the application of the Large Language Model (LLM) powered Information Search Processes (ISP) in healthcare. The Six Stages Model, a foundational framework in information science, outlines the sequential phases individuals undergo during information seeking: initiation, selection, exploration, formulation, collection, and presentation. Integrating LLM technology into this model significantly optimizes each stage, particularly in healthcare. LLMs enhance query interpretation, streamline information retrieval from complex medical databases, and provide contextually relevant responses, thereby improving the efficiency and accuracy of medical information searches. This fusion not only aids healthcare professionals in accessing critical data swiftly but also empowers patients with reliable and personalized health information, fostering a more informed and effective healthcare environment.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
A Hardware/Firmware-Based Switching Gate Multiplexing Method for Pulse Mode Radiation Detectors
Authors:
Md Faisal Rahman,
John Mattingly
Abstract:
We present a hardware/firmware-based switching gate multiplexing method for pulse mode radiation detectors that can combine many detector signals into two readout channels. One readout channel passes the signal of the multiplexed detector that "fired" first, and the other channel provides a variable-width logic pulse, i.e., a pulse width modulation (PWM) signal, that identifies the active detector…
▽ More
We present a hardware/firmware-based switching gate multiplexing method for pulse mode radiation detectors that can combine many detector signals into two readout channels. One readout channel passes the signal of the multiplexed detector that "fired" first, and the other channel provides a variable-width logic pulse, i.e., a pulse width modulation (PWM) signal, that identifies the active detector. The multiplexed output pulse is produced by passing the first active detector's signal to a fan-in circuit by gating on the corresponding channel for a fixed duration while blocking all other detector signals. It does this using individual analog switches for all the detector signals. Each switch is controlled by a fixed width logic pulse that is triggered by the arrival of the first active detector pulse. Both the fixed width logic pulse and the PWM signal are generated using a field-programmable gate array (FPGA). To demonstrate the proposed multiplexing method, a prototype four-channel multiplexer was developed for use with four NaI(Tl) detectors. The performance of the multiplexer was evaluated in terms of its ability to retain energy resolution, timing resolution, and original pulse shape. The proposed multiplexing method showed very little degradation in energy resolution and timing resolution or alteration of pulse shape. The switching gate feature of the proposed method enables the multiplexer output to have very low noise contribution from the inactive channels. This multiplexing technique also has the unique capability of isolating and recovering the first active detector's output pulse in cases where there is overlap between pulses from different detectors in a single digitized record. These features make the proposed hardware/firmware-based switching gate multiplexing method very promising for application to large radiation detector networks.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception
Authors:
M. Mahbubur Rahman,
Ryoma Yataka,
Sorachi Kato,
Pu Perry Wang,
Peizhao Li,
Adriano Cardace,
Petros Boufounos
Abstract:
Compared with an extensive list of automotive radar datasets that support autonomous driving, indoor radar datasets are scarce at a smaller scale in the format of low-resolution radar point clouds and usually under an open-space single-room setting. In this paper, we scale up indoor radar data collection using multi-view high-resolution radar heatmap in a multi-day, multi-room, and multi-subject s…
▽ More
Compared with an extensive list of automotive radar datasets that support autonomous driving, indoor radar datasets are scarce at a smaller scale in the format of low-resolution radar point clouds and usually under an open-space single-room setting. In this paper, we scale up indoor radar data collection using multi-view high-resolution radar heatmap in a multi-day, multi-room, and multi-subject setting, with an emphasis on the diversity of environment and subjects. Referred to as the millimeter-wave multi-view radar (MMVR) dataset, it consists of $345$K multi-view radar frames collected from $25$ human subjects over $6$ different rooms, $446$K annotated bounding boxes/segmentation instances, and $7.59$ million annotated keypoints to support three major perception tasks of object detection, pose estimation, and instance segmentation, respectively. For each task, we report performance benchmarks under two protocols: a single subject in an open space and multiple subjects in several cluttered rooms with two data splits: random split and cross-environment split over $395$ 1-min data segments. We anticipate that MMVR facilitates indoor radar perception development for indoor vehicle (robot/humanoid) navigation, building energy management, and elderly care for better efficiency, user experience, and safety. The MMVR dataset is available at https://doi.org/10.5281/zenodo.12611978.
△ Less
Submitted 17 July, 2024; v1 submitted 15 June, 2024;
originally announced June 2024.