-
A Framework for Generating Conversational Recommendation Datasets from Behavioral Interactions
Authors:
Vinaik Chhetri,
Yousaf Reza,
Moghis Fereidouni,
Srijata Maji,
Umar Farooq,
AB Siddique
Abstract:
Modern recommendation systems typically follow two complementary paradigms: collaborative filtering, which models long-term user preferences from historical interactions, and conversational recommendation systems (CRS), which interact with users in natural language to uncover immediate needs. Each captures a different dimension of user intent. While CRS models lack collaborative signals, leading t…
▽ More
Modern recommendation systems typically follow two complementary paradigms: collaborative filtering, which models long-term user preferences from historical interactions, and conversational recommendation systems (CRS), which interact with users in natural language to uncover immediate needs. Each captures a different dimension of user intent. While CRS models lack collaborative signals, leading to generic or poorly personalized suggestions, traditional recommenders lack mechanisms to interactively elicit immediate needs. Unifying these paradigms promises richer personalization but remains challenging due to the lack of large-scale conversational datasets grounded in real user behavior. We present ConvRecStudio, a framework that uses large language models (LLMs) to simulate realistic, multi-turn dialogs grounded in timestamped user-item interactions and reviews. ConvRecStudio follows a three-stage pipeline: (1) Temporal Profiling, which constructs user profiles and community-level item sentiment trajectories over fine-grained aspects; (2) Semantic Dialog Planning, which generates a structured plan using a DAG of flexible super-nodes; and (3) Multi-Turn Simulation, which instantiates the plan using paired LLM agents for the user and system, constrained by executional and behavioral fidelity checks. We apply ConvRecStudio to three domains -- MobileRec, Yelp, and Amazon Electronics -- producing over 12K multi-turn dialogs per dataset. Human and automatic evaluations confirm the naturalness, coherence, and behavioral grounding of the generated conversations. To demonstrate utility, we build a cross-attention transformer model that jointly encodes user history and dialog context, achieving gains in Hit@K and NDCG@K over baselines using either signal alone or naive fusion. Notably, our model achieves a 10.9% improvement in Hit@1 on Yelp over the strongest baseline.
△ Less
Submitted 14 June, 2025;
originally announced June 2025.
-
INTERPOS: Interaction Rhythm Guided Positional Morphing for Mobile App Recommender Systems
Authors:
M. H. Maqbool,
Moghis Fereidouni,
Umar Farooq,
A. B. Siddique,
Hassan Foroosh
Abstract:
The mobile app market has expanded exponentially, offering millions of apps with diverse functionalities, yet research in mobile app recommendation remains limited. Traditional sequential recommender systems utilize the order of items in users' historical interactions to predict the next item for the users. Position embeddings, well-established in transformer-based architectures for natural langua…
▽ More
The mobile app market has expanded exponentially, offering millions of apps with diverse functionalities, yet research in mobile app recommendation remains limited. Traditional sequential recommender systems utilize the order of items in users' historical interactions to predict the next item for the users. Position embeddings, well-established in transformer-based architectures for natural language processing tasks, effectively distinguish token positions in sequences. In sequential recommendation systems, position embeddings can capture the order of items in a user's historical interaction sequence. Nevertheless, this ordering does not consider the time elapsed between two interactions of the same user (e.g., 1 day, 1 week, 1 month), referred to as "user rhythm". In mobile app recommendation datasets, the time between consecutive user interactions is notably longer compared to other domains like movies, posing significant challenges for sequential recommender systems. To address this phenomenon in the mobile app domain, we introduce INTERPOS, an Interaction Rhythm Guided Positional Morphing strategy for autoregressive mobile app recommender systems. INTERPOS incorporates rhythm-guided position embeddings, providing a more comprehensive representation that considers both the sequential order of interactions and the temporal gaps between them. This approach enables a deep understanding of users' rhythms at a fine-grained level, capturing the intricacies of their interaction patterns over time. We propose three strategies to incorporate the morphed positional embeddings in two transformer-based sequential recommendation system architectures. Our extensive evaluations show that INTERPOS outperforms state-of-the-art models using 7 mobile app recommendation datasets on NDCG@K and HIT@K metrics. The source code of INTERPOS is available at https://github.com/dlgrad/INTERPOS.
△ Less
Submitted 14 June, 2025;
originally announced June 2025.
-
What Users Value and Critique: Large-Scale Analysis of User Feedback on AI-Powered Mobile Apps
Authors:
Vinaik Chhetri,
Krishna Upadhyay,
A. B. Siddique,
Umar Farooq
Abstract:
Artificial Intelligence (AI)-powered features have rapidly proliferated across mobile apps in various domains, including productivity, education, entertainment, and creativity. However, how users perceive, evaluate, and critique these AI features remains largely unexplored, primarily due to the overwhelming volume of user feedback. In this work, we present the first comprehensive, large-scale stud…
▽ More
Artificial Intelligence (AI)-powered features have rapidly proliferated across mobile apps in various domains, including productivity, education, entertainment, and creativity. However, how users perceive, evaluate, and critique these AI features remains largely unexplored, primarily due to the overwhelming volume of user feedback. In this work, we present the first comprehensive, large-scale study of user feedback on AI-powered mobile apps, leveraging a curated dataset of 292 AI-driven apps across 14 categories with 894K AI-specific reviews from Google Play. We develop and validate a multi-stage analysis pipeline that begins with a human-labeled benchmark and systematically evaluates large language models (LLMs) and prompting strategies. Each stage, including review classification, aspect-sentiment extraction, and clustering, is validated for accuracy and consistency. Our pipeline enables scalable, high-precision analysis of user feedback, extracting over one million aspect-sentiment pairs clustered into 18 positive and 15 negative user topics. Our analysis reveals that users consistently focus on a narrow set of themes: positive comments emphasize productivity, reliability, and personalized assistance, while negative feedback highlights technical failures (e.g., scanning and recognition), pricing concerns, and limitations in language support. Our pipeline surfaces both satisfaction with one feature and frustration with another within the same review. These fine-grained, co-occurring sentiments are often missed by traditional approaches that treat positive and negative feedback in isolation or rely on coarse-grained analysis. To this end, our approach provides a more faithful reflection of the real-world user experiences with AI-powered apps. Category-aware analysis further uncovers both universal drivers of satisfaction and domain-specific frustrations.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
Assessing and Enhancing Quantum Readiness in Mobile Apps
Authors:
Joseph Strauss,
Krishna Upadhyay,
A. B. Siddique,
Ibrahim Baggili,
Umar Farooq
Abstract:
Quantum computers threaten widely deployed cryptographic primitives such as RSA, DSA, and ECC. While NIST has released post-quantum cryptographic (PQC) standards (e.g., Kyber, Dilithium), mobile app ecosystems remain largely unprepared for this transition. We present a large-scale binary analysis of over 4,000 Android apps to assess cryptographic readiness. Our results show widespread reliance on…
▽ More
Quantum computers threaten widely deployed cryptographic primitives such as RSA, DSA, and ECC. While NIST has released post-quantum cryptographic (PQC) standards (e.g., Kyber, Dilithium), mobile app ecosystems remain largely unprepared for this transition. We present a large-scale binary analysis of over 4,000 Android apps to assess cryptographic readiness. Our results show widespread reliance on quantum-vulnerable algorithms such as MD5, SHA-1, and RSA, while PQC adoption remains absent in production apps. To bridge the readiness gap, we explore LLM-assisted migration. We evaluate leading LLMs (GPT-4o, Gemini Flash, Claude Sonnet, etc.) for automated cryptographic migration. All models successfully performed simple hash replacements (e.g., SHA-1 to SHA-256). However, none produced correct PQC upgrades due to multi-file changes, missing imports, and lack of context awareness. These results underscore the need for structured guidance and system-aware tooling for post-quantum migration
△ Less
Submitted 31 May, 2025;
originally announced June 2025.
-
OptiGait-LGBM: An Efficient Approach of Gait-based Person Re-identification in Non-Overlapping Regions
Authors:
Md. Sakib Hassan Chowdhury,
Md. Hafiz Ahamed,
Bishowjit Paul,
Sarafat Hussain Abhi,
Abu Bakar Siddique,
Md. Robius Sany
Abstract:
Gait recognition, known for its ability to identify individuals from a distance, has gained significant attention in recent times due to its non-intrusive verification. While video-based gait identification systems perform well on large public datasets, their performance drops when applied to real-world, unconstrained gait data due to various factors. Among these, uncontrolled outdoor environments…
▽ More
Gait recognition, known for its ability to identify individuals from a distance, has gained significant attention in recent times due to its non-intrusive verification. While video-based gait identification systems perform well on large public datasets, their performance drops when applied to real-world, unconstrained gait data due to various factors. Among these, uncontrolled outdoor environments, non-overlapping camera views, varying illumination, and computational efficiency are core challenges in gait-based authentication. Currently, no dataset addresses all these challenges simultaneously. In this paper, we propose an OptiGait-LGBM model capable of recognizing person re-identification under these constraints using a skeletal model approach, which helps mitigate inconsistencies in a person's appearance. The model constructs a dataset from landmark positions, minimizing memory usage by using non-sequential data. A benchmark dataset, RUET-GAIT, is introduced to represent uncontrolled gait sequences in complex outdoor environments. The process involves extracting skeletal joint landmarks, generating numerical datasets, and developing an OptiGait-LGBM gait classification model. Our aim is to address the aforementioned challenges with minimal computational cost compared to existing methods. A comparative analysis with ensemble techniques such as Random Forest and CatBoost demonstrates that the proposed approach outperforms them in terms of accuracy, memory usage, and training time. This method provides a novel, low-cost, and memory-efficient video-based gait recognition solution for real-world scenarios.
△ Less
Submitted 10 May, 2025;
originally announced May 2025.
-
ApproXAI: Energy-Efficient Hardware Acceleration of Explainable AI using Approximate Computing
Authors:
Ayesha Siddique,
Khurram Khalil,
Khaza Anuarul Hoque
Abstract:
Explainable artificial intelligence (XAI) enhances AI system transparency by framing interpretability as an optimization problem. However, this approach often necessitates numerous iterations of computationally intensive operations, limiting its applicability in real-time scenarios. While recent research has focused on XAI hardware acceleration on FPGAs and TPU, these methods do not fully address…
▽ More
Explainable artificial intelligence (XAI) enhances AI system transparency by framing interpretability as an optimization problem. However, this approach often necessitates numerous iterations of computationally intensive operations, limiting its applicability in real-time scenarios. While recent research has focused on XAI hardware acceleration on FPGAs and TPU, these methods do not fully address energy efficiency in real-time settings. To address this limitation, we propose XAIedge, a novel framework that leverages approximate computing techniques into XAI algorithms, including integrated gradients, model distillation, and Shapley analysis. XAIedge translates these algorithms into approximate matrix computations and exploits the synergy between convolution, Fourier transform, and approximate computing paradigms. This approach enables efficient hardware acceleration on TPU-based edge devices, facilitating faster real-time outcome interpretations. Our comprehensive evaluation demonstrates that XAIedge achieves a $2\times$ improvement in energy efficiency compared to existing accurate XAI hardware acceleration techniques while maintaining comparable accuracy. These results highlight the potential of XAIedge to significantly advance the deployment of explainable AI in energy-constrained real-time applications.
△ Less
Submitted 12 May, 2025; v1 submitted 24 April, 2025;
originally announced April 2025.
-
Explainable AI-Guided Efficient Approximate DNN Generation for Multi-Pod Systolic Arrays
Authors:
Ayesha Siddique,
Khurram Khalil,
Khaza Anuarul Hoque
Abstract:
Approximate deep neural networks (AxDNNs) are promising for enhancing energy efficiency in real-world devices. One of the key contributors behind this enhanced energy efficiency in AxDNNs is the use of approximate multipliers. Unfortunately, the simulation of approximate multipliers does not usually scale well on CPUs and GPUs. As a consequence, this slows down the overall simulation of AxDNNs aim…
▽ More
Approximate deep neural networks (AxDNNs) are promising for enhancing energy efficiency in real-world devices. One of the key contributors behind this enhanced energy efficiency in AxDNNs is the use of approximate multipliers. Unfortunately, the simulation of approximate multipliers does not usually scale well on CPUs and GPUs. As a consequence, this slows down the overall simulation of AxDNNs aimed at identifying the appropriate approximate multipliers to achieve high energy efficiency with a minimum accuracy loss. To address this problem, we present a novel XAI-Gen methodology, which leverages the analytical model of the emerging hardware accelerator (e.g., Google TPU v4) and explainable artificial intelligence (XAI) to precisely identify the non-critical layers for approximation and quickly discover the appropriate approximate multipliers for AxDNN layers. Our results show that XAI-Gen achieves up to 7x lower energy consumption with only 1-2% accuracy loss. We also showcase the effectiveness of the XAI-Gen approach through a neural architecture search (XAI-NAS) case study. Interestingly, XAI-NAS achieves 40\% higher energy efficiency with up to 5x less execution time when compared to the state-of-the-art NAS methods for generating AxDNNs.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
UStyle: Waterbody Style Transfer of Underwater Scenes by Depth-Guided Feature Synthesis
Authors:
Md Abu Bakr Siddique,
Vaishnav Ramesh,
Junliang Liu,
Piyush Singh,
Md Jahidul Islam
Abstract:
The concept of waterbody style transfer remains largely unexplored in the underwater imaging and vision literature. Traditional image style transfer (STx) methods primarily focus on artistic and photorealistic blending, often failing to preserve object and scene geometry in images captured in high-scattering mediums such as underwater. The wavelength-dependent nonlinear attenuation and depth-depen…
▽ More
The concept of waterbody style transfer remains largely unexplored in the underwater imaging and vision literature. Traditional image style transfer (STx) methods primarily focus on artistic and photorealistic blending, often failing to preserve object and scene geometry in images captured in high-scattering mediums such as underwater. The wavelength-dependent nonlinear attenuation and depth-dependent backscattering artifacts further complicate learning underwater image STx from unpaired data. This paper introduces UStyle, the first data-driven learning framework for transferring waterbody styles across underwater images without requiring prior reference images or scene information. We propose a novel depth-aware whitening and coloring transform (DA-WCT) mechanism that integrates physics-based waterbody synthesis to ensure perceptually consistent stylization while preserving scene structure. To enhance style transfer quality, we incorporate carefully designed loss functions that guide UStyle to maintain colorfulness, lightness, structural integrity, and frequency-domain characteristics, as well as high-level content in VGG and CLIP (contrastive language-image pretraining) feature spaces. By addressing domain-specific challenges, UStyle provides a robust framework for no-reference underwater image STx, surpassing state-of-the-art (SOTA) methods that rely solely on end-to-end reconstruction loss. Furthermore, we introduce the UF7D dataset, a curated collection of high-resolution underwater images spanning seven distinct waterbody styles, establishing a benchmark to support future research in underwater image STx. The UStyle inference pipeline and UF7D dataset are released at: https://github.com/uf-robopi/UStyle.
△ Less
Submitted 7 June, 2025; v1 submitted 14 March, 2025;
originally announced March 2025.
-
Evaluating and Enhancing Out-of-Domain Generalization of Task-Oriented Dialog Systems for Task Completion without Turn-level Dialog Annotations
Authors:
Adib Mosharrof,
Moghis Fereidouni,
A. B. Siddique
Abstract:
Traditional task-oriented dialog (ToD) systems rely heavily on labor-intensive turn-level annotations, such as dialogue states and policy labels, for training. This work explores whether large language models (LLMs) can be fine-tuned solely on natural language dialogs to perform ToD tasks, without requiring such annotations. We evaluate their ability to generalize to unseen domains and compare the…
▽ More
Traditional task-oriented dialog (ToD) systems rely heavily on labor-intensive turn-level annotations, such as dialogue states and policy labels, for training. This work explores whether large language models (LLMs) can be fine-tuned solely on natural language dialogs to perform ToD tasks, without requiring such annotations. We evaluate their ability to generalize to unseen domains and compare their performance with models trained on fully annotated data. Through extensive experiments with three open-source LLMs of varying sizes and two diverse ToD datasets, we find that models fine-tuned without turn-level annotations generate coherent and contextually appropriate responses. However, their task completion performance - measured by accurate execution of API calls - remains suboptimal, with the best models achieving only around 53% success in unseen domains. To improve task completion, we propose ZeroToD, a framework that incorporates a schema augmentation mechanism to enhance API call accuracy and overall task completion rates, particularly in out-of-domain settings. We also compare ZeroToD with fine-tuning-free alternatives, such as prompting off-the-shelf LLMs, and find that our framework enables smaller, fine-tuned models that outperform large-scale proprietary LLMs in task completion. Additionally, a human study evaluating informativeness, fluency, and task completion confirms our empirical findings. These findings suggest the feasibility of developing cost-effective, scalable, and zero-shot generalizable ToD systems for real-world applications.
△ Less
Submitted 18 February, 2025;
originally announced February 2025.
-
Improving Multi-turn Task Completion in Task-Oriented Dialog Systems via Prompt Chaining and Fine-Grained Feedback
Authors:
Moghis Fereidouni,
Md Sajid Ahmed,
Adib Mosharrof,
A. B. Siddique
Abstract:
Task-oriented dialog (TOD) systems facilitate users in accomplishing complex, multi-turn tasks through natural language. While traditional approaches rely on extensive fine-tuning and annotated data for each domain, instruction-tuned large language models (LLMs) offer a more flexible alternative. However, LLMs struggle to reliably handle multi-turn task completion, particularly with accurately gen…
▽ More
Task-oriented dialog (TOD) systems facilitate users in accomplishing complex, multi-turn tasks through natural language. While traditional approaches rely on extensive fine-tuning and annotated data for each domain, instruction-tuned large language models (LLMs) offer a more flexible alternative. However, LLMs struggle to reliably handle multi-turn task completion, particularly with accurately generating API calls and adapting to new domains without explicit demonstrations. To address these challenges, we propose RealTOD, a novel framework that enhances TOD systems through prompt chaining and fine-grained feedback mechanisms. Prompt chaining enables zero-shot domain adaptation via a two-stage prompting strategy, eliminating the need for human-curated demonstrations. Meanwhile, the fine-grained feedback mechanism improves task completion by verifying API calls against domain schemas and providing precise corrective feedback when errors are detected. We conduct extensive experiments on the SGD and BiTOD benchmarks using four LLMs. RealTOD improves API accuracy, surpassing AutoTOD by 37.74% on SGD and SimpleTOD by 11.26% on BiTOD. Human evaluations further confirm that LLMs integrated with RealTOD achieve superior task completion, fluency, and informativeness compared to existing methods.
△ Less
Submitted 18 February, 2025;
originally announced February 2025.
-
Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution
Authors:
Muhammad Umair Haider,
Hammad Rizwan,
Hassan Sajjad,
Peizhong Ju,
A. B. Siddique
Abstract:
Interpreting the internal mechanisms of large language models (LLMs) is crucial for improving their trustworthiness and utility. Prior work has primarily focused on mapping individual neurons to discrete semantic concepts. However, such mappings struggle to handle the inherent polysemanticity in LLMs, where individual neurons encode multiple, distinct concepts. Through a comprehensive analysis of…
▽ More
Interpreting the internal mechanisms of large language models (LLMs) is crucial for improving their trustworthiness and utility. Prior work has primarily focused on mapping individual neurons to discrete semantic concepts. However, such mappings struggle to handle the inherent polysemanticity in LLMs, where individual neurons encode multiple, distinct concepts. Through a comprehensive analysis of both encoder and decoder-based LLMs across diverse datasets, we observe that even highly salient neurons, identified via various attribution techniques for specific semantic concepts, consistently exhibit polysemantic behavior. Importantly, activation magnitudes for fine-grained concepts follow distinct, often Gaussian-like distributions with minimal overlap. This observation motivates a shift from neuron attribution to range-based interpretation. We hypothesize that interpreting and manipulating neuron activation ranges would enable more precise interpretability and targeted interventions in LLMs. To validate our hypothesis, we introduce NeuronLens, a novel range-based interpretation and manipulation framework that provides a finer view of neuron activation distributions to localize concept attribution within a neuron. Extensive empirical evaluations demonstrate that NeuronLens significantly reduces unintended interference, while maintaining precise manipulation of targeted concepts, outperforming neuron attribution.
△ Less
Submitted 20 May, 2025; v1 submitted 3 February, 2025;
originally announced February 2025.
-
Analyzing the Evolution and Maintenance of Quantum Software Repositories
Authors:
Krishna Upadhyay,
Vinaik Chhetri,
A. B. Siddique,
Umar Farooq
Abstract:
Quantum computing is rapidly advancing, but quantum software development faces significant challenges, including a steep learning curve, high hardware error rates, and a lack of mature engineering practices. This study conducts a large-scale mining analysis of over 21,000 GitHub repositories, containing 1.2 million commits from more than 10,000 developers, to examine the evolution and maintenance…
▽ More
Quantum computing is rapidly advancing, but quantum software development faces significant challenges, including a steep learning curve, high hardware error rates, and a lack of mature engineering practices. This study conducts a large-scale mining analysis of over 21,000 GitHub repositories, containing 1.2 million commits from more than 10,000 developers, to examine the evolution and maintenance of quantum software. We analyze repository growth, programming language and framework adoption, and contributor trends, revealing a 200% increase in repositories and a 150% rise in contributors since 2017. Additionally, we investigate software development and maintenance practices, showing that perfective commits dominate (51.76%), while the low occurrence of corrective commits (18.54%) indicates potential gaps in bug resolution. Furthermore, 34% of reported issues are quantum-specific, highlighting the need for specialized debugging tools beyond conventional software engineering approaches. This study provides empirical insights into the software engineering challenges of quantum computing, offering recommendations to improve development workflows, tooling, and documentation. We are also open-sourcing our dataset to support further analysis by the community and to guide future research and tool development for quantum computing. The dataset is available at: https://github.com/kriss-u/QRepoAnalysis-Paper
△ Less
Submitted 5 June, 2025; v1 submitted 12 January, 2025;
originally announced January 2025.
-
Biomolecular Analysis of Soil Samples and Rock Imagery for Tracing Evidence of Life Using a Mobile Robot
Authors:
Shah Md Ahasan Siddique,
Ragib Tahshin Rinath,
Shakil Mosharrof,
Syed Tanjib Mahmud,
Sakib Ahmed
Abstract:
The search for evidence of past life on Mars presents a tremendous challenge that requires the usage of very advanced robotic technologies to overcome it. Current digital microscopic imagers and spectrometers used for astrobiological examination suffer from limitations such as insufficient resolution, narrow detection range, and lack of portability. To overcome these challenges, this research stud…
▽ More
The search for evidence of past life on Mars presents a tremendous challenge that requires the usage of very advanced robotic technologies to overcome it. Current digital microscopic imagers and spectrometers used for astrobiological examination suffer from limitations such as insufficient resolution, narrow detection range, and lack of portability. To overcome these challenges, this research study presents modifications to the Phoenix rover to expand its capability for detecting biosignatures on Mars. This paper examines the modifications implemented on the Phoenix rover to enhance its capability to detect a broader spectrum of biosignatures. One of the notable improvements comprises the integration of advanced digital microscopic imagers and spectrometers, enabling high-resolution examination of soil samples. Additionally, the mechanical components of the device have been reinforced to enhance maneuverability and optimize subsurface sampling capabilities. Empirical investigations have demonstrated that Phoenix has the capability to navigate diverse geological environments and procure samples for the purpose of biomolecular analysis. The biomolecular instrumentation and hybrid analytical methods showcased in this study demonstrate considerable potential for future astrobiology missions on Mars. The potential for enhancing the system lies in the possibility of broadening the range of detectable biomarkers and biosignatures.
△ Less
Submitted 27 November, 2024;
originally announced November 2024.
-
Enhancing Object Detection Accuracy in Autonomous Vehicles Using Synthetic Data
Authors:
Sergei Voronin,
Abubakar Siddique,
Muhammad Iqbal
Abstract:
The rapid progress in machine learning models has significantly boosted the potential for real-world applications such as autonomous vehicles, disease diagnoses, and recognition of emergencies. The performance of many machine learning models depends on the nature and size of the training data sets. These models often face challenges due to the scarcity, noise, and imbalance in real-world data, lim…
▽ More
The rapid progress in machine learning models has significantly boosted the potential for real-world applications such as autonomous vehicles, disease diagnoses, and recognition of emergencies. The performance of many machine learning models depends on the nature and size of the training data sets. These models often face challenges due to the scarcity, noise, and imbalance in real-world data, limiting their performance. Nonetheless, high-quality, diverse, relevant and representative training data is essential to build accurate and reliable machine learning models that adapt well to real-world scenarios.
It is hypothesised that well-designed synthetic data can improve the performance of a machine learning algorithm. This work aims to create a synthetic dataset and evaluate its effectiveness to improve the prediction accuracy of object detection systems. This work considers autonomous vehicle scenarios as an illustrative example to show the efficacy of synthetic data. The effectiveness of these synthetic datasets in improving the performance of state-of-the-art object detection models is explored. The findings demonstrate that incorporating synthetic data improves model performance across all performance matrices.
Two deep learning systems, System-1 (trained on real-world data) and System-2 (trained on a combination of real and synthetic data), are evaluated using the state-of-the-art YOLO model across multiple metrics, including accuracy, precision, recall, and mean average precision. Experimental results revealed that System-2 outperformed System-1, showing a 3% improvement in accuracy, along with superior performance in all other metrics.
△ Less
Submitted 23 November, 2024;
originally announced November 2024.
-
UAV survey coverage path planning of complex regions containing exclusion zones
Authors:
Shadman Tajwar Shahid,
Shah Md. Ahasan Siddique,
Md. Mahidul Alam
Abstract:
This article addresses the challenge of UAV survey coverage path planning for areas that are complex concave polygons, containing exclusion zones or obstacles. While standard drone path planners typically generate coverage paths for simple convex polygons, this study proposes a method to manage more intricate regions, including boundary splits, merges, and interior holes. To achieve this, polygona…
▽ More
This article addresses the challenge of UAV survey coverage path planning for areas that are complex concave polygons, containing exclusion zones or obstacles. While standard drone path planners typically generate coverage paths for simple convex polygons, this study proposes a method to manage more intricate regions, including boundary splits, merges, and interior holes. To achieve this, polygonal decomposition techniques are used to partition the target area into convex sub-regions. The sub-polygons are then merged using a depth-first search algorithm, followed by the generation of continuous Boustrophedon paths based on connected components. Polygonal offset by the straight skeleton method was used to ensure a constant safe distance from the exclusion zones. This approach allows UAV path planning in environments with complex geometric constraints.
△ Less
Submitted 13 November, 2024; v1 submitted 11 November, 2024;
originally announced November 2024.
-
Automatic Contact-Based 3D Scanning Using Articulated Robotic Arm
Authors:
Shadman Tajwar Shahid,
Shah Md. Ahasan Siddique,
Md. Humayun Kabir Bhuiyan
Abstract:
This paper presents an open-loop articulated 6-degree-of-freedom (DoF) robotic system for three-dimensional (3D) scanning of objects by contact-based method. A digitizer probe was used to detect contact with the object. Inverse kinematics (IK) was used to determine the joint angles of the robot corresponding to the probe position and orientation, and straight-line trajectory planning was implement…
▽ More
This paper presents an open-loop articulated 6-degree-of-freedom (DoF) robotic system for three-dimensional (3D) scanning of objects by contact-based method. A digitizer probe was used to detect contact with the object. Inverse kinematics (IK) was used to determine the joint angles of the robot corresponding to the probe position and orientation, and straight-line trajectory planning was implemented for motion. The system can take single-point measurements and 3D scans of freeform surfaces. Specifying the scanning area's size, position, and density, the system automatically scans the designated volume. The system produces 3D scans in Standard Triangle Language (STL) format, ensuring compatibility with commonly used 3D software. Tests based on ASME B89.4.22 standards were conducted to quantify accuracy and repeatability. The point cloud from the scans was compared to the original 3D model of the object.
△ Less
Submitted 11 November, 2024;
originally announced November 2024.
-
AquaFuse: Waterbody Fusion for Physics Guided View Synthesis of Underwater Scenes
Authors:
Md Abu Bakr Siddique,
Jiayi Wu,
Ioannis Rekleitis,
Md Jahidul Islam
Abstract:
We introduce the idea of AquaFuse, a physics-based method for synthesizing waterbody properties in underwater imagery. We formulate a closed-form solution for waterbody fusion that facilitates realistic data augmentation and geometrically consistent underwater scene rendering. AquaFuse leverages the physical characteristics of light propagation underwater to synthesize the waterbody from one scene…
▽ More
We introduce the idea of AquaFuse, a physics-based method for synthesizing waterbody properties in underwater imagery. We formulate a closed-form solution for waterbody fusion that facilitates realistic data augmentation and geometrically consistent underwater scene rendering. AquaFuse leverages the physical characteristics of light propagation underwater to synthesize the waterbody from one scene to the object contents of another. Unlike data-driven style transfer, AquaFuse preserves the depth consistency and object geometry in an input scene. We validate this unique feature by comprehensive experiments over diverse underwater scenes. We find that the AquaFused images preserve over 94% depth consistency and 90-95% structural similarity of the input scenes. We also demonstrate that it generates accurate 3D view synthesis by preserving object geometry while adapting to the inherent waterbody fusion process. AquaFuse opens up a new research direction in data augmentation by geometry-preserving style transfer for underwater imaging and robot vision applications.
△ Less
Submitted 11 March, 2025; v1 submitted 1 November, 2024;
originally announced November 2024.
-
Training Zero-Shot Generalizable End-to-End Task-Oriented Dialog System Without Turn-level Dialog Annotations
Authors:
Adib Mosharrof,
A. B. Siddique
Abstract:
Task-oriented dialogue (TOD) systems enable users to achieve their goals through natural language interactions. Traditionally, these systems have relied on turn-level manually annotated metadata, such as dialogue states and policy annotations, which are expensive, time-consuming, and often inconsistent or error-prone. This dependence limits the potential to leverage vast amounts of readily availab…
▽ More
Task-oriented dialogue (TOD) systems enable users to achieve their goals through natural language interactions. Traditionally, these systems have relied on turn-level manually annotated metadata, such as dialogue states and policy annotations, which are expensive, time-consuming, and often inconsistent or error-prone. This dependence limits the potential to leverage vast amounts of readily available conversational data for training TOD systems. Additionally, a critical challenge in TOD system design is determining when and how to access and integrate information from external sources. Current approaches typically expect this information to be provided alongside the dialogue context, rather than learning to identify and retrieve it autonomously. While pre-trained large language models (LLMs) have been used to develop TOD systems, their potential to train such systems without laborious annotations remains largely unexplored. This work employs multi-task instruction fine-tuning to create more efficient and scalable TOD systems that can effectively leverage natural language conversational data without manual annotations, while autonomously managing external information retrieval. Our extensive experimental evaluations, using three diverse TOD datasets and three LLMs of varying sizes, demonstrate that our approach can generalize to new, unseen domains. Notably, our approach outperforms both state-of-the-art models trained on annotated data and billion-scale parameter off-the-shelf ChatGPT models.
△ Less
Submitted 4 November, 2024; v1 submitted 21 July, 2024;
originally announced July 2024.
-
Looking into Black Box Code Language Models
Authors:
Muhammad Umair Haider,
Umar Farooq,
A. B. Siddique,
Mark Marron
Abstract:
Language Models (LMs) have shown their application for tasks pertinent to code and several code~LMs have been proposed recently. The majority of the studies in this direction only focus on the improvements in performance of the LMs on different benchmarks, whereas LMs are considered black boxes. Besides this, a handful of works attempt to understand the role of attention layers in the code~LMs. No…
▽ More
Language Models (LMs) have shown their application for tasks pertinent to code and several code~LMs have been proposed recently. The majority of the studies in this direction only focus on the improvements in performance of the LMs on different benchmarks, whereas LMs are considered black boxes. Besides this, a handful of works attempt to understand the role of attention layers in the code~LMs. Nonetheless, feed-forward layers remain under-explored which consist of two-thirds of a typical transformer model's parameters.
In this work, we attempt to gain insights into the inner workings of code language models by examining the feed-forward layers. To conduct our investigations, we use two state-of-the-art code~LMs, Codegen-Mono and Ploycoder, and three widely used programming languages, Java, Go, and Python. We focus on examining the organization of stored concepts, the editability of these concepts, and the roles of different layers and input context size variations for output generation. Our empirical findings demonstrate that lower layers capture syntactic patterns while higher layers encode abstract concepts and semantics. We show concepts of interest can be edited within feed-forward layers without compromising code~LM performance. Additionally, we observe initial layers serve as ``thinking'' layers, while later layers are crucial for predicting subsequent code tokens. Furthermore, we discover earlier layers can accurately predict smaller contexts, but larger contexts need critical later layers' contributions. We anticipate these findings will facilitate better understanding, debugging, and testing of code~LMs.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
MobileConvRec: A Conversational Dataset for Mobile Apps Recommendations
Authors:
Srijata Maji,
Moghis Fereidouni,
Vinaik Chhetri,
Umar Farooq,
A. B. Siddique
Abstract:
Existing recommendation systems have focused on two paradigms: 1- historical user-item interaction-based recommendations and 2- conversational recommendations. Conversational recommendation systems facilitate natural language dialogues between users and the system, allowing the system to solicit users' explicit needs while enabling users to inquire about recommendations and provide feedback. Due t…
▽ More
Existing recommendation systems have focused on two paradigms: 1- historical user-item interaction-based recommendations and 2- conversational recommendations. Conversational recommendation systems facilitate natural language dialogues between users and the system, allowing the system to solicit users' explicit needs while enabling users to inquire about recommendations and provide feedback. Due to substantial advancements in natural language processing, conversational recommendation systems have gained prominence. Existing conversational recommendation datasets have greatly facilitated research in their respective domains. Despite the exponential growth in mobile users and apps in recent years, research in conversational mobile app recommender systems has faced substantial constraints. This limitation can primarily be attributed to the lack of high-quality benchmark datasets specifically tailored for mobile apps. To facilitate research for conversational mobile app recommendations, we introduce MobileConvRec. MobileConvRec simulates conversations by leveraging real user interactions with mobile apps on the Google Play store, originally captured in large-scale mobile app recommendation dataset MobileRec. The proposed conversational recommendation dataset synergizes sequential user-item interactions, which reflect implicit user preferences, with comprehensive multi-turn conversations to effectively grasp explicit user needs. MobileConvRec consists of over 12K multi-turn recommendation-related conversations spanning 45 app categories. Moreover, MobileConvRec presents rich metadata for each app such as permissions data, security and privacy-related information, and binary executables of apps, among others. We demonstrate that MobileConvRec can serve as an excellent testbed for conversational mobile app recommendation through a comparative study of several pre-trained large language models.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Grounded Language Agent for Product Search via Intelligent Web Interactions
Authors:
Moghis Fereidouni,
Adib Mosharrof,
A. B. Siddique
Abstract:
The development of agents powered by large language models (LLMs) to accomplish complex high-level user intents, has attracted significant attention recently. However, employing LLMs with billions of parameters (e.g., GPT-4) may incur substantial costs on top of handcrafting extensive prompts. To address this, we introduce a Grounded Language Agent for Intelligent Web Interactions, named GLAINTEL.…
▽ More
The development of agents powered by large language models (LLMs) to accomplish complex high-level user intents, has attracted significant attention recently. However, employing LLMs with billions of parameters (e.g., GPT-4) may incur substantial costs on top of handcrafting extensive prompts. To address this, we introduce a Grounded Language Agent for Intelligent Web Interactions, named GLAINTEL. GLAINTEL employs Flan-T5 as its backbone and is flexible in training in various settings: unsupervised learning, supervised learning, and unsupervised domain adaptation. Specifically, we tackle both the challenge of learning without human demonstrations and the opportunity to leverage human demonstrations effectively when those are available. Additionally, we explore unsupervised domain adaptation for cases where demonstrations are limited to a specific domain. Experimental evaluations across diverse setups demonstrate the effectiveness of GLAINTEL in unsupervised settings, outperforming in-context learning-based approaches that employ larger models with up to 540 billion parameters. Surprisingly, behavioral cloning-based methods that straightforwardly use human demonstrations do not outperform unsupervised variants of GLAINTEL. Additionally, we show that combining human demonstrations with reinforcement learning-based training yields results comparable to methods utilizing GPT-4. The code is available at: https://github.com/MultifacetedNLP/WebAgents-Unsupervised.
△ Less
Submitted 26 January, 2025; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Deployment of Advanced and Intelligent Logistics Vehicles with Enhanced Tracking and Security Features
Authors:
Iqtiar Md Siddique,
Selim Molla,
MD Rakib Hasan,
Anamika Ahmed Siddique
Abstract:
This study focuses on the implementation of modern and intelligent logistics vehicles equipped with advanced tracking and security features. In response to the evolving landscape of logistics management, the proposed system integrates cutting edge technologies to enhance efficiency and ensure the security of the entire logistics process. The core component of this implementation is the incorporati…
▽ More
This study focuses on the implementation of modern and intelligent logistics vehicles equipped with advanced tracking and security features. In response to the evolving landscape of logistics management, the proposed system integrates cutting edge technologies to enhance efficiency and ensure the security of the entire logistics process. The core component of this implementation is the incorporation of state-of-the art tracking mechanisms, enabling real-time monitoring of vehicle locations and movements. Furthermore, the system addresses the paramount concern of security by introducing advanced security measures. Through the utilization of sophisticated tracking technologies and security protocols, the proposed logistics vehicles aim to safeguard both customer and provider data. The implementation includes the integration of QR code concepts, creating a binary image system that conceals sensitive information and ensures access only to authorized users. In addition to tracking and security, the study delves into the realm of information mining, employing techniques such as classification, clustering, and recommendation to extract meaningful patterns from vast datasets. Collaborative filtering techniques are incorporated to enhance customer experience by recommending services based on user preferences and historical data. This abstract encapsulates the comprehensive approach of deploying modern logistics vehicles, emphasizing their intelligence through advanced tracking, robust security measures, and data-driven insights. The proposed system aims to revolutionize logistics management, providing a seamless and secure experience for both customers and service providers in the dynamic logistics landscape.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
GAN-driven Electromagnetic Imaging of 2-D Dielectric Scatterers
Authors:
Ehtasham Naseer,
Ali Imran Sandhu,
Muhammad Adnan Siddique,
Waqas W. Ahmed,
Mohamed Farhat,
Ying Wu
Abstract:
Inverse scattering problems are inherently challenging, given the fact they are ill-posed and nonlinear. This paper presents a powerful deep learning-based approach that relies on generative adversarial networks to accurately and efficiently reconstruct randomly-shaped two-dimensional dielectric objects from amplitudes of multi-frequency scattered electric fields. An adversarial autoencoder (AAE)…
▽ More
Inverse scattering problems are inherently challenging, given the fact they are ill-posed and nonlinear. This paper presents a powerful deep learning-based approach that relies on generative adversarial networks to accurately and efficiently reconstruct randomly-shaped two-dimensional dielectric objects from amplitudes of multi-frequency scattered electric fields. An adversarial autoencoder (AAE) is trained to learn to generate the scatterer's geometry from a lower-dimensional latent representation constrained to adhere to the Gaussian distribution. A cohesive inverse neural network (INN) framework is set up comprising a sequence of appropriately designed dense layers, the already-trained generator as well as a separately trained forward neural network. The images reconstructed at the output of the inverse network are validated through comparison with outputs from the forward neural network, addressing the non-uniqueness challenge inherent to electromagnetic (EM) imaging problems. The trained INN demonstrates an enhanced robustness, evidenced by a mean binary cross-entropy (BCE) loss of $0.13$ and a structure similarity index (SSI) of $0.90$. The study not only demonstrates a significant reduction in computational load, but also marks a substantial improvement over traditional objective-function-based methods. It contributes both to the fields of machine learning and EM imaging by offering a real-time quantitative imaging approach. The results obtained with the simulated data, for both training and testing, yield promising results and may open new avenues for radio-frequency inverse imaging.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Zero-Shot Generalizable End-to-End Task-Oriented Dialog System using Context Summarization and Domain Schema
Authors:
Adib Mosharrof,
M. H. Maqbool,
A. B. Siddique
Abstract:
Task-oriented dialog systems empower users to accomplish their goals by facilitating intuitive and expressive natural language interactions. State-of-the-art approaches in task-oriented dialog systems formulate the problem as a conditional sequence generation task and fine-tune pre-trained causal language models in the supervised setting. This requires labeled training data for each new domain or…
▽ More
Task-oriented dialog systems empower users to accomplish their goals by facilitating intuitive and expressive natural language interactions. State-of-the-art approaches in task-oriented dialog systems formulate the problem as a conditional sequence generation task and fine-tune pre-trained causal language models in the supervised setting. This requires labeled training data for each new domain or task, and acquiring such data is prohibitively laborious and expensive, thus making it a bottleneck for scaling systems to a wide range of domains. To overcome this challenge, we introduce a novel Zero-Shot generalizable end-to-end Task-oriented Dialog system, ZS-ToD, that leverages domain schemas to allow for robust generalization to unseen domains and exploits effective summarization of the dialog history. We employ GPT-2 as a backbone model and introduce a two-step training process where the goal of the first step is to learn the general structure of the dialog data and the second step optimizes the response generation as well as intermediate outputs, such as dialog state and system actions. As opposed to state-of-the-art systems that are trained to fulfill certain intents in the given domains and memorize task-specific conversational patterns, ZS-ToD learns generic task-completion skills by comprehending domain semantics via domain schemas and generalizing to unseen domains seamlessly. We conduct an extensive experimental evaluation on SGD and SGD-X datasets that span up to 20 unique domains and ZS-ToD outperforms state-of-the-art systems on key metrics, with an improvement of +17% on joint goal accuracy and +5 on inform. Additionally, we present a detailed ablation study to demonstrate the effectiveness of the proposed components and training mechanism
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
Toward Open-domain Slot Filling via Self-supervised Co-training
Authors:
Adib Mosharrof,
Moghis Fereidouni,
A. B. Siddique
Abstract:
Slot filling is one of the critical tasks in modern conversational systems. The majority of existing literature employs supervised learning methods, which require labeled training data for each new domain. Zero-shot learning and weak supervision approaches, among others, have shown promise as alternatives to manual labeling. Nonetheless, these learning paradigms are significantly inferior to super…
▽ More
Slot filling is one of the critical tasks in modern conversational systems. The majority of existing literature employs supervised learning methods, which require labeled training data for each new domain. Zero-shot learning and weak supervision approaches, among others, have shown promise as alternatives to manual labeling. Nonetheless, these learning paradigms are significantly inferior to supervised learning approaches in terms of performance. To minimize this performance gap and demonstrate the possibility of open-domain slot filling, we propose a Self-supervised Co-training framework, called SCot, that requires zero in-domain manually labeled training examples and works in three phases. Phase one acquires two sets of complementary pseudo labels automatically. Phase two leverages the power of the pre-trained language model BERT, by adapting it for the slot filling task using these sets of pseudo labels. In phase three, we introduce a self-supervised cotraining mechanism, where both models automatically select highconfidence soft labels to further improve the performance of the other in an iterative fashion. Our thorough evaluations show that SCot outperforms state-of-the-art models by 45.57% and 37.56% on SGD and MultiWoZ datasets, respectively. Moreover, our proposed framework SCot achieves comparable performance when compared to state-of-the-art fully supervised models.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
Personalizing Task-oriented Dialog Systems via Zero-shot Generalizable Reward Function
Authors:
A. B. Siddique,
M. H. Maqbool,
Kshitija Taywade,
Hassan Foroosh
Abstract:
Task-oriented dialog systems enable users to accomplish tasks using natural language. State-of-the-art systems respond to users in the same way regardless of their personalities, although personalizing dialogues can lead to higher levels of adoption and better user experiences. Building personalized dialog systems is an important, yet challenging endeavor and only a handful of works took on the ch…
▽ More
Task-oriented dialog systems enable users to accomplish tasks using natural language. State-of-the-art systems respond to users in the same way regardless of their personalities, although personalizing dialogues can lead to higher levels of adoption and better user experiences. Building personalized dialog systems is an important, yet challenging endeavor and only a handful of works took on the challenge. Most existing works rely on supervised learning approaches and require laborious and expensive labeled training data for each user profile. Additionally, collecting and labeling data for each user profile is virtually impossible. In this work, we propose a novel framework, P-ToD, to personalize task-oriented dialog systems capable of adapting to a wide range of user profiles in an unsupervised fashion using a zero-shot generalizable reward function. P-ToD uses a pre-trained GPT-2 as a backbone model and works in three phases. Phase one performs task-specific training. Phase two kicks off unsupervised personalization by leveraging the proximal policy optimization algorithm that performs policy gradients guided by the zero-shot generalizable reward function. Our novel reward function can quantify the quality of the generated responses even for unseen profiles. The optional final phase fine-tunes the personalized model using a few labeled training examples. We conduct extensive experimental analysis using the personalized bAbI dialogue benchmark for five tasks and up to 180 diverse user profiles. The experimental results demonstrate that P-ToD, even when it had access to zero labeled examples, outperforms state-of-the-art supervised personalization models and achieves competitive performance on BLEU and ROUGE metrics when compared to a strong fully-supervised GPT-2 baseline
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
MobileRec: A Large-Scale Dataset for Mobile Apps Recommendation
Authors:
M. H. Maqbool,
Umar Farooq,
Adib Mosharrof,
A. B. Siddique,
Hassan Foroosh
Abstract:
Recommender systems have become ubiquitous in our digital lives, from recommending products on e-commerce websites to suggesting movies and music on streaming platforms. Existing recommendation datasets, such as Amazon Product Reviews and MovieLens, greatly facilitated the research and development of recommender systems in their respective domains. While the number of mobile users and applications…
▽ More
Recommender systems have become ubiquitous in our digital lives, from recommending products on e-commerce websites to suggesting movies and music on streaming platforms. Existing recommendation datasets, such as Amazon Product Reviews and MovieLens, greatly facilitated the research and development of recommender systems in their respective domains. While the number of mobile users and applications (aka apps) has increased exponentially over the past decade, research in mobile app recommender systems has been significantly constrained, primarily due to the lack of high-quality benchmark datasets, as opposed to recommendations for products, movies, and news. To facilitate research for app recommendation systems, we introduce a large-scale dataset, called MobileRec. We constructed MobileRec from users' activity on the Google play store. MobileRec contains 19.3 million user interactions (i.e., user reviews on apps) with over 10K unique apps across 48 categories. MobileRec records the sequential activity of a total of 0.7 million distinct users. Each of these users has interacted with no fewer than five distinct apps, which stands in contrast to previous datasets on mobile apps that recorded only a single interaction per user. Furthermore, MobileRec presents users' ratings as well as sentiments on installed apps, and each app contains rich metadata such as app name, category, description, and overall rating, among others. We demonstrate that MobileRec can serve as an excellent testbed for app recommendation through a comparative study of several state-of-the-art recommendation approaches. The quantitative results can act as a baseline for other researchers to compare their results against. The MobileRec dataset is available at https://huggingface.co/datasets/recmeapp/mobilerec.
△ Less
Submitted 12 March, 2023;
originally announced March 2023.
-
Proactive Prioritization of App Issues via Contrastive Learning
Authors:
Moghis Fereidouni,
Adib Mosharrof,
Umar Farooq,
AB Siddique
Abstract:
Mobile app stores produce a tremendous amount of data in the form of user reviews, which is a huge source of user requirements and sentiments; such reviews allow app developers to proactively address issues in their apps. However, only a small number of reviews capture common issues and sentiments which creates a need for automatically identifying prominent reviews. Unfortunately, most existing wo…
▽ More
Mobile app stores produce a tremendous amount of data in the form of user reviews, which is a huge source of user requirements and sentiments; such reviews allow app developers to proactively address issues in their apps. However, only a small number of reviews capture common issues and sentiments which creates a need for automatically identifying prominent reviews. Unfortunately, most existing work in text ranking and popularity prediction focuses on social contexts where other signals are available, which renders such works ineffective in the context of app reviews. In this work, we propose a new framework, PPrior, that enables proactive prioritization of app issues through identifying prominent reviews (ones predicted to receive a large number of votes in a given time window). Predicting highly-voted reviews is challenging given that, unlike social posts, social network features of users are not available. Moreover, there is an issue of class imbalance, since a large number of user reviews receive little to no votes. PPrior employs a pre-trained T5 model and works in three phases. Phase one adapts the pre-trained T5 model to the user reviews data in a self-supervised fashion. In phase two, we leverage contrastive training to learn a generic and task-independent representation of user reviews. Phase three uses radius neighbors classifier t o m ake t he final predictions. This phase also uses FAISS index for scalability and efficient search. To conduct extensive experiments, we acquired a large dataset of over 2.1 million user reviews from Google Play. Our experimental results demonstrate the effectiveness of the proposed framework when compared against several state-of-the-art approaches. Moreover, the accuracy of PPrior in predicting prominent reviews is comparable to that of experienced app developers.
△ Less
Submitted 12 March, 2023;
originally announced March 2023.
-
Analysis of Real-Time Hostile Activitiy Detection from Spatiotemporal Features Using Time Distributed Deep CNNs, RNNs and Attention-Based Mechanisms
Authors:
Labib Ahmed Siddique,
Rabita Junhai,
Tanzim Reza,
Salman Sayeed Khan,
Tanvir Rahman
Abstract:
Real-time video surveillance, through CCTV camera systems has become essential for ensuring public safety which is a priority today. Although CCTV cameras help a lot in increasing security, these systems require constant human interaction and monitoring. To eradicate this issue, intelligent surveillance systems can be built using deep learning video classification techniques that can help us autom…
▽ More
Real-time video surveillance, through CCTV camera systems has become essential for ensuring public safety which is a priority today. Although CCTV cameras help a lot in increasing security, these systems require constant human interaction and monitoring. To eradicate this issue, intelligent surveillance systems can be built using deep learning video classification techniques that can help us automate surveillance systems to detect violence as it happens. In this research, we explore deep learning video classification techniques to detect violence as they are happening. Traditional image classification techniques fall short when it comes to classifying videos as they attempt to classify each frame separately for which the predictions start to flicker. Therefore, many researchers are coming up with video classification techniques that consider spatiotemporal features while classifying. However, deploying these deep learning models with methods such as skeleton points obtained through pose estimation and optical flow obtained through depth sensors, are not always practical in an IoT environment. Although these techniques ensure a higher accuracy score, they are computationally heavier. Keeping these constraints in mind, we experimented with various video classification and action recognition techniques such as ConvLSTM, LRCN (with both custom CNN layers and VGG-16 as feature extractor) CNNTransformer and C3D. We achieved a test accuracy of 80% on ConvLSTM, 83.33% on CNN-BiLSTM, 70% on VGG16-BiLstm ,76.76% on CNN-Transformer and 80% on C3D.
△ Less
Submitted 21 February, 2023;
originally announced February 2023.
-
Lateralization in Agents' Decision Making: Evidence of Benefits/Costs from Artificial Intelligence
Authors:
Abubakar Siddique,
Will N. Browne,
Gina M. Grimshaw
Abstract:
Lateralization is ubiquitous in vertebrate brains which, as well as its role in locomotion, is considered an important factor in biological intelligence. Lateralization has been associated with both poor and good performance. It has been hypothesized that lateralization has benefits that may counterbalance its costs. Given that lateralization is ubiquitous, it likely has advantages that can benefi…
▽ More
Lateralization is ubiquitous in vertebrate brains which, as well as its role in locomotion, is considered an important factor in biological intelligence. Lateralization has been associated with both poor and good performance. It has been hypothesized that lateralization has benefits that may counterbalance its costs. Given that lateralization is ubiquitous, it likely has advantages that can benefit artificial intelligence. In turn, lateralized artificial intelligent systems can be used as tools to advance the understanding of lateralization in biological intelligence. Recently lateralization has been incorporated into artificially intelligent systems to solve complex problems in computer vision and navigation domains. Here we describe and test two novel lateralized artificial intelligent systems that simultaneously represent and address given problems at constituent and holistic levels. The experimental results demonstrate that the lateralized systems outperformed state-of-the-art non-lateralized systems in resolving complex problems. The advantages arise from the abilities, (i) to represent an input signal at both the constituent level and holistic level simultaneously, such that the most appropriate viewpoint controls the system; (ii) to avoid extraneous computations by generating excite and inhibit signals. The computational costs associated with the lateralized AI systems are either less than the conventional AI systems or countered by providing better solutions.
△ Less
Submitted 2 February, 2023;
originally announced February 2023.
-
Lateralized Learning for Multi-Class Visual Classification Tasks
Authors:
Abubakar Siddique,
Will N. Browne,
Gina M. Grimshaw
Abstract:
The majority of computer vision algorithms fail to find higher-order (abstract) patterns in an image so are not robust against adversarial attacks, unlike human lateralized vision. Deep learning considers each input pixel in a homogeneous manner such that different parts of a ``locality-sensitive hashing table'' are often not connected, meaning higher-order patterns are not discovered. Hence these…
▽ More
The majority of computer vision algorithms fail to find higher-order (abstract) patterns in an image so are not robust against adversarial attacks, unlike human lateralized vision. Deep learning considers each input pixel in a homogeneous manner such that different parts of a ``locality-sensitive hashing table'' are often not connected, meaning higher-order patterns are not discovered. Hence these systems are not robust against noisy, irrelevant, and redundant data, resulting in the wrong prediction being made with high confidence. Conversely, vertebrate brains afford heterogeneous knowledge representation through lateralization, enabling modular learning at different levels of abstraction. This work aims to verify the effectiveness, scalability, and robustness of a lateralized approach to real-world problems that contain noisy, irrelevant, and redundant data. The experimental results of multi-class (200 classes) image classification show that the novel system effectively learns knowledge representation at multiple levels of abstraction making it more robust than other state-of-the-art techniques. Crucially, the novel lateralized system outperformed all the state-of-the-art deep learning-based systems for the classification of normal and adversarial images by 19.05% - 41.02% and 1.36% - 49.22%, respectively. Findings demonstrate the value of heterogeneous and lateralized learning for computer vision applications.
△ Less
Submitted 29 January, 2023;
originally announced January 2023.
-
RobustPdM: Designing Robust Predictive Maintenance against Adversarial Attacks
Authors:
Ayesha Siddique,
Ripan Kumar Kundu,
Gautam Raj Mode,
Khaza Anuarul Hoque
Abstract:
The state-of-the-art predictive maintenance (PdM) techniques have shown great success in reducing maintenance costs and downtime of complicated machines while increasing overall productivity through extensive utilization of Internet-of-Things (IoT) and Deep Learning (DL). Unfortunately, IoT sensors and DL algorithms are both prone to cyber-attacks. For instance, DL algorithms are known for their s…
▽ More
The state-of-the-art predictive maintenance (PdM) techniques have shown great success in reducing maintenance costs and downtime of complicated machines while increasing overall productivity through extensive utilization of Internet-of-Things (IoT) and Deep Learning (DL). Unfortunately, IoT sensors and DL algorithms are both prone to cyber-attacks. For instance, DL algorithms are known for their susceptibility to adversarial examples. Such adversarial attacks are vastly under-explored in the PdM domain. This is because the adversarial attacks in the computer vision domain for classification tasks cannot be directly applied to the PdM domain for multivariate time series (MTS) regression tasks. In this work, we propose an end-to-end methodology to design adversarially robust PdM systems by extensively analyzing the effect of different types of adversarial attacks and proposing a novel adversarial defense technique for DL-enabled PdM models. First, we propose novel MTS Projected Gradient Descent (PGD) and MTS PGD with random restarts (PGD_r) attacks. Then, we evaluate the impact of MTS PGD and PGD_r along with MTS Fast Gradient Sign Method (FGSM) and MTS Basic Iterative Method (BIM) on Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), Convolutional Neural Network (CNN), and Bi-directional LSTM based PdM system. Our results using NASA's turbofan engine dataset show that adversarial attacks can cause a severe defect (up to 11X) in the RUL prediction, outperforming the effectiveness of the state-of-the-art PdM attacks by 3X. Furthermore, we present a novel approximate adversarial training method to defend against adversarial attacks. We observe that approximate adversarial training can significantly improve the robustness of PdM models (up to 54X) and outperforms the state-of-the-art PdM defense methods by offering 3X more robustness.
△ Less
Submitted 10 August, 2023; v1 submitted 25 January, 2023;
originally announced January 2023.
-
Exposing Reliability Degradation and Mitigation in Approximate DNNs under Permanent Faults
Authors:
Ayesha Siddique,
Khaza Anuarul Hoque
Abstract:
Approximate computing is known for enhancing deep neural network accelerators' energy efficiency by introducing inexactness with a tolerable accuracy loss. However, small accuracy variations may increase the sensitivity of these accelerators towards undesired subtle disturbances, such as permanent faults. The impact of permanent faults in accurate deep neural network (AccDNN) accelerators has been…
▽ More
Approximate computing is known for enhancing deep neural network accelerators' energy efficiency by introducing inexactness with a tolerable accuracy loss. However, small accuracy variations may increase the sensitivity of these accelerators towards undesired subtle disturbances, such as permanent faults. The impact of permanent faults in accurate deep neural network (AccDNN) accelerators has been thoroughly investigated in the literature. Conversely, the impact of permanent faults and their mitigation in approximate DNN (AxDNN) accelerators is vastly under-explored. Towards this, we first present an extensive fault resilience analysis of approximate multi-layer perceptrons (MLPs) and convolutional neural networks (CNNs) using the state-of-the-art Evoapprox8b multipliers in GPU and TPU accelerators. Then, we propose a novel fault mitigation method, i.e., fault-aware retuning of weights (Fal-reTune). Fal-reTune retunes the weights using a weight mapping function in the presence of faults for improved classification accuracy. To evaluate the fault resilience and the effectiveness of our proposed mitigation method, we used the most widely used MNIST, Fashion-MNIST, and CIFAR10 datasets. Our results demonstrate that the permanent faults exacerbate the accuracy loss in AxDNNs compared to the AccDNN accelerators. For instance, a permanent fault in AxDNNs can lead to 56\% accuracy loss, whereas the same faulty bit can lead to only 4\% accuracy loss in AccDNN accelerators. We empirically show that our proposed Fal-reTune mitigation method improves the performance of AxDNNs up to 98%, even with fault rates of up to 50%. Furthermore, we observe that the fault resilience in AxDNNs is orthogonal to their energy efficiency.
△ Less
Submitted 12 January, 2023;
originally announced January 2023.
-
Improving Reliability of Spiking Neural Networks through Fault Aware Threshold Voltage Optimization
Authors:
Ayesha Siddique,
Khaza Anuarul Hoque
Abstract:
Spiking neural networks have made breakthroughs in computer vision by lending themselves to neuromorphic hardware. However, the neuromorphic hardware lacks parallelism and hence, limits the throughput and hardware acceleration of SNNs on edge devices. To address this problem, many systolic-array SNN accelerators (systolicSNNs) have been proposed recently, but their reliability is still a major con…
▽ More
Spiking neural networks have made breakthroughs in computer vision by lending themselves to neuromorphic hardware. However, the neuromorphic hardware lacks parallelism and hence, limits the throughput and hardware acceleration of SNNs on edge devices. To address this problem, many systolic-array SNN accelerators (systolicSNNs) have been proposed recently, but their reliability is still a major concern. In this paper, we first extensively analyze the impact of permanent faults on the SystolicSNNs. Then, we present a novel fault mitigation method, i.e., fault-aware threshold voltage optimization in retraining (FalVolt). FalVolt optimizes the threshold voltage for each layer in retraining to achieve the classification accuracy close to the baseline in the presence of faults. To demonstrate the effectiveness of our proposed mitigation, we classify both static (i.e., MNIST) and neuromorphic datasets (i.e., N-MNIST and DVS Gesture) on a 256x256 systolicSNN with stuck-at faults. We empirically show that the classification accuracy of a systolicSNN drops significantly even at extremely low fault rates (as low as 0.012\%). Our proposed FalVolt mitigation method improves the performance of systolicSNNs by enabling them to operate at fault rates of up to 60\%, with a negligible drop in classification accuracy (as low as 0.1\%). Our results show that FalVolt is 2x faster compared to other state-of-the-art techniques common in artificial neural networks (ANNs), such as fault-aware pruning and retraining without threshold voltage optimization.
△ Less
Submitted 12 January, 2023;
originally announced January 2023.
-
Security-Aware Approximate Spiking Neural Networks
Authors:
Syed Tihaam Ahmad,
Ayesha Siddique,
Khaza Anuarul Hoque
Abstract:
Deep Neural Networks (DNNs) and Spiking Neural Networks (SNNs) are both known for their susceptibility to adversarial attacks. Therefore, researchers in the recent past have extensively studied the robustness and defense of DNNs and SNNs under adversarial attacks. Compared to accurate SNNs (AccSNN), approximate SNNs (AxSNNs) are known to be up to 4X more energy-efficient for ultra-low power applic…
▽ More
Deep Neural Networks (DNNs) and Spiking Neural Networks (SNNs) are both known for their susceptibility to adversarial attacks. Therefore, researchers in the recent past have extensively studied the robustness and defense of DNNs and SNNs under adversarial attacks. Compared to accurate SNNs (AccSNN), approximate SNNs (AxSNNs) are known to be up to 4X more energy-efficient for ultra-low power applications. Unfortunately, the robustness of AxSNNs under adversarial attacks is yet unexplored. In this paper, we first extensively analyze the robustness of AxSNNs with different structural parameters and approximation levels under two gradient-based and two neuromorphic attacks. Then, we propose two novel defense methods, i.e., precision scaling and approximate quantization-aware filtering (AQF), for securing AxSNNs. We evaluated the effectiveness of these two defense methods using both static and neuromorphic datasets. Our results demonstrate that AxSNNs are more prone to adversarial attacks than AccSNNs, but precision scaling and AQF significantly improve the robustness of AxSNNs. For instance, a PGD attack on AxSNN results in a 72\% accuracy loss compared to AccSNN without any attack, whereas the same attack on the precision-scaled AxSNN leads to only a 17\% accuracy loss in the static MNIST dataset (4X robustness improvement). Similarly, a Sparse Attack on AxSNN leads to a 77\% accuracy loss when compared to AccSNN without any attack, whereas the same attack on an AxSNN with AQF leads to only a 2\% accuracy loss in the neuromorphic DVS128 Gesture dataset (38X robustness improvement).
△ Less
Submitted 12 January, 2023;
originally announced January 2023.
-
Tracking Passengers and Baggage Items using Multiple Overhead Cameras at Security Checkpoints
Authors:
Abubakar Siddique,
Henry Medeiros
Abstract:
We introduce a novel framework to track multiple objects in overhead camera videos for airport checkpoint security scenarios where targets correspond to passengers and their baggage items. We propose a Self-Supervised Learning (SSL) technique to provide the model information about instance segmentation uncertainty from overhead images. Our SSL approach improves object detection by employing a test…
▽ More
We introduce a novel framework to track multiple objects in overhead camera videos for airport checkpoint security scenarios where targets correspond to passengers and their baggage items. We propose a Self-Supervised Learning (SSL) technique to provide the model information about instance segmentation uncertainty from overhead images. Our SSL approach improves object detection by employing a test-time data augmentation and a regression-based, rotation-invariant pseudo-label refinement technique. Our pseudo-label generation method provides multiple geometrically-transformed images as inputs to a Convolutional Neural Network (CNN), regresses the augmented detections generated by the network to reduce localization errors, and then clusters them using the mean-shift algorithm. The self-supervised detector model is used in a single-camera tracking algorithm to generate temporal identifiers for the targets. Our method also incorporates a multi-view trajectory association mechanism to maintain consistent temporal identifiers as passengers travel across camera views. An evaluation of detection, tracking, and association performances on videos obtained from multiple overhead cameras in a realistic airport checkpoint environment demonstrates the effectiveness of the proposed approach. Our results show that self-supervision improves object detection accuracy by up to $42\%$ without increasing the inference time of the model. Our multi-camera association method achieves up to $89\%$ multi-object tracking accuracy with an average computation time of less than $15$ ms.
△ Less
Submitted 30 September, 2023; v1 submitted 31 December, 2022;
originally announced January 2023.
-
GraphTango: A Hybrid Representation Format for Efficient Streaming Graph Updates and Analysis
Authors:
Alif Ahmed,
Farzana Ahmed Siddique,
Kevin Skadron
Abstract:
Streaming graph processing involves performing updates and analytics on a time-evolving graph. The underlying representation format largely determines the throughputs of these updates and analytics phases. Existing formats usually employ variations of hash tables or adjacency lists. However, adjacency-list-based approaches perform poorly on heavy-tailed graphs, and the hash-based approaches suffer…
▽ More
Streaming graph processing involves performing updates and analytics on a time-evolving graph. The underlying representation format largely determines the throughputs of these updates and analytics phases. Existing formats usually employ variations of hash tables or adjacency lists. However, adjacency-list-based approaches perform poorly on heavy-tailed graphs, and the hash-based approaches suffer on short-tailed graphs. We propose GraphTango, a hybrid format that provides excellent update and analytics throughput regardless of the graph's degree distribution. GraphTango switches among three different formats based on a vertex's degree: i) Low-degree vertices store the edges directly with the neighborhood metadata, confining accesses to a single cache line, ii) Medium-degree vertices use adjacency lists, and iii) High-degree vertices use hash tables as well as adjacency lists. In this case, adjacency list provides fast traversal during the analytics phase, while the hash table provides constant-time lookups during the update phase. We further optimized the performance by designing an open-addressing-based hash table that fully utilizes every fetched cache line. In addition, we developed a thread-local lock-free memory pool that allows fast growing/shrinking of the adjacency lists and hash tables in a multi-threaded environment. We evaluated GraphTango with the help of the SAGA-Bench framework and compared it with four other representation formats. On average, GraphTango provides 4.5x higher insertion throughput, 3.2x higher deletion throughput, and 1.1x higher analytics throughput over the next best format. Furthermore, we integrated GraphTango with the state-of-the-art graph processing frameworks DZiG and RisGraph. Compared to the vanilla DZiG and vanilla RisGraph, [GraphTango + DZiG] and [GraphTango + RisGraph] reduces the average batch processing time by 2.3x and 1.5x, respectively.
△ Less
Submitted 22 December, 2022;
originally announced December 2022.
-
Deterministic vs. Non Deterministic Finite Automata in Automata Processing
Authors:
Farzana Ahmed Siddique,
Tommy James Tracy II,
Nathan Brunelle,
Kevin Skadron
Abstract:
Linear-time pattern matching engines have seen promising results using Finite Automata (FA) as their computation model. Among different FA variants, deterministic (DFA) and non-deterministic (NFA) are the most commonly used computation models for FA-based pattern matching engines. Moreover, NFA is the prevalent model in pattern matching engines on spatial architectures. The reasons are: i) DFA siz…
▽ More
Linear-time pattern matching engines have seen promising results using Finite Automata (FA) as their computation model. Among different FA variants, deterministic (DFA) and non-deterministic (NFA) are the most commonly used computation models for FA-based pattern matching engines. Moreover, NFA is the prevalent model in pattern matching engines on spatial architectures. The reasons are: i) DFA size, as in #states, can be exponential compared to equivalent NFA, ii) DFA cannot exploit the massive parallelism available on spatial architectures. This paper performs an empirical study on the #state of minimized DFA and optimized NFA across a diverse set of real-world benchmarks and shows that if distinct DFAs are generated for distinct patterns, #states of minimized DFA are typically equal to their equivalent optimized NFA. However, NFA is more robust in maintaining the low #states for some benchmarks. Thus, the choice of NFA vs. DFA for spatial architecture is less important than the need to generate distinct DFAs for each pattern and support these distinct DFAs' parallel processing. Finally, this paper presents a throughput study for von Neumann's architecture-based (CPU) vs. spatial architecture-based (FPGA) automata processing engines. The study shows that, based on the workload, neither CPU-based automata processing engine nor FPGA-based automata processing engine is the clear winner. If #patterns matched per workload increases, the CPU-based automata processing engine's throughput decreases. On the other hand, the FPGA-based automata processing engine lacks the memory spilling option; hence, it fails to accommodate an entire automata if it does not fit into FPGA's logic fabric. In the best-case scenario, the CPU has a 4.5x speedup over the FPGA, while for some benchmarks, the FPGA has a 32,530x speedup over the CPU.
△ Less
Submitted 18 October, 2022;
originally announced October 2022.
-
Self-supervised Learning for Panoptic Segmentation of Multiple Fruit Flower Species
Authors:
Abubakar Siddique,
Amy Tabb,
Henry Medeiros
Abstract:
Convolutional neural networks trained using manually generated labels are commonly used for semantic or instance segmentation. In precision agriculture, automated flower detection methods use supervised models and post-processing techniques that may not perform consistently as the appearance of the flowers and the data acquisition conditions vary. We propose a self-supervised learning strategy to…
▽ More
Convolutional neural networks trained using manually generated labels are commonly used for semantic or instance segmentation. In precision agriculture, automated flower detection methods use supervised models and post-processing techniques that may not perform consistently as the appearance of the flowers and the data acquisition conditions vary. We propose a self-supervised learning strategy to enhance the sensitivity of segmentation models to different flower species using automatically generated pseudo-labels. We employ a data augmentation and refinement approach to improve the accuracy of the model predictions. The augmented semantic predictions are then converted to panoptic pseudo-labels to iteratively train a multi-task model. The self-supervised model predictions can be refined with existing post-processing approaches to further improve their accuracy. An evaluation on a multi-species fruit tree flower dataset demonstrates that our method outperforms state-of-the-art models without computationally expensive post-processing steps, providing a new baseline for flower detection applications.
△ Less
Submitted 10 September, 2022;
originally announced September 2022.
-
Effect of Balancing Data Using Synthetic Data on the Performance of Machine Learning Classifiers for Intrusion Detection in Computer Networks
Authors:
Ayesha S. Dina,
A. B. Siddique,
D. Manivannan
Abstract:
Attacks on computer networks have increased significantly in recent days, due in part to the availability of sophisticated tools for launching such attacks as well as thriving underground cyber-crime economy to support it. Over the past several years, researchers in academia and industry used machine learning (ML) techniques to design and implement Intrusion Detection Systems (IDSes) for computer…
▽ More
Attacks on computer networks have increased significantly in recent days, due in part to the availability of sophisticated tools for launching such attacks as well as thriving underground cyber-crime economy to support it. Over the past several years, researchers in academia and industry used machine learning (ML) techniques to design and implement Intrusion Detection Systems (IDSes) for computer networks. Many of these researchers used datasets collected by various organizations to train ML models for predicting intrusions. In many of the datasets used in such systems, data are imbalanced (i.e., not all classes have equal amount of samples). With unbalanced data, the predictive models developed using ML algorithms may produce unsatisfactory classifiers which would affect accuracy in predicting intrusions. Traditionally, researchers used over-sampling and under-sampling for balancing data in datasets to overcome this problem. In this work, in addition to over-sampling, we also use a synthetic data generation method, called Conditional Generative Adversarial Network (CTGAN), to balance data and study their effect on various ML classifiers. To the best of our knowledge, no one else has used CTGAN to generate synthetic samples to balance intrusion detection datasets. Based on extensive experiments using a widely used dataset NSL-KDD, we found that training ML models on dataset balanced with synthetic samples generated by CTGAN increased prediction accuracy by up to $8\%$, compared to training the same ML models over unbalanced data. Our experiments also show that the accuracy of some ML models trained over data balanced with random over-sampling decline compared to the same ML models trained over unbalanced data.
△ Less
Submitted 31 March, 2022;
originally announced April 2022.
-
Incorporating Multi-Agent Systems Technology in Power and Energy Systems of Bangladesh: A Feasibility Study
Authors:
Syed Redwan Md Hassan,
Nazmul Hasan,
Mohammad Ali Siddique,
K. M Solaiman Fahim,
Rummana Rahman,
Lamia Iftekhar
Abstract:
The power sector of Bangladesh is presently experiencing essential changes as demand for power services is increasing with rising population and economic development. With a gradual shift from a rigidly centralized structure to a more decentralized and fluid setup, fundamentally because of the enormous advancement of distributed renewable energy sources, the future power system of the nation requi…
▽ More
The power sector of Bangladesh is presently experiencing essential changes as demand for power services is increasing with rising population and economic development. With a gradual shift from a rigidly centralized structure to a more decentralized and fluid setup, fundamentally because of the enormous advancement of distributed renewable energy sources, the future power system of the nation requires new control strategies to work efficiently and sustainably in the face of evolving conditions and constraints. Multi-Agent Systems (MAS) technology has attributes that meet these prerequisites of modern power systems and has been shown to be effective in dealing with its distributed and complex nature. This is a literature-based feasibility study to explore whether MAS technology is suited to be applied in the context of Bangladesh. For this preliminary paper, we look at the topic from a holistic perspective and conduct a meta-review to curate common applications of Multi-Agent System-based concepts, tools and algorithms on the power and energy sector. We also identify the top challenges of this domain in Bangladesh and connect the potential MAS-based solutions to address each challenge. Our qualitative assessment is motivated to provide a starting point for local researchers eager to experiment with MAS technology for application in Bangladesh.
△ Less
Submitted 16 March, 2022;
originally announced March 2022.
-
Graph-based hierarchical record clustering for unsupervised entity resolution
Authors:
Islam Akef Ebeid,
John R. Talburt,
Md Abdus Salam Siddique
Abstract:
Here we study the problem of matched record clustering in unsupervised entity resolution. We build upon a state-of-the-art probabilistic framework named the Data Washing Machine (DWM). We introduce a graph-based hierarchical 2-step record clustering method (GDWM) that first identifies large, connected components or, as we call them, soft clusters in the matched record pairs using a graph-based tra…
▽ More
Here we study the problem of matched record clustering in unsupervised entity resolution. We build upon a state-of-the-art probabilistic framework named the Data Washing Machine (DWM). We introduce a graph-based hierarchical 2-step record clustering method (GDWM) that first identifies large, connected components or, as we call them, soft clusters in the matched record pairs using a graph-based transitive closure algorithm utilized in the DWM. That is followed by breaking down the discovered soft clusters into more precise entity clusters in a hierarchical manner using an adapted graph-based modularity optimization method. Our approach provides several advantages over the original implementation of the DWM, mainly a significant speed-up, increased precision, and overall increased F1 scores. We demonstrate the efficacy of our approach using experiments on multiple synthetic datasets. Our results also provide evidence of the utility of graph theory-based algorithms despite their sparsity in the literature on unsupervised entity resolution.
△ Less
Submitted 12 December, 2021;
originally announced December 2021.
-
Is Approximation Universally Defensive Against Adversarial Attacks in Deep Neural Networks?
Authors:
Ayesha Siddique,
Khaza Anuarul Hoque
Abstract:
Approximate computing is known for its effectiveness in improvising the energy efficiency of deep neural network (DNN) accelerators at the cost of slight accuracy loss. Very recently, the inexact nature of approximate components, such as approximate multipliers have also been reported successful in defending adversarial attacks on DNNs models. Since the approximation errors traverse through the DN…
▽ More
Approximate computing is known for its effectiveness in improvising the energy efficiency of deep neural network (DNN) accelerators at the cost of slight accuracy loss. Very recently, the inexact nature of approximate components, such as approximate multipliers have also been reported successful in defending adversarial attacks on DNNs models. Since the approximation errors traverse through the DNN layers as masked or unmasked, this raises a key research question-can approximate computing always offer a defense against adversarial attacks in DNNs, i.e., are they universally defensive? Towards this, we present an extensive adversarial robustness analysis of different approximate DNN accelerators (AxDNNs) using the state-of-the-art approximate multipliers. In particular, we evaluate the impact of ten adversarial attacks on different AxDNNs using the MNIST and CIFAR-10 datasets. Our results demonstrate that adversarial attacks on AxDNNs can cause 53% accuracy loss whereas the same attack may lead to almost no accuracy loss (as low as 0.06%) in the accurate DNN. Thus, approximate computing cannot be referred to as a universal defense strategy against adversarial attacks.
△ Less
Submitted 2 December, 2021;
originally announced December 2021.
-
Generalized Zero-shot Intent Detection via Commonsense Knowledge
Authors:
A. B. Siddique,
Fuad Jamour,
Luxun Xu,
Vagelis Hristidis
Abstract:
Identifying user intents from natural language utterances is a crucial step in conversational systems that has been extensively studied as a supervised classification problem. However, in practice, new intents emerge after deploying an intent detection model. Thus, these models should seamlessly adapt and classify utterances with both seen and unseen intents -- unseen intents emerge after deployme…
▽ More
Identifying user intents from natural language utterances is a crucial step in conversational systems that has been extensively studied as a supervised classification problem. However, in practice, new intents emerge after deploying an intent detection model. Thus, these models should seamlessly adapt and classify utterances with both seen and unseen intents -- unseen intents emerge after deployment and they do not have training data. The few existing models that target this setting rely heavily on the scarcely available training data and overfit to seen intents data, resulting in a bias to misclassify utterances with unseen intents into seen ones. We propose RIDE: an intent detection model that leverages commonsense knowledge in an unsupervised fashion to overcome the issue of training data scarcity. RIDE computes robust and generalizable relationship meta-features that capture deep semantic relationships between utterances and intent labels; these features are computed by considering how the concepts in an utterance are linked to those in an intent label via commonsense knowledge. Our extensive experimental analysis on three widely-used intent detection benchmarks shows that relationship meta-features significantly increase the accuracy of detecting both seen and unseen intents and that RIDE outperforms the state-of-the-art model for unseen intents.
△ Less
Submitted 4 February, 2021;
originally announced February 2021.
-
Linguistically-Enriched and Context-Aware Zero-shot Slot Filling
Authors:
A. B. Siddique,
Fuad Jamour,
Vagelis Hristidis
Abstract:
Slot filling is identifying contiguous spans of words in an utterance that correspond to certain parameters (i.e., slots) of a user request/query. Slot filling is one of the most important challenges in modern task-oriented dialog systems. Supervised learning approaches have proven effective at tackling this challenge, but they need a significant amount of labeled training data in a given domain.…
▽ More
Slot filling is identifying contiguous spans of words in an utterance that correspond to certain parameters (i.e., slots) of a user request/query. Slot filling is one of the most important challenges in modern task-oriented dialog systems. Supervised learning approaches have proven effective at tackling this challenge, but they need a significant amount of labeled training data in a given domain. However, new domains (i.e., unseen in training) may emerge after deployment. Thus, it is imperative that these models seamlessly adapt and fill slots from both seen and unseen domains -- unseen domains contain unseen slot types with no training data, and even seen slots in unseen domains are typically presented in different contexts. This setting is commonly referred to as zero-shot slot filling. Little work has focused on this setting, with limited experimental evaluation. Existing models that mainly rely on context-independent embedding-based similarity measures fail to detect slot values in unseen domains or do so only partially. We propose a new zero-shot slot filling neural model, LEONA, which works in three steps. Step one acquires domain-oblivious, context-aware representations of the utterance word by exploiting (a) linguistic features; (b) named entity recognition cues; (c) contextual embeddings from pre-trained language models. Step two fine-tunes these rich representations and produces slot-independent tags for each word. Step three exploits generalizable context-aware utterance-slot similarity features at the word level, uses slot-independent tags, and contextualizes them to produce slot-specific predictions for each word. Our thorough evaluation on four diverse public datasets demonstrates that our approach consistently outperforms the SOTA models by 17.52%, 22.15%, 17.42%, and 17.95% on average for unseen domains on SNIPS, ATIS, MultiWOZ, and SGD datasets, respectively.
△ Less
Submitted 16 January, 2021;
originally announced January 2021.
-
Exploring Fault-Energy Trade-offs in Approximate DNN Hardware Accelerators
Authors:
Ayesha Siddique,
Kanad Basu,
Khaza Anuarul Hoque
Abstract:
Systolic array-based deep neural network (DNN) accelerators have recently gained prominence for their low computational cost. However, their high energy consumption poses a bottleneck to their deployment in energy-constrained devices. To address this problem, approximate computing can be employed at the cost of some tolerable accuracy loss. However, such small accuracy variations may increase the…
▽ More
Systolic array-based deep neural network (DNN) accelerators have recently gained prominence for their low computational cost. However, their high energy consumption poses a bottleneck to their deployment in energy-constrained devices. To address this problem, approximate computing can be employed at the cost of some tolerable accuracy loss. However, such small accuracy variations may increase the sensitivity of DNNs towards undesired subtle disturbances, such as permanent faults. The impact of permanent faults in accurate DNNs has been thoroughly investigated in the literature. Conversely, the impact of permanent faults in approximate DNN accelerators (AxDNNs) is yet under-explored. The impact of such faults may vary with the fault bit positions, activation functions and approximation errors in AxDNN layers. Such dynamacity poses a considerable challenge to exploring the trade-off between their energy efficiency and fault resilience in AxDNNs. Towards this, we present an extensive layer-wise and bit-wise fault resilience and energy analysis of different AxDNNs, using the state-of-the-art Evoapprox8b signed multipliers. In particular, we vary the stuck-at-0, stuck-at-1 fault-bit positions, and activation functions to study their impact using the most widely used MNIST and Fashion-MNIST datasets. Our quantitative analysis shows that the permanent faults exacerbate the accuracy loss in AxDNNs when compared to the accurate DNN accelerators. For instance, a permanent fault in AxDNNs can lead up to 66\% accuracy loss, whereas the same faulty bit can lead to only 9\% accuracy loss in an accurate DNN accelerator. Our results demonstrate that the fault resilience in AxDNNs is orthogonal to the energy efficiency.
△ Less
Submitted 8 January, 2021;
originally announced January 2021.
-
Deep Convolutional Neural Networks Model-based Brain Tumor Detection in Brain MRI Images
Authors:
Md. Abu Bakr Siddique,
Shadman Sakib,
Mohammad Mahmudur Rahman Khan,
Abyaz Kader Tanzeem,
Madiha Chowdhury,
Nowrin Yasmin
Abstract:
Diagnosing Brain Tumor with the aid of Magnetic Resonance Imaging (MRI) has gained enormous prominence over the years, primarily in the field of medical science. Detection and/or partitioning of brain tumors solely with the aid of MR imaging is achieved at the cost of immense time and effort and demands a lot of expertise from engaged personnel. This substantiates the necessity of fabricating an a…
▽ More
Diagnosing Brain Tumor with the aid of Magnetic Resonance Imaging (MRI) has gained enormous prominence over the years, primarily in the field of medical science. Detection and/or partitioning of brain tumors solely with the aid of MR imaging is achieved at the cost of immense time and effort and demands a lot of expertise from engaged personnel. This substantiates the necessity of fabricating an autonomous model brain tumor diagnosis. Our work involves implementing a deep convolutional neural network (DCNN) for diagnosing brain tumors from MR images. The dataset used in this paper consists of 253 brain MR images where 155 images are reported to have tumors. Our model can single out the MR images with tumors with an overall accuracy of 96%. The model outperformed the existing conventional methods for the diagnosis of brain tumor in the test dataset (Precision = 0.93, Sensitivity = 1.00, and F1-score = 0.97). Moreover, the proposed model's average precision-recall score is 0.93, Cohen's Kappa 0.91, and AUC 0.95. Therefore, the proposed model can help clinical experts verify whether the patient has a brain tumor and, consequently, accelerate the treatment procedure.
△ Less
Submitted 3 October, 2020;
originally announced October 2020.
-
Prediction of Temperature and Rainfall in Bangladesh using Long Short Term Memory Recurrent Neural Networks
Authors:
Mohammad Mahmudur Rahman Khan,
Md. Abu Bakr Siddique,
Shadman Sakib,
Anas Aziz,
Ihtyaz Kader Tasawar,
Ziad Hossain
Abstract:
Temperature and rainfall have a significant impact on economic growth as well as the outbreak of seasonal diseases in a region. In spite of that inadequate studies have been carried out for analyzing the weather pattern of Bangladesh implementing the artificial neural network. Therefore, in this study, we are implementing a Long Short-term Memory (LSTM) model to forecast the month-wise temperature…
▽ More
Temperature and rainfall have a significant impact on economic growth as well as the outbreak of seasonal diseases in a region. In spite of that inadequate studies have been carried out for analyzing the weather pattern of Bangladesh implementing the artificial neural network. Therefore, in this study, we are implementing a Long Short-term Memory (LSTM) model to forecast the month-wise temperature and rainfall by analyzing 115 years (1901-2015) of weather data of Bangladesh. The LSTM model has shown a mean error of -0.38oC in case of predicting the month-wise temperature for 2 years and -17.64mm in case of predicting the rainfall. This prediction model can help to understand the weather pattern changes as well as studying seasonal diseases of Bangladesh whose outbreaks are dependent on regional temperature and/or rainfall.
△ Less
Submitted 22 October, 2020;
originally announced October 2020.
-
App-Aware Response Synthesis for User Reviews
Authors:
Umar Farooq,
A. B. Siddique,
Fuad Jamour,
Zhijia Zhao,
Vagelis Hristidis
Abstract:
Responding to user reviews promptly and satisfactorily improves application ratings, which is key to application popularity and success. The proliferation of such reviews makes it virtually impossible for developers to keep up with responding manually. To address this challenge, recent work has shown the possibility of automatic response generation. However, because the training review-response pa…
▽ More
Responding to user reviews promptly and satisfactorily improves application ratings, which is key to application popularity and success. The proliferation of such reviews makes it virtually impossible for developers to keep up with responding manually. To address this challenge, recent work has shown the possibility of automatic response generation. However, because the training review-response pairs are aggregated from many different apps, it remains challenging for such models to generate app-specific responses, which, on the other hand, are often desirable as apps have different features and concerns. Solving the challenge by simply building a model per app (i.e., training with review-response pairs of a single app) may be insufficient because individual apps have limited review-response pairs, and such pairs typically lack the relevant information needed to respond to a new review. To enable app-specific response generation, this work proposes AARSynth: an app-aware response synthesis system. The key idea behind AARSynth is to augment the seq2seq model with information specific to a given app. Given a new user review, it first retrieves the top-K most relevant app reviews and the most relevant snippet from the app description. The retrieved information and the new user review are then fed into a fused machine learning model that integrates the seq2seq model with a machine reading comprehension model. The latter helps digest the retrieved reviews and app description. Finally, the fused model generates a response that is customized to the given app. We evaluated AARSynth using a large corpus of reviews and responses from Google Play. The results show that AARSynth outperforms the state-of-the-art system by 22.2% on BLEU-4 score. Furthermore, our human study shows that AARSynth produces a statistically significant improvement in response quality compared to the state-of-the-art system.
△ Less
Submitted 10 November, 2020; v1 submitted 30 July, 2020;
originally announced July 2020.
-
Performance Evaluation of t-SNE and MDS Dimensionality Reduction Techniques with KNN, ENN and SVM Classifiers
Authors:
Shadman Sakib,
Md. Abu Bakr Siddique,
Md. Abdur Rahman
Abstract:
The central goal of this paper is to establish two commonly available dimensionality reduction (DR) methods i.e. t-distributed Stochastic Neighbor Embedding (t-SNE) and Multidimensional Scaling (MDS) in Matlab and to observe their application in several datasets. These DR techniques are applied to nine different datasets namely CNAE9, Segmentation, Seeds, Pima Indians diabetes, Parkinsons, Movemen…
▽ More
The central goal of this paper is to establish two commonly available dimensionality reduction (DR) methods i.e. t-distributed Stochastic Neighbor Embedding (t-SNE) and Multidimensional Scaling (MDS) in Matlab and to observe their application in several datasets. These DR techniques are applied to nine different datasets namely CNAE9, Segmentation, Seeds, Pima Indians diabetes, Parkinsons, Movement Libras, Mammographic Masses, Knowledge, and Ionosphere acquired from UCI machine learning repository. By applying t-SNE and MDS algorithms, each dataset is transformed to the half of its original dimension by eliminating unnecessary features from the datasets. Subsequently, these datasets with reduced dimensions are fed into three supervised classification algorithms for classification. These classification algorithms are K Nearest Neighbors (KNN), Extended Nearest Neighbors (ENN), and Support Vector Machine (SVM). Again, all these algorithms are implemented in Matlab. The training and test data ratios are maintained as ninety percent: ten percent for each dataset. Upon accuracy observation, the efficiency for every dimensionality technique with availed classification algorithms is analyzed and the performance of each classifier is evaluated.
△ Less
Submitted 20 June, 2020;
originally announced July 2020.