-
Automated Personnel Selection for Software Engineers Using LLM-Based Profile Evaluation
Authors:
Ahmed Akib Jawad Karim,
Shahria Hoque,
Md. Golam Rabiul Alam,
Md. Zia Uddin
Abstract:
Organizational success in todays competitive employment market depends on choosing the right staff. This work evaluates software engineer profiles using an automated staff selection method based on advanced natural language processing (NLP) techniques. A fresh dataset was generated by collecting LinkedIn profiles with important attributes like education, experience, skills, and self-introduction.…
▽ More
Organizational success in todays competitive employment market depends on choosing the right staff. This work evaluates software engineer profiles using an automated staff selection method based on advanced natural language processing (NLP) techniques. A fresh dataset was generated by collecting LinkedIn profiles with important attributes like education, experience, skills, and self-introduction. Expert feedback helped transformer models including RoBERTa, DistilBERT, and a customized BERT variation, LastBERT, to be adjusted. The models were meant to forecast if a candidate's profile fit the selection criteria, therefore allowing automated ranking and assessment. With 85% accuracy and an F1 score of 0.85, RoBERTa performed the best; DistilBERT provided comparable results at less computing expense. Though light, LastBERT proved to be less effective, with 75% accuracy. The reusable models provide a scalable answer for further categorization challenges. This work presents a fresh dataset and technique as well as shows how transformer models could improve recruiting procedures. Expanding the dataset, enhancing model interpretability, and implementing the system in actual environments will be part of future activities.
△ Less
Submitted 3 November, 2024; v1 submitted 30 October, 2024;
originally announced October 2024.
-
AI Can Enhance Creativity in Social Networks
Authors:
Raiyan Abdul Baten,
Ali Sarosh Bangash,
Krish Veera,
Gourab Ghoshal,
Ehsan Hoque
Abstract:
Can peer recommendation engines elevate people's creative performances in self-organizing social networks? Answering this question requires resolving challenges in data collection (e.g., tracing inspiration links and psycho-social attributes of nodes) and intervention design (e.g., balancing idea stimulation and redundancy in evolving information environments). We trained a model that predicts peo…
▽ More
Can peer recommendation engines elevate people's creative performances in self-organizing social networks? Answering this question requires resolving challenges in data collection (e.g., tracing inspiration links and psycho-social attributes of nodes) and intervention design (e.g., balancing idea stimulation and redundancy in evolving information environments). We trained a model that predicts people's ideation performances using semantic and network-structural features in an online platform. Using this model, we built SocialMuse, which maximizes people's predicted performances to generate peer recommendations for them. We found treatment networks leveraging SocialMuse outperforming AI-agnostic control networks in several creativity measures. The treatment networks were more decentralized than the control, as SocialMuse increasingly emphasized network-structural features at large network sizes. This decentralization spreads people's inspiration sources, helping inspired ideas stand out better. Our study provides actionable insights into building intelligent systems for elevating creativity.
△ Less
Submitted 11 December, 2024; v1 submitted 19 October, 2024;
originally announced October 2024.
-
MTDNS: Moving Target Defense for Resilient DNS Infrastructure
Authors:
Abdullah Aydeger,
Pei Zhou,
Sanzida Hoque,
Marco Carvalho,
Engin Zeydan
Abstract:
One of the most critical components of the Internet that an attacker could exploit is the DNS (Domain Name System) protocol and infrastructure. Researchers have been constantly developing methods to detect and defend against the attacks against DNS, specifically DNS flooding attacks. However, most solutions discard packets for defensive approaches, which can cause legitimate packets to be dropped,…
▽ More
One of the most critical components of the Internet that an attacker could exploit is the DNS (Domain Name System) protocol and infrastructure. Researchers have been constantly developing methods to detect and defend against the attacks against DNS, specifically DNS flooding attacks. However, most solutions discard packets for defensive approaches, which can cause legitimate packets to be dropped, making them highly dependable on detection strategies. In this paper, we propose MTDNS, a resilient MTD-based approach that employs Moving Target Defense techniques through Software Defined Networking (SDN) switches to redirect traffic to alternate DNS servers that are dynamically created and run under the Network Function Virtualization (NFV) framework. The proposed approach is implemented in a testbed environment by running our DNS servers as separate Virtual Network Functions, NFV Manager, SDN switches, and an SDN Controller. The experimental result shows that the MTDNS approach achieves a much higher success rate in resolving DNS queries and significantly reduces average latency even if there is a DNS flooding attack.
△ Less
Submitted 3 October, 2024;
originally announced October 2024.
-
Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models
Authors:
Shayekh Bin Islam,
Md Asib Rahman,
K S M Tozammel Hossain,
Enamul Hoque,
Shafiq Joty,
Md Rizwan Parvez
Abstract:
Retrieval-Augmented Generation (RAG) has been shown to enhance the factual accuracy of Large Language Models (LLMs), but existing methods often suffer from limited reasoning capabilities in effectively using the retrieved evidence, particularly when using open-source LLMs. To mitigate this gap, we introduce a novel framework, Open-RAG, designed to enhance reasoning capabilities in RAG with open-so…
▽ More
Retrieval-Augmented Generation (RAG) has been shown to enhance the factual accuracy of Large Language Models (LLMs), but existing methods often suffer from limited reasoning capabilities in effectively using the retrieved evidence, particularly when using open-source LLMs. To mitigate this gap, we introduce a novel framework, Open-RAG, designed to enhance reasoning capabilities in RAG with open-source LLMs. Our framework transforms an arbitrary dense LLM into a parameter-efficient sparse mixture of experts (MoE) model capable of handling complex reasoning tasks, including both single- and multi-hop queries. Open-RAG uniquely trains the model to navigate challenging distractors that appear relevant but are misleading. As a result, Open-RAG leverages latent learning, dynamically selecting relevant experts and integrating external knowledge effectively for more accurate and contextually relevant responses. In addition, we propose a hybrid adaptive retrieval method to determine retrieval necessity and balance the trade-off between performance gain and inference speed. Experimental results show that the Llama2-7B-based Open-RAG outperforms state-of-the-art LLMs and RAG models such as ChatGPT, Self-RAG, and Command R+ in various knowledge-intensive tasks. We open-source our code and models at https://openragmoe.github.io/
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
Natural Language Generation for Visualizations: State of the Art, Challenges and Future Directions
Authors:
Enamul Hoque,
Mohammed Saidul Islam
Abstract:
Natural language and visualization are two complementary modalities of human communication that play a crucial role in conveying information effectively. While visualizations help people discover trends, patterns, and anomalies in data, natural language descriptions help explain these insights. Thus, combining text with visualizations is a prevalent technique for effectively delivering the core me…
▽ More
Natural language and visualization are two complementary modalities of human communication that play a crucial role in conveying information effectively. While visualizations help people discover trends, patterns, and anomalies in data, natural language descriptions help explain these insights. Thus, combining text with visualizations is a prevalent technique for effectively delivering the core message of the data. Given the rise of natural language generation (NLG), there is a growing interest in automatically creating natural language descriptions for visualizations, which can be used as chart captions, answering questions about charts, or telling data-driven stories. In this survey, we systematically review the state of the art on NLG for visualizations and introduce a taxonomy of the problem. The NLG tasks fall within the domain of Natural Language Interfaces (NLI) for visualization, an area that has garnered significant attention from both the research community and industry. To narrow down the scope of the survey, we primarily concentrate on the research works that focus on text generation for visualizations. To characterize the NLG problem and the design space of proposed solutions, we pose five Wh-questions, why and how NLG tasks are performed for visualizations, what the task inputs and outputs are, as well as where and when the generated texts are integrated with visualizations. We categorize the solutions used in the surveyed papers based on these "five Wh-questions." Finally, we discuss the key challenges and potential avenues for future research in this domain.
△ Less
Submitted 29 September, 2024;
originally announced September 2024.
-
Experimental observation of ballistic to diffusive transition in AlN thin films
Authors:
Md Shafkat Bin Hoque,
Michael E. Liao,
Saman Zare,
Zeyu Liu,
Yee Rui Koh,
Kenny Huynh,
Jingjing Shi,
Samuel Graham,
Tengfei Luo,
Habib Ahmad,
W. Alan Doolittle,
Mark S. Goorsky,
Patrick E. Hopkins
Abstract:
Bulk AlN possesses high thermal conductivity due to long phonon mean-free-paths, high group velocity, and long lifetimes. However, the thermal transport scenario becomes very different in a thin AlN film due to phonon-defect and phonon-boundary scattering. Herein, we report experimental observation of ballistic to diffusive transition in a series of AlN thin films (1.6 - 2440 nm) grown on sapphire…
▽ More
Bulk AlN possesses high thermal conductivity due to long phonon mean-free-paths, high group velocity, and long lifetimes. However, the thermal transport scenario becomes very different in a thin AlN film due to phonon-defect and phonon-boundary scattering. Herein, we report experimental observation of ballistic to diffusive transition in a series of AlN thin films (1.6 - 2440 nm) grown on sapphire substrates. The ballistic transport is characterized by constant thermal resistance as a function of film thickness due to phonon scattering by defects and boundaries. In this transport regime, phonons possess very small group velocities and lifetimes. The lifetime of the optical phonons increases by more than an order of magnitude in the diffusive regime, however, remains nearly constant afterwards. Our study is important for understanding the details of nano and microscale thermal transport in a highly conductive material.
△ Less
Submitted 22 September, 2024;
originally announced September 2024.
-
FuzzEval: Assessing Fuzzers on Generating Context-Sensitive Inputs
Authors:
S Mahmudul Hasan,
Polina Kozyreva,
Endadul Hoque
Abstract:
Cryptographic protocols form the backbone of modern security systems, yet vulnerabilities persist within their implementations. Traditional testing techniques, including fuzzing, have struggled to effectively identify vulnerabilities in cryptographic libraries due to their reliance on context-sensitive inputs. This paper presents a comprehensive evaluation of eleven state-of-the-art fuzzers' abili…
▽ More
Cryptographic protocols form the backbone of modern security systems, yet vulnerabilities persist within their implementations. Traditional testing techniques, including fuzzing, have struggled to effectively identify vulnerabilities in cryptographic libraries due to their reliance on context-sensitive inputs. This paper presents a comprehensive evaluation of eleven state-of-the-art fuzzers' ability to generate context-sensitive inputs for testing a cryptographic standard, PKCS#1-v1.5, across thirteen implementations. Our study reveals nuanced performance differences among the fuzzers in terms of the validity and diversity of the produced inputs. This investigation underscores the limitations of existing fuzzers in handling context-sensitive inputs. These findings are expected to drive further research and development in this area.
△ Less
Submitted 18 September, 2024;
originally announced September 2024.
-
Uddessho: An Extensive Benchmark Dataset for Multimodal Author Intent Classification in Low-Resource Bangla Language
Authors:
Fatema Tuj Johora Faria,
Mukaffi Bin Moin,
Md. Mahfuzur Rahman,
Md Morshed Alam Shanto,
Asif Iftekher Fahim,
Md. Moinul Hoque
Abstract:
With the increasing popularity of daily information sharing and acquisition on the Internet, this paper introduces an innovative approach for intent classification in Bangla language, focusing on social media posts where individuals share their thoughts and opinions. The proposed method leverages multimodal data with particular emphasis on authorship identification, aiming to understand the underl…
▽ More
With the increasing popularity of daily information sharing and acquisition on the Internet, this paper introduces an innovative approach for intent classification in Bangla language, focusing on social media posts where individuals share their thoughts and opinions. The proposed method leverages multimodal data with particular emphasis on authorship identification, aiming to understand the underlying purpose behind textual content, especially in the context of varied user-generated posts on social media. Current methods often face challenges in low-resource languages like Bangla, particularly when author traits intricately link with intent, as observed in social media posts. To address this, we present the Multimodal-based Author Bangla Intent Classification (MABIC) framework, utilizing text and images to gain deeper insights into the conveyed intentions. We have created a dataset named "Uddessho," comprising 3,048 instances sourced from social media. Our methodology comprises two approaches for classifying textual intent and multimodal author intent, incorporating early fusion and late fusion techniques. In our experiments, the unimodal approach achieved an accuracy of 64.53% in interpreting Bangla textual intent. In contrast, our multimodal approach significantly outperformed traditional unimodal methods, achieving an accuracy of 76.19%. This represents an improvement of 11.66%. To our best knowledge, this is the first research work on multimodal-based author intent classification for low-resource Bangla language social media posts.
△ Less
Submitted 14 September, 2024;
originally announced September 2024.
-
Mazed and Confused: A Dataset of Cybersickness, Working Memory, Mental Load, Physical Load, and Attention During a Real Walking Task in VR
Authors:
Jyotirmay Nag Setu,
Joshua M Le,
Ripan Kumar Kundu,
Barry Giesbrecht,
Tobias Höllerer,
Khaza Anuarul Hoque,
Kevin Desai,
John Quarles
Abstract:
Virtual Reality (VR) is quickly establishing itself in various industries, including training, education, medicine, and entertainment, in which users are frequently required to carry out multiple complex cognitive and physical activities. However, the relationship between cognitive activities, physical activities, and familiar feelings of cybersickness is not well understood and thus can be unpred…
▽ More
Virtual Reality (VR) is quickly establishing itself in various industries, including training, education, medicine, and entertainment, in which users are frequently required to carry out multiple complex cognitive and physical activities. However, the relationship between cognitive activities, physical activities, and familiar feelings of cybersickness is not well understood and thus can be unpredictable for developers. Researchers have previously provided labeled datasets for predicting cybersickness while users are stationary, but there have been few labeled datasets on cybersickness while users are physically walking. Thus, from 39 participants, we collected head orientation, head position, eye tracking, images, physiological readings from external sensors, and the self-reported cybersickness severity, physical load, and mental load in VR. Throughout the data collection, participants navigated mazes via real walking and performed tasks challenging their attention and working memory. To demonstrate the dataset's utility, we conducted a case study of training classifiers in which we achieved 95% accuracy for cybersickness severity classification. The noteworthy performance of the straightforward classifiers makes this dataset ideal for future researchers to develop cybersickness detection and reduction models. To better understand the features that helped with classification, we performed SHAP(SHapley Additive exPlanations) analysis, highlighting the importance of eye tracking and physiological measures for cybersickness prediction while walking. This open dataset can allow future researchers to study the connection between cybersickness and cognitive loads and develop prediction models. This dataset will empower future VR developers to design efficient and effective Virtual Environments by improving cognitive load management and minimizing cybersickness.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
Gate-tunable negative differential resistance in multifunctional van der Waals heterostructure
Authors:
Richa Mitra,
Konstantina Iordanidou,
Naveen Shetty,
Md Anamul Hoque,
Anushree Datta,
Alexei Kalaboukhov,
Julia Wiktor,
Sergey Kubatkin,
Saroj Prasad Dash,
Samuel Lara-Avila
Abstract:
Two-dimensional (2D) semiconductors have emerged as leading candidates for the development of low-power and multifunctional computing applications, thanks to their qualities such as layer-dependent band gap tunability, high carrier mobility, and excellent electrostatic control. Here, we explore a pair of 2D semiconductors with broken-gap (Type III) band alignment and demonstrate a highly gate-tuna…
▽ More
Two-dimensional (2D) semiconductors have emerged as leading candidates for the development of low-power and multifunctional computing applications, thanks to their qualities such as layer-dependent band gap tunability, high carrier mobility, and excellent electrostatic control. Here, we explore a pair of 2D semiconductors with broken-gap (Type III) band alignment and demonstrate a highly gate-tunable p-MoTe$_{2}$/n-SnS$_{2}$ heterojunction tunnel field-effect transistor with multifunctional behavior. Employing a dual-gated asymmetric device geometry, we unveil its functionality as both a forward and backward rectifying device. Consequently, we observe a highly gate-tunable negative differential resistance (NDR), with a gate-coupling efficiency of $η\simeq 0.5$ and a peak-to-valley ratio of $\sim$ 3 down to 150K. By employing density functional theory and exploring the density of states, we determine that interband tunneling within the valence bands is the cause of the observed NDR characteristics. The combination of band-to-band tunneling and gate controllability of NDR signal open the pathway for realizing gate-tunable 2D material-based neuromorphic and energy-efficient electronics.
△ Less
Submitted 7 September, 2024;
originally announced September 2024.
-
Exploration of new 212 MAB phases: M2AB2 (M=Mo, Ta; A=Ga, Ge) via DFT calculations
Authors:
A. K. M Naim Ishtiaq,
Md Nasir Uddin,
Md. Rasel Rana,
Shariful Islam,
Noor Afsary,
Karimul Hoque,
Md. Ashraf Ali
Abstract:
The recently developed MAB phases, an extension of the MAX phase, have sparked interest in research among scientists because of their better thermo-mechanical properties. In this paper, we have explored four new MAB phases M2AB2 (M=Mo, Ta and A=Ga, Ge) and studied the elastic, electronic, thermal, and optical properties to predict the possible applications. The stability of the new phases has been…
▽ More
The recently developed MAB phases, an extension of the MAX phase, have sparked interest in research among scientists because of their better thermo-mechanical properties. In this paper, we have explored four new MAB phases M2AB2 (M=Mo, Ta and A=Ga, Ge) and studied the elastic, electronic, thermal, and optical properties to predict the possible applications. The stability of the new phases has been confirmed by calculating formation energy (Ef), formation enthalpy (H), phonon dispersion curve (PDC), and elastic constant (Cij). The study reveals that M2AB2 (M=Mo, Ta and A=Ga, Ge) exhibit significantly higher elastic constants, elastic moduli, and Vickers hardness values than their counterpart 211 borides. Higher Vickers hardness values of Ta2AB2 (A=Ga, Ge) than Mo2AB2 (A=Ga, Ge) have been explained based on the values of the bond overlap population. The analysis of the density of states and electronic band structure revealed the metallic nature of the borides under examination. The thermodynamic characteristics of M2AB2 (M=Mo, Ta and A=Ga, Ge) under high temperatures (0 to 1000 K) are investigated using the quasi-harmonic Debye model. Critical thermal properties such as melting temperature (Tm), Gruneisen parameter, minimum thermal conductivity (Kmin), Debye temperature, and others are also computed. Compared with 211 MAX phases, the 212 phases exhibit higher values of Debye temperature and Tm, along with a lower value of Kmin. These findings suggest that the studied compounds exhibit superior thermal properties that are suitable for practical applications. The optical characteristics have been examined, and the reflectance spectrum indicates that the materials have the potential to mitigate solar heating across various energy regions.
△ Less
Submitted 30 August, 2024;
originally announced August 2024.
-
A Persistent Hierarchical Bloom Filter-based Framework for Authentication and Tracking of ICs
Authors:
Fairuz Shadmani Shishir,
Md Mashfiq Rizvee,
Tanvir Hossain,
Tamzidul Hoque,
Domenic Forte,
Sumaiya Shomaji
Abstract:
Detecting counterfeit integrated circuits (ICs) in unreliable supply chains demands robust tracking and authentication. Physical Unclonable Functions (PUFs) offer unique IC identifiers, but noise undermines their utility. This study introduces the Persistent Hierarchical Bloom Filter (PHBF) framework, ensuring swift and accurate IC authentication with an accuracy rate of 100% across the supply cha…
▽ More
Detecting counterfeit integrated circuits (ICs) in unreliable supply chains demands robust tracking and authentication. Physical Unclonable Functions (PUFs) offer unique IC identifiers, but noise undermines their utility. This study introduces the Persistent Hierarchical Bloom Filter (PHBF) framework, ensuring swift and accurate IC authentication with an accuracy rate of 100% across the supply chain even with noisy PUF-generated signatures.
△ Less
Submitted 22 September, 2024; v1 submitted 29 August, 2024;
originally announced August 2024.
-
Post-Quantum Secure UE-to-UE Communications
Authors:
Sanzida Hoque,
Abdullah Aydeger,
Engin Zeydan
Abstract:
The rapid development of quantum computing poses a significant threat to the security of current cryptographic systems, including those used in User Equipment (UE) for mobile communications. Conventional cryptographic algorithms such as Rivest-Shamir-Adleman (RSA) and Elliptic curve cryptography (ECC) are vulnerable to quantum computing attacks, which could jeopardize the confidentiality, integrit…
▽ More
The rapid development of quantum computing poses a significant threat to the security of current cryptographic systems, including those used in User Equipment (UE) for mobile communications. Conventional cryptographic algorithms such as Rivest-Shamir-Adleman (RSA) and Elliptic curve cryptography (ECC) are vulnerable to quantum computing attacks, which could jeopardize the confidentiality, integrity, and availability of sensitive data transmitted by UEs. This demo paper proposes the integration of Post-Quantum Cryptography (PQC) in TLS for UE Communication to mitigate the risks of quantum attacks. We present our setup and explain each of the components used. We also provide the entire workflow of the demo for other researchers to replicate the same setup. By addressing the implementation of PQC within a 5G network to secure UE-to-UE communication, this research aims to pave the way for developing quantum-resistant mobile devices and securing the future of wireless communications.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
SHARP-Net: A Refined Pyramid Network for Deficiency Segmentation in Culverts and Sewer Pipes
Authors:
Rasha Alshawi,
Md Meftahul Ferdaus,
Md Tamjidul Hoque,
Kendall Niles,
Ken Pathak,
Steve Sloan,
Mahdi Abdelguerfi
Abstract:
This paper introduces Semantic Haar-Adaptive Refined Pyramid Network (SHARP-Net), a novel architecture for semantic segmentation. SHARP-Net integrates a bottom-up pathway featuring Inception-like blocks with varying filter sizes (3x3…
▽ More
This paper introduces Semantic Haar-Adaptive Refined Pyramid Network (SHARP-Net), a novel architecture for semantic segmentation. SHARP-Net integrates a bottom-up pathway featuring Inception-like blocks with varying filter sizes (3x3$ and 5x5), parallel max-pooling, and additional spatial detection layers. This design captures multi-scale features and fine structural details. Throughout the network, depth-wise separable convolutions are used to reduce complexity. The top-down pathway of SHARP-Net focuses on generating high-resolution features through upsampling and information fusion using $1\times1$ and $3\times3$ depth-wise separable convolutions. We evaluated our model using our developed challenging Culvert-Sewer Defects dataset and the benchmark DeepGlobe Land Cover dataset. Our experimental evaluation demonstrated the base model's (excluding Haar-like features) effectiveness in handling irregular defect shapes, occlusions, and class imbalances. It outperformed state-of-the-art methods, including U-Net, CBAM U-Net, ASCU-Net, FPN, and SegFormer, achieving average improvements of 14.4% and 12.1% on the Culvert-Sewer Defects and DeepGlobe Land Cover datasets, respectively, with IoU scores of 77.2% and 70.6%. Additionally, the training time was reduced. Furthermore, the integration of carefully selected and fine-tuned Haar-like features enhanced the performance of deep learning models by at least 20%. The proposed SHARP-Net, incorporating Haar-like features, achieved an impressive IoU of 94.75%, representing a 22.74% improvement over the base model. These features were also applied to other deep learning models, showing a 35.0% improvement, proving their versatility and effectiveness. SHARP-Net thus provides a powerful and efficient solution for accurate semantic segmentation in challenging real-world scenarios.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
DataNarrative: Automated Data-Driven Storytelling with Visualizations and Texts
Authors:
Mohammed Saidul Islam,
Md Tahmid Rahman Laskar,
Md Rizwan Parvez,
Enamul Hoque,
Shafiq Joty
Abstract:
Data-driven storytelling is a powerful method for conveying insights by combining narrative techniques with visualizations and text. These stories integrate visual aids, such as highlighted bars and lines in charts, along with textual annotations explaining insights. However, creating such stories requires a deep understanding of the data and meticulous narrative planning, often necessitating huma…
▽ More
Data-driven storytelling is a powerful method for conveying insights by combining narrative techniques with visualizations and text. These stories integrate visual aids, such as highlighted bars and lines in charts, along with textual annotations explaining insights. However, creating such stories requires a deep understanding of the data and meticulous narrative planning, often necessitating human intervention, which can be time-consuming and mentally taxing. While Large Language Models (LLMs) excel in various NLP tasks, their ability to generate coherent and comprehensive data stories remains underexplored. In this work, we introduce a novel task for data story generation and a benchmark containing 1,449 stories from diverse sources. To address the challenges of crafting coherent data stories, we propose a multiagent framework employing two LLM agents designed to replicate the human storytelling process: one for understanding and describing the data (Reflection), generating the outline, and narration, and another for verification at each intermediary step. While our agentic framework generally outperforms non-agentic counterparts in both model-based and human evaluations, the results also reveal unique challenges in data story generation.
△ Less
Submitted 3 October, 2024; v1 submitted 9 August, 2024;
originally announced August 2024.
-
On supertranslation invariant Lorentz charges
Authors:
Sumanta Chakraborty,
Sk Jahanur Hoque,
Amitabh Virmani
Abstract:
In recent papers, Fuentealba, Henneaux, and Troessaert (FHT) gave definitions for supertranslation invariant Lorentz charges in the ADM Hamiltonian formalism and showed that their definitions match with the Chen, Wang, Yau (CWY) definitions of Lorentz charges at null infinity which are free from ``supertranslation ambiguities''. In this brief note, motivated by the analysis of FHT, we write expres…
▽ More
In recent papers, Fuentealba, Henneaux, and Troessaert (FHT) gave definitions for supertranslation invariant Lorentz charges in the ADM Hamiltonian formalism and showed that their definitions match with the Chen, Wang, Yau (CWY) definitions of Lorentz charges at null infinity which are free from ``supertranslation ambiguities''. In this brief note, motivated by the analysis of FHT, we write expressions for the supertranslation invariant Lorentz charges in Beig-Schmidt variables at spacelike and timelike infinity. We present calculations, building upon the work of Compère, Gralla, and Wei (CGW), to show that our expressions for supertranslation invariant Lorentz charges match the CWY definitions at null infinity.
△ Less
Submitted 11 March, 2025; v1 submitted 26 July, 2024;
originally announced July 2024.
-
MHD activity induced coherent mode excitation in the edge plasma region of ADITYA-U Tokamak
Authors:
Kaushlender Singh,
Suman Dolui,
Bharat Hegde,
Lavkesh Lachhvani,
Sharvil Patel,
Injamul Hoque,
Ashok K. Kumawat,
Ankit Kumar,
Tanmay Macwan,
Harshita Raj,
Soumitra Banerjee,
Komal Yadav,
Abha Kanik,
Pramila Gautam,
Rohit Kumar,
Suman Aich,
Laxmikanta Pradhan,
Ankit Patel,
Kalpesh Galodiya,
Daniel Raju,
S. K. Jha,
K. A. Jadeja,
K. M. Patel,
S. N. Pandya,
M. B. Chaudhary
, et al. (6 additional authors not shown)
Abstract:
In this paper, we report the excitation of coherent density and potential fluctuations induced by magnetohydrodynamic (MHD) activity in the edge plasma region of ADITYA-U Tokamak. When the amplitude of the MHD mode, mainly the m/n = 2/1, increases beyond a threshold value of 0.3-0.4 %, coherent oscillations in the density and potential fluctuations are observed having the same frequency as that of…
▽ More
In this paper, we report the excitation of coherent density and potential fluctuations induced by magnetohydrodynamic (MHD) activity in the edge plasma region of ADITYA-U Tokamak. When the amplitude of the MHD mode, mainly the m/n = 2/1, increases beyond a threshold value of 0.3-0.4 %, coherent oscillations in the density and potential fluctuations are observed having the same frequency as that of the MHD mode. The mode numbers of these MHD induced density and potential fluctuations are obtained by Langmuir probes placed at different radial, poloidal, and toroidal locations in the edge plasma region. Detailed analyses of these Langmuir probe measurements reveal that the coherent mode in edge potential fluctuation has a mode structure of m/n = 2/1 whereas the edge density fluctuation has an m/n = 1/1 structure. It is further observed that beyond the threshold, the coupled power fraction scales almost linearly with the magnitude of magnetic fluctuations. Furthermore, the rise rates of the coupled power fraction for coherent modes in density and potential fluctuations are also found to be dependent on the growth rate of magnetic fluctuations. The disparate mode structures of the excited modes in density and plasma potential fluctuations suggest that the underlying mechanism for their existence is most likely due to the excitation of the global high-frequency branch of zonal flows occurring through the coupling of even harmonics of potential to the odd harmonics of pressure due to 1/R dependence of the toroidal magnetic field.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
ReactAIvate: A Deep Learning Approach to Predicting Reaction Mechanisms and Unmasking Reactivity Hotspots
Authors:
Ajnabiul Hoque,
Manajit Das,
Mayank Baranwal,
Raghavan B. Sunoj
Abstract:
A chemical reaction mechanism (CRM) is a sequence of molecular-level events involving bond-breaking/forming processes, generating transient intermediates along the reaction pathway as reactants transform into products. Understanding such mechanisms is crucial for designing and discovering new reactions. One of the currently available methods to probe CRMs is quantum mechanical (QM) computations. T…
▽ More
A chemical reaction mechanism (CRM) is a sequence of molecular-level events involving bond-breaking/forming processes, generating transient intermediates along the reaction pathway as reactants transform into products. Understanding such mechanisms is crucial for designing and discovering new reactions. One of the currently available methods to probe CRMs is quantum mechanical (QM) computations. The resource-intensive nature of QM methods and the scarcity of mechanism-based datasets motivated us to develop reliable ML models for predicting mechanisms. In this study, we created a comprehensive dataset with seven distinct classes, each representing uniquely characterized elementary steps. Subsequently, we developed an interpretable attention-based GNN that achieved near-unity and 96% accuracy, respectively for reaction step classification and the prediction of reactive atoms in each such step, capturing interactions between the broader reaction context and local active regions. The near-perfect classification enables accurate prediction of both individual events and the entire CRM, mitigating potential drawbacks of Seq2Seq approaches, where a wrongly predicted character leads to incoherent CRM identification. In addition to interpretability, our model adeptly identifies key atom(s) even from out-of-distribution classes. This generalizabilty allows for the inclusion of new reaction types in a modular fashion, thus will be of value to experts for understanding the reactivity of new molecules.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild
Authors:
Ahmed Masry,
Megh Thakkar,
Aayush Bajaj,
Aaryaman Kartha,
Enamul Hoque,
Shafiq Joty
Abstract:
Given the ubiquity of charts as a data analysis, visualization, and decision-making tool across industries and sciences, there has been a growing interest in developing pre-trained foundation models as well as general purpose instruction-tuned models for chart understanding and reasoning. However, existing methods suffer crucial drawbacks across two critical axes affecting the performance of chart…
▽ More
Given the ubiquity of charts as a data analysis, visualization, and decision-making tool across industries and sciences, there has been a growing interest in developing pre-trained foundation models as well as general purpose instruction-tuned models for chart understanding and reasoning. However, existing methods suffer crucial drawbacks across two critical axes affecting the performance of chart representation models: they are trained on data generated from underlying data tables of the charts, ignoring the visual trends and patterns in chart images, and use weakly aligned vision-language backbone models for domain-specific training, limiting their generalizability when encountering charts in the wild. We address these important drawbacks and introduce ChartGemma, a novel chart understanding and reasoning model developed over PaliGemma. Rather than relying on underlying data tables, ChartGemma is trained on instruction-tuning data generated directly from chart images, thus capturing both high-level trends and low-level visual information from a diverse set of charts. Our simple approach achieves state-of-the-art results across $5$ benchmarks spanning chart summarization, question answering, and fact-checking, and our elaborate qualitative studies on real-world charts show that ChartGemma generates more realistic and factually correct summaries compared to its contemporaries. We release the code, model checkpoints, dataset, and demos at https://github.com/vis-nlp/ChartGemma.
△ Less
Submitted 3 November, 2024; v1 submitted 4 July, 2024;
originally announced July 2024.
-
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Authors:
Md Tahmid Rahman Laskar,
Sawsan Alqahtani,
M Saiful Bari,
Mizanur Rahman,
Mohammad Abdullah Matin Khan,
Haidar Khan,
Israt Jahan,
Amran Bhuiyan,
Chee Wei Tan,
Md Rizwan Parvez,
Enamul Hoque,
Shafiq Joty,
Jimmy Huang
Abstract:
Large Language Models (LLMs) have recently gained significant attention due to their remarkable capabilities in performing diverse tasks across various domains. However, a thorough evaluation of these models is crucial before deploying them in real-world applications to ensure they produce reliable performance. Despite the well-established importance of evaluating LLMs in the community, the comple…
▽ More
Large Language Models (LLMs) have recently gained significant attention due to their remarkable capabilities in performing diverse tasks across various domains. However, a thorough evaluation of these models is crucial before deploying them in real-world applications to ensure they produce reliable performance. Despite the well-established importance of evaluating LLMs in the community, the complexity of the evaluation process has led to varied evaluation setups, causing inconsistencies in findings and interpretations. To address this, we systematically review the primary challenges and limitations causing these inconsistencies and unreliable evaluations in various steps of LLM evaluation. Based on our critical review, we present our perspectives and recommendations to ensure LLM evaluations are reproducible, reliable, and robust.
△ Less
Submitted 3 October, 2024; v1 submitted 4 July, 2024;
originally announced July 2024.
-
Exploring the Role of Randomization on Belief Rigidity in Online Social Networks
Authors:
Adiba Mahbub Proma,
Neeley Pate,
Raiyan Abdul Baten,
Sifeng Chen,
James Druckman,
Gourab Ghoshal,
Ehsan Hoque
Abstract:
People often stick to their existing beliefs, ignoring contradicting evidence or only interacting with those who reinforce their views. Social media platforms often facilitate such tendencies of homophily and echo-chambers as they promote highly personalized content to maximize user engagement. However, increased belief rigidity can negatively affect real-world policy decisions such as leading to…
▽ More
People often stick to their existing beliefs, ignoring contradicting evidence or only interacting with those who reinforce their views. Social media platforms often facilitate such tendencies of homophily and echo-chambers as they promote highly personalized content to maximize user engagement. However, increased belief rigidity can negatively affect real-world policy decisions such as leading to climate change inaction and increased vaccine hesitancy. To understand and effectively tackle belief rigidity on online social networks, designing and evaluating various intervention strategies is crucial, and increasing randomization in the network can be considered one such intervention. In this paper, we empirically quantify the effects of a randomized social network structure on belief rigidity, specifically examining the potential benefits of introducing randomness into the network. We show that individuals' beliefs are positively influenced by peer opinions, regardless of whether those opinions are similar to or differ from their own by passively sensing belief rigidity through our experimental framework. Moreover, people incorporate a slightly higher variety of different peers (based on their opinions) into their networks when the recommendation algorithm provides them with diverse content, compared to when it provides them with similar content. Our results indicate that in some cases, there might be benefits to randomization, providing empirical evidence that a more randomized network could be a feasible way of helping people get out of their echo-chambers. Our findings have broader implications in computing and platform design of social media, and can help combat overly rigid beliefs in online social networks.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Accessible, At-Home Detection of Parkinson's Disease via Multi-task Video Analysis
Authors:
Md Saiful Islam,
Tariq Adnan,
Jan Freyberg,
Sangwu Lee,
Abdelrahman Abdelkader,
Meghan Pawlik,
Cathe Schwartz,
Karen Jaffe,
Ruth B. Schneider,
E Ray Dorsey,
Ehsan Hoque
Abstract:
Limited accessibility to neurological care leads to underdiagnosed Parkinson's Disease (PD), preventing early intervention. Existing AI-based PD detection methods primarily focus on unimodal analysis of motor or speech tasks, overlooking the multifaceted nature of the disease. To address this, we introduce a large-scale, multi-task video dataset consisting of 1102 sessions (each containing videos…
▽ More
Limited accessibility to neurological care leads to underdiagnosed Parkinson's Disease (PD), preventing early intervention. Existing AI-based PD detection methods primarily focus on unimodal analysis of motor or speech tasks, overlooking the multifaceted nature of the disease. To address this, we introduce a large-scale, multi-task video dataset consisting of 1102 sessions (each containing videos of finger tapping, facial expression, and speech tasks captured via webcam) from 845 participants (272 with PD). We propose a novel Uncertainty-calibrated Fusion Network (UFNet) that leverages this multimodal data to enhance diagnostic accuracy. UFNet employs independent task-specific networks, trained with Monte Carlo Dropout for uncertainty quantification, followed by self-attended fusion of features, with attention weights dynamically adjusted based on task-specific uncertainties. To ensure patient-centered evaluation, the participants were randomly split into three sets: 60% for training, 20% for model selection, and 20% for final performance evaluation. UFNet significantly outperformed single-task models in terms of accuracy, area under the ROC curve (AUROC), and sensitivity while maintaining non-inferior specificity. Withholding uncertain predictions further boosted the performance, achieving 88.0+-0.3%$ accuracy, 93.0+-0.2% AUROC, 79.3+-0.9% sensitivity, and 92.6+-0.3% specificity, at the expense of not being able to predict for 2.3+-0.3% data (+- denotes 95% confidence interval). Further analysis suggests that the trained model does not exhibit any detectable bias across sex and ethnic subgroups and is most effective for individuals aged between 50 and 80. Requiring only a webcam and microphone, our approach facilitates accessible home-based PD screening, especially in regions with limited healthcare resources.
△ Less
Submitted 26 April, 2025; v1 submitted 21 June, 2024;
originally announced June 2024.
-
Classification of Non-native Handwritten Characters Using Convolutional Neural Network
Authors:
F. A. Mamun,
S. A. H. Chowdhury,
J. E. Giti,
H. Sarker
Abstract:
The use of convolutional neural networks (CNNs) has accelerated the progress of handwritten character classification/recognition. Handwritten character recognition (HCR) has found applications in various domains, such as traffic signal detection, language translation, and document information extraction. However, the widespread use of existing HCR technology is yet to be seen as it does not provid…
▽ More
The use of convolutional neural networks (CNNs) has accelerated the progress of handwritten character classification/recognition. Handwritten character recognition (HCR) has found applications in various domains, such as traffic signal detection, language translation, and document information extraction. However, the widespread use of existing HCR technology is yet to be seen as it does not provide reliable character recognition with outstanding accuracy. One of the reasons for unreliable HCR is that existing HCR methods do not take the handwriting styles of non-native writers into account. Hence, further improvement is needed to ensure the reliability and extensive deployment of character recognition technologies for critical tasks. In this work, the classification of English characters written by non-native users is performed by proposing a custom-tailored CNN model. We train this CNN with a new dataset called the handwritten isolated English character (HIEC) dataset. This dataset consists of 16,496 images collected from 260 persons. This paper also includes an ablation study of our CNN by adjusting hyperparameters to identify the best model for the HIEC dataset. The proposed model with five convolutional layers and one hidden layer outperforms state-of-the-art models in terms of character recognition accuracy and achieves an accuracy of $\mathbf{97.04}$%. Compared with the second-best model, the relative improvement of our model in terms of classification accuracy is $\mathbf{4.38}$%.
△ Less
Submitted 25 September, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
Hi5: 2D Hand Pose Estimation with Zero Human Annotation
Authors:
Masum Hasan,
Cengiz Ozel,
Nina Long,
Alexander Martin,
Samuel Potter,
Tariq Adnan,
Sangwu Lee,
Amir Zadeh,
Ehsan Hoque
Abstract:
We propose a new large synthetic hand pose estimation dataset, Hi5, and a novel inexpensive method for collecting high-quality synthetic data that requires no human annotation or validation. Leveraging recent advancements in computer graphics, high-fidelity 3D hand models with diverse genders and skin colors, and dynamic environments and camera movements, our data synthesis pipeline allows precise…
▽ More
We propose a new large synthetic hand pose estimation dataset, Hi5, and a novel inexpensive method for collecting high-quality synthetic data that requires no human annotation or validation. Leveraging recent advancements in computer graphics, high-fidelity 3D hand models with diverse genders and skin colors, and dynamic environments and camera movements, our data synthesis pipeline allows precise control over data diversity and representation, ensuring robust and fair model training. We generate a dataset with 583,000 images with accurate pose annotation using a single consumer PC that closely represents real-world variability. Pose estimation models trained with Hi5 perform competitively on real-hand benchmarks while surpassing models trained with real data when tested on occlusions and perturbations. Our experiments show promising results for synthetic data as a viable solution for data representation problems in real datasets. Overall, this paper provides a promising new approach to synthetic data creation and annotation that can reduce costs and increase the diversity and quality of data for hand pose estimation.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Are Large Vision Language Models up to the Challenge of Chart Comprehension and Reasoning? An Extensive Investigation into the Capabilities and Limitations of LVLMs
Authors:
Mohammed Saidul Islam,
Raian Rahman,
Ahmed Masry,
Md Tahmid Rahman Laskar,
Mir Tafseer Nayeem,
Enamul Hoque
Abstract:
Natural language is a powerful complementary modality of communication for data visualizations, such as bar and line charts. To facilitate chart-based reasoning using natural language, various downstream tasks have been introduced recently such as chart question answering, chart summarization, and fact-checking with charts. These tasks pose a unique challenge, demanding both vision-language reason…
▽ More
Natural language is a powerful complementary modality of communication for data visualizations, such as bar and line charts. To facilitate chart-based reasoning using natural language, various downstream tasks have been introduced recently such as chart question answering, chart summarization, and fact-checking with charts. These tasks pose a unique challenge, demanding both vision-language reasoning and a nuanced understanding of chart data tables, visual encodings, and natural language prompts. Despite the recent success of Large Language Models (LLMs) across diverse NLP tasks, their abilities and limitations in the realm of data visualization remain under-explored, possibly due to their lack of multi-modal capabilities. To bridge the gap, this paper presents the first comprehensive evaluation of the recently developed large vision language models (LVLMs) for chart understanding and reasoning tasks. Our evaluation includes a comprehensive assessment of LVLMs, including GPT-4V and Gemini, across four major chart reasoning tasks. Furthermore, we perform a qualitative evaluation of LVLMs' performance on a diverse range of charts, aiming to provide a thorough analysis of their strengths and weaknesses. Our findings reveal that LVLMs demonstrate impressive abilities in generating fluent texts covering high-level data insights while also encountering common problems like hallucinations, factual errors, and data bias. We highlight the key strengths and limitations of chart comprehension tasks, offering insights for future research.
△ Less
Submitted 3 October, 2024; v1 submitted 31 May, 2024;
originally announced June 2024.
-
Influence of mid-infrared Galactic bubble on surroundings: A case study on IRAS 16489-4431
Authors:
Ariful Hoque,
Tapas Baug,
Lokesh Dewangan,
Ke Wang,
Tie Liu,
Soumen Mondal
Abstract:
We studied the influence of a massive star on a mid-infrared bubble and its surrounding gas in the IRAS\,16489-4431 star-forming region using multi-wavelength data. The {\it Spitzer} mid-infrared band images revealed the shocked nature of the bubble. Analyses showed that the bubble is developed by a massive star owing to its strong radiation pressure. Evidence of collected material along the edge…
▽ More
We studied the influence of a massive star on a mid-infrared bubble and its surrounding gas in the IRAS\,16489-4431 star-forming region using multi-wavelength data. The {\it Spitzer} mid-infrared band images revealed the shocked nature of the bubble. Analyses showed that the bubble is developed by a massive star owing to its strong radiation pressure. Evidence of collected material along the edge of the bubble was noted by the cold gas tracer line observed using Atacama Millimeter/submillimeter Array (ALMA). The presence of dense dust cores with bi-polar outflows and young stellar objects toward the collected material is suggestive of active star formation possibly influenced by the expansion of the radiation driven bubble.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
A Novel Fusion Architecture for PD Detection Using Semi-Supervised Speech Embeddings
Authors:
Tariq Adnan,
Abdelrahman Abdelkader,
Zipei Liu,
Ekram Hossain,
Sooyong Park,
MD Saiful Islam,
Ehsan Hoque
Abstract:
We present a framework to recognize Parkinson's disease (PD) through an English pangram utterance speech collected using a web application from diverse recording settings and environments, including participants' homes. Our dataset includes a global cohort of 1306 participants, including 392 diagnosed with PD. Leveraging the diversity of the dataset, spanning various demographic properties (such a…
▽ More
We present a framework to recognize Parkinson's disease (PD) through an English pangram utterance speech collected using a web application from diverse recording settings and environments, including participants' homes. Our dataset includes a global cohort of 1306 participants, including 392 diagnosed with PD. Leveraging the diversity of the dataset, spanning various demographic properties (such as age, sex, and ethnicity), we used deep learning embeddings derived from semi-supervised models such as Wav2Vec 2.0, WavLM, and ImageBind representing the speech dynamics associated with PD. Our novel fusion model for PD classification, which aligns different speech embeddings into a cohesive feature space, demonstrated superior performance over standard concatenation-based fusion models and other baselines (including models built on traditional acoustic features). In a randomized data split configuration, the model achieved an Area Under the Receiver Operating Characteristic Curve (AUROC) of 88.94% and an accuracy of 85.65%. Rigorous statistical analysis confirmed that our model performs equitably across various demographic subgroups in terms of sex, ethnicity, and age, and remains robust regardless of disease duration. Furthermore, our model, when tested on two entirely unseen test datasets collected from clinical settings and from a PD care center, maintained AUROC scores of 82.12% and 78.44%, respectively. This affirms the model's robustness and it's potential to enhance accessibility and health equity in real-world applications.
△ Less
Submitted 18 November, 2024; v1 submitted 21 May, 2024;
originally announced May 2024.
-
Wealth inequality and utility: Effect evaluation of redistribution and consumption morals using the macro-econophysical coupled approach
Authors:
Takeshi Kato,
Yosuke Tanabe,
Mohammad Rezoanul Hoque
Abstract:
Reducing wealth inequality and increasing utility are critical issues. This study reveals the effects of redistribution and consumption morals on wealth inequality and utility. To this end, we present a novel approach that couples the dynamic model of capital, consumption, and utility in macroeconomics with the interaction model of joint business and redistribution in econophysics. With this appro…
▽ More
Reducing wealth inequality and increasing utility are critical issues. This study reveals the effects of redistribution and consumption morals on wealth inequality and utility. To this end, we present a novel approach that couples the dynamic model of capital, consumption, and utility in macroeconomics with the interaction model of joint business and redistribution in econophysics. With this approach, we calculate the capital (wealth), the utility based on consumption, and the Gini index of these inequality using redistribution and consumption thresholds as moral parameters. The results show that: under-redistribution and waste exacerbate inequality; conversely, over-redistribution and stinginess reduce utility; and a balanced moderate moral leads to achieve both reduced inequality and increased utility. These findings provide renewed economic and numerical support for the moral importance known from philosophy, anthropology, and religion. The revival of redistribution and consumption morals should promote the transformation to a human mutual-aid economy, as indicated by philosopher and anthropologist, instead of the capitalist economy that has produced the current inequality. The practical challenge is to implement bottom-up social business, on a foothold of worker coops and platform cooperatives as a community against the state and the market, with moral consensus and its operation.
△ Less
Submitted 17 April, 2025; v1 submitted 22 May, 2024;
originally announced May 2024.
-
de Sitter Teukolsky waves
Authors:
Harsh,
Sk Jahanur Hoque,
Sitender Pratap Kashyap,
Amitabh Virmani
Abstract:
We present de Sitter Teukolsky waves -- linearised quadrupolar gravitational waves in the transverse-traceless gauge in de Sitter spacetime. In the cosmological constant $Λ$ going to zero limit, our solutions match to Teukolsky solutions. For non-zero $Λ$, we compare our solutions to the wider literature, where different authors have constructed linearised gravitational perturbations in de Sitter…
▽ More
We present de Sitter Teukolsky waves -- linearised quadrupolar gravitational waves in the transverse-traceless gauge in de Sitter spacetime. In the cosmological constant $Λ$ going to zero limit, our solutions match to Teukolsky solutions. For non-zero $Λ$, we compare our solutions to the wider literature, where different authors have constructed linearised gravitational perturbations in de Sitter spacetime with varied motivations. For de Sitter Teukolsky waves, we compute the energy flux across future timelike infinity $\mathcal{I}^{+}$ and show that it is manifestly positive.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Sentiment Polarity Analysis of Bangla Food Reviews Using Machine and Deep Learning Algorithms
Authors:
Al Amin,
Anik Sarkar,
Md Mahamodul Islam,
Asif Ahammad Miazee,
Md Robiul Islam,
Md Mahmudul Hoque
Abstract:
The Internet has become an essential tool for people in the modern world. Humans, like all living organisms, have essential requirements for survival. These include access to atmospheric oxygen, potable water, protective shelter, and sustenance. The constant flux of the world is making our existence less complicated. A significant portion of the population utilizes online food ordering services to…
▽ More
The Internet has become an essential tool for people in the modern world. Humans, like all living organisms, have essential requirements for survival. These include access to atmospheric oxygen, potable water, protective shelter, and sustenance. The constant flux of the world is making our existence less complicated. A significant portion of the population utilizes online food ordering services to have meals delivered to their residences. Although there are numerous methods for ordering food, customers sometimes experience disappointment with the food they receive. Our endeavor was to establish a model that could determine if food is of good or poor quality. We compiled an extensive dataset of over 1484 online reviews from prominent food ordering platforms, including Food Panda and HungryNaki. Leveraging the collected data, a rigorous assessment of various deep learning and machine learning techniques was performed to determine the most accurate approach for predicting food quality. Out of all the algorithms evaluated, logistic regression emerged as the most accurate, achieving an impressive 90.91% accuracy. The review offers valuable insights that will guide the user in deciding whether or not to order the food.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning
Authors:
Ryan Hoque,
Ajay Mandlekar,
Caelan Garrett,
Ken Goldberg,
Dieter Fox
Abstract:
Imitation learning is a promising paradigm for training robot control policies, but these policies can suffer from distribution shift, where the conditions at evaluation time differ from those in the training data. A popular approach for increasing policy robustness to distribution shift is interactive imitation learning (i.e., DAgger and variants), where a human operator provides corrective inter…
▽ More
Imitation learning is a promising paradigm for training robot control policies, but these policies can suffer from distribution shift, where the conditions at evaluation time differ from those in the training data. A popular approach for increasing policy robustness to distribution shift is interactive imitation learning (i.e., DAgger and variants), where a human operator provides corrective interventions during policy rollouts. However, collecting a sufficient amount of interventions to cover the distribution of policy mistakes can be burdensome for human operators. We propose IntervenGen (I-Gen), a novel data generation system that can autonomously produce a large set of corrective interventions with rich coverage of the state space from a small number of human interventions. We apply I-Gen to 4 simulated environments and 1 physical environment with object pose estimation error and show that it can increase policy robustness by up to 39x with only 10 human interventions. Videos and more results are available at https://sites.google.com/view/intervengen2024.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
IrrNet: Advancing Irrigation Mapping with Incremental Patch Size Training on Remote Sensing Imagery
Authors:
Oishee Bintey Hoque,
Samarth Swarup,
Abhijin Adiga,
Sayjro Kossi Nouwakpo,
Madhav Marathe
Abstract:
Irrigation mapping plays a crucial role in effective water management, essential for preserving both water quality and quantity, and is key to mitigating the global issue of water scarcity. The complexity of agricultural fields, adorned with diverse irrigation practices, especially when multiple systems coexist in close quarters, poses a unique challenge. This complexity is further compounded by t…
▽ More
Irrigation mapping plays a crucial role in effective water management, essential for preserving both water quality and quantity, and is key to mitigating the global issue of water scarcity. The complexity of agricultural fields, adorned with diverse irrigation practices, especially when multiple systems coexist in close quarters, poses a unique challenge. This complexity is further compounded by the nature of Landsat's remote sensing data, where each pixel is rich with densely packed information, complicating the task of accurate irrigation mapping. In this study, we introduce an innovative approach that employs a progressive training method, which strategically increases patch sizes throughout the training process, utilizing datasets from Landsat 5 and 7, labeled with the WRLU dataset for precise labeling. This initial focus allows the model to capture detailed features, progressively shifting to broader, more general features as the patch size enlarges. Remarkably, our method enhances the performance of existing state-of-the-art models by approximately 20%. Furthermore, our analysis delves into the significance of incorporating various spectral bands into the model, assessing their impact on performance. The findings reveal that additional bands are instrumental in enabling the model to discern finer details more effectively. This work sets a new standard for leveraging remote sensing imagery in irrigation mapping.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Exploring Post Quantum Cryptography with Quantum Key Distribution for Sustainable Mobile Network Architecture Design
Authors:
Sanzida Hoque,
Abdullah Aydeger,
Engin Zeydan
Abstract:
The proliferation of mobile networks and their increasing importance to modern life, combined with the emerging threat of quantum computing, present new challenges and opportunities for cybersecurity. This paper addresses the complexity of protecting these critical infrastructures against future quantum attacks while considering operational sustainability. We begin with an overview of the current…
▽ More
The proliferation of mobile networks and their increasing importance to modern life, combined with the emerging threat of quantum computing, present new challenges and opportunities for cybersecurity. This paper addresses the complexity of protecting these critical infrastructures against future quantum attacks while considering operational sustainability. We begin with an overview of the current landscape, identify the main vulnerabilities in mobile networks, and evaluate existing security solutions with new post-quantum cryptography (PQC) methods. We then present a quantum-secure architecture with PQC and Quantum Key Distribution (QKD) tailored explicitly for sustainable mobile networks and illustrate its applicability with several use cases that emphasize the need for advanced protection measures in this new era. In addition, a comprehensive analysis of PQC algorithm families is presented, focusing on their suitability for integration in mobile environments, with particular attention to the trade-offs between energy consumption and security improvements. Finally, recommendations for strengthening mobile networks against quantum threats are provided through a detailed examination of current challenges and opportunities.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Visualization for Human-Centered AI Tools
Authors:
Md Naimul Hoque,
Sungbok Shin,
Niklas Elmqvist
Abstract:
Human-centered AI (HCAI) puts the user in the driver's seat of so-called human-centered AI-infused tools (HCAI tools): interactive software tools that amplify, augment, empower, and enhance human performance using AI models. We discuss how interactive visualization can be a key enabling technology for creating such human-centered AI tools. To validate our approach, we first interviewed HCI, AI, an…
▽ More
Human-centered AI (HCAI) puts the user in the driver's seat of so-called human-centered AI-infused tools (HCAI tools): interactive software tools that amplify, augment, empower, and enhance human performance using AI models. We discuss how interactive visualization can be a key enabling technology for creating such human-centered AI tools. To validate our approach, we first interviewed HCI, AI, and Visualization experts to define the characteristics of HCAI tools. We then present several examples of HCAI tools using visualization and use the examples to extract guidelines on how interactive visualization can support future HCAI tools.
△ Less
Submitted 3 November, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
iSpLib: A Library for Accelerating Graph Neural Networks using Auto-tuned Sparse Operations
Authors:
Md Saidul Hoque Anik,
Pranav Badhe,
Rohit Gampa,
Ariful Azad
Abstract:
Core computations in Graph Neural Network (GNN) training and inference are often mapped to sparse matrix operations such as sparse-dense matrix multiplication (SpMM). These sparse operations are harder to optimize by manual tuning because their performance depends significantly on the sparsity of input graphs, GNN models, and computing platforms. To address this challenge, we present iSpLib, a PyT…
▽ More
Core computations in Graph Neural Network (GNN) training and inference are often mapped to sparse matrix operations such as sparse-dense matrix multiplication (SpMM). These sparse operations are harder to optimize by manual tuning because their performance depends significantly on the sparsity of input graphs, GNN models, and computing platforms. To address this challenge, we present iSpLib, a PyTorch-based C++ library equipped with auto-tuned sparse operations. iSpLib expedites GNN training with a cache-enabled backpropagation that stores intermediate matrices in local caches. The library offers a user-friendly Python plug-in that allows users to take advantage of our optimized PyTorch operations out-of-the-box for any existing linear algebra-based PyTorch implementation of popular GNNs (Graph Convolution Network, GraphSAGE, Graph Inference Network, etc.) with only two lines of additional code. We demonstrate that iSpLib obtains up to 27x overall training speedup compared to the equivalent PyTorch 2.1.0 and PyTorch Geometric 2.4.0 implementations on the CPU. Our library is publicly available at https://github.com/HipGraph/iSpLib (https://doi.org/10.5281/zenodo.10806511).
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
A Design Space for Intelligent and Interactive Writing Assistants
Authors:
Mina Lee,
Katy Ilonka Gero,
John Joon Young Chung,
Simon Buckingham Shum,
Vipul Raheja,
Hua Shen,
Subhashini Venugopalan,
Thiemo Wambsganss,
David Zhou,
Emad A. Alghamdi,
Tal August,
Avinash Bhat,
Madiha Zahrah Choksi,
Senjuti Dutta,
Jin L. C. Guo,
Md Naimul Hoque,
Yewon Kim,
Simon Knight,
Seyed Parsa Neshaei,
Agnia Sergeyuk,
Antonette Shibani,
Disha Shrivastava,
Lila Shroff,
Jessi Stark,
Sarah Sterman
, et al. (11 additional authors not shown)
Abstract:
In our era of rapid technological advancement, the research landscape for writing assistants has become increasingly fragmented across various research communities. We seek to address this challenge by proposing a design space as a structured way to examine and explore the multidimensional space of intelligent and interactive writing assistants. Through a large community collaboration, we explore…
▽ More
In our era of rapid technological advancement, the research landscape for writing assistants has become increasingly fragmented across various research communities. We seek to address this challenge by proposing a design space as a structured way to examine and explore the multidimensional space of intelligent and interactive writing assistants. Through a large community collaboration, we explore five aspects of writing assistants: task, user, technology, interaction, and ecosystem. Within each aspect, we define dimensions (i.e., fundamental components of an aspect) and codes (i.e., potential options for each dimension) by systematically reviewing 115 papers. Our design space aims to offer researchers and designers a practical tool to navigate, comprehend, and compare the various possibilities of writing assistants, and aid in the envisioning and design of new writing assistants.
△ Less
Submitted 26 March, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
Deciphering Hate: Identifying Hateful Memes and Their Targets
Authors:
Eftekhar Hossain,
Omar Sharif,
Mohammed Moshiul Hoque,
Sarah M. Preum
Abstract:
Internet memes have become a powerful means for individuals to express emotions, thoughts, and perspectives on social media. While often considered as a source of humor and entertainment, memes can also disseminate hateful content targeting individuals or communities. Most existing research focuses on the negative aspects of memes in high-resource languages, overlooking the distinctive challenges…
▽ More
Internet memes have become a powerful means for individuals to express emotions, thoughts, and perspectives on social media. While often considered as a source of humor and entertainment, memes can also disseminate hateful content targeting individuals or communities. Most existing research focuses on the negative aspects of memes in high-resource languages, overlooking the distinctive challenges associated with low-resource languages like Bengali (also known as Bangla). Furthermore, while previous work on Bengali memes has focused on detecting hateful memes, there has been no work on detecting their targeted entities. To bridge this gap and facilitate research in this arena, we introduce a novel multimodal dataset for Bengali, BHM (Bengali Hateful Memes). The dataset consists of 7,148 memes with Bengali as well as code-mixed captions, tailored for two tasks: (i) detecting hateful memes, and (ii) detecting the social entities they target (i.e., Individual, Organization, Community, and Society). To solve these tasks, we propose DORA (Dual cO attention fRAmework), a multimodal deep neural network that systematically extracts the significant modality features from the memes and jointly evaluates them with the modality-specific features to understand the context better. Our experiments show that DORA is generalizable on other low-resource hateful meme datasets and outperforms several state-of-the-art rivaling baselines.
△ Less
Submitted 22 September, 2024; v1 submitted 16 March, 2024;
originally announced March 2024.
-
ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning
Authors:
Ahmed Masry,
Mehrad Shahmohammadi,
Md Rizwan Parvez,
Enamul Hoque,
Shafiq Joty
Abstract:
Charts provide visual representations of data and are widely used for analyzing information, addressing queries, and conveying insights to others. Various chart-related downstream tasks have emerged recently, such as question-answering and summarization. A common strategy to solve these tasks is to fine-tune various models originally trained on vision tasks language. However, such task-specific mo…
▽ More
Charts provide visual representations of data and are widely used for analyzing information, addressing queries, and conveying insights to others. Various chart-related downstream tasks have emerged recently, such as question-answering and summarization. A common strategy to solve these tasks is to fine-tune various models originally trained on vision tasks language. However, such task-specific models are not capable of solving a wide range of chart-related tasks, constraining their real-world applicability. To overcome these challenges, we introduce ChartInstruct: a novel chart-specific vision-language Instruction-following dataset comprising 191K instructions generated with 71K charts. We then present two distinct systems for instruction tuning on such datasets: (1) an end-to-end model that connects a vision encoder for chart understanding with a LLM; and (2) a pipeline model that employs a two-step approach to extract chart data tables and input them into the LLM. In experiments on four downstream tasks, we first show the effectiveness of our model--achieving a new set of state-of-the-art results. Further evaluation shows that our instruction-tuning approach supports a wide array of real-world chart comprehension and reasoning scenarios, thereby expanding the scope and applicability of our models to new kinds of tasks.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Disorder controlled sound speed and thermal conductivity of hybrid metalcone films
Authors:
Md Shafkat Bin Hoque,
Rachel A. Nye,
Saman Zare,
Stephanie Atkinson,
Siyao Wang,
Andrew H. Jones,
John T. Gaskins,
Gregory Parsons,
Patrick E. Hopkins
Abstract:
The multifaceted applications of polymers are often limited by their thermal conductivity. Therefore, understanding the mechanisms of thermal transport in polymers is of vital interest. Here, we leverage molecular layer deposition to grow three types of hybrid metalcone (i.e., alucone, zincone, and tincone) films and study their thermal and acoustic properties. The thermal conductivity of the hybr…
▽ More
The multifaceted applications of polymers are often limited by their thermal conductivity. Therefore, understanding the mechanisms of thermal transport in polymers is of vital interest. Here, we leverage molecular layer deposition to grow three types of hybrid metalcone (i.e., alucone, zincone, and tincone) films and study their thermal and acoustic properties. The thermal conductivity of the hybrid polymer films ranges from 0.43 to 1.14 W/mK. Using kinetic theory, we trace the origin of thermal conductivity difference to sound speed change, which is dictated by the structural disorder within the films. Changing the disorder has negligible impacts on volumetric heat capacity and vibrational lifetimes. Our findings provide means to improve the thermal conductivity of organic, hybrid, and inorganic polymer films.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Formal Verification for Blockchain-based Insurance Claims Processing
Authors:
Roshan Lal Neupane,
Ernest Bonnah,
Bishnu Bhusal,
Kiran Neupane,
Khaza Anuarul Hoque,
Prasad Calyam
Abstract:
Insurance claims processing involves multi-domain entities and multi-source data, along with a number of human-agent interactions. Use of Blockchain technology-based platform can significantly improve scalability and response time for processing of claims which are otherwise manually-intensive and time-consuming. However, the chaincodes involved within the processes that issue claims, approve or d…
▽ More
Insurance claims processing involves multi-domain entities and multi-source data, along with a number of human-agent interactions. Use of Blockchain technology-based platform can significantly improve scalability and response time for processing of claims which are otherwise manually-intensive and time-consuming. However, the chaincodes involved within the processes that issue claims, approve or deny them as required, need to be formally verified to ensure secure and reliable processing of transactions in Blockchain. In this paper, we use a formal modeling approach to verify various processes and their underlying chaincodes relating to different stages in insurance claims processing viz., issuance, approval, denial, and flagging for fraud investigation by using linear temporal logic (LTL). We simulate the formalism on the chaincodes and analyze the breach of chaincodes via model checking.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Align before Attend: Aligning Visual and Textual Features for Multimodal Hateful Content Detection
Authors:
Eftekhar Hossain,
Omar Sharif,
Mohammed Moshiul Hoque,
Sarah M. Preum
Abstract:
Multimodal hateful content detection is a challenging task that requires complex reasoning across visual and textual modalities. Therefore, creating a meaningful multimodal representation that effectively captures the interplay between visual and textual features through intermediate fusion is critical. Conventional fusion techniques are unable to attend to the modality-specific features effective…
▽ More
Multimodal hateful content detection is a challenging task that requires complex reasoning across visual and textual modalities. Therefore, creating a meaningful multimodal representation that effectively captures the interplay between visual and textual features through intermediate fusion is critical. Conventional fusion techniques are unable to attend to the modality-specific features effectively. Moreover, most studies exclusively concentrated on English and overlooked other low-resource languages. This paper proposes a context-aware attention framework for multimodal hateful content detection and assesses it for both English and non-English languages. The proposed approach incorporates an attention layer to meaningfully align the visual and textual features. This alignment enables selective focus on modality-specific features before fusing them. We evaluate the proposed approach on two benchmark hateful meme datasets, viz. MUTE (Bengali code-mixed) and MultiOFF (English). Evaluation results demonstrate our proposed approach's effectiveness with F1-scores of $69.7$% and $70.3$% for the MUTE and MultiOFF datasets. The scores show approximately $2.5$% and $3.2$% performance improvement over the state-of-the-art systems on these datasets. Our implementation is available at https://github.com/eftekhar-hossain/Bengali-Hateful-Memes.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
HOACS: Homomorphic Obfuscation Assisted Concealing of Secrets to Thwart Trojan Attacks in COTS Processor
Authors:
Tanvir Hossain,
Matthew Showers,
Mahmudul Hasan,
Tamzidul Hoque
Abstract:
Commercial-off-the-shelf (COTS) components are often preferred over custom Integrated Circuits (ICs) to achieve reduced system development time and cost, easy adoption of new technologies, and replaceability. Unfortunately, the integration of COTS components introduces serious security concerns. None of the entities in the COTS IC supply chain are trusted from a consumer's perspective, leading to…
▽ More
Commercial-off-the-shelf (COTS) components are often preferred over custom Integrated Circuits (ICs) to achieve reduced system development time and cost, easy adoption of new technologies, and replaceability. Unfortunately, the integration of COTS components introduces serious security concerns. None of the entities in the COTS IC supply chain are trusted from a consumer's perspective, leading to a ''zero trust'' threat model. Any of these entities could introduce hidden malicious circuits or hardware Trojans within the component, allowing an attacker in the field to extract secret information (e.g., cryptographic keys) or cause a functional failure. Existing solutions to counter hardware Trojans are inapplicable in such a zero-trust scenario as they assume either the design house or the foundry to be trusted and consider the design to be available for either analysis or modification. In this work, we have proposed a software-oriented countermeasure to ensure the confidentiality of secret assets against hardware Trojans that can be seamlessly integrated in existing COTS microprocessors. The proposed solution does not require any supply chain entity to be trusted and does not require analysis or modification of the IC design. To protect secret assets in an untrusted microprocessor, the proposed method leverages the concept of residue number coding (RNC) to transform the software functions operating on the asset to be fully homomorphic. We have implemented the proposed solution to protect the secret key within the Advanced Encryption Standard (AES) program and presented a detailed security analysis. We also have developed a plugin for the LLVM compiler toolchain that automatically integrates the solution in AES. Finally, we compare the execution time overhead of the operations in the RNC-based technique with comparable homomorphic solutions and demonstrate significant improvement.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
TP-Aware Dequantization
Authors:
Adnan Hoque,
Mudhakar Srivatsa,
Chih-Chieh Yang,
Raghu Ganti
Abstract:
In this paper, we present a novel method that reduces model inference latency during distributed deployment of Large Language Models (LLMs). Our contribution is an optimized inference deployment scheme that address the current limitations of state-of-the-art quantization kernels when used in conjunction with Tensor Parallel (TP). Our method preserves data locality in GPU memory access patterns and…
▽ More
In this paper, we present a novel method that reduces model inference latency during distributed deployment of Large Language Models (LLMs). Our contribution is an optimized inference deployment scheme that address the current limitations of state-of-the-art quantization kernels when used in conjunction with Tensor Parallel (TP). Our method preserves data locality in GPU memory access patterns and exploits a priori knowledge of TP to reduce global communication. We demonstrate an up to 1.81x speedup over existing methods for Llama-70B and up to 1.78x speedup for IBM WatsonX's Granite-20B MLP layer problem sizes on A100 and H100 NVIDIA DGX Systems for a variety of TP settings.
△ Less
Submitted 15 January, 2024;
originally announced February 2024.
-
Accelerating a Triton Fused Kernel for W4A16 Quantized Inference with SplitK work decomposition
Authors:
Adnan Hoque,
Less Wright,
Chih-Chieh Yang,
Mudhakar Srivatsa,
Raghu Ganti
Abstract:
We propose an implementation of an efficient fused matrix multiplication kernel for W4A16 quantized inference, where we perform dequantization and GEMM in a fused kernel using a SplitK work decomposition. Our implementation shows improvement for the type of skinny matrix-matrix multiplications found in foundation model inference workloads. In particular, this paper surveys the type of matrix multi…
▽ More
We propose an implementation of an efficient fused matrix multiplication kernel for W4A16 quantized inference, where we perform dequantization and GEMM in a fused kernel using a SplitK work decomposition. Our implementation shows improvement for the type of skinny matrix-matrix multiplications found in foundation model inference workloads. In particular, this paper surveys the type of matrix multiplication between a skinny activation matrix and a square weight matrix. Our results show an average of 65% speed improvement on A100, and an average of 124% speed improvement on H100 (with a peak of 295%) for a range of matrix dimensions including those found in a llama-style model, where m < n = k.
△ Less
Submitted 22 February, 2024; v1 submitted 5 January, 2024;
originally announced February 2024.
-
Smart Driver Monitoring Robotic System to Enhance Road Safety : A Comprehensive Review
Authors:
Farhin Farhad Riya,
Shahinul Hoque,
Xiaopeng Zhao,
Jinyuan Stella Sun
Abstract:
The future of transportation is being shaped by technology, and one revolutionary step in improving road safety is the incorporation of robotic systems into driver monitoring infrastructure. This literature review explores the current landscape of driver monitoring systems, ranging from traditional physiological parameter monitoring to advanced technologies such as facial recognition to steering a…
▽ More
The future of transportation is being shaped by technology, and one revolutionary step in improving road safety is the incorporation of robotic systems into driver monitoring infrastructure. This literature review explores the current landscape of driver monitoring systems, ranging from traditional physiological parameter monitoring to advanced technologies such as facial recognition to steering analysis. Exploring the challenges faced by existing systems, the review then investigates the integration of robots as intelligent entities within this framework. These robotic systems, equipped with artificial intelligence and sophisticated sensors, not only monitor but actively engage with the driver, addressing cognitive and emotional states in real-time. The synthesis of existing research reveals a dynamic interplay between human and machine, offering promising avenues for innovation in adaptive, personalized, and ethically responsible human-robot interactions for driver monitoring. This review establishes a groundwork for comprehending the intricacies and potential avenues within this dynamic field. It encourages further investigation and advancement at the intersection of human-robot interaction and automotive safety, introducing a novel direction. This involves various sections detailing technological enhancements that can be integrated to propose an innovative and improved driver monitoring system.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
HRI Challenges Influencing Low Usage of Robotic Systems in Disaster Response and Rescue Operations
Authors:
Shahinul Hoque,
Farhin Farhad Riya,
Jinyuan Sun
Abstract:
The breakthrough in AI and Machine Learning has brought a new revolution in robotics, resulting in the construction of more sophisticated robotic systems. Not only can these robotic systems benefit all domains, but also can accomplish tasks that seemed to be unimaginable a few years ago. From swarms of autonomous small robots working together to more very heavy and large objects, to seemingly inde…
▽ More
The breakthrough in AI and Machine Learning has brought a new revolution in robotics, resulting in the construction of more sophisticated robotic systems. Not only can these robotic systems benefit all domains, but also can accomplish tasks that seemed to be unimaginable a few years ago. From swarms of autonomous small robots working together to more very heavy and large objects, to seemingly indestructible robots capable of going to the harshest environments, we can see robotic systems designed for every task imaginable. Among them, a key scenario where robotic systems can benefit is in disaster response scenarios and rescue operations. Robotic systems are capable of successfully conducting tasks such as removing heavy materials, utilizing multiple advanced sensors for finding objects of interest, moving through debris and various inhospitable environments, and not the least have flying capabilities. Even with so much potential, we rarely see the utilization of robotic systems in disaster response scenarios and rescue missions. Many factors could be responsible for the low utilization of robotic systems in such scenarios. One of the key factors involve challenges related to Human-Robot Interaction (HRI) issues. Therefore, in this paper, we try to understand the HRI challenges involving the utilization of robotic systems in disaster response and rescue operations. Furthermore, we go through some of the proposed robotic systems designed for disaster response scenarios and identify the HRI challenges of those systems. Finally, we try to address the challenges by introducing ideas from various proposed research works.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
Room temperature nonlocal detection of charge-spin interconversion in a topological insulator
Authors:
Anamul Md. Hoque,
Lars Sjöström,
Dmitrii Khokhriakov,
Bing Zhao,
Saroj P. Dash
Abstract:
Topological insulators (TIs) are emerging materials for next-generation low-power nanoelectronic and spintronic device applications. TIs possess non-trivial spin-momentum locking features in the topological surface states in addition to the spin-Hall effect (SHE), and Rashba states due to high spin-orbit coupling (SOC) properties. These phenomena are vital for observing the charge-spin conversion…
▽ More
Topological insulators (TIs) are emerging materials for next-generation low-power nanoelectronic and spintronic device applications. TIs possess non-trivial spin-momentum locking features in the topological surface states in addition to the spin-Hall effect (SHE), and Rashba states due to high spin-orbit coupling (SOC) properties. These phenomena are vital for observing the charge-spin conversion (CSC) processes for spin-based memory, logic and quantum technologies. Although CSC has been observed in TIs by potentiometric measurements, reliable nonlocal detection has so far been limited to cryogenic temperatures up to T = 15 K. Here, we report nonlocal detection of CSC and its inverse effect in the TI compound Bi1.5Sb0.5Te1.7Se1.3 at room temperature using a van der Waals heterostructure with a graphene spin-valve device. The lateral nonlocal device design with graphene allows observation of both spin-switch and Hanle spin precession signals for generation, injection and detection of spin currents by the TI. Detailed bias- and gate-dependent measurements in different geometries prove the robustness of the CSC effects in the TI. These findings demonstrate the possibility of using topological materials to make all-electrical room-temperature spintronic devices.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Belief Miner: A Methodology for Discovering Causal Beliefs and Causal Illusions from General Populations
Authors:
Shahreen Salim,
Md Naimul Hoque,
Klaus Mueller
Abstract:
Causal belief is a cognitive practice that humans apply everyday to reason about cause and effect relations between factors, phenomena, or events. Like optical illusions, humans are prone to drawing causal relations between events that are only coincidental (i.e., causal illusions). Researchers in domains such as cognitive psychology and healthcare often use logistically expensive experiments to u…
▽ More
Causal belief is a cognitive practice that humans apply everyday to reason about cause and effect relations between factors, phenomena, or events. Like optical illusions, humans are prone to drawing causal relations between events that are only coincidental (i.e., causal illusions). Researchers in domains such as cognitive psychology and healthcare often use logistically expensive experiments to understand causal beliefs and illusions. In this paper, we propose Belief Miner, a crowdsourcing method for evaluating people's causal beliefs and illusions. Our method uses the (dis)similarities between the causal relations collected from the crowds and experts to surface the causal beliefs and illusions. Through an iterative design process, we developed a web-based interface for collecting causal relations from a target population. We then conducted a crowdsourced experiment with 101 workers on Amazon Mechanical Turk and Prolific using this interface and analyzed the collected data with Belief Miner. We discovered a variety of causal beliefs and potential illusions, and we report the design implications for future research.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Dual Attention U-Net with Feature Infusion: Pushing the Boundaries of Multiclass Defect Segmentation
Authors:
Rasha Alshawi,
Md Tamjidul Hoque,
Md Meftahul Ferdaus,
Mahdi Abdelguerfi,
Kendall Niles,
Ken Prathak,
Joe Tom,
Jordan Klein,
Murtada Mousa,
Johny Javier Lopez
Abstract:
The proposed architecture, Dual Attentive U-Net with Feature Infusion (DAU-FI Net), addresses challenges in semantic segmentation, particularly on multiclass imbalanced datasets with limited samples. DAU-FI Net integrates multiscale spatial-channel attention mechanisms and feature injection to enhance precision in object localization. The core employs a multiscale depth-separable convolution block…
▽ More
The proposed architecture, Dual Attentive U-Net with Feature Infusion (DAU-FI Net), addresses challenges in semantic segmentation, particularly on multiclass imbalanced datasets with limited samples. DAU-FI Net integrates multiscale spatial-channel attention mechanisms and feature injection to enhance precision in object localization. The core employs a multiscale depth-separable convolution block, capturing localized patterns across scales. This block is complemented by a spatial-channel squeeze and excitation (scSE) attention unit, modeling inter-dependencies between channels and spatial regions in feature maps. Additionally, additive attention gates refine segmentation by connecting encoder-decoder pathways.
To augment the model, engineered features using Gabor filters for textural analysis, Sobel and Canny filters for edge detection are injected guided by semantic masks to expand the feature space strategically. Comprehensive experiments on a challenging sewer pipe and culvert defect dataset and a benchmark dataset validate DAU-FI Net's capabilities. Ablation studies highlight incremental benefits from attention blocks and feature injection. DAU-FI Net achieves state-of-the-art mean Intersection over Union (IoU) of 95.6% and 98.8% on the defect test set and benchmark respectively, surpassing prior methods by 8.9% and 12.6%, respectively. Ablation studies highlight incremental benefits from attention blocks and feature injection. The proposed architecture provides a robust solution, advancing semantic segmentation for multiclass problems with limited training data. Our sewer-culvert defects dataset, featuring pixel-level annotations, opens avenues for further research in this crucial domain. Overall, this work delivers key innovations in architecture, attention, and feature engineering to elevate semantic segmentation efficacy.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Towards Designing a Question-Answering Chatbot for Online News: Understanding Questions and Perspectives
Authors:
Md Naimul Hoque,
Ayman Mahfuz,
Mayukha Kindi,
Naeemul Hassan
Abstract:
Large Language Models (LLMs) have created opportunities for designing chatbots that can support complex question-answering (QA) scenarios and improve news audience engagement. However, we still lack an understanding of what roles journalists and readers deem fit for such a chatbot in newsrooms. To address this gap, we first interviewed six journalists to understand how they answer questions from r…
▽ More
Large Language Models (LLMs) have created opportunities for designing chatbots that can support complex question-answering (QA) scenarios and improve news audience engagement. However, we still lack an understanding of what roles journalists and readers deem fit for such a chatbot in newsrooms. To address this gap, we first interviewed six journalists to understand how they answer questions from readers currently and how they want to use a QA chatbot for this purpose. To understand how readers want to interact with a QA chatbot, we then conducted an online experiment (N=124) where we asked each participant to read three news articles and ask questions to either the author(s) of the articles or a chatbot. By combining results from the studies, we present alignments and discrepancies between how journalists and readers want to use QA chatbots and propose a framework for designing effective QA chatbots in newsrooms.
△ Less
Submitted 23 March, 2024; v1 submitted 17 December, 2023;
originally announced December 2023.