-
GeoClip: Geometry-Aware Clipping for Differentially Private SGD
Authors:
Atefeh Gilani,
Naima Tasnim,
Lalitha Sankar,
Oliver Kosut
Abstract:
Differentially private stochastic gradient descent (DP-SGD) is the most widely used method for training machine learning models with provable privacy guarantees. A key challenge in DP-SGD is setting the per-sample gradient clipping threshold, which significantly affects the trade-off between privacy and utility. While recent adaptive methods improve performance by adjusting this threshold during t…
▽ More
Differentially private stochastic gradient descent (DP-SGD) is the most widely used method for training machine learning models with provable privacy guarantees. A key challenge in DP-SGD is setting the per-sample gradient clipping threshold, which significantly affects the trade-off between privacy and utility. While recent adaptive methods improve performance by adjusting this threshold during training, they operate in the standard coordinate system and fail to account for correlations across the coordinates of the gradient. We propose GeoClip, a geometry-aware framework that clips and perturbs gradients in a transformed basis aligned with the geometry of the gradient distribution. GeoClip adaptively estimates this transformation using only previously released noisy gradients, incurring no additional privacy cost. We provide convergence guarantees for GeoClip and derive a closed-form solution for the optimal transformation that minimizes the amount of noise added while keeping the probability of gradient clipping under control. Experiments on both tabular and image datasets demonstrate that GeoClip consistently outperforms existing adaptive clipping methods under the same privacy budget.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
Right Side Up? Disentangling Orientation Understanding in MLLMs with Fine-grained Multi-axis Perception Tasks
Authors:
Keanu Nichols,
Nazia Tasnim,
Yuting Yan,
Nicholas Ikechukwu,
Elva Zou,
Deepti Ghadiyaram,
Bryan A. Plummer
Abstract:
Object orientation understanding represents a fundamental challenge in visual perception critical for applications like robotic manipulation and augmented reality. Current vision-language benchmarks fail to isolate this capability, often conflating it with positional relationships and general scene understanding. We introduce DORI (Discriminative Orientation Reasoning Intelligence), a comprehensiv…
▽ More
Object orientation understanding represents a fundamental challenge in visual perception critical for applications like robotic manipulation and augmented reality. Current vision-language benchmarks fail to isolate this capability, often conflating it with positional relationships and general scene understanding. We introduce DORI (Discriminative Orientation Reasoning Intelligence), a comprehensive benchmark establishing object orientation perception as a primary evaluation target. DORI assesses four dimensions of orientation comprehension: frontal alignment, rotational transformations, relative directional relationships, and canonical orientation understanding. Through carefully curated tasks from 11 datasets spanning 67 object categories across synthetic and real-world scenarios, DORI provides insights on how multi-modal systems understand object orientations. Our evaluation of 15 state-of-the-art vision-language models reveals critical limitations: even the best models achieve only 54.2% accuracy on coarse tasks and 33.0% on granular orientation judgments, with performance deteriorating for tasks requiring reference frame shifts or compound rotations. These findings demonstrate the need for dedicated orientation representation mechanisms, as models show systematic inability to perform precise angular estimations, track orientation changes across viewpoints, and understand compound rotations - suggesting limitations in their internal 3D spatial representations. As the first diagnostic framework specifically designed for orientation awareness in multimodal systems, DORI offers implications for improving robotic control, 3D scene reconstruction, and human-AI interaction in physical environments. DORI data: https://huggingface.co/datasets/appledora/DORI-Benchmark
△ Less
Submitted 4 June, 2025; v1 submitted 27 May, 2025;
originally announced May 2025.
-
Energy-Efficient Deep Reinforcement Learning with Spiking Transformers
Authors:
Mohammad Irfan Uddin,
Nishad Tasnim,
Md Omor Faruk,
Zejian Zhou
Abstract:
Agent-based Transformers have been widely adopted in recent reinforcement learning advances due to their demonstrated ability to solve complex tasks. However, the high computational complexity of Transformers often results in significant energy consumption, limiting their deployment in real-world autonomous systems. Spiking neural networks (SNNs), with their biologically inspired structure, offer…
▽ More
Agent-based Transformers have been widely adopted in recent reinforcement learning advances due to their demonstrated ability to solve complex tasks. However, the high computational complexity of Transformers often results in significant energy consumption, limiting their deployment in real-world autonomous systems. Spiking neural networks (SNNs), with their biologically inspired structure, offer an energy-efficient alternative for machine learning. In this paper, a novel Spike-Transformer Reinforcement Learning (STRL) algorithm that combines the energy efficiency of SNNs with the powerful decision-making capabilities of reinforcement learning is developed. Specifically, an SNN using multi-step Leaky Integrate-and-Fire (LIF) neurons and attention mechanisms capable of processing spatio-temporal patterns over multiple time steps is designed. The architecture is further enhanced with state, action, and reward encodings to create a Transformer-like structure optimized for reinforcement learning tasks. Comprehensive numerical experiments conducted on state-of-the-art benchmarks demonstrate that the proposed SNN Transformer achieves significantly improved policy performance compared to conventional agent-based Transformers. With both enhanced energy efficiency and policy optimality, this work highlights a promising direction for deploying bio-inspired, low-cost machine learning models in complex real-world decision-making scenarios.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Reveal-or-Obscure: A Differentially Private Sampling Algorithm for Discrete Distributions
Authors:
Naima Tasnim,
Atefeh Gilani,
Lalitha Sankar,
Oliver Kosut
Abstract:
We introduce a differentially private (DP) algorithm called reveal-or-obscure (ROO) to generate a single representative sample from a dataset of $n$ observations drawn i.i.d. from an unknown discrete distribution $P$. Unlike methods that add explicit noise to the estimated empirical distribution, ROO achieves $ε$-differential privacy by randomly choosing whether to "reveal" or "obscure" the empiri…
▽ More
We introduce a differentially private (DP) algorithm called reveal-or-obscure (ROO) to generate a single representative sample from a dataset of $n$ observations drawn i.i.d. from an unknown discrete distribution $P$. Unlike methods that add explicit noise to the estimated empirical distribution, ROO achieves $ε$-differential privacy by randomly choosing whether to "reveal" or "obscure" the empirical distribution. While ROO is structurally identical to Algorithm 1 proposed by Cheu and Nayak (arXiv:2412.10512), we prove a strictly better bound on the sampling complexity than that established in Theorem 12 of (arXiv:2412.10512). To further improve the privacy-utility trade-off, we propose a novel generalized sampling algorithm called Data-Specific ROO (DS-ROO), where the probability of obscuring the empirical distribution of the dataset is chosen adaptively. We prove that DS-ROO satisfies $ε$-DP, and provide empirical evidence that DS-ROO can achieve better utility under the same privacy budget of vanilla ROO.
△ Less
Submitted 20 April, 2025;
originally announced April 2025.
-
RECAST: Reparameterized, Compact weight Adaptation for Sequential Tasks
Authors:
Nazia Tasnim,
Bryan A. Plummer
Abstract:
Incremental learning aims to adapt to new sets of categories over time with minimal computational overhead. Prior work often addresses this task by training efficient task-specific adaptors that modify frozen layer weights or features to capture relevant information without affecting predictions on previously learned categories. While these adaptors are generally more efficient than finetuning the…
▽ More
Incremental learning aims to adapt to new sets of categories over time with minimal computational overhead. Prior work often addresses this task by training efficient task-specific adaptors that modify frozen layer weights or features to capture relevant information without affecting predictions on previously learned categories. While these adaptors are generally more efficient than finetuning the entire network, they still require tens to hundreds of thousands of task-specific trainable parameters even for relatively small networks, making it challenging to operate on resource-constrained environments with high communication costs like edge devices or mobile phones. Thus, we propose Reparameterized, Compact weight Adaptation for Sequential Tasks (RECAST), a novel method that dramatically reduces task-specific trainable parameters to fewer than 50 - several orders of magnitude less than competing methods like LoRA. RECAST accomplishes this efficiency by learning to decompose layer weights into a soft parameter-sharing framework consisting of shared weight templates and very few module-specific scaling factors or coefficients. This soft parameter-sharing framework allows for effective task-wise reparameterization by tuning only these coefficients while keeping templates frozen.A key innovation of RECAST is the novel weight reconstruction pipeline called Neural Mimicry, which eliminates the need for pretraining from scratch. This allows for high-fidelity emulation of existing pretrained weights within our framework and provides quick adaptability to any model scale and architecture. Extensive experiments across six datasets demonstrate RECAST outperforms the state-of-the-art by up to 3% across various scales, architectures, and parameter spaces Moreover, we show that RECAST's architecture-agnostic nature allows for seamless integration with existing methods, further boosting performance.
△ Less
Submitted 14 March, 2025; v1 submitted 25 November, 2024;
originally announced November 2024.
-
Integrating A.I. in Higher Education: Protocol for a Pilot Study with 'SAMCares: An Adaptive Learning Hub'
Authors:
Syed Hasib Akhter Faruqui,
Nazia Tasnim,
Iftekhar Ibne Basith,
Suleiman Obeidat,
Faruk Yildiz
Abstract:
Learning never ends, and there is no age limit to grow yourself. However, the educational landscape may face challenges in effectively catering to students' inclusion and diverse learning needs. These students should have access to state-of-the-art methods for lecture delivery, online resources, and technology needs. However, with all the diverse learning sources, it becomes harder for students to…
▽ More
Learning never ends, and there is no age limit to grow yourself. However, the educational landscape may face challenges in effectively catering to students' inclusion and diverse learning needs. These students should have access to state-of-the-art methods for lecture delivery, online resources, and technology needs. However, with all the diverse learning sources, it becomes harder for students to comprehend a large amount of knowledge in a short period of time. Traditional assistive technologies and learning aids often lack the dynamic adaptability required for individualized education plans. Large Language Models (LLM) have been used in language translation, text summarization, and content generation applications. With rapid growth in AI over the past years, AI-powered chatbots and virtual assistants have been developed. This research aims to bridge this gap by introducing an innovative study buddy we will be calling the 'SAMCares'. The system leverages a Large Language Model (LLM) (in our case, LLaMa-2 70B as the base model) and Retriever-Augmented Generation (RAG) to offer real-time, context-aware, and adaptive educational support. The context of the model will be limited to the knowledge base of Sam Houston State University (SHSU) course notes. The LLM component enables a chat-like environment to interact with it to meet the unique learning requirements of each student. For this, we will build a custom web-based GUI. At the same time, RAG enhances real-time information retrieval and text generation, in turn providing more accurate and context-specific assistance. An option to upload additional study materials in the web GUI is added in case additional knowledge support is required. The system's efficacy will be evaluated through controlled trials and iterative feedback mechanisms.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Mapping Violence: Developing an Extensive Framework to Build a Bangla Sectarian Expression Dataset from Social Media Interactions
Authors:
Nazia Tasnim,
Sujan Sen Gupta,
Md. Istiak Hossain Shihab,
Fatiha Islam Juee,
Arunima Tahsin,
Pritom Ghum,
Kanij Fatema,
Marshia Haque,
Wasema Farzana,
Prionti Nasir,
Ashique KhudaBukhsh,
Farig Sadeque,
Asif Sushmit
Abstract:
Communal violence in online forums has become extremely prevalent in South Asia, where many communities of different cultures coexist and share resources. These societies exhibit a phenomenon characterized by strong bonds within their own groups and animosity towards others, leading to conflicts that frequently escalate into violent confrontations. To address this issue, we have developed the firs…
▽ More
Communal violence in online forums has become extremely prevalent in South Asia, where many communities of different cultures coexist and share resources. These societies exhibit a phenomenon characterized by strong bonds within their own groups and animosity towards others, leading to conflicts that frequently escalate into violent confrontations. To address this issue, we have developed the first comprehensive framework for the automatic detection of communal violence markers in online Bangla content accompanying the largest collection (13K raw sentences) of social media interactions that fall under the definition of four major violence class and their 16 coarse expressions. Our workflow introduces a 7-step expert annotation process incorporating insights from social scientists, linguists, and psychologists. By presenting data statistics and benchmarking performance using this dataset, we have determined that, aside from the category of Non-communal violence, Religio-communal violence is particularly pervasive in Bangla text. Moreover, we have substantiated the effectiveness of fine-tuning language models in identifying violent comments by conducting preliminary benchmarking on the state-of-the-art Bangla deep learning model.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks
Authors:
Maan Qraitem,
Nazia Tasnim,
Piotr Teterwak,
Kate Saenko,
Bryan A. Plummer
Abstract:
Typographic attacks, adding misleading text to images, can deceive vision-language models (LVLMs). The susceptibility of recent large LVLMs like GPT4-V to such attacks is understudied, raising concerns about amplified misinformation in personal assistant applications. Previous attacks use simple strategies, such as random misleading words, which don't fully exploit LVLMs' language reasoning abilit…
▽ More
Typographic attacks, adding misleading text to images, can deceive vision-language models (LVLMs). The susceptibility of recent large LVLMs like GPT4-V to such attacks is understudied, raising concerns about amplified misinformation in personal assistant applications. Previous attacks use simple strategies, such as random misleading words, which don't fully exploit LVLMs' language reasoning abilities. We introduce an experimental setup for testing typographic attacks on LVLMs and propose two novel self-generated attacks: (1) Class-based attacks, where the model identifies a similar class to deceive itself, and (2) Reasoned attacks, where an advanced LVLM suggests an attack combining a deceiving class and description. Our experiments show these attacks significantly reduce classification performance by up to 60\% and are effective across different models, including InstructBLIP and MiniGPT4. Code: https://github.com/mqraitem/Self-Gen-Typo-Attack
△ Less
Submitted 12 February, 2025; v1 submitted 1 February, 2024;
originally announced February 2024.
-
OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking
Authors:
Fazle Rabbi Rakib,
Souhardya Saha Dip,
Samiul Alam,
Nazia Tasnim,
Md. Istiak Hossain Shihab,
Md. Nazmuddoha Ansary,
Syed Mobassir Hossen,
Marsia Haque Meghla,
Mamunur Mamun,
Farig Sadeque,
Sayma Sultana Chowdhury,
Tahsin Reasat,
Asif Sushmit,
Ahmed Imtiaz Humayun
Abstract:
We present OOD-Speech, the first out-of-distribution (OOD) benchmarking dataset for Bengali automatic speech recognition (ASR). Being one of the most spoken languages globally, Bengali portrays large diversity in dialects and prosodic features, which demands ASR frameworks to be robust towards distribution shifts. For example, islamic religious sermons in Bengali are delivered with a tonality that…
▽ More
We present OOD-Speech, the first out-of-distribution (OOD) benchmarking dataset for Bengali automatic speech recognition (ASR). Being one of the most spoken languages globally, Bengali portrays large diversity in dialects and prosodic features, which demands ASR frameworks to be robust towards distribution shifts. For example, islamic religious sermons in Bengali are delivered with a tonality that is significantly different from regular speech. Our training dataset is collected via massively online crowdsourcing campaigns which resulted in 1177.94 hours collected and curated from $22,645$ native Bengali speakers from South Asia. Our test dataset comprises 23.03 hours of speech collected and manually annotated from 17 different sources, e.g., Bengali TV drama, Audiobook, Talk show, Online class, and Islamic sermons to name a few. OOD-Speech is jointly the largest publicly available speech dataset, as well as the first out-of-distribution ASR benchmarking dataset for Bengali.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
DRL-GAN: A Hybrid Approach for Binary and Multiclass Network Intrusion Detection
Authors:
Caroline Strickland,
Chandrika Saha,
Muhammad Zakar,
Sareh Nejad,
Noshin Tasnim,
Daniel Lizotte,
Anwar Haque
Abstract:
Our increasingly connected world continues to face an ever-growing amount of network-based attacks. Intrusion detection systems (IDS) are an essential security technology for detecting these attacks. Although numerous machine learning-based IDS have been proposed for the detection of malicious network traffic, the majority have difficulty properly detecting and classifying the more uncommon attack…
▽ More
Our increasingly connected world continues to face an ever-growing amount of network-based attacks. Intrusion detection systems (IDS) are an essential security technology for detecting these attacks. Although numerous machine learning-based IDS have been proposed for the detection of malicious network traffic, the majority have difficulty properly detecting and classifying the more uncommon attack types. In this paper, we implement a novel hybrid technique using synthetic data produced by a Generative Adversarial Network (GAN) to use as input for training a Deep Reinforcement Learning (DRL) model. Our GAN model is trained with the NSL-KDD dataset for four attack categories as well as normal network flow. Ultimately, our findings demonstrate that training the DRL on specific synthetic datasets can result in better performance in correctly classifying minority classes over training on the true imbalanced dataset.
△ Less
Submitted 5 January, 2023;
originally announced January 2023.
-
Toward IoT enabled smart offices: Achieving Sustainable Development Goals
Authors:
Syeda Nishat Tasnim,
Md Taimur Ahad
Abstract:
Despite research advocating the Internet of Things (IoT) as an effective in-office monitoring system, little research has presented societal and climate centric discussions. Whereas the United Nations (UN) and other development agencies concerned with climate impact, are advocating transformative actions towards smart cities, very little research in the IoT domain analyzes the advantages of IoT in…
▽ More
Despite research advocating the Internet of Things (IoT) as an effective in-office monitoring system, little research has presented societal and climate centric discussions. Whereas the United Nations (UN) and other development agencies concerned with climate impact, are advocating transformative actions towards smart cities, very little research in the IoT domain analyzes the advantages of IoT in achieving sustainable development goals (SDGs) to fill this gap. In this study, a smart office (SO) was developed in a Cisco packet tracer. We then presented the SO through the lens of SDGs. We suggest that SOs support targets mentioned in Goal 6, 7, 8, 9, 11 and 12 of the SDGs. This research is crucial - both for developing and developed economies, as we move toward industrialization, while ignoring the adverse impacts of industrialization. This work is expected to provide a pathway with technological innovation toward a more sustainable world for IT practitioners, governments and development agencies.
△ Less
Submitted 7 June, 2022;
originally announced June 2022.
-
VISTA: Vision Transformer enhanced by U-Net and Image Colorfulness Frame Filtration for Automatic Retail Checkout
Authors:
Md. Istiak Hossain Shihab,
Nazia Tasnim,
Hasib Zunair,
Labiba Kanij Rupty,
Nabeel Mohammed
Abstract:
Multi-class product counting and recognition identifies product items from images or videos for automated retail checkout. The task is challenging due to the real-world scenario of occlusions where product items overlap, fast movement in the conveyor belt, large similarity in overall appearance of the items being scanned, novel products, and the negative impact of misidentifying items. Further, th…
▽ More
Multi-class product counting and recognition identifies product items from images or videos for automated retail checkout. The task is challenging due to the real-world scenario of occlusions where product items overlap, fast movement in the conveyor belt, large similarity in overall appearance of the items being scanned, novel products, and the negative impact of misidentifying items. Further, there is a domain bias between training and test sets, specifically, the provided training dataset consists of synthetic images and the test set videos consist of foreign objects such as hands and tray. To address these aforementioned issues, we propose to segment and classify individual frames from a video sequence. The segmentation method consists of a unified single product item- and hand-segmentation followed by entropy masking to address the domain bias problem. The multi-class classification method is based on Vision Transformers (ViT). To identify the frames with target objects, we utilize several image processing methods and propose a custom metric to discard frames not having any product items. Combining all these mechanisms, our best system achieves 3rd place in the AI City Challenge 2022 Track 4 with an F1 score of 0.4545. Code will be available at
△ Less
Submitted 23 April, 2022;
originally announced April 2022.
-
TEAM-Atreides at SemEval-2022 Task 11: On leveraging data augmentation and ensemble to recognize complex Named Entities in Bangla
Authors:
Nazia Tasnim,
Md. Istiak Hossain Shihab,
Asif Shahriyar Sushmit,
Steven Bethard,
Farig Sadeque
Abstract:
Many areas, such as the biological and healthcare domain, artistic works, and organization names, have nested, overlapping, discontinuous entity mentions that may even be syntactically or semantically ambiguous in practice. Traditional sequence tagging algorithms are unable to recognize these complex mentions because they may violate the assumptions upon which sequence tagging schemes are founded.…
▽ More
Many areas, such as the biological and healthcare domain, artistic works, and organization names, have nested, overlapping, discontinuous entity mentions that may even be syntactically or semantically ambiguous in practice. Traditional sequence tagging algorithms are unable to recognize these complex mentions because they may violate the assumptions upon which sequence tagging schemes are founded. In this paper, we describe our contribution to SemEval 2022 Task 11 on identifying such complex Named Entities. We have leveraged the ensemble of multiple ELECTRA-based models that were exclusively pretrained on the Bangla language with the performance of ELECTRA-based models pretrained on English to achieve competitive performance on the Track-11. Besides providing a system description, we will also present the outcomes of our experiments on architectural decisions, dataset augmentations, and post-competition findings.
△ Less
Submitted 21 April, 2022;
originally announced April 2022.
-
Exploring the Scope and Potential of Local Newspaper-based Dengue Surveillance in Bangladesh
Authors:
Nazia Tasnim,
Md. Istiak Hossain Shihab,
Moqsadur Rahman,
Sheikh Rabiul Islam,
Mohammad Ruhul Amin
Abstract:
Dengue fever has been considered to be one of the global public health problems of the twenty-first century, especially in tropical and subtropical countries of the global south. The high morbidity and mortality rates of Dengue fever impose a huge economic and health burden for middle and low-income countries. It is so prevalent in such regions that enforcing a granular level of surveillance is qu…
▽ More
Dengue fever has been considered to be one of the global public health problems of the twenty-first century, especially in tropical and subtropical countries of the global south. The high morbidity and mortality rates of Dengue fever impose a huge economic and health burden for middle and low-income countries. It is so prevalent in such regions that enforcing a granular level of surveillance is quite impossible. Therefore, it is crucial to explore an alternative cost-effective solution that can provide updates of the ongoing situation in a timely manner. In this paper, we explore the scope and potential of a local newspaper-based dengue surveillance system, using well-known data-mining techniques, in Bangladesh from the analysis of the news contents written in the native language. In addition, we explain the working procedure of developing a novel database, using human-in-the-loop technique, for further analysis, and classification of dengue and its intervention-related news. Our classification method has an f-score of 91.45%, and matches the ground truth of reported cases quite closely. Based on the dengue and intervention-related news, we identified the regions where more intervention efforts are needed to reduce the rate of dengue infection. A demo of this project can be accessed at: http://erdos.dsm.fordham.edu:3009/
△ Less
Submitted 7 July, 2021;
originally announced July 2021.
-
Comparisonal study of Deep Learning approaches on Retinal OCT Image
Authors:
Nowshin Tasnim,
Mahmudul Hasan,
Ishrak Islam
Abstract:
In medical science, the use of computer science in disease detection and diagnosis is gaining popularity. Previously, the detection of disease used to take a significant amount of time and was less reliable. Machine learning (ML) techniques employed in recent biomedical researches are making revolutionary changes by gaining higher accuracy with more concise timing. At present, it is even possible…
▽ More
In medical science, the use of computer science in disease detection and diagnosis is gaining popularity. Previously, the detection of disease used to take a significant amount of time and was less reliable. Machine learning (ML) techniques employed in recent biomedical researches are making revolutionary changes by gaining higher accuracy with more concise timing. At present, it is even possible to automatically detect diseases from the scanned images with the help of ML. In this research, we have taken such an attempt to detect retinal diseases from optical coherence tomography (OCT) X-ray images. Here, we propose a deep learning (DL) based approach in detecting retinal diseases from OCT images which can identify three conditions of the retina. Four different models used in this approach are compared with each other. On the test set, the detection accuracy is 98.00\% for a vanilla convolutional neural network (CNN) model, 99.07\% for Xception model, 97.00\% for ResNet50 model, and 99.17\% for MobileNetV2 model. The MobileNetV2 model acquires the highest accuracy, and the closest to the highest is the Xception model. The proposed approach has a potential impact on creating a tool for automatically detecting retinal diseases.
△ Less
Submitted 16 December, 2019;
originally announced December 2019.