-
Outcome-Based Education: Evaluating Students' Perspectives Using Transformer
Authors:
Shuvra Smaran Das,
Anirban Saha Anik,
Md Kishor Morol,
Mohammad Sakib Mahmood
Abstract:
Outcome-Based Education (OBE) emphasizes the development of specific competencies through student-centered learning. In this study, we reviewed the importance of OBE and implemented transformer-based models, particularly DistilBERT, to analyze an NLP dataset that includes student feedback. Our objective is to assess and improve educational outcomes. Our approach is better than other machine learni…
▽ More
Outcome-Based Education (OBE) emphasizes the development of specific competencies through student-centered learning. In this study, we reviewed the importance of OBE and implemented transformer-based models, particularly DistilBERT, to analyze an NLP dataset that includes student feedback. Our objective is to assess and improve educational outcomes. Our approach is better than other machine learning models because it uses the transformer's deep understanding of language context to classify sentiment better, giving better results across a wider range of matrices. Our work directly contributes to OBE's goal of achieving measurable outcomes by facilitating the identification of patterns in student learning experiences. We have also applied LIME (local interpretable model-agnostic explanations) to make sure that model predictions are clear. This gives us understandable information about how key terms affect sentiment. Our findings indicate that the combination of transformer models and LIME explanations results in a strong and straightforward framework for analyzing student feedback. This aligns more closely with the principles of OBE and ensures the improvement of educational practices through data-driven insights.
△ Less
Submitted 8 April, 2025;
originally announced June 2025.
-
Generalizable Process Reward Models via Formally Verified Training Data
Authors:
Ryo Kamoi,
Yusen Zhang,
Nan Zhang,
Sarkar Snigdha Sarathi Das,
Rui Zhang
Abstract:
Process Reward Models (PRMs), which provide step-level feedback on reasoning traces generated by Large Language Models (LLMs), are receiving increasing attention. However, two key research gaps remain: creating PRM training data requires costly human annotation to label accurate step-level errors, and existing PRMs are limited to math reasoning domains. In response to these gaps, this paper aims t…
▽ More
Process Reward Models (PRMs), which provide step-level feedback on reasoning traces generated by Large Language Models (LLMs), are receiving increasing attention. However, two key research gaps remain: creating PRM training data requires costly human annotation to label accurate step-level errors, and existing PRMs are limited to math reasoning domains. In response to these gaps, this paper aims to enable automatic synthesis of accurate PRM training data and the generalization of PRMs to diverse reasoning tasks beyond math reasoning. We propose FoVer, an approach to synthesize PRM training data with accurate step-level error labels automatically annotated by formal verification tools, such as Z3 and Isabelle. To show the practical effectiveness of FoVer, we synthesize a training dataset by annotating step-level error labels on LLM responses to formal logic and theorem proving tasks, without relying on human annotation. While FoVer creates training data with symbolic tasks compatible with formal verification, our experiments show that PRMs trained on our dataset exhibit cross-task generalization, enabling a single PRM to effectively perform verification across diverse reasoning tasks. Specifically, LLM-based PRMs trained with FoVer significantly outperform PRMs based on the original LLMs and achieve competitive or superior results compared to state-of-the-art PRMs, as measured by step-level verification on ProcessBench and Best-of-K performance across 12 reasoning benchmarks, including MATH, AIME, ANLI, MMLU, and BBH. The dataset and code are in the supplementary material and will be made public. The datasets, models, and code are provided at https://github.com/psunlpgroup/FoVer.
△ Less
Submitted 27 September, 2025; v1 submitted 21 May, 2025;
originally announced May 2025.
-
HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
Authors:
Yusen Zhang,
Wenliang Zheng,
Aashrith Madasu,
Peng Shi,
Ryo Kamoi,
Hao Zhou,
Zhuoyang Zou,
Shu Zhao,
Sarkar Snigdha Sarathi Das,
Vipul Gupta,
Xiaoxin Lu,
Nan Zhang,
Ranran Haoran Zhang,
Avitej Iyer,
Renze Lou,
Wenpeng Yin,
Rui Zhang
Abstract:
High-resolution image (HRI) understanding aims to process images with a large number of pixels, such as pathological images and agricultural aerial images, both of which can exceed 1 million pixels. Vision Large Language Models (VLMs) can allegedly handle HRIs, however, there is a lack of a comprehensive benchmark for VLMs to evaluate HRI understanding. To address this gap, we introduce HRScene, a…
▽ More
High-resolution image (HRI) understanding aims to process images with a large number of pixels, such as pathological images and agricultural aerial images, both of which can exceed 1 million pixels. Vision Large Language Models (VLMs) can allegedly handle HRIs, however, there is a lack of a comprehensive benchmark for VLMs to evaluate HRI understanding. To address this gap, we introduce HRScene, a novel unified benchmark for HRI understanding with rich scenes. HRScene incorporates 25 real-world datasets and 2 synthetic diagnostic datasets with resolutions ranging from 1,024 $\times$ 1,024 to 35,503 $\times$ 26,627. HRScene is collected and re-annotated by 10 graduate-level annotators, covering 25 scenarios, ranging from microscopic to radiology images, street views, long-range pictures, and telescope images. It includes HRIs of real-world objects, scanned documents, and composite multi-image. The two diagnostic evaluation datasets are synthesized by combining the target image with the gold answer and distracting images in different orders, assessing how well models utilize regions in HRI. We conduct extensive experiments involving 28 VLMs, including Gemini 2.0 Flash and GPT-4o. Experiments on HRScene show that current VLMs achieve an average accuracy of around 50% on real-world tasks, revealing significant gaps in HRI understanding. Results on synthetic datasets reveal that VLMs struggle to effectively utilize HRI regions, showing significant Regional Divergence and lost-in-middle, shedding light on future research.
△ Less
Submitted 29 April, 2025; v1 submitted 25 April, 2025;
originally announced April 2025.
-
Distributed Time Synchronization in NOMA-Assisted Ultra-Dense Networks
Authors:
Debjani Goswami,
Indrakshi Dey,
Nicola Marchetti,
Suvra Sekhar Das
Abstract:
Ultra-dense networks (UDNs) represent a transformative access architecture for upcoming sixth generation (6G) systems, poised to meet the surging demand for high data rates. Achieving precise synchronization across diverse base stations (BSs) is critical in these networks to mitigate inter-cell interference (ICI). However, traditional centralized synchronization approaches face substantial challen…
▽ More
Ultra-dense networks (UDNs) represent a transformative access architecture for upcoming sixth generation (6G) systems, poised to meet the surging demand for high data rates. Achieving precise synchronization across diverse base stations (BSs) is critical in these networks to mitigate inter-cell interference (ICI). However, traditional centralized synchronization approaches face substantial challenges in dense urban, including limited access to Global Positioning System (GPS), dependence on reliable backhaul, and high signaling overhead demands. This study advances a low-complexity distributed synchronization solution. A primary focus is on assessing the algorithm's accuracy incorporating the effects of information exchange delays, which are pronounced in large-networks. Recognizing the pivotal role of neighbor-gathered information in the proposed approach, this research employs uplink Non-Orthogonal Multiple Access (NOMA) to reduce message-gathering delays between transmitters (TXs) and receivers (RXs). The proposed algorithm is evaluated to assess effectiveness under exchange delays, analyzing impact of system parameters like network connectivity, size, sub-bands, etc., on synchronization speed. The findings demonstrate that the NOMA-based information-gathering technique significantly accelerates network synchronization compared to orthogonal access schemes. This advancement is crucial for meeting the low-latency requirements of beyond fifth generation (5G) systems, underscoring the potential of distributed synchronization as a cornerstone for next-generation UDN deployments.
△ Less
Submitted 5 April, 2025;
originally announced April 2025.
-
GREATERPROMPT: A Unified, Customizable, and High-Performing Open-Source Toolkit for Prompt Optimization
Authors:
Wenliang Zheng,
Sarkar Snigdha Sarathi Das,
Yusen Zhang,
Rui Zhang
Abstract:
LLMs have gained immense popularity among researchers and the general public for its impressive capabilities on a variety of tasks. Notably, the efficacy of LLMs remains significantly dependent on the quality and structure of the input prompts, making prompt design a critical factor for their performance. Recent advancements in automated prompt optimization have introduced diverse techniques that…
▽ More
LLMs have gained immense popularity among researchers and the general public for its impressive capabilities on a variety of tasks. Notably, the efficacy of LLMs remains significantly dependent on the quality and structure of the input prompts, making prompt design a critical factor for their performance. Recent advancements in automated prompt optimization have introduced diverse techniques that automatically enhance prompts to better align model outputs with user expectations. However, these methods often suffer from the lack of standardization and compatibility across different techniques, limited flexibility in customization, inconsistent performance across model scales, and they often exclusively rely on expensive proprietary LLM APIs. To fill in this gap, we introduce GREATERPROMPT, a novel framework that democratizes prompt optimization by unifying diverse methods under a unified, customizable API while delivering highly effective prompts for different tasks. Our framework flexibly accommodates various model scales by leveraging both text feedback-based optimization for larger LLMs and internal gradient-based optimization for smaller models to achieve powerful and precise prompt improvements. Moreover, we provide a user-friendly Web UI that ensures accessibility for non-expert users, enabling broader adoption and enhanced performance across various user groups and application scenarios. GREATERPROMPT is available at https://github.com/psunlpgroup/GreaterPrompt via GitHub, PyPI, and web user interfaces.
△ Less
Submitted 4 April, 2025;
originally announced April 2025.
-
SGS-GNN: A Supervised Graph Sparsification method for Graph Neural Networks
Authors:
Siddhartha Shankar Das,
Naheed Anjum Arafat,
Muftiqur Rahman,
S M Ferdous,
Alex Pothen,
Mahantesh M Halappanavar
Abstract:
We propose SGS-GNN, a novel supervised graph sparsifier that learns the sampling probability distribution of edges and samples sparse subgraphs of a user-specified size to reduce the computational costs required by GNNs for inference tasks on large graphs. SGS-GNN employs regularizers in the loss function to enhance homophily in sparse subgraphs, boosting the accuracy of GNNs on heterophilic graph…
▽ More
We propose SGS-GNN, a novel supervised graph sparsifier that learns the sampling probability distribution of edges and samples sparse subgraphs of a user-specified size to reduce the computational costs required by GNNs for inference tasks on large graphs. SGS-GNN employs regularizers in the loss function to enhance homophily in sparse subgraphs, boosting the accuracy of GNNs on heterophilic graphs, where a significant number of the neighbors of a node have dissimilar labels. SGS-GNN also supports conditional updates of the probability distribution learning module based on a prior, which helps narrow the search space for sparse graphs. SGS-GNN requires fewer epochs to obtain high accuracies since it learns the search space of subgraphs more effectively than methods using fixed distributions such as random sampling. Extensive experiments using 33 homophilic and heterophilic graphs demonstrate the following: (i) with only 20% of edges retained in the sparse subgraphs, SGS-GNN improves the F1-scores by a geometric mean of 4% relative to the original graph; on heterophilic graphs, the prediction accuracy is better up to 30%. (ii) SGS-GNN outperforms state-of-the-art methods with improvement in F1-scores of 4-7% in geometric mean with similar sparsities in the sampled subgraphs, and (iii) compared to sparsifiers that employ fixed distributions, SGS-GNN requires about half the number of epochs to converge.
△ Less
Submitted 14 February, 2025;
originally announced February 2025.
-
Can LLMs Rank the Harmfulness of Smaller LLMs? We are Not There Yet
Authors:
Berk Atil,
Vipul Gupta,
Sarkar Snigdha Sarathi Das,
Rebecca J. Passonneau
Abstract:
Large language models (LLMs) have become ubiquitous, thus it is important to understand their risks and limitations. Smaller LLMs can be deployed where compute resources are constrained, such as edge devices, but with different propensity to generate harmful output. Mitigation of LLM harm typically depends on annotating the harmfulness of LLM output, which is expensive to collect from humans. This…
▽ More
Large language models (LLMs) have become ubiquitous, thus it is important to understand their risks and limitations. Smaller LLMs can be deployed where compute resources are constrained, such as edge devices, but with different propensity to generate harmful output. Mitigation of LLM harm typically depends on annotating the harmfulness of LLM output, which is expensive to collect from humans. This work studies two questions: How do smaller LLMs rank regarding generation of harmful content? How well can larger LLMs annotate harmfulness? We prompt three small LLMs to elicit harmful content of various types, such as discriminatory language, offensive content, privacy invasion, or negative influence, and collect human rankings of their outputs. Then, we evaluate three state-of-the-art large LLMs on their ability to annotate the harmfulness of these responses. We find that the smaller models differ with respect to harmfulness. We also find that large LLMs show low to moderate agreement with humans. These findings underline the need for further work on harm mitigation in LLMs.
△ Less
Submitted 21 April, 2025; v1 submitted 7 February, 2025;
originally announced February 2025.
-
GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong Prompt Optimizers
Authors:
Sarkar Snigdha Sarathi Das,
Ryo Kamoi,
Bo Pang,
Yusen Zhang,
Caiming Xiong,
Rui Zhang
Abstract:
The effectiveness of large language models (LLMs) is closely tied to the design of prompts, making prompt optimization essential for enhancing their performance across a wide range of tasks. Many existing approaches to automating prompt engineering rely exclusively on textual feedback, refining prompts based solely on inference errors identified by large, computationally expensive LLMs. Unfortunat…
▽ More
The effectiveness of large language models (LLMs) is closely tied to the design of prompts, making prompt optimization essential for enhancing their performance across a wide range of tasks. Many existing approaches to automating prompt engineering rely exclusively on textual feedback, refining prompts based solely on inference errors identified by large, computationally expensive LLMs. Unfortunately, smaller models struggle to generate high-quality feedback, resulting in complete dependence on large LLM judgment. Moreover, these methods fail to leverage more direct and finer-grained information, such as gradients, due to operating purely in text space. To this end, we introduce GReaTer, a novel prompt optimization technique that directly incorporates gradient information over task-specific reasoning. By utilizing task loss gradients, GReaTer enables self-optimization of prompts for open-source, lightweight language models without the need for costly closed-source LLMs. This allows high-performance prompt optimization without dependence on massive LLMs, closing the gap between smaller models and the sophisticated reasoning often needed for prompt refinement. Extensive evaluations across diverse reasoning tasks including BBH, GSM8k, and FOLIO demonstrate that GReaTer consistently outperforms previous state-of-the-art prompt optimization methods, even those reliant on powerful LLMs. Additionally, GReaTer-optimized prompts frequently exhibit better transferability and, in some cases, boost task performance to levels comparable to or surpassing those achieved by larger language models, highlighting the effectiveness of prompt optimization guided by gradients over reasoning. Code of GReaTer is available at https://github.com/psunlpgroup/GreaTer.
△ Less
Submitted 7 April, 2025; v1 submitted 12 December, 2024;
originally announced December 2024.
-
VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information
Authors:
Ryo Kamoi,
Yusen Zhang,
Sarkar Snigdha Sarathi Das,
Ranran Haoran Zhang,
Rui Zhang
Abstract:
Large Vision Language Models (LVLMs) have achieved remarkable performance in various vision-language tasks. However, it is still unclear how accurately LVLMs can perceive visual information in images. In particular, the capability of LVLMs to perceive geometric information, such as shape, angle, and size, remains insufficiently analyzed, although the perception of these properties is crucial for t…
▽ More
Large Vision Language Models (LVLMs) have achieved remarkable performance in various vision-language tasks. However, it is still unclear how accurately LVLMs can perceive visual information in images. In particular, the capability of LVLMs to perceive geometric information, such as shape, angle, and size, remains insufficiently analyzed, although the perception of these properties is crucial for tasks that require a detailed visual understanding. In this work, we introduce VisOnlyQA, a dataset for evaluating the geometric perception of LVLMs, and reveal that LVLMs often cannot accurately perceive basic geometric information in images, while human performance is nearly perfect. VisOnlyQA consists of 12 tasks that directly ask about geometric information in geometric shapes, charts, chemical structures, and 3D shapes. Our experiments highlight the following findings: (i) State-of-the-art LVLMs struggle with basic geometric perception. 23 LVLMs we evaluate, including GPT-4o and Gemini 2.5 Pro, work poorly on VisOnlyQA. (ii) Additional training data does not resolve this issue. Fine-tuning on the training set of VisOnlyQA is not always effective, even for in-distribution tasks. (iii) LLM may be the bottleneck. LVLMs using stronger LLMs exhibit better geometric perception on VisOnlyQA, while it does not require complex reasoning, suggesting that the way LVLMs process information from visual encoders is a bottleneck. The datasets, code, and model responses are provided at https://github.com/psunlpgroup/VisOnlyQA.
△ Less
Submitted 13 July, 2025; v1 submitted 1 December, 2024;
originally announced December 2024.
-
Verbosity $\neq$ Veracity: Demystify Verbosity Compensation Behavior of Large Language Models
Authors:
Yusen Zhang,
Sarkar Snigdha Sarathi Das,
Rui Zhang
Abstract:
Although Large Language Models (LLMs) have demonstrated their strong capabilities in various tasks, recent work has revealed LLMs also exhibit undesirable behaviors, such as hallucination and toxicity, limiting their reliability and broader adoption. In this paper, we discover an understudied type of undesirable behavior of LLMs, which we term Verbosity Compensation (VC), similar to the hesitation…
▽ More
Although Large Language Models (LLMs) have demonstrated their strong capabilities in various tasks, recent work has revealed LLMs also exhibit undesirable behaviors, such as hallucination and toxicity, limiting their reliability and broader adoption. In this paper, we discover an understudied type of undesirable behavior of LLMs, which we term Verbosity Compensation (VC), similar to the hesitation behavior of humans under uncertainty, where they respond with excessive words such as repeating questions, introducing ambiguity, or providing excessive enumeration. We present the first work that defines and analyzes Verbosity Compensation, explores its causes, and proposes a simple mitigating approach. Our experiments, conducted on five datasets of knowledge and reasoning-based QA tasks with 14 newly developed LLMs, reveal three conclusions. 1) We reveal a pervasive presence of VC across all models and all datasets. Notably, GPT-4 exhibits a VC frequency of 50.40%. 2) We reveal the large performance gap between verbose and concise responses, with a notable difference of 27.61% on the Qasper dataset. We also demonstrate that this difference does not naturally diminish as LLM capability increases. Both 1) and 2) highlight the urgent need to mitigate the frequency of VC behavior and disentangle verbosity with veracity. We propose a simple yet effective cascade algorithm that replaces the verbose responses with the other model-generated responses. The results show that our approach effectively alleviates the VC of the Mistral model from 63.81% to 16.16% on the Qasper dataset. 3) We also find that verbose responses exhibit higher uncertainty across all five datasets, suggesting a strong connection between verbosity and model uncertainty. Our dataset and code are available at https://github.com/psunlpgroup/VerbosityLLM.
△ Less
Submitted 7 December, 2024; v1 submitted 12 November, 2024;
originally announced November 2024.
-
AGS-GNN: Attribute-guided Sampling for Graph Neural Networks
Authors:
Siddhartha Shankar Das,
S M Ferdous,
Mahantesh M Halappanavar,
Edoardo Serra,
Alex Pothen
Abstract:
We propose AGS-GNN, a novel attribute-guided sampling algorithm for Graph Neural Networks (GNNs) that exploits node features and connectivity structure of a graph while simultaneously adapting for both homophily and heterophily in graphs. (In homophilic graphs vertices of the same class are more likely to be connected, and vertices of different classes tend to be linked in heterophilic graphs.) Wh…
▽ More
We propose AGS-GNN, a novel attribute-guided sampling algorithm for Graph Neural Networks (GNNs) that exploits node features and connectivity structure of a graph while simultaneously adapting for both homophily and heterophily in graphs. (In homophilic graphs vertices of the same class are more likely to be connected, and vertices of different classes tend to be linked in heterophilic graphs.) While GNNs have been successfully applied to homophilic graphs, their application to heterophilic graphs remains challenging. The best-performing GNNs for heterophilic graphs do not fit the sampling paradigm, suffer high computational costs, and are not inductive. We employ samplers based on feature-similarity and feature-diversity to select subsets of neighbors for a node, and adaptively capture information from homophilic and heterophilic neighborhoods using dual channels. Currently, AGS-GNN is the only algorithm that we know of that explicitly controls homophily in the sampled subgraph through similar and diverse neighborhood samples. For diverse neighborhood sampling, we employ submodularity, which was not used in this context prior to our work. The sampling distribution is pre-computed and highly parallel, achieving the desired scalability. Using an extensive dataset consisting of 35 small ($\le$ 100K nodes) and large (>100K nodes) homophilic and heterophilic graphs, we demonstrate the superiority of AGS-GNN compare to the current approaches in the literature. AGS-GNN achieves comparable test accuracy to the best-performing heterophilic GNNs, even outperforming methods using the entire graph for node classification. AGS-GNN also converges faster compared to methods that sample neighborhoods randomly, and can be incorporated into existing GNN models that employ node or graph sampling.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Evaluating LLMs at Detecting Errors in LLM Responses
Authors:
Ryo Kamoi,
Sarkar Snigdha Sarathi Das,
Renze Lou,
Jihyun Janice Ahn,
Yilun Zhao,
Xiaoxin Lu,
Nan Zhang,
Yusen Zhang,
Ranran Haoran Zhang,
Sujeeth Reddy Vummanthala,
Salika Dave,
Shaobo Qin,
Arman Cohan,
Wenpeng Yin,
Rui Zhang
Abstract:
With Large Language Models (LLMs) being widely used across various tasks, detecting errors in their responses is increasingly crucial. However, little research has been conducted on error detection of LLM responses. Collecting error annotations on LLM responses is challenging due to the subjective nature of many NLP tasks, and thus previous research focuses on tasks of little practical value (e.g.…
▽ More
With Large Language Models (LLMs) being widely used across various tasks, detecting errors in their responses is increasingly crucial. However, little research has been conducted on error detection of LLM responses. Collecting error annotations on LLM responses is challenging due to the subjective nature of many NLP tasks, and thus previous research focuses on tasks of little practical value (e.g., word sorting) or limited error types (e.g., faithfulness in summarization). This work introduces ReaLMistake, the first error detection benchmark consisting of objective, realistic, and diverse errors made by LLMs. ReaLMistake contains three challenging and meaningful tasks that introduce objectively assessable errors in four categories (reasoning correctness, instruction-following, context-faithfulness, and parameterized knowledge), eliciting naturally observed and diverse errors in responses of GPT-4 and Llama 2 70B annotated by experts. We use ReaLMistake to evaluate error detectors based on 12 LLMs. Our findings show: 1) Top LLMs like GPT-4 and Claude 3 detect errors made by LLMs at very low recall, and all LLM-based error detectors perform much worse than humans. 2) Explanations by LLM-based error detectors lack reliability. 3) LLMs-based error detection is sensitive to small changes in prompts but remains challenging to improve. 4) Popular approaches to improving LLMs, including self-consistency and majority vote, do not improve the error detection performance. Our benchmark and code are provided at https://github.com/psunlpgroup/ReaLMistake.
△ Less
Submitted 27 July, 2024; v1 submitted 4 April, 2024;
originally announced April 2024.
-
Unified Low-Resource Sequence Labeling by Sample-Aware Dynamic Sparse Finetuning
Authors:
Sarkar Snigdha Sarathi Das,
Ranran Haoran Zhang,
Peng Shi,
Wenpeng Yin,
Rui Zhang
Abstract:
Unified Sequence Labeling that articulates different sequence labeling problems such as Named Entity Recognition, Relation Extraction, Semantic Role Labeling, etc. in a generalized sequence-to-sequence format opens up the opportunity to make the maximum utilization of large language model knowledge toward structured prediction. Unfortunately, this requires formatting them into specialized augmente…
▽ More
Unified Sequence Labeling that articulates different sequence labeling problems such as Named Entity Recognition, Relation Extraction, Semantic Role Labeling, etc. in a generalized sequence-to-sequence format opens up the opportunity to make the maximum utilization of large language model knowledge toward structured prediction. Unfortunately, this requires formatting them into specialized augmented format unknown to the base pretrained language model (PLMs) necessitating finetuning to the target format. This significantly bounds its usefulness in data-limited settings where finetuning large models cannot properly generalize to the target format. To address this challenge and leverage PLM knowledge effectively, we propose FISH-DIP, a sample-aware dynamic sparse finetuning strategy that selectively focuses on a fraction of parameters, informed by feedback from highly regressing examples, during the fine-tuning process. By leveraging the dynamism of sparsity, our approach mitigates the impact of well-learned samples and prioritizes underperforming instances for improvement in generalization. Across five tasks of sequence labeling, we demonstrate that FISH-DIP can smoothly optimize the model in low resource settings offering upto 40% performance improvements over full fine-tuning depending on target evaluation settings. Also, compared to in-context learning and other parameter-efficient fine-tuning approaches, FISH-DIP performs comparably or better, notably in extreme low-resource settings.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Hermes: Unlocking Security Analysis of Cellular Network Protocols by Synthesizing Finite State Machines from Natural Language Specifications
Authors:
Abdullah Al Ishtiaq,
Sarkar Snigdha Sarathi Das,
Syed Md Mukit Rashid,
Ali Ranjbar,
Kai Tu,
Tianwei Wu,
Zhezheng Song,
Weixuan Wang,
Mujtahid Akon,
Rui Zhang,
Syed Rafiul Hussain
Abstract:
In this paper, we present Hermes, an end-to-end framework to automatically generate formal representations from natural language cellular specifications. We first develop a neural constituency parser, NEUTREX, to process transition-relevant texts and extract transition components (i.e., states, conditions, and actions). We also design a domain-specific language to translate these transition compon…
▽ More
In this paper, we present Hermes, an end-to-end framework to automatically generate formal representations from natural language cellular specifications. We first develop a neural constituency parser, NEUTREX, to process transition-relevant texts and extract transition components (i.e., states, conditions, and actions). We also design a domain-specific language to translate these transition components to logical formulas by leveraging dependency parse trees. Finally, we compile these logical formulas to generate transitions and create the formal model as finite state machines. To demonstrate the effectiveness of Hermes, we evaluate it on 4G NAS, 5G NAS, and 5G RRC specifications and obtain an overall accuracy of 81-87%, which is a substantial improvement over the state-of-the-art. Our security analysis of the extracted models uncovers 3 new vulnerabilities and identifies 19 previous attacks in 4G and 5G specifications, and 7 deviations in commercial 4G basebands.
△ Less
Submitted 11 October, 2023; v1 submitted 6 October, 2023;
originally announced October 2023.
-
Using Large Language Models to Generate, Validate, and Apply User Intent Taxonomies
Authors:
Chirag Shah,
Ryen W. White,
Reid Andersen,
Georg Buscher,
Scott Counts,
Sarkar Snigdha Sarathi Das,
Ali Montazer,
Sathish Manivannan,
Jennifer Neville,
Xiaochuan Ni,
Nagu Rangan,
Tara Safavi,
Siddharth Suri,
Mengting Wan,
Leijie Wang,
Longqi Yang
Abstract:
Log data can reveal valuable information about how users interact with Web search services, what they want, and how satisfied they are. However, analyzing user intents in log data is not easy, especially for emerging forms of Web search such as AI-driven chat. To understand user intents from log data, we need a way to label them with meaningful categories that capture their diversity and dynamics.…
▽ More
Log data can reveal valuable information about how users interact with Web search services, what they want, and how satisfied they are. However, analyzing user intents in log data is not easy, especially for emerging forms of Web search such as AI-driven chat. To understand user intents from log data, we need a way to label them with meaningful categories that capture their diversity and dynamics. Existing methods rely on manual or machine-learned labeling, which are either expensive or inflexible for large and dynamic datasets. We propose a novel solution using large language models (LLMs), which can generate rich and relevant concepts, descriptions, and examples for user intents. However, using LLMs to generate a user intent taxonomy and apply it for log analysis can be problematic for two main reasons: (1) such a taxonomy is not externally validated; and (2) there may be an undesirable feedback loop. To address this, we propose a new methodology with human experts and assessors to verify the quality of the LLM-generated taxonomy. We also present an end-to-end pipeline that uses an LLM with human-in-the-loop to produce, refine, and apply labels for user intent analysis in log data. We demonstrate its effectiveness by uncovering new insights into user intents from search and chat logs from the Microsoft Bing commercial search engine. The proposed work's novelty stems from the method for generating purpose-driven user intent taxonomies with strong validation. This method not only helps remove methodological and practical bottlenecks from intent-focused research, but also provides a new framework for generating, validating, and applying other kinds of taxonomies in a scalable and adaptable way with reasonable human effort.
△ Less
Submitted 9 May, 2024; v1 submitted 14 September, 2023;
originally announced September 2023.
-
S3-DST: Structured Open-Domain Dialogue Segmentation and State Tracking in the Era of LLMs
Authors:
Sarkar Snigdha Sarathi Das,
Chirag Shah,
Mengting Wan,
Jennifer Neville,
Longqi Yang,
Reid Andersen,
Georg Buscher,
Tara Safavi
Abstract:
The traditional Dialogue State Tracking (DST) problem aims to track user preferences and intents in user-agent conversations. While sufficient for task-oriented dialogue systems supporting narrow domain applications, the advent of Large Language Model (LLM)-based chat systems has introduced many real-world intricacies in open-domain dialogues. These intricacies manifest in the form of increased co…
▽ More
The traditional Dialogue State Tracking (DST) problem aims to track user preferences and intents in user-agent conversations. While sufficient for task-oriented dialogue systems supporting narrow domain applications, the advent of Large Language Model (LLM)-based chat systems has introduced many real-world intricacies in open-domain dialogues. These intricacies manifest in the form of increased complexity in contextual interactions, extended dialogue sessions encompassing a diverse array of topics, and more frequent contextual shifts. To handle these intricacies arising from evolving LLM-based chat systems, we propose joint dialogue segmentation and state tracking per segment in open-domain dialogue systems. Assuming a zero-shot setting appropriate to a true open-domain dialogue system, we propose S3-DST, a structured prompting technique that harnesses Pre-Analytical Recollection, a novel grounding mechanism we designed for improving long context tracking. To demonstrate the efficacy of our proposed approach in joint segmentation and state tracking, we evaluate S3-DST on a proprietary anonymized open-domain dialogue dataset, as well as publicly available DST and segmentation datasets. Across all datasets and settings, S3-DST consistently outperforms the state-of-the-art, demonstrating its potency and robustness the next generation of LLM-based chat systems.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
ConvoWaste: An Automatic Waste Segregation Machine Using Deep Learning
Authors:
Md. Shahariar Nafiz,
Shuvra Smaran Das,
Md. Kishor Morol,
Abdullah Al Juabir,
Dip Nandi
Abstract:
Nowadays, proper urban waste management is one of the biggest concerns for maintaining a green and clean environment. An automatic waste segregation system can be a viable solution to improve the sustainability of the country and boost the circular economy. This paper proposes a machine to segregate waste into different parts with the help of a smart object detection algorithm using ConvoWaste in…
▽ More
Nowadays, proper urban waste management is one of the biggest concerns for maintaining a green and clean environment. An automatic waste segregation system can be a viable solution to improve the sustainability of the country and boost the circular economy. This paper proposes a machine to segregate waste into different parts with the help of a smart object detection algorithm using ConvoWaste in the field of deep convolutional neural networks (DCNN) and image processing techniques. In this paper, deep learning and image processing techniques are applied to precisely classify the waste, and the detected waste is placed inside the corresponding bins with the help of a servo motor-based system. This machine has the provision to notify the responsible authority regarding the waste level of the bins and the time to trash out the bins filled with garbage by using the ultrasonic sensors placed in each bin and the dual-band GSM-based communication technology. The entire system is controlled remotely through an Android app in order to dump the separated waste in the desired place thanks to its automation properties. The use of this system can aid in the process of recycling resources that were initially destined to become waste, utilizing natural resources, and turning these resources back into usable products. Thus, the system helps fulfill the criteria of a circular economy through resource optimization and extraction. Finally, the system is designed to provide services at a low cost while maintaining a high level of accuracy in terms of technological advancement in the field of artificial intelligence (AI). We have gotten 98% accuracy for our ConvoWaste deep learning model.
△ Less
Submitted 6 February, 2023;
originally announced February 2023.
-
The oriented relative clique number of triangle-free planar graphs is 10
Authors:
Soura Sena Das,
Soumen Nandi,
Sagnik Sen
Abstract:
In relation to oriented coloring and chromatic number, the parameter oriented relative clique number of an oriented graph $\overrightarrow{G}$, denoted by $ω_{ro}(\overrightarrow{G})$, is the main focus of this work. We solve an open problem mentioned in the recent survey on oriented coloring by Sopena (Discrete Mathematics 2016), and positively settle a conjecture due to Sen (PhD thesis 2014), by…
▽ More
In relation to oriented coloring and chromatic number, the parameter oriented relative clique number of an oriented graph $\overrightarrow{G}$, denoted by $ω_{ro}(\overrightarrow{G})$, is the main focus of this work. We solve an open problem mentioned in the recent survey on oriented coloring by Sopena (Discrete Mathematics 2016), and positively settle a conjecture due to Sen (PhD thesis 2014), by proving that the maximum value of $ω_{ro}(\overrightarrow{G})$ is $10$ when $\overrightarrow{G}$ is a planar graph.
△ Less
Submitted 22 August, 2022;
originally announced August 2022.
-
Analysis of Temporal Robustness in Massive Machine Type Communications
Authors:
Debjani Goswami,
Merim Dzaferagic,
Harun Siljak,
Suvra Sekhar Das,
Nicola Marchetti
Abstract:
The evolution of fifth generation (5G) networks needs to support the latest use cases, which demand robust network connectivity for the collaborative performance of the network agents, like multi-robot systems and vehicle to anything (V2X) communication. Unfortunately, the user device's limited communication range and battery constraint confirm the unfitness of known robustness metrics suggested f…
▽ More
The evolution of fifth generation (5G) networks needs to support the latest use cases, which demand robust network connectivity for the collaborative performance of the network agents, like multi-robot systems and vehicle to anything (V2X) communication. Unfortunately, the user device's limited communication range and battery constraint confirm the unfitness of known robustness metrics suggested for fixed networks, when applied to time-switching communication graphs. Furthermore, the calculation of most of the existing robustness metrics involves non-deterministic polynomial-time complexity, and hence are best-fitted only for small networks. Despite a large volume of works, the complete analysis of a $\textit{low-complexity}$ temporal robustness metric for a communication network is absent in the literature, and the present work aims to fill this gap. More in detail, our work provides a stochastic analysis of network robustness for a massive machine type communication (mMTC) network. The numerical investigation corroborates the exactness of the proposed analytical framework for temporal robustness metric. Along with studying the impact on network robustness of various system parameters, such as cluster head (CH) probability, power threshold value, network size, and node failure probability, we justify the observed trend of numerical results probabilistically.
△ Less
Submitted 6 August, 2022;
originally announced August 2022.
-
A Deep Learning Framework to Reconstruct Face under Mask
Authors:
Gourango Modak,
Shuvra Smaran Das,
Md. Ajharul Islam Miraj,
Md. Kishor Morol
Abstract:
While deep learning-based image reconstruction methods have shown significant success in removing objects from pictures, they have yet to achieve acceptable results for attributing consistency to gender, ethnicity, expression, and other characteristics like the topological structure of the face. The purpose of this work is to extract the mask region from a masked image and rebuild the area that ha…
▽ More
While deep learning-based image reconstruction methods have shown significant success in removing objects from pictures, they have yet to achieve acceptable results for attributing consistency to gender, ethnicity, expression, and other characteristics like the topological structure of the face. The purpose of this work is to extract the mask region from a masked image and rebuild the area that has been detected. This problem is complex because (i) it is difficult to determine the gender of an image hidden behind a mask, which causes the network to become confused and reconstruct the male face as a female or vice versa; (ii) we may receive images from multiple angles, making it extremely difficult to maintain the actual shape, topological structure of the face and a natural image; and (iii) there are problems with various mask forms because, in some cases, the area of the mask cannot be anticipated precisely; certain parts of the mask remain on the face after completion. To solve this complex task, we split the problem into three phases: landmark detection, object detection for the targeted mask area, and inpainting the addressed mask region. To begin, to solve the first problem, we have used gender classification, which detects the actual gender behind a mask, then we detect the landmark of the masked facial image. Second, we identified the non-face item, i.e., the mask, and used the Mask R-CNN network to create the binary mask of the observed mask area. Thirdly, we developed an inpainting network that uses anticipated landmarks to create realistic images. To segment the mask, this article uses a mask R-CNN and offers a binary segmentation map for identifying the mask area. Additionally, we generated the image utilizing landmarks as structural guidance through a GAN-based network. The studies presented in this paper use the FFHQ and CelebA datasets.
△ Less
Submitted 23 March, 2022;
originally announced March 2022.
-
CONTaiNER: Few-Shot Named Entity Recognition via Contrastive Learning
Authors:
Sarkar Snigdha Sarathi Das,
Arzoo Katiyar,
Rebecca J. Passonneau,
Rui Zhang
Abstract:
Named Entity Recognition (NER) in Few-Shot setting is imperative for entity tagging in low resource domains. Existing approaches only learn class-specific semantic features and intermediate representations from source domains. This affects generalizability to unseen target domains, resulting in suboptimal performances. To this end, we present CONTaiNER, a novel contrastive learning technique that…
▽ More
Named Entity Recognition (NER) in Few-Shot setting is imperative for entity tagging in low resource domains. Existing approaches only learn class-specific semantic features and intermediate representations from source domains. This affects generalizability to unseen target domains, resulting in suboptimal performances. To this end, we present CONTaiNER, a novel contrastive learning technique that optimizes the inter-token distribution distance for Few-Shot NER. Instead of optimizing class-specific attributes, CONTaiNER optimizes a generalized objective of differentiating between token categories based on their Gaussian-distributed embeddings. This effectively alleviates overfitting issues originating from training domains. Our experiments in several traditional test domains (OntoNotes, CoNLL'03, WNUT '17, GUM) and a new large scale Few-Shot NER dataset (Few-NERD) demonstrate that on average, CONTaiNER outperforms previous methods by 3%-13% absolute F1 points while showing consistent performance trends, even in challenging scenarios where previous approaches could not achieve appreciable performance.
△ Less
Submitted 28 March, 2022; v1 submitted 15 September, 2021;
originally announced September 2021.
-
V2W-BERT: A Framework for Effective Hierarchical Multiclass Classification of Software Vulnerabilities
Authors:
Siddhartha Shankar Das,
Edoardo Serra,
Mahantesh Halappanavar,
Alex Pothen,
Ehab Al-Shaer
Abstract:
Weaknesses in computer systems such as faults, bugs and errors in the architecture, design or implementation of software provide vulnerabilities that can be exploited by attackers to compromise the security of a system. Common Weakness Enumerations (CWE) are a hierarchically designed dictionary of software weaknesses that provide a means to understand software flaws, potential impact of their expl…
▽ More
Weaknesses in computer systems such as faults, bugs and errors in the architecture, design or implementation of software provide vulnerabilities that can be exploited by attackers to compromise the security of a system. Common Weakness Enumerations (CWE) are a hierarchically designed dictionary of software weaknesses that provide a means to understand software flaws, potential impact of their exploitation, and means to mitigate these flaws. Common Vulnerabilities and Exposures (CVE) are brief low-level descriptions that uniquely identify vulnerabilities in a specific product or protocol. Classifying or mapping of CVEs to CWEs provides a means to understand the impact and mitigate the vulnerabilities. Since manual mapping of CVEs is not a viable option, automated approaches are desirable but challenging.
We present a novel Transformer-based learning framework (V2W-BERT) in this paper. By using ideas from natural language processing, link prediction and transfer learning, our method outperforms previous approaches not only for CWE instances with abundant data to train, but also rare CWE classes with little or no data to train. Our approach also shows significant improvements in using historical data to predict links for future instances of CVEs, and therefore, provides a viable approach for practical applications. Using data from MITRE and National Vulnerability Database, we achieve up to 97% prediction accuracy for randomly partitioned data and up to 94% prediction accuracy in temporally partitioned data. We believe that our work will influence the design of better methods and training models, as well as applications to solve increasingly harder problems in cybersecurity.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.
-
A Survey on Deep Learning Based Point-Of-Interest (POI) Recommendations
Authors:
Md. Ashraful Islam,
Mir Mahathir Mohammad,
Sarkar Snigdha Sarathi Das,
Mohammed Eunus Ali
Abstract:
Location-based Social Networks (LBSNs) enable users to socialize with friends and acquaintances by sharing their check-ins, opinions, photos, and reviews. Huge volume of data generated from LBSNs opens up a new avenue of research that gives birth to a new sub-field of recommendation systems, known as Point-of-Interest (POI) recommendation. A POI recommendation technique essentially exploits users'…
▽ More
Location-based Social Networks (LBSNs) enable users to socialize with friends and acquaintances by sharing their check-ins, opinions, photos, and reviews. Huge volume of data generated from LBSNs opens up a new avenue of research that gives birth to a new sub-field of recommendation systems, known as Point-of-Interest (POI) recommendation. A POI recommendation technique essentially exploits users' historical check-ins and other multi-modal information such as POI attributes and friendship network, to recommend the next set of POIs suitable for a user. A plethora of earlier works focused on traditional machine learning techniques by using hand-crafted features from the dataset. With the recent surge of deep learning research, we have witnessed a large variety of POI recommendation works utilizing different deep learning paradigms. These techniques largely vary in problem formulations, proposed techniques, used datasets, and features, etc. To the best of our knowledge, this work is the first comprehensive survey of all major deep learning-based POI recommendation works. Our work categorizes and critically analyzes the recent POI recommendation works based on different deep learning paradigms and other relevant features. This review can be considered a cookbook for researchers or practitioners working in the area of POI recommendation.
△ Less
Submitted 19 November, 2020;
originally announced November 2020.
-
BayesBeat: Reliable Atrial Fibrillation Detection from Noisy Photoplethysmography Data
Authors:
Sarkar Snigdha Sarathi Das,
Subangkar Karmaker Shanto,
Masum Rahman,
Md. Saiful Islam,
Atif Rahman,
Mohammad Mehedy Masud,
Mohammed Eunus Ali
Abstract:
Smartwatches or fitness trackers have garnered a lot of popularity as potential health tracking devices due to their affordable and longitudinal monitoring capabilities. To further widen their health tracking capabilities, in recent years researchers have started to look into the possibility of Atrial Fibrillation (AF) detection in real-time leveraging photoplethysmography (PPG) data, an inexpensi…
▽ More
Smartwatches or fitness trackers have garnered a lot of popularity as potential health tracking devices due to their affordable and longitudinal monitoring capabilities. To further widen their health tracking capabilities, in recent years researchers have started to look into the possibility of Atrial Fibrillation (AF) detection in real-time leveraging photoplethysmography (PPG) data, an inexpensive sensor widely available in almost all smartwatches. A significant challenge in AF detection from PPG signals comes from the inherent noise in the smartwatch PPG signals. In this paper, we propose a novel deep learning based approach, BayesBeat that leverages the power of Bayesian deep learning to accurately infer AF risks from noisy PPG signals, and at the same time provides an uncertainty estimate of the prediction. Extensive experiments on two publicly available dataset reveal that our proposed method BayesBeat outperforms the existing state-of-the-art methods. Moreover, BayesBeat is substantially more efficient having 40-200X fewer parameters than state-of-the-art baseline approaches making it suitable for deployment in resource constrained wearable devices.
△ Less
Submitted 16 September, 2022; v1 submitted 2 November, 2020;
originally announced November 2020.
-
Boosting House Price Predictions using Geo-Spatial Network Embedding
Authors:
Sarkar Snigdha Sarathi Das,
Mohammed Eunus Ali,
Yuan-Fang Li,
Yong-Bin Kang,
Timos Sellis
Abstract:
Real estate contributes significantly to all major economies around the world. In particular, house prices have a direct impact on stakeholders, ranging from house buyers to financing companies. Thus, a plethora of techniques have been developed for real estate price prediction. Most of the existing techniques rely on different house features to build a variety of prediction models to predict hous…
▽ More
Real estate contributes significantly to all major economies around the world. In particular, house prices have a direct impact on stakeholders, ranging from house buyers to financing companies. Thus, a plethora of techniques have been developed for real estate price prediction. Most of the existing techniques rely on different house features to build a variety of prediction models to predict house prices. Perceiving the effect of spatial dependence on house prices, some later works focused on introducing spatial regression models for improving prediction performance. However, they fail to take into account the geo-spatial context of the neighborhood amenities such as how close a house is to a train station, or a highly-ranked school, or a shopping center. Such contextual information may play a vital role in users' interests in a house and thereby has a direct influence on its price. In this paper, we propose to leverage the concept of graph neural networks to capture the geo-spatial context of the neighborhood of a house. In particular, we present a novel method, the Geo-Spatial Network Embedding (GSNE), that learns the embeddings of houses and various types of Points of Interest (POIs) in the form of multipartite networks, where the houses and the POIs are represented as attributed nodes and the relationships between them as edges. Extensive experiments with a large number of regression techniques show that the embeddings produced by our proposed GSNE technique consistently and significantly improve the performance of the house price prediction task regardless of the downstream regression model.
△ Less
Submitted 1 September, 2020;
originally announced September 2020.
-
Non Orthogonal Multiple Access with Orthogonal Time Frequency Space Signal Transmission
Authors:
Aritra Chatterjee,
Vivek Rangamgari,
Shashank Tiwari,
Suvra Sekhar Das
Abstract:
Orthogonal time frequency space (OTFS) is being pursued in recent times as a suitable wireless transmission technology for use in high mobility scenarios. In this work, we propose nonorthogonal multiple acess (NOMA) based OTFS which may be called NOMA-OTFS system and evaluate its performance from system level and link level perspective. The challenge lies in the fact that while OTFS transmission t…
▽ More
Orthogonal time frequency space (OTFS) is being pursued in recent times as a suitable wireless transmission technology for use in high mobility scenarios. In this work, we propose nonorthogonal multiple acess (NOMA) based OTFS which may be called NOMA-OTFS system and evaluate its performance from system level and link level perspective. The challenge lies in the fact that while OTFS transmission technology is known for its resilience to high mobility conditions, while NOMA is known to yield high spectral efficiency in low mobility scenarios in comparison to orthogonal multiple access (OMA). We present a minimum mean square error (MMSE)- successive interference cancellation (SIC) based receiver for NOMA-OTFS, for which we derive expression for symbol-wise post-processing SINR in order to evaluate system sum spectral efficiency (SE). We develop power allocation schemes to maximize the sum SE in the high-mobility version of NOMA. We further design a realizable codeword level SIC (CWIC) receiver using LDPC codes along with MMSE equalization for evaluating link level performance of such practical NOMA-OTFS system. The system level and link level performance of the proposed NOMA-OTFS system are compared against benchmark OMA-OTFS, OMA-orthogonal frequency division multiplexing (OMA-OFDM) and NOMA-OFDM schemes. From system-level performance evaluation, we observe interestingly that NOMA-OTFS provides higher sum SE than OMA-OTFS. When compared to NOMA-OFDM, we find that outage SE of NOMA-OTFS is improved at the cost of decrease in mean SE. Whereas link-level results show that the developed CWIC based NOMA-OTFS receiver performs significantly better than NOMA-OFDM in terms of block error rate (BLER), goodput and throughput.
△ Less
Submitted 31 May, 2020; v1 submitted 13 March, 2020;
originally announced March 2020.
-
OTFS: Interleaved OFDM with Block CP
Authors:
Vivek Rangamgari,
Shashank Tiwari,
Suvra Sekhar Das,
Subhas Chandra Mondal
Abstract:
Orthogonal time frequency space (OTFS) modulation is a recently proposed waveform for reliable communication in high-speed vehicular communication scenarios. It has better resilience to inter-carrier interference (ICI) than orthogonal frequency division multiplexing (OFDM). In this work, we describe OTFS as block-OFDM with a cyclic prefix and time interleaving. This interpretation helps one visual…
▽ More
Orthogonal time frequency space (OTFS) modulation is a recently proposed waveform for reliable communication in high-speed vehicular communication scenarios. It has better resilience to inter-carrier interference (ICI) than orthogonal frequency division multiplexing (OFDM). In this work, we describe OTFS as block-OFDM with a cyclic prefix and time interleaving. This interpretation helps one visualize OTFS in the light of OFDM as well as it also helps in analyzing the gain obtained by OTFS over OFDM. Further, we compare the performance of OTFS with its contender 5G new radio (NR)'s OFDM configuration of variable subcarrier bandwidth (VSB-OFDM) while considering practical forward error correction codes and 3GPP high-speed channel model. This provides realistic performance comparison, which is highly desired for technology realization. Considering practical channel estimation, we find that OTFS outperforms VSB-OFDM with 5G NR parameter by about 5dB. We also present results on peak to average power ratio (PAPR) due to specific pilot structure used in OTFS for channel estimation.
△ Less
Submitted 28 January, 2020; v1 submitted 8 January, 2020;
originally announced January 2020.
-
CCCNet: An Attention Based Deep Learning Framework for Categorized Crowd Counting
Authors:
Sarkar Snigdha Sarathi Das,
Syed Md. Mukit Rashid,
Mohammed Eunus Ali
Abstract:
Crowd counting problem that counts the number of people in an image has been extensively studied in recent years. In this paper, we introduce a new variant of crowd counting problem, namely "Categorized Crowd Counting", that counts the number of people sitting and standing in a given image. Categorized crowd counting has many real-world applications such as crowd monitoring, customer service, and…
▽ More
Crowd counting problem that counts the number of people in an image has been extensively studied in recent years. In this paper, we introduce a new variant of crowd counting problem, namely "Categorized Crowd Counting", that counts the number of people sitting and standing in a given image. Categorized crowd counting has many real-world applications such as crowd monitoring, customer service, and resource management. The major challenges in categorized crowd counting come from high occlusion, perspective distortion and the seemingly identical upper body posture of sitting and standing persons. Existing density map based approaches perform well to approximate a large crowd, but lose important local information necessary for categorization. On the other hand, traditional detection-based approaches perform poorly in occluded environments, especially when the crowd size gets bigger. Hence, to solve the categorized crowd counting problem, we develop a novel attention-based deep learning framework that addresses the above limitations. In particular, our approach works in three phases: i) We first generate basic detection based sitting and standing density maps to capture the local information; ii) Then, we generate a crowd counting based density map as global counting feature; iii) Finally, we have a cross-branch segregating refinement phase that splits the crowd density map into final sitting and standing density maps using attention mechanism. Extensive experiments show the efficacy of our approach in solving the categorized crowd counting problem.
△ Less
Submitted 11 December, 2019;
originally announced December 2019.
-
Circularly Pulse Shaped Orthogonal Time Frequency Space Modulation
Authors:
Shashank Tiwari,
Suvra Sekhar Das
Abstract:
Orthogonal time-frequency space (OTFS) modulation is a recently proposed waveform for efficient data transfer in high-speed vehicular scenarios. The use of rectangular pulse shape in OTFS results in high out of band (OoB) radiation, which is undesirable for multi-user scenarios. In this work, we present a circular pulse shaping framework for OTFS for reducing the OoB. We also design a low complexi…
▽ More
Orthogonal time-frequency space (OTFS) modulation is a recently proposed waveform for efficient data transfer in high-speed vehicular scenarios. The use of rectangular pulse shape in OTFS results in high out of band (OoB) radiation, which is undesirable for multi-user scenarios. In this work, we present a circular pulse shaping framework for OTFS for reducing the OoB. We also design a low complexity transmitter for such a system. We argue in favor of orthogonal transmission for low complexity transceiver structure. We establish that frequency-localized circulant Dirichlet pulse is one of the possible pulses having this desirable unitary property, which can reduce OoB radiation significantly (by around 50 dB) without any loss in BER. We also show that our proposed pulse shaped OTFS has a lower peak to average power ratio than the conventional OTFS system.
△ Less
Submitted 23 October, 2019;
originally announced October 2019.
-
Low Complexity LMMSE Receiver for OTFS
Authors:
Shashank Tiwari,
Suvra Sekhar das,
Vivek Rangamgari
Abstract:
Orthogonal time frequency space modulation is a two dimensional (2D) delay-Doppler domain waveform. It uses inverse symplectic Fourier transform (ISFFT) to spread the signal in time-frequency domain. To extract diversity gain from 2D spreaded signal, advanced receivers are required. In this work, we investigate a low complexity linear minimum mean square error receiver which exploits sparsity and…
▽ More
Orthogonal time frequency space modulation is a two dimensional (2D) delay-Doppler domain waveform. It uses inverse symplectic Fourier transform (ISFFT) to spread the signal in time-frequency domain. To extract diversity gain from 2D spreaded signal, advanced receivers are required. In this work, we investigate a low complexity linear minimum mean square error receiver which exploits sparsity and quasi-banded structure of matrices involved in the demodulation process which results in a log-linear order of complexity without any performance degradation of BER.
△ Less
Submitted 3 October, 2019;
originally announced October 2019.
-
Coverage Analysis of 3-D Dense Cellular Networks with Realistic Propagation Conditions
Authors:
Aritra Chatterjee,
Suvra Sekhar Das
Abstract:
In recent times, the use of stochastic geometry has become a popular and important tool for performance analysis of next-generation dense small cell wireless networks. Usually, such networks are modeled using 2 dimensional spatial Poisson point processes (SPPP). Moreover, the distinctive effects of line-of-sight (LOS) and non-line-of-sight (NLOS) propagation are also not explicitly taken into acco…
▽ More
In recent times, the use of stochastic geometry has become a popular and important tool for performance analysis of next-generation dense small cell wireless networks. Usually, such networks are modeled using 2 dimensional spatial Poisson point processes (SPPP). Moreover, the distinctive effects of line-of-sight (LOS) and non-line-of-sight (NLOS) propagation are also not explicitly taken into account in such analysis. The aim of the current work is to bridge this gap by modeling the access point (AP) and user equipment (UE) locations by 3-dimensional SPPP and considering the realistic LOS/NLOS channel models (path loss and small scale fading) as reported in existing standards. The effect of UE density on downlink coverage probability has also been investigated. In this process, the probabilistic activity of APs has been analytically modeled as a function of AP and UE densities. The derived upper bound of coverage probability is found to be numerically simple as well as extremely tight in nature and thus can be used as a close approximation of the same.
△ Less
Submitted 31 December, 2019; v1 submitted 15 April, 2019;
originally announced April 2019.
-
Spectral Efficiency Analysis in Presence of Correlated Gamma-Lognormal Desired and Interfering Signals
Authors:
Aritra Chatterjee,
Sandeep Mukherjee,
Suvra Sekhar Das
Abstract:
Spectral efficiency analysis in presence of correlated interfering signals is very important in modern generation wireless networks where there is aggressive frequency reuse with a dense deployment of access points. However, most works available in literature either address the effect of correlated interfering signals or include interferer activity, but not both. Further, available literature has…
▽ More
Spectral efficiency analysis in presence of correlated interfering signals is very important in modern generation wireless networks where there is aggressive frequency reuse with a dense deployment of access points. However, most works available in literature either address the effect of correlated interfering signals or include interferer activity, but not both. Further, available literature has also addressed the effect of large-scale fading (shadowing and distance-dependent path loss) only, however, has fallen short of including the composite effect of the line of sight and non-line of sight small-scale fading. The correlation of desired signals with interfering signals due to shadowing has also not been considered in existing literature. In this work, we present a comprehensive analytical signal to interference power ratio evaluation framework addressing all the above mentioned important components to the model in a holistic manner. In this analysis, we extend and apply the Moment Generating Function-matching method to such systems so that correlation and activity of lognormal random variables can be included with high accuracy. We compare the analytical results against realistic channel model based extensive Monte-Carlo simulation for mmWave and sub-6 GHz in both indoor and outdoor scenarios. the performance of the model is depicted in terms of mean, alpha-percentile outage spectral efficiency and Kullback-Leibler divergence and Kolmogorov-Smirnov distance.
△ Less
Submitted 13 March, 2020; v1 submitted 13 November, 2018;
originally announced November 2018.
-
Design of Low Complexity GFDM Transceiver
Authors:
Shashank Tiwari,
Suvra Sekhar Das
Abstract:
In this work, we propose a novel low complexity Generalised Frequency Division Multiplexing (GFDM) transceiver design. GFDM modulation matrix is factorized into FFT matrices and a diagonal matrix to design low complexity GFDM transmitter. Factorization of GFDM modulation matrix is used to derive low complexity Matched Filter (MF), Zero Forcing (ZF) and Minimum Mean Square Error (MMSE) based novel…
▽ More
In this work, we propose a novel low complexity Generalised Frequency Division Multiplexing (GFDM) transceiver design. GFDM modulation matrix is factorized into FFT matrices and a diagonal matrix to design low complexity GFDM transmitter. Factorization of GFDM modulation matrix is used to derive low complexity Matched Filter (MF), Zero Forcing (ZF) and Minimum Mean Square Error (MMSE) based novel low complexity self-interference equalizers. A two-stage receiver is proposed for multipath fading channel in which channel equalization is followed by our proposed low-complexity self-interference equalizers. Unlike other known low complexity GFDM transceivers, our proposed transceiver attains low complexity for arbitrary number of time and frequency slots. The complexity of our proposed transceiver is log-linear with a number of transmitted symbols and achieves 3 to 300 times lower complexity compared to the existing structures without incurring any performance loss. Our proposed Unbiased-MMSE receiver outperforms our proposed ZF receiver without any significant increase in complexity especially in the case of large number of time slots. In a nutshell, our proposed transceiver enables low complexity flexible GFDM transceiver implementation.
△ Less
Submitted 16 November, 2018; v1 submitted 12 November, 2018;
originally announced November 2018.
-
Multi-Objective Framework for Dynamic Optimization of OFDMA Cellular Systems
Authors:
Prabhu Chandhar,
Suvra Sekhar Das
Abstract:
Green cellular networking has become an important research area in recent years due to environmental and economical concerns. Switching off under-utilized BSs during off-peak traffic load conditions is a promising approach to reduce energy consumption in cellular networks. In practice, during initial cell planning, the BS locations and RAN parameters are optimized to meet the basic system design r…
▽ More
Green cellular networking has become an important research area in recent years due to environmental and economical concerns. Switching off under-utilized BSs during off-peak traffic load conditions is a promising approach to reduce energy consumption in cellular networks. In practice, during initial cell planning, the BS locations and RAN parameters are optimized to meet the basic system design requirements like coverage, capacity, overlap, QoS etc. As these metrics are tightly coupled with each other due to co-channel interference, switching off certain BSs may affect the system requirements. Therefore, identifying a subset of large number of BSs which are to be put into sleep mode, is a challenging dynamic optimization problem. In this work, we develop a multiobjective framework for dynamic optimization framework for OFDMA based cellular systems. The objective is to identify the appropriate set of active sectors and RAN parameters that maximize coverage and area spectral efficiency while minimizing overlap and area power consumption without violating the QoS requirements for a given traffic demand density. The objective functions and constraints are obtained using appropriate analytical models which capture the traffic characteristics, propagation characteristics (pathloss, shadowing, and small scale fading) as well as load condition in neighbouring cells. A low complexity evolutionary algorithm is used for identifying the global Pareto optimal solutions at a faster convergence rate. The inter-relationships between the system objectives are studied and guidelines are provided to find an appropriate network configuration that provides the best achievable trade-offs. The results show that using the proposed framework, significant amount of energy saving can be achieved and with a low computational complexity while maintaining good trade-offs among the other objectives.
△ Less
Submitted 5 April, 2016; v1 submitted 4 February, 2016;
originally announced February 2016.
-
Precoded GFDM System to Combat Inter Carrier Interference : Performance Analysis
Authors:
Shashank Tiwari,
Suvra Sekhar Das,
Kalyan Kumar Bandyopadhyay
Abstract:
The expected operating scenarios of 5G pose a great challenge to orthogonal frequency division multiplexing (OFDM) which has poor out of band (OoB) spectral properties, stringent synchronization requirements, and large symbol duration. Generalized frequency division multiplexing (GFDM) which is the focus of this work, has been suggested in the literature as one of the possible solutions to meet 5G…
▽ More
The expected operating scenarios of 5G pose a great challenge to orthogonal frequency division multiplexing (OFDM) which has poor out of band (OoB) spectral properties, stringent synchronization requirements, and large symbol duration. Generalized frequency division multiplexing (GFDM) which is the focus of this work, has been suggested in the literature as one of the possible solutions to meet 5G requirements. In this work, the analytical performance evaluation of MMSE receiver for GFDM is presented. We also proposed precoding techniques to enhance the performance of GFDM. A simplified expression of SINR for MMSE receiver of GFDM is derived using special properties related to the modulation matrix of GFDM, which are described in this work. This SINR is used to evaluate the BER performance. Precoding schemes are proposed to reduce complexity of GFDM-MMSE receiver without compromising on the performance. Block Inverse Discrete Fourier Transform (BIDFT) and Discrete Fourier Transform (DFT) based precoding schemes are found to outperform GFDM-MMSE receiver due to frequency diversity gain while having complexity similar to zero-forcing receiver of GFDM. It is shown that both BIDFT and DFT-based precoding schemes reduce peak to average power ratio (PAPR) significantly. Computational complexity of different transmitters and receivers of precoded and uncoded GFDM is also presented.
△ Less
Submitted 13 June, 2015; v1 submitted 11 June, 2015;
originally announced June 2015.