Search | arXiv e-print repository

Aligning Text, Images, and 3D Structure Token-by-Token

Authors: Aadarsh Sahoo, Vansh Tibrewal, Georgia Gkioxari

Abstract: Creating machines capable of understanding the world in 3D is essential in assisting designers that build and edit 3D environments and robots navigating and interacting within a three-dimensional space. Inspired by advances in language and image modeling, we investigate the potential of autoregressive models for a new modality: structured 3D scenes. To this end, we propose a unified LLM framework… ▽ More Creating machines capable of understanding the world in 3D is essential in assisting designers that build and edit 3D environments and robots navigating and interacting within a three-dimensional space. Inspired by advances in language and image modeling, we investigate the potential of autoregressive models for a new modality: structured 3D scenes. To this end, we propose a unified LLM framework that aligns language, images, and 3D scenes and provide a detailed ''cookbook'' outlining critical design choices for achieving optimal training and performance addressing key questions related to data representation, modality-specific objectives, and more. We evaluate performance across four core 3D tasks -- rendering, recognition, instruction-following, and question-answering -- and four 3D datasets, synthetic and real-world. We extend our approach to reconstruct complex 3D object shapes by enriching our 3D modality with quantized shape encodings, and show our model's effectiveness on real-world 3D object recognition tasks. Project webpage: https://glab-caltech.github.io/kyvo/ △ Less

Submitted 9 June, 2025; originally announced June 2025.

Comments: Project webpage: https://glab-caltech.github.io/kyvo/

arXiv:2506.02945 [pdf, ps, other]

Quantitative LLM Judges

Authors: Aishwarya Sahoo, Jeevana Kruthi Karnuthala, Tushar Parmanand Budhwani, Pranchal Agarwal, Sankaran Vaidyanathan, Alexa Siu, Franck Dernoncourt, Jennifer Healey, Nedim Lipka, Ryan Rossi, Uttaran Bhattacharya, Branislav Kveton

Abstract: LLM-as-a-judge is a framework in which a large language model (LLM) automatically evaluates the output of another LLM. We propose quantitative LLM judges, which align evaluation scores of existing LLM judges to human scores in a given domain using regression models. The models are trained to improve the score of the original judge by using the judge's textual evaluation and score. We present four… ▽ More LLM-as-a-judge is a framework in which a large language model (LLM) automatically evaluates the output of another LLM. We propose quantitative LLM judges, which align evaluation scores of existing LLM judges to human scores in a given domain using regression models. The models are trained to improve the score of the original judge by using the judge's textual evaluation and score. We present four quantitative judges for different types of absolute and relative feedback, which showcases the generality and versatility of our framework. Our framework is more computationally efficient than supervised fine-tuning and can be more statistically efficient when human feedback is limited, which is expected in most applications of our work. We validate these claims empirically on four datasets using two base judges. Our experiments show that quantitative judges can effectively improve the predictive power of existing judges through post-hoc modeling. △ Less

Submitted 3 June, 2025; originally announced June 2025.

arXiv:2505.24763 [pdf, ps, other]

Detecting Airborne Objects with 5G NR Radars

Authors: Steve Blandino, Nada Golmie, Anirudha Sahoo, Thao Nguyen, Tanguy Ropitault, David Griffith, Amala Sonny

Abstract: The integration of sensing capabilities into 5G New Radio (5G NR) networks offers an opportunity to enable the detection of airborne objects without the need for dedicated radars. This paper investigates the feasibility of using standardized Positioning Reference Signals (PRS) to detect UAVs in Urban Micro (UMi) and Urban Macro (UMa) propagation environments. A full 5G NR radar processing chain is… ▽ More The integration of sensing capabilities into 5G New Radio (5G NR) networks offers an opportunity to enable the detection of airborne objects without the need for dedicated radars. This paper investigates the feasibility of using standardized Positioning Reference Signals (PRS) to detect UAVs in Urban Micro (UMi) and Urban Macro (UMa) propagation environments. A full 5G NR radar processing chain is implemented, including clutter suppression, angle and range estimation, and 3D position reconstruction. Simulation results show that performance strongly depends on the propagation environment. 5G NR radars exhibit the highest missed detection rate, up to 16%, in UMi, due to severe clutter. Positioning error increases with target distance, resulting in larger errors in UMa scenarios and at higher UAV altitudes. In particular, the system achieves a position error within 4m in the UMi environment and within 8m in UMa. The simulation platform has been released as open-source software to support reproducible research in integrated sensing and communication (ISAC) systems. △ Less

Submitted 30 May, 2025; originally announced May 2025.

arXiv:2505.05755 [pdf, other]

Insertion Language Models: Sequence Generation with Arbitrary-Position Insertions

Authors: Dhruvesh Patel, Aishwarya Sahoo, Avinash Amballa, Tahira Naseem, Tim G. J. Rudner, Andrew McCallum

Abstract: Autoregressive models (ARMs), which predict subsequent tokens one-by-one ``from left to right,'' have achieved significant success across a wide range of sequence generation tasks. However, they struggle to accurately represent sequences that require satisfying sophisticated constraints or whose sequential dependencies are better addressed by out-of-order generation. Masked Diffusion Models (MDMs)… ▽ More Autoregressive models (ARMs), which predict subsequent tokens one-by-one ``from left to right,'' have achieved significant success across a wide range of sequence generation tasks. However, they struggle to accurately represent sequences that require satisfying sophisticated constraints or whose sequential dependencies are better addressed by out-of-order generation. Masked Diffusion Models (MDMs) address some of these limitations, but the process of unmasking multiple tokens simultaneously in MDMs can introduce incoherences, and MDMs cannot handle arbitrary infilling constraints when the number of tokens to be filled in is not known in advance. In this work, we introduce Insertion Language Models (ILMs), which learn to insert tokens at arbitrary positions in a sequence -- that is, they select jointly both the position and the vocabulary element to be inserted. By inserting tokens one at a time, ILMs can represent strong dependencies between tokens, and their ability to generate sequences in arbitrary order allows them to accurately model sequences where token dependencies do not follow a left-to-right sequential structure. To train ILMs, we propose a tailored network parameterization and use a simple denoising objective. Our empirical evaluation demonstrates that ILMs outperform both ARMs and MDMs on common planning tasks. Furthermore, we show that ILMs outperform MDMs and perform on par with ARMs in an unconditional text generation task while offering greater flexibility than MDMs in arbitrary-length text infilling. △ Less

Submitted 15 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

Comments: Corrected a typo in author names

arXiv:2503.05718 [pdf, other]

zScore: A Universal Decentralised Reputation System for the Blockchain Economy

Authors: Himanshu Udupi, Ashutosh Sahoo, Akshay S. P., Gurukiran S., Parag Paul, Petrus C. Martens

Abstract: Modern society functions on trust. The onchain economy, however, is built on the founding principles of trustless peer-to-peer interactions in an adversarial environment without a centralised body of trust and needs a verifiable system to quantify credibility to minimise bad economic activity. We provide a robust framework titled zScore, a core primitive for reputation derived from a wallet's onch… ▽ More Modern society functions on trust. The onchain economy, however, is built on the founding principles of trustless peer-to-peer interactions in an adversarial environment without a centralised body of trust and needs a verifiable system to quantify credibility to minimise bad economic activity. We provide a robust framework titled zScore, a core primitive for reputation derived from a wallet's onchain behaviour using state-of-the-art AI neural network models combined with real-world credentials ported onchain through zkTLS. The initial results tested on retroactive data from lending protocols establish a strong correlation between a good zScore and healthy borrowing and repayment behaviour, making it a robust and decentralised alibi for creditworthiness; we highlight significant improvements from previous attempts by protocols like Cred showcasing its robustness. We also present a list of possible applications of our system in Section 5, thereby establishing its utility in rewarding actual value creation while filtering noise and suspicious activity and flagging malicious behaviour by bad actors. △ Less

Submitted 17 February, 2025; originally announced March 2025.

ACM Class: K.4.4; I.2.11; C.2.4; K.4.2; H.3.5

arXiv:2502.03086 [pdf, other]

Implementing Large Quantum Boltzmann Machines as Generative AI Models for Dataset Balancing

Authors: Salvatore Sinno, Markus Bertl, Arati Sahoo, Bhavika Bhalgamiya, Thomas Groß, Nicholas Chancellor

Abstract: This study explores the implementation of large Quantum Restricted Boltzmann Machines (QRBMs), a key advancement in Quantum Machine Learning (QML), as generative models on D-Wave's Pegasus quantum hardware to address dataset imbalance in Intrusion Detection Systems (IDS). By leveraging Pegasus's enhanced connectivity and computational capabilities, a QRBM with 120 visible and 120 hidden units was… ▽ More This study explores the implementation of large Quantum Restricted Boltzmann Machines (QRBMs), a key advancement in Quantum Machine Learning (QML), as generative models on D-Wave's Pegasus quantum hardware to address dataset imbalance in Intrusion Detection Systems (IDS). By leveraging Pegasus's enhanced connectivity and computational capabilities, a QRBM with 120 visible and 120 hidden units was successfully embedded, surpassing the limitations of default embedding tools. The QRBM synthesized over 1.6 million attack samples, achieving a balanced dataset of over 4.2 million records. Comparative evaluations with traditional balancing methods, such as SMOTE and RandomOversampler, revealed that QRBMs produced higher-quality synthetic samples, significantly improving detection rates, precision, recall, and F1 score across diverse classifiers. The study underscores the scalability and efficiency of QRBMs, completing balancing tasks in milliseconds. These findings highlight the transformative potential of QML and QRBMs as next-generation tools in data preprocessing, offering robust solutions for complex computational challenges in modern information systems. △ Less

Submitted 5 February, 2025; originally announced February 2025.

Comments: accapted at IEEE International Conference on Next Generation Information System Engineering

arXiv:2501.11538 [pdf, other]

DenoMAE: A Multimodal Autoencoder for Denoising Modulation Signals

Authors: Atik Faysal, Taha Boushine, Mohammad Rostami, Reihaneh Gh. Roshan, Huaxia Wang, Nikhil Muralidhar, Avimanyu Sahoo, Yu-Dong Yao

Abstract: We propose Denoising Masked Autoencoder (Deno-MAE), a novel multimodal autoencoder framework for denoising modulation signals during pretraining. DenoMAE extends the concept of masked autoencoders by incorporating multiple input modalities, including noise as an explicit modality, to enhance cross-modal learning and improve denoising performance. The network is pre-trained using unlabeled noisy mo… ▽ More We propose Denoising Masked Autoencoder (Deno-MAE), a novel multimodal autoencoder framework for denoising modulation signals during pretraining. DenoMAE extends the concept of masked autoencoders by incorporating multiple input modalities, including noise as an explicit modality, to enhance cross-modal learning and improve denoising performance. The network is pre-trained using unlabeled noisy modulation signals and constellation diagrams, effectively learning to reconstruct their equivalent noiseless signals and diagrams. Deno-MAE achieves state-of-the-art accuracy in automatic modulation classification tasks with significantly fewer training samples, demonstrating a 10% reduction in unlabeled pretraining data and a 3% reduction in labeled fine-tuning data compared to existing approaches. Moreover, our model exhibits robust performance across varying signal-to-noise ratios (SNRs) and supports extrapolation on unseen lower SNRs. The results indicate that DenoMAE is an efficient, flexible, and data-efficient solution for denoising and classifying modulation signals in challenging noise-intensive environments. △ Less

Submitted 20 January, 2025; originally announced January 2025.

arXiv:2501.09051 [pdf, other]

Polyp detection in colonoscopy images using YOLOv11

Authors: Alok Ranjan Sahoo, Satya Sangram Sahoo, Pavan Chakraborty

Abstract: Colorectal cancer (CRC) is one of the most commonly diagnosed cancers all over the world. It starts as a polyp in the inner lining of the colon. To prevent CRC, early polyp detection is required. Colonosopy is used for the inspection of the colon. Generally, the images taken by the camera placed at the tip of the endoscope are analyzed by the experts manually. Various traditional machine learning… ▽ More Colorectal cancer (CRC) is one of the most commonly diagnosed cancers all over the world. It starts as a polyp in the inner lining of the colon. To prevent CRC, early polyp detection is required. Colonosopy is used for the inspection of the colon. Generally, the images taken by the camera placed at the tip of the endoscope are analyzed by the experts manually. Various traditional machine learning models have been used with the rise of machine learning. Recently, deep learning models have shown more effectiveness in polyp detection due to their superiority in generalizing and learning small features. These deep learning models for object detection can be segregated into two different types: single-stage and two-stage. Generally, two stage models have higher accuracy than single stage ones but the single stage models have low inference time. Hence, single stage models are easy to use for quick object detection. YOLO is one of the singlestage models used successfully for polyp detection. It has drawn the attention of researchers because of its lower inference time. The researchers have used Different versions of YOLO so far, and with each newer version, the accuracy of the model is increasing. This paper aims to see the effectiveness of the recently released YOLOv11 to detect polyp. We analyzed the performance for all five models of YOLOv11 (YOLO11n, YOLO11s, YOLO11m, YOLO11l, YOLO11x) with Kvasir dataset for the training and testing. Two different versions of the dataset were used. The first consisted of the original dataset, and the other was created using augmentation techniques. The performance of all the models with these two versions of the dataset have been analysed. △ Less

Submitted 15 January, 2025; originally announced January 2025.

arXiv:2412.10529 [pdf, other]

Solving the Inverse Alignment Problem for Efficient RLHF

Authors: Shambhavi Krishna, Aishwarya Sahoo

Abstract: Collecting high-quality preference datasets for reinforcement learning from human feedback (RLHF) is resource-intensive and challenging. As a result, researchers often train reward models on extensive offline datasets which aggregate diverse generation sources and scoring/alignment policies. We hypothesize that this aggregation has an averaging effect on reward model scores, which limits signal an… ▽ More Collecting high-quality preference datasets for reinforcement learning from human feedback (RLHF) is resource-intensive and challenging. As a result, researchers often train reward models on extensive offline datasets which aggregate diverse generation sources and scoring/alignment policies. We hypothesize that this aggregation has an averaging effect on reward model scores, which limits signal and impairs the alignment process. Inspired by the field of inverse RL, we define the 'inverse alignment problem' in language model training, where our objective is to optimize the critic's reward for a fixed actor and a fixed offline preference dataset. We hypothesize that solving the inverse alignment problem will improve reward model quality by providing clearer feedback on the policy's current behavior. To that end, we investigate whether repeatedly fine-tuning a reward model on subsets of the offline preference dataset aligned with a periodically frozen policy during RLHF improves upon vanilla RLHF. Our empirical results demonstrate that this approach facilitates superior alignment and faster convergence compared to using an unaligned or out-of-distribution reward model relative to the LLM policy. △ Less

Submitted 13 December, 2024; originally announced December 2024.

arXiv:2411.06263 [pdf, other]

Federated Split Learning for Human Activity Recognition with Differential Privacy

Authors: Josue Ndeko, Shaba Shaon, Aubrey Beal, Avimanyu Sahoo, Dinh C. Nguyen

Abstract: This paper proposes a novel intelligent human activity recognition (HAR) framework based on a new design of Federated Split Learning (FSL) with Differential Privacy (DP) over edge networks. Our FSL-DP framework leverages both accelerometer and gyroscope data, achieving significant improvements in HAR accuracy. The evaluation includes a detailed comparison between traditional Federated Learning (FL… ▽ More This paper proposes a novel intelligent human activity recognition (HAR) framework based on a new design of Federated Split Learning (FSL) with Differential Privacy (DP) over edge networks. Our FSL-DP framework leverages both accelerometer and gyroscope data, achieving significant improvements in HAR accuracy. The evaluation includes a detailed comparison between traditional Federated Learning (FL) and our FSL framework, showing that the FSL framework outperforms FL models in both accuracy and loss metrics. Additionally, we examine the privacy-performance trade-off under different data settings in the DP mechanism, highlighting the balance between privacy guarantees and model accuracy. The results also indicate that our FSL framework achieves faster communication times per training round compared to traditional FL, further emphasizing its efficiency and effectiveness. This work provides valuable insight and a novel framework which was tested on a real-life dataset. △ Less

Submitted 9 November, 2024; originally announced November 2024.

Comments: Accepted to IEEE Consumer Communications and Networking Conference (CCNC), 6 pages

arXiv:2403.19825 [pdf, other]

Sensing Performance of the IEEE 802.11bf Protocol and Its Impact on Data Communication

Authors: Anirudha Sahoo, Tanguy Ropitault, Steve Blandino, Nada Golmie

Abstract: Wi-Fi sensing has been used to detect and track movements in an environment, resulting in the emergence of several innovative applications. Wi-Fi sensing can detect movement and locate objects by analyzing variations in the Wi-Fi signal due to its interaction with moving objects. Until recently, Wi-Fi sensing has been primarily available through proprietary solutions, which has limited its adoptio… ▽ More Wi-Fi sensing has been used to detect and track movements in an environment, resulting in the emergence of several innovative applications. Wi-Fi sensing can detect movement and locate objects by analyzing variations in the Wi-Fi signal due to its interaction with moving objects. Until recently, Wi-Fi sensing has been primarily available through proprietary solutions, which has limited its adoption. However, the recent initiative by the IEEE to develop the IEEE 802.11bf standard promises to make the adoption of Wi-Fi sensing widespread. Although Wi-Fi sensing procedures in communication standards can be overhead, there is currently a lack of literature exploring the sensing performance of Wi-Fi sensing procedures specified in the IEEE 802.11bf standard and its impact on data communication. Therefore, this paper presents a comprehensive evaluation of the sensing performance of the IEEE 802.11bf protocol and its impact on data communication in different configurations. Our findings expose the limitations of specific configurations and pave the way to provide guidance on efficient operating configurations of an IEEE 802.11bf network. △ Less

Submitted 31 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.18456 [pdf, other]

Inverse kinematics learning of a continuum manipulator using limited real time data

Authors: Alok Ranjan Sahoo, Pavan Chakraborty

Abstract: Data driven control of a continuum manipulator requires a lot of data for training but generating sufficient amount of real time data is not cost efficient. Random actuation of the manipulator can also be unsafe sometimes. Meta learning has been used successfully to adapt to a new environment. Hence, this paper tries to solve the above mentioned problem using meta learning. We consider two cases f… ▽ More Data driven control of a continuum manipulator requires a lot of data for training but generating sufficient amount of real time data is not cost efficient. Random actuation of the manipulator can also be unsafe sometimes. Meta learning has been used successfully to adapt to a new environment. Hence, this paper tries to solve the above mentioned problem using meta learning. We consider two cases for that. First, this paper proposes a method to use simulation data for training the model using MAML(Model-Agnostic Meta-Learning). Then, it adapts to the real world using gradient steps. Secondly,if the simulation model is not available or difficult to formulate, then we propose a CGAN(Conditional Generative adversial network)-MAML based method for it. The model is trained using a small amount of real time data and augmented data for different loading conditions. Then, adaptation is done in the real environment. It has been found out from the experiments that the relative positioning error for both the cases are below 3%. The proposed models are experimentally verified on a real continuum manipulator. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.09819 [pdf, other]

An Admission Control Algorithm for Isochronous and Asynchronous Traffic in IEEE 802.11ad MAC

Authors: Anirudha Sahoo

Abstract: Due to availability of large amount of bandwidth in the 60 GHz band and support of contention-free channel access called Service Period (SP), the IEEE 802.11ad/ay Wi-Fi standard is well suited for low latency and high data rate applications. IEEE 802.11ad supports two types of SP user traffic: isochronous and asynchronous. These user traffic need guaranteed SP duration before their respective dead… ▽ More Due to availability of large amount of bandwidth in the 60 GHz band and support of contention-free channel access called Service Period (SP), the IEEE 802.11ad/ay Wi-Fi standard is well suited for low latency and high data rate applications. IEEE 802.11ad supports two types of SP user traffic: isochronous and asynchronous. These user traffic need guaranteed SP duration before their respective deadlines. Hence, admission control plays an important role in an IEEE 802.11ad system. In an earlier work, we studied admission control and scheduling of isochronous and asynchronous traffic in an IEEE 802.11ad system, but we assumed the asynchronous requests to be periodic to keep the algorithm simple. That assumption resulted in overallocation of resource and potential degradation of performance. In this paper, we present an admission control algorithm which does not make such assumption and yet still maintains a linear run time complexity and allocates resources to the requests in a proportional fair manner. We provide arguments to establish correctness of the algorithm in terms of guaranteeing SP allocation to the requests before their respective deadlines. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2402.18599 [pdf, other]

Meta-Task: A Method-Agnostic Framework for Learning to Regularize in Few-Shot Learning

Authors: Mohammad Rostami, Atik Faysal, Huaxia Wang, Avimanyu Sahoo

Abstract: Overfitting is a significant challenge in Few-Shot Learning (FSL), where models trained on small, variable datasets tend to memorize rather than generalize to unseen tasks. Regularization is crucial in FSL to prevent overfitting and enhance generalization performance. To address this issue, we introduce Meta-Task, a novel, method-agnostic framework that leverages both labeled and unlabeled data to… ▽ More Overfitting is a significant challenge in Few-Shot Learning (FSL), where models trained on small, variable datasets tend to memorize rather than generalize to unseen tasks. Regularization is crucial in FSL to prevent overfitting and enhance generalization performance. To address this issue, we introduce Meta-Task, a novel, method-agnostic framework that leverages both labeled and unlabeled data to enhance generalization through auxiliary tasks for regularization. Specifically, Meta-Task introduces a Task-Decoder, which is a simple example of the broader framework that refines hidden representations by reconstructing input images from embeddings, effectively mitigating overfitting. Our framework's method-agnostic design ensures its broad applicability across various FSL settings. We validate Meta-Task's effectiveness on standard benchmarks, including Mini-ImageNet, Tiered-ImageNet, and FC100, where it consistently improves existing state-of-the-art meta-learning techniques, demonstrating superior performance, faster convergence, reduced generalization error, and lower variance-all without extensive hyperparameter tuning. These results underline Meta-Task's practical applicability and efficiency in real-world, resource-constrained scenarios. △ Less

Submitted 26 February, 2025; v1 submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.10026 [pdf, other]

Hybrid CNN Bi-LSTM neural network for Hyperspectral image classification

Authors: Alok Ranjan Sahoo, Pavan Chakraborty

Abstract: Hyper spectral images have drawn the attention of the researchers for its complexity to classify. It has nonlinear relation between the materials and the spectral information provided by the HSI image. Deep learning methods have shown superiority in learning this nonlinearity in comparison to traditional machine learning methods. Use of 3-D CNN along with 2-D CNN have shown great success for learn… ▽ More Hyper spectral images have drawn the attention of the researchers for its complexity to classify. It has nonlinear relation between the materials and the spectral information provided by the HSI image. Deep learning methods have shown superiority in learning this nonlinearity in comparison to traditional machine learning methods. Use of 3-D CNN along with 2-D CNN have shown great success for learning spatial and spectral features. However, it uses comparatively large number of parameters. Moreover, it is not effective to learn inter layer information. Hence, this paper proposes a neural network combining 3-D CNN, 2-D CNN and Bi-LSTM. The performance of this model has been tested on Indian Pines(IP) University of Pavia(PU) and Salinas Scene(SA) data sets. The results are compared with the state of-the-art deep learning-based models. This model performed better in all three datasets. It could achieve 99.83, 99.98 and 100 percent accuracy using only 30 percent trainable parameters of the state-of-art model in IP, PU and SA datasets respectively. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2401.12671 [pdf, other]

Context Matters: Pushing the Boundaries of Open-Ended Answer Generation with Graph-Structured Knowledge Context

Authors: Somnath Banerjee, Amruit Sahoo, Sayan Layek, Avik Dutta, Rima Hazra, Animesh Mukherjee

Abstract: In the continuously advancing AI landscape, crafting context-rich and meaningful responses via Large Language Models (LLMs) is essential. Researchers are becoming more aware of the challenges that LLMs with fewer parameters encounter when trying to provide suitable answers to open-ended questions. To address these hurdles, the integration of cutting-edge strategies, augmentation of rich external d… ▽ More In the continuously advancing AI landscape, crafting context-rich and meaningful responses via Large Language Models (LLMs) is essential. Researchers are becoming more aware of the challenges that LLMs with fewer parameters encounter when trying to provide suitable answers to open-ended questions. To address these hurdles, the integration of cutting-edge strategies, augmentation of rich external domain knowledge to LLMs, offers significant improvements. This paper introduces a novel framework that combines graph-driven context retrieval in conjunction to knowledge graphs based enhancement, honing the proficiency of LLMs, especially in domain specific community question answering platforms like AskUbuntu, Unix, and ServerFault. We conduct experiments on various LLMs with different parameter sizes to evaluate their ability to ground knowledge and determine factual accuracy in answers to open-ended questions. Our methodology GraphContextGen consistently outperforms dominant text-based retrieval systems, demonstrating its robustness and adaptability to a larger number of use cases. This advancement highlights the importance of pairing context rich data retrieval with LLMs, offering a renewed approach to knowledge sourcing and generation in AI systems. We also show that, due to rich contextual data retrieval, the crucial entities, along with the generated answer, remain factually coherent with the gold answer. △ Less

Submitted 15 October, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

Comments: Accepted at EMNLP 2024

arXiv:2312.05626 [pdf, other]

Redefining Developer Assistance: Through Large Language Models in Software Ecosystem

Authors: Somnath Banerjee, Avik Dutta, Sayan Layek, Amruit Sahoo, Sam Conrad Joyce, Rima Hazra

Abstract: In this paper, we delve into the advancement of domain-specific Large Language Models (LLMs) with a focus on their application in software development. We introduce DevAssistLlama, a model developed through instruction tuning, to assist developers in processing software-related natural language queries. This model, a variant of instruction tuned LLM, is particularly adept at handling intricate tec… ▽ More In this paper, we delve into the advancement of domain-specific Large Language Models (LLMs) with a focus on their application in software development. We introduce DevAssistLlama, a model developed through instruction tuning, to assist developers in processing software-related natural language queries. This model, a variant of instruction tuned LLM, is particularly adept at handling intricate technical documentation, enhancing developer capability in software specific tasks. The creation of DevAssistLlama involved constructing an extensive instruction dataset from various software systems, enabling effective handling of Named Entity Recognition (NER), Relation Extraction (RE), and Link Prediction (LP). Our results demonstrate DevAssistLlama's superior capabilities in these tasks, in comparison with other models including ChatGPT. This research not only highlights the potential of specialized LLMs in software development also the pioneer LLM for this domain. △ Less

Submitted 15 March, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

Comments: Under review

arXiv:2310.16314 [pdf, other]

Understanding Code Semantics: An Evaluation of Transformer Models in Summarization

Authors: Debanjan Mondal, Abhilasha Lodha, Ankita Sahoo, Beena Kumari

Abstract: This paper delves into the intricacies of code summarization using advanced transformer-based language models. Through empirical studies, we evaluate the efficacy of code summarization by altering function and variable names to explore whether models truly understand code semantics or merely rely on textual cues. We have also introduced adversaries like dead code and commented code across three pr… ▽ More This paper delves into the intricacies of code summarization using advanced transformer-based language models. Through empirical studies, we evaluate the efficacy of code summarization by altering function and variable names to explore whether models truly understand code semantics or merely rely on textual cues. We have also introduced adversaries like dead code and commented code across three programming languages (Python, Javascript, and Java) to further scrutinize the model's understanding. Ultimately, our research aims to offer valuable insights into the inner workings of transformer-based LMs, enhancing their ability to understand code and contributing to more efficient software development practices and maintenance workflows. △ Less

Submitted 26 October, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: Accepted at GenBench, EMNLP 2023. All authors are co-first authors and have equal contributions

arXiv:2310.14239 [pdf, other]

Guidance system for Visually Impaired Persons using Deep Learning and Optical flow

Authors: Shwetang Dubey, Alok Ranjan Sahoo, Pavan Chakraborty

Abstract: Visually impaired persons find it difficult to know about their surroundings while walking on a road. Walking sticks used by them can only give them information about the obstacles in the stick's proximity. Moreover, it is mostly effective in static or very slow-paced environments. Hence, this paper introduces a method to guide them in a busy street. To create such a system it is very important to… ▽ More Visually impaired persons find it difficult to know about their surroundings while walking on a road. Walking sticks used by them can only give them information about the obstacles in the stick's proximity. Moreover, it is mostly effective in static or very slow-paced environments. Hence, this paper introduces a method to guide them in a busy street. To create such a system it is very important to know about the approaching object and its direction of approach. To achieve this objective we created a method in which the image frame received from the video is divided into three parts i.e. center, left, and right to know the direction of approach of the approaching object. Object detection is done using YOLOv3. Lucas Kanade's optical flow estimation method is used for the optical flow estimation and Depth-net is used for depth estimation. Using the depth information, object motion trajectory, and object category information, the model provides necessary information/warning to the person. This model has been tested in the real world to show its effectiveness. △ Less

Submitted 22 October, 2023; originally announced October 2023.

arXiv:2310.13085 [pdf, other]

Unsupervised Representation Learning to Aid Semi-Supervised Meta Learning

Authors: Atik Faysal, Mohammad Rostami, Huaxia Wang, Avimanyu Sahoo, Ryan Antle

Abstract: Few-shot learning or meta-learning leverages the data scarcity problem in machine learning. Traditionally, training data requires a multitude of samples and labeling for supervised learning. To address this issue, we propose a one-shot unsupervised meta-learning to learn the latent representation of the training samples. We use augmented samples as the query set during the training phase of the un… ▽ More Few-shot learning or meta-learning leverages the data scarcity problem in machine learning. Traditionally, training data requires a multitude of samples and labeling for supervised learning. To address this issue, we propose a one-shot unsupervised meta-learning to learn the latent representation of the training samples. We use augmented samples as the query set during the training phase of the unsupervised meta-learning. A temperature-scaled cross-entropy loss is used in the inner loop of meta-learning to prevent overfitting during unsupervised learning. The learned parameters from this step are applied to the targeted supervised meta-learning in a transfer-learning fashion for initialization and fast adaptation with improved accuracy. The proposed method is model agnostic and can aid any meta-learning model to improve accuracy. We use model agnostic meta-learning (MAML) and relation network (RN) on Omniglot and mini-Imagenet datasets to demonstrate the performance of the proposed method. Furthermore, a meta-learning model with the proposed initialization can achieve satisfactory accuracy with significantly fewer training samples. △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2309.05035 [pdf, other]

Duplicate Question Retrieval and Confirmation Time Prediction in Software Communities

Authors: Rima Hazra, Debanjan Saha, Amruit Sahoo, Somnath Banerjee, Animesh Mukherjee

Abstract: Community Question Answering (CQA) in different domains is growing at a large scale because of the availability of several platforms and huge shareable information among users. With the rapid growth of such online platforms, a massive amount of archived data makes it difficult for moderators to retrieve possible duplicates for a new question and identify and confirm existing question pairs as dupl… ▽ More Community Question Answering (CQA) in different domains is growing at a large scale because of the availability of several platforms and huge shareable information among users. With the rapid growth of such online platforms, a massive amount of archived data makes it difficult for moderators to retrieve possible duplicates for a new question and identify and confirm existing question pairs as duplicates at the right time. This problem is even more critical in CQAs corresponding to large software systems like askubuntu where moderators need to be experts to comprehend something as a duplicate. Note that the prime challenge in such CQA platforms is that the moderators are themselves experts and are therefore usually extremely busy with their time being extraordinarily expensive. To facilitate the task of the moderators, in this work, we have tackled two significant issues for the askubuntu CQA platform: (1) retrieval of duplicate questions given a new question and (2) duplicate question confirmation time prediction. In the first task, we focus on retrieving duplicate questions from a question pool for a particular newly posted question. In the second task, we solve a regression problem to rank a pair of questions that could potentially take a long time to get confirmed as duplicates. For duplicate question retrieval, we propose a Siamese neural network based approach by exploiting both text and network-based features, which outperforms several state-of-the-art baseline techniques. Our method outperforms DupPredictor and DUPE by 5% and 7% respectively. For duplicate confirmation time prediction, we have used both the standard machine learning models and neural network along with the text and graph-based features. We obtain Spearman's rank correlation of 0.20 and 0.213 (statistically significant) for text and graph based features respectively. △ Less

Submitted 5 March, 2024; v1 submitted 10 September, 2023; originally announced September 2023.

Comments: Full paper accepted at ASONAM 2023: The 2023 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

arXiv:2211.05829 [pdf, other]

A Machine Learning system to monitor student progress in educational institutes

Authors: Bibhuprasad Mahakud, Bibhuti Parida, Ipsit Panda, Souvik Maity, Arpita Sahoo, Reeta Sharma

Abstract: In order to track and comprehend the academic achievement of students, both private and public educational institutions devote a significant amount of resources and labour. One of the difficult issues that institutes deal with on a regular basis is understanding the exam shortcomings of students. The performance of a student is influenced by a variety of factors, including attendance, attentivenes… ▽ More In order to track and comprehend the academic achievement of students, both private and public educational institutions devote a significant amount of resources and labour. One of the difficult issues that institutes deal with on a regular basis is understanding the exam shortcomings of students. The performance of a student is influenced by a variety of factors, including attendance, attentiveness in class, understanding of concepts taught, the teachers ability to deliver the material effectively, timely completion of home assignments, and the concern of parents and teachers for guiding the student through the learning process. We propose a data driven approach that makes use of Machine Learning techniques to generate a classifier called credit score that helps to comprehend the learning journeys of students and identify activities that lead to subpar performances. This would make it easier for educators and institute management to create guidelines for system development to increase productivity. The proposal to use credit score as progress indicator is well suited to be used in a Learning Management System. In this article, we demonstrate the proof of the concept under simplified assumptions using simulated data. △ Less

Submitted 2 November, 2022; originally announced November 2022.

Comments: 9 pages, 7 figures

arXiv:2110.15128 [pdf, other]

Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing

Authors: Aadarsh Sahoo, Rutav Shah, Rameswar Panda, Kate Saenko, Abir Das

Abstract: Unsupervised domain adaptation which aims to adapt models trained on a labeled source domain to a completely unlabeled target domain has attracted much attention in recent years. While many domain adaptation techniques have been proposed for images, the problem of unsupervised domain adaptation in videos remains largely underexplored. In this paper, we introduce Contrast and Mix (CoMix), a new con… ▽ More Unsupervised domain adaptation which aims to adapt models trained on a labeled source domain to a completely unlabeled target domain has attracted much attention in recent years. While many domain adaptation techniques have been proposed for images, the problem of unsupervised domain adaptation in videos remains largely underexplored. In this paper, we introduce Contrast and Mix (CoMix), a new contrastive learning framework that aims to learn discriminative invariant feature representations for unsupervised video domain adaptation. First, unlike existing methods that rely on adversarial learning for feature alignment, we utilize temporal contrastive learning to bridge the domain gap by maximizing the similarity between encoded representations of an unlabeled video at two different speeds as well as minimizing the similarity between different videos played at different speeds. Second, we propose a novel extension to the temporal contrastive loss by using background mixing that allows additional positives per anchor, thus adapting contrastive learning to leverage action semantics shared across both domains. Moreover, we also integrate a supervised contrastive learning objective using target pseudo-labels to enhance discriminability of the latent space for video domain adaptation. Extensive experiments on several benchmark datasets demonstrate the superiority of our proposed approach over state-of-the-art methods. Project page: https://cvir.github.io/projects/comix △ Less

Submitted 28 October, 2021; originally announced October 2021.

Comments: Accepted to NeurIPS 2021. Project page: https://cvir.github.io/projects/comix

arXiv:2012.03358 [pdf, other]

Select, Label, and Mix: Learning Discriminative Invariant Feature Representations for Partial Domain Adaptation

Authors: Aadarsh Sahoo, Rameswar Panda, Rogerio Feris, Kate Saenko, Abir Das

Abstract: Partial domain adaptation which assumes that the unknown target label space is a subset of the source label space has attracted much attention in computer vision. Despite recent progress, existing methods often suffer from three key problems: negative transfer, lack of discriminability, and domain invariance in the latent space. To alleviate the above issues, we develop a novel 'Select, Label, and… ▽ More Partial domain adaptation which assumes that the unknown target label space is a subset of the source label space has attracted much attention in computer vision. Despite recent progress, existing methods often suffer from three key problems: negative transfer, lack of discriminability, and domain invariance in the latent space. To alleviate the above issues, we develop a novel 'Select, Label, and Mix' (SLM) framework that aims to learn discriminative invariant feature representations for partial domain adaptation. First, we present an efficient "select" module that automatically filters out the outlier source samples to avoid negative transfer while aligning distributions across both domains. Second, the "label" module iteratively trains the classifier using both the labeled source domain data and the generated pseudo-labels for the target domain to enhance the discriminability of the latent space. Finally, the "mix" module utilizes domain mixup regularization jointly with the other two modules to explore more intrinsic structures across domains leading to a domain-invariant latent space for partial domain adaptation. Extensive experiments on several benchmark datasets for partial domain adaptation demonstrate the superiority of our proposed framework over state-of-the-art methods. △ Less

Submitted 3 January, 2023; v1 submitted 6 December, 2020; originally announced December 2020.

Comments: Accepted to WACV 2023. Project page: https://cvir.github.io/projects/slm.html

arXiv:2008.05524 [pdf, other]

Mitigating Dataset Imbalance via Joint Generation and Classification

Authors: Aadarsh Sahoo, Ankit Singh, Rameswar Panda, Rogerio Feris, Abir Das

Abstract: Supervised deep learning methods are enjoying enormous success in many practical applications of computer vision and have the potential to revolutionize robotics. However, the marked performance degradation to biases and imbalanced data questions the reliability of these methods. In this work we address these questions from the perspective of dataset imbalance resulting out of severe under-represe… ▽ More Supervised deep learning methods are enjoying enormous success in many practical applications of computer vision and have the potential to revolutionize robotics. However, the marked performance degradation to biases and imbalanced data questions the reliability of these methods. In this work we address these questions from the perspective of dataset imbalance resulting out of severe under-representation of annotated training data for certain classes and its effect on both deep classification and generation methods. We introduce a joint dataset repairment strategy by combining a neural network classifier with Generative Adversarial Networks (GAN) that makes up for the deficit of training examples from the under-representated class by producing additional training examples. We show that the combined training helps to improve the robustness of both the classifier and the GAN against severe class imbalance. We show the effectiveness of our proposed approach on three very different datasets with different degrees of imbalance in them. The code is available at https://github.com/AadSah/ImbalanceCycleGAN . △ Less

Submitted 12 August, 2020; originally announced August 2020.

Comments: Accepted in ECCV2020 Workshop on Imbalance Problems in Computer Vision (IPCV)

arXiv:2004.11663 [pdf, other]

doi 10.1145/3408995

Retrofitting Parallelism onto OCaml

Authors: KC Sivaramakrishnan, Stephen Dolan, Leo White, Sadiq Jaffer, Tom Kelly, Anmol Sahoo, Sudha Parimala, Atul Dhiman, Anil Madhavapeddy

Abstract: OCaml is an industrial-strength, multi-paradigm programming language, widely used in industry and academia. OCaml is also one of the few modern managed system programming languages to lack support for shared memory parallel programming. This paper describes the design, a full-fledged implementation and evaluation of a mostly-concurrent garbage collector (GC) for the multicore extension of the OCam… ▽ More OCaml is an industrial-strength, multi-paradigm programming language, widely used in industry and academia. OCaml is also one of the few modern managed system programming languages to lack support for shared memory parallel programming. This paper describes the design, a full-fledged implementation and evaluation of a mostly-concurrent garbage collector (GC) for the multicore extension of the OCaml programming language. Given that we propose to add parallelism to a widely used programming language with millions of lines of existing code, we face the challenge of maintaining backwards compatibility--not just in terms of the language features but also the performance of single-threaded code running with the new GC. To this end, the paper presents a series of novel techniques and demonstrates that the new GC strikes a balance between performance and feature backwards compatibility for sequential programs and scales admirably on modern multicore processors. △ Less

Submitted 2 July, 2020; v1 submitted 24 April, 2020; originally announced April 2020.

Comments: Accepted to ICFP 2020

ACM Class: D.3.4

arXiv:1901.01153 [pdf, other]

Demystifying Multi-Faceted Video Summarization: Tradeoff Between Diversity,Representation, Coverage and Importance

Authors: Vishal Kaushal, Rishabh Iyer, Khoshrav Doctor, Anurag Sahoo, Pratik Dubal, Suraj Kothawade, Rohan Mahadev, Kunal Dargan, Ganesh Ramakrishnan

Abstract: This paper addresses automatic summarization of videos in a unified manner. In particular, we propose a framework for multi-faceted summarization for extractive, query base and entity summarization (summarization at the level of entities like objects, scenes, humans and faces in the video). We investigate several summarization models which capture notions of diversity, coverage, representation and… ▽ More This paper addresses automatic summarization of videos in a unified manner. In particular, we propose a framework for multi-faceted summarization for extractive, query base and entity summarization (summarization at the level of entities like objects, scenes, humans and faces in the video). We investigate several summarization models which capture notions of diversity, coverage, representation and importance, and argue the utility of these different models depending on the application. While most of the prior work on submodular summarization approaches has focused oncombining several models and learning weighted mixtures, we focus on the explainability of different models and featurizations, and how they apply to different domains. We also provide implementation details on summarization systems and the different modalities involved. We hope that the study from this paper will give insights into practitioners to appropriately choose the right summarization models for the problems at hand. △ Less

Submitted 3 January, 2019; originally announced January 2019.

Comments: Accepted to WACV 2019. arXiv admin note: substantial text overlap with arXiv:1704.01466, arXiv:1809.08846

arXiv:1805.11191 [pdf, other]

Learning From Less Data: Diversified Subset Selection and Active Learning in Image Classification Tasks

Authors: Vishal Kaushal, Anurag Sahoo, Khoshrav Doctor, Narasimha Raju, Suyash Shetty, Pankaj Singh, Rishabh Iyer, Ganesh Ramakrishnan

Abstract: Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry and pose the challenges of not having adequate computing resources and of high costs involved in human labeling efforts. Training data subset selection and active learning techniques have been proposed as possible solutions to these challenges respectively. A special class of subset selection f… ▽ More Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry and pose the challenges of not having adequate computing resources and of high costs involved in human labeling efforts. Training data subset selection and active learning techniques have been proposed as possible solutions to these challenges respectively. A special class of subset selection functions naturally model notions of diversity, coverage and representation and they can be used to eliminate redundancy and thus lend themselves well for training data subset selection. They can also help improve the efficiency of active learning in further reducing human labeling efforts by selecting a subset of the examples obtained using the conventional uncertainty sampling based techniques. In this work we empirically demonstrate the effectiveness of two diversity models, namely the Facility-Location and Disparity-Min models for training-data subset selection and reducing labeling effort. We do this for a variety of computer vision tasks including Gender Recognition, Scene Recognition and Object Recognition. Our results show that subset selection done in the right way can add 2-3% in accuracy on existing baselines, particularly in the case of less training data. This allows the training of complex machine learning models (like Convolutional Neural Networks) with much less training data while incurring minimal performance loss. △ Less

Submitted 28 May, 2018; originally announced May 2018.

Comments: 15 pages, 7 figures

arXiv:1704.01466 [pdf, other]

A Unified Multi-Faceted Video Summarization System

Authors: Anurag Sahoo, Vishal Kaushal, Khoshrav Doctor, Suyash Shetty, Rishabh Iyer, Ganesh Ramakrishnan

Abstract: This paper addresses automatic summarization and search in visual data comprising of videos, live streams and image collections in a unified manner. In particular, we propose a framework for multi-faceted summarization which extracts key-frames (image summaries), skims (video summaries) and entity summaries (summarization at the level of entities like objects, scenes, humans and faces in the video… ▽ More This paper addresses automatic summarization and search in visual data comprising of videos, live streams and image collections in a unified manner. In particular, we propose a framework for multi-faceted summarization which extracts key-frames (image summaries), skims (video summaries) and entity summaries (summarization at the level of entities like objects, scenes, humans and faces in the video). The user can either view these as extractive summarization, or query focused summarization. Our approach first pre-processes the video or image collection once, to extract all important visual features, following which we provide an interactive mechanism to the user to summarize the video based on their choice. We investigate several diversity, coverage and representation models for all these problems, and argue the utility of these different mod- els depending on the application. While most of the prior work on submodular summarization approaches has focused on combining several models and learning weighted mixtures, we focus on the explain-ability of different the diversity, coverage and representation models and their scalability. Most importantly, we also show that we can summarize hours of video data in a few seconds, and our system allows the user to generate summaries of various lengths and types interactively on the fly. △ Less

Submitted 4 April, 2017; originally announced April 2017.

Comments: 18 pages, 11 Figures

arXiv:1401.0875 [pdf]

Determining the Possibilities and Certainties in Network Participation for MANETS

Authors: Anoop J. Sahoo, Md. Amir Khusru Akhtar

Abstract: A mobile ad hoc network is a self organized cooperative network that works without any permanent infrastructure. This infrastructure less design makes it complex compared to other wireless networks. Lot of attacks and misbehavior obstruct the growth and implementation. The majority of attacks and misbehavior can be handled by existing protocols. But these protocols reduce the total strength of nod… ▽ More A mobile ad hoc network is a self organized cooperative network that works without any permanent infrastructure. This infrastructure less design makes it complex compared to other wireless networks. Lot of attacks and misbehavior obstruct the growth and implementation. The majority of attacks and misbehavior can be handled by existing protocols. But these protocols reduce the total strength of nodes in a network because they isolate nodes from network participation having lesser reputation value. To cope with this problem we have presented the Possibility and Certainty model. This model uses reputation value to determine the possibilities and certainties in network participation. The proposed model classifies nodes into three classes such as certain or HIGH grade possible or MED grade and not possible or LOW grade. Choosing HIGH grade nodes in network activities improves the Packet Delivery Ratio which enhances the throughput of the MANET. On the other hand when node strength is poor we choose MED grade nodes for network activities. Thus the proposed model allows communication in the worst scenario with the possibility of success. It protects a network from misbehavior by isolating LOW grade nodes from routing paths. △ Less

Submitted 5 January, 2014; originally announced January 2014.

Comments: 10 Pages. International Journal of Computer Engineering and Applications,2013

Showing 1–30 of 30 results for author: Sahoo, A