-
REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback
Authors:
Aniruddha Roy,
Pretam Ray,
Abhilash Nandy,
Somak Aditya,
Pawan Goyal
Abstract:
Instruction-based Large Language Models (LLMs) have proven effective in numerous few-shot or zero-shot Natural Language Processing (NLP) tasks. However, creating human-annotated instruction data is time-consuming, expensive, and often limited in quantity and task diversity. Previous research endeavors have attempted to address this challenge by proposing frameworks capable of generating instructio…
▽ More
Instruction-based Large Language Models (LLMs) have proven effective in numerous few-shot or zero-shot Natural Language Processing (NLP) tasks. However, creating human-annotated instruction data is time-consuming, expensive, and often limited in quantity and task diversity. Previous research endeavors have attempted to address this challenge by proposing frameworks capable of generating instructions in a semi-automated and task-agnostic manner directly from the model itself. Many of these efforts have relied on large API-only parameter-based models such as GPT-3.5 (175B), which are expensive, and subject to limits on a number of queries. This paper explores the performance of three open-source small LLMs such as LLaMA 2-7B, LLama 2-13B, and Mistral 7B, using a semi-automated framework, thereby reducing human intervention, effort, and cost required to generate an instruction dataset for fine-tuning LLMs. Furthermore, we demonstrate that incorporating a Reinforcement Learning (RL) based training algorithm into this LLMs-based framework leads to further enhancements. Our evaluation of the dataset reveals that these RL-based frameworks achieve a substantial improvements in 63-66% of the tasks compared to previous approaches.
△ Less
Submitted 10 May, 2025;
originally announced May 2025.
-
Channel Coding meets Sequence Design via Machine Learning for Integrated Sensing and Communications
Authors:
Sundar Aditya,
Morteza Varasteh,
Bruno Clerckx
Abstract:
For integrated sensing and communications, an intriguing question is whether information-bearing channel-coded signals can be reused for sensing - specifically ranging. This question forces the hitherto non-overlapping fields of channel coding (communications) and sequence design (sensing) to intersect by motivating the design of error-correcting codes that have good autocorrelation properties. In…
▽ More
For integrated sensing and communications, an intriguing question is whether information-bearing channel-coded signals can be reused for sensing - specifically ranging. This question forces the hitherto non-overlapping fields of channel coding (communications) and sequence design (sensing) to intersect by motivating the design of error-correcting codes that have good autocorrelation properties. In this letter, we demonstrate how machine learning (ML) is well-suited for designing such codes, especially for short block lengths. As an example, for rate 1/2 and block length 32, we show that even an unsophisticated ML code has a bit-error rate performance similar to a Polar code with the same parameters, but with autocorrelation sidelobes 24dB lower. While a length-32 Zadoff-Chu (ZC) sequence has zero autocorrelation sidelobes, there are only 16 such sequences and hence, a 1/2 code rate cannot be realized by using ZC sequences as codewords. Hence, ML bridges channel coding and sequence design by trading off an ideal autocorrelation function for a large (i.e., rate-dependent) codebook size.
△ Less
Submitted 29 March, 2025;
originally announced March 2025.
-
Elastic Motion Policy: An Adaptive Dynamical System for Robust and Efficient One-Shot Imitation Learning
Authors:
Tianyu Li,
Sunan Sun,
Shubhodeep Shiv Aditya,
Nadia Figueroa
Abstract:
Behavior cloning (BC) has become a staple imitation learning paradigm in robotics due to its ease of teaching robots complex skills directly from expert demonstrations. However, BC suffers from an inherent generalization issue. To solve this, the status quo solution is to gather more data. Yet, regardless of how much training data is available, out-of-distribution performance is still sub-par, lac…
▽ More
Behavior cloning (BC) has become a staple imitation learning paradigm in robotics due to its ease of teaching robots complex skills directly from expert demonstrations. However, BC suffers from an inherent generalization issue. To solve this, the status quo solution is to gather more data. Yet, regardless of how much training data is available, out-of-distribution performance is still sub-par, lacks any formal guarantee of convergence and success, and is incapable of allowing and recovering from physical interactions with humans. These are critical flaws when robots are deployed in ever-changing human-centric environments. Thus, we propose Elastic Motion Policy (EMP), a one-shot imitation learning framework that allows robots to adjust their behavior based on the scene change while respecting the task specification. Trained from a single demonstration, EMP follows the dynamical systems paradigm where motion planning and control are governed by first-order differential equations with convergence guarantees. We leverage Laplacian editing in full end-effector space, $\mathbb{R}^3\times SO(3)$, and online convex learning of Lyapunov functions, to adapt EMP online to new contexts, avoiding the need to collect new demonstrations. We extensively validate our framework in real robot experiments, demonstrating its robust and efficient performance in dynamic environments, with obstacle avoidance and multi-step task capabilities. Project Website: https://elastic-motion-policy.github.io/EMP/
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text Generation
Authors:
Saurabh Kumar Pandey,
Sachin Vashistha,
Debrup Das,
Somak Aditya,
Monojit Choudhury
Abstract:
To understand the complexity of sequence classification tasks, Hahn et al. (2021) proposed sensitivity as the number of disjoint subsets of the input sequence that can each be individually changed to change the output. Though effective, calculating sensitivity at scale using this framework is costly because of exponential time complexity. Therefore, we introduce a Sensitivity-based Multi-Armed Ban…
▽ More
To understand the complexity of sequence classification tasks, Hahn et al. (2021) proposed sensitivity as the number of disjoint subsets of the input sequence that can each be individually changed to change the output. Though effective, calculating sensitivity at scale using this framework is costly because of exponential time complexity. Therefore, we introduce a Sensitivity-based Multi-Armed Bandit framework (SMAB), which provides a scalable approach for calculating word-level local (sentence-level) and global (aggregated) sensitivities concerning an underlying text classifier for any dataset. We establish the effectiveness of our approach through various applications. We perform a case study on CHECKLIST generated sentiment analysis dataset where we show that our algorithm indeed captures intuitively high and low-sensitive words. Through experiments on multiple tasks and languages, we show that sensitivity can serve as a proxy for accuracy in the absence of gold data. Lastly, we show that guiding perturbation prompts using sensitivity values in adversarial example generation improves attack success rate by 15.58%, whereas using sensitivity as an additional reward in adversarial paraphrase generation gives a 12.00% improvement over SOTA approaches. Warning: Contains potentially offensive content.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments
Authors:
Sourjyadip Ray,
Kushal Gupta,
Soumi Kundu,
Payal Arvind Kasat,
Somak Aditya,
Pawan Goyal
Abstract:
The global shortage of healthcare workers has demanded the development of smart healthcare assistants, which can help monitor and alert healthcare workers when necessary. We examine the healthcare knowledge of existing Large Vision Language Models (LVLMs) via the Visual Question Answering (VQA) task in hospital settings through expert annotated open-ended questions. We introduce the Emergency Room…
▽ More
The global shortage of healthcare workers has demanded the development of smart healthcare assistants, which can help monitor and alert healthcare workers when necessary. We examine the healthcare knowledge of existing Large Vision Language Models (LVLMs) via the Visual Question Answering (VQA) task in hospital settings through expert annotated open-ended questions. We introduce the Emergency Room Visual Question Answering (ERVQA) dataset, consisting of <image, question, answer> triplets covering diverse emergency room scenarios, a seminal benchmark for LVLMs. By developing a detailed error taxonomy and analyzing answer trends, we reveal the nuanced nature of the task. We benchmark state-of-the-art open-source and closed LVLMs using traditional and adapted VQA metrics: Entailment Score and CLIPScore Confidence. Analyzing errors across models, we infer trends based on properties like decoder type, model size, and in-context examples. Our findings suggest the ERVQA dataset presents a highly complex task, highlighting the need for specialized, domain-specific solutions.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
ECHO: Environmental Sound Classification with Hierarchical Ontology-guided Semi-Supervised Learning
Authors:
Pranav Gupta,
Raunak Sharma,
Rashmi Kumari,
Sri Krishna Aditya,
Shwetank Choudhary,
Sumit Kumar,
Kanchana M,
Thilagavathy R
Abstract:
Environment Sound Classification has been a well-studied research problem in the field of signal processing and up till now more focus has been laid on fully supervised approaches. Over the last few years, focus has moved towards semi-supervised methods which concentrate on the utilization of unlabeled data, and self-supervised methods which learn the intermediate representation through pretext ta…
▽ More
Environment Sound Classification has been a well-studied research problem in the field of signal processing and up till now more focus has been laid on fully supervised approaches. Over the last few years, focus has moved towards semi-supervised methods which concentrate on the utilization of unlabeled data, and self-supervised methods which learn the intermediate representation through pretext task or contrastive learning. However, both approaches require a vast amount of unlabelled data to improve performance. In this work, we propose a novel framework called Environmental Sound Classification with Hierarchical Ontology-guided semi-supervised Learning (ECHO) that utilizes label ontology-based hierarchy to learn semantic representation by defining a novel pretext task. In the pretext task, the model tries to predict coarse labels defined by the Large Language Model (LLM) based on ground truth label ontology. The trained model is further fine-tuned in a supervised way to predict the actual task. Our proposed novel semi-supervised framework achieves an accuracy improvement in the range of 1\% to 8\% over baseline systems across three datasets namely UrbanSound8K, ESC-10, and ESC-50.
△ Less
Submitted 21 September, 2024;
originally announced September 2024.
-
Rate-Splitting Multiple Access for Overloaded Multi-group Multicast: A First Experimental Study
Authors:
Xinze Lyu,
Sundar Aditya,
Bruno Clerckx
Abstract:
Multi-group multicast (MGM) is an increasingly important form of multi-user wireless communications with several potential applications, such as video streaming, federated learning, safety-critical vehicular communications, etc. Rate-Splitting Multiple Access (RSMA) is a powerful interference management technique that can, in principle, achieve higher data rates and greater fairness for all types…
▽ More
Multi-group multicast (MGM) is an increasingly important form of multi-user wireless communications with several potential applications, such as video streaming, federated learning, safety-critical vehicular communications, etc. Rate-Splitting Multiple Access (RSMA) is a powerful interference management technique that can, in principle, achieve higher data rates and greater fairness for all types of multi-user wireless communications, including MGM. This paper presents the first-ever experimental evaluation of RSMA-based MGM, as well as the first-ever three-way comparison of RSMA-based, Space Divison Multiple Access (SDMA)-based and Non-Orthogonal Multiple Access (NOMA)-based MGM. Using a measurement setup involving a two-antenna transmitter and two groups of two single-antenna users per group, we consider the problem of realizing throughput (max-min) fairness across groups for each of three multiple access schemes, over nine experimental cases in a line-of-sight environment capturing varying levels of pathloss difference and channel correlation across the groups. Over these cases, we observe that RSMA-based MGM achieves fairness at a higher throughput for each group than SDMA- and NOMA-based MGM. These findings validate RSMA-based MGM's promised gains from the theoretical literature.
△ Less
Submitted 26 September, 2024; v1 submitted 21 June, 2024;
originally announced June 2024.
-
[WIP] Jailbreak Paradox: The Achilles' Heel of LLMs
Authors:
Abhinav Rao,
Monojit Choudhury,
Somak Aditya
Abstract:
We introduce two paradoxes concerning jailbreak of foundation models: First, it is impossible to construct a perfect jailbreak classifier, and second, a weaker model cannot consistently detect whether a stronger (in a pareto-dominant sense) model is jailbroken or not. We provide formal proofs for these paradoxes and a short case study on Llama and GPT4-o to demonstrate this. We discuss broader the…
▽ More
We introduce two paradoxes concerning jailbreak of foundation models: First, it is impossible to construct a perfect jailbreak classifier, and second, a weaker model cannot consistently detect whether a stronger (in a pareto-dominant sense) model is jailbroken or not. We provide formal proofs for these paradoxes and a short case study on Llama and GPT4-o to demonstrate this. We discuss broader theoretical and practical repercussions of these results.
△ Less
Submitted 20 June, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning
Authors:
Debrup Das,
Debopriyo Banerjee,
Somak Aditya,
Ashish Kulkarni
Abstract:
Tool-augmented Large Language Models (TALMs) are known to enhance the skillset of large language models (LLMs), thereby, leading to their improved reasoning abilities across many tasks. While, TALMs have been successfully employed in different question-answering benchmarks, their efficacy on complex mathematical reasoning benchmarks, and the potential complementary benefits offered by tools for kn…
▽ More
Tool-augmented Large Language Models (TALMs) are known to enhance the skillset of large language models (LLMs), thereby, leading to their improved reasoning abilities across many tasks. While, TALMs have been successfully employed in different question-answering benchmarks, their efficacy on complex mathematical reasoning benchmarks, and the potential complementary benefits offered by tools for knowledge retrieval and mathematical equation solving are open research questions. In this work, we present MathSensei, a tool-augmented large language model for mathematical reasoning. We study the complementary benefits of the tools - knowledge retriever (Bing Web Search), program generator + executor (Python), and symbolic equation solver (Wolfram-Alpha API) through evaluations on mathematical reasoning datasets. We perform exhaustive ablations on MATH, a popular dataset for evaluating mathematical reasoning on diverse mathematical disciplines. We also conduct experiments involving well-known tool planners to study the impact of tool sequencing on the model performance. MathSensei achieves 13.5% better accuracy over gpt-3.5-turbo with Chain-of-Thought on the MATH dataset. We further observe that TALMs are not as effective for simpler math word problems (in GSM-8K), and the benefit increases as the complexity and required knowledge increases (progressively over AQuA, MMLU-Math, and higher level complex questions in MATH). The code and data are available at https://github.com/Debrup-61/MathSensei.
△ Less
Submitted 3 April, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
TEXT2AFFORD: Probing Object Affordance Prediction abilities of Language Models solely from Text
Authors:
Sayantan Adak,
Daivik Agrawal,
Animesh Mukherjee,
Somak Aditya
Abstract:
We investigate the knowledge of object affordances in pre-trained language models (LMs) and pre-trained Vision-Language models (VLMs). A growing body of literature shows that PTLMs fail inconsistently and non-intuitively, demonstrating a lack of reasoning and grounding. To take a first step toward quantifying the effect of grounding (or lack thereof), we curate a novel and comprehensive dataset of…
▽ More
We investigate the knowledge of object affordances in pre-trained language models (LMs) and pre-trained Vision-Language models (VLMs). A growing body of literature shows that PTLMs fail inconsistently and non-intuitively, demonstrating a lack of reasoning and grounding. To take a first step toward quantifying the effect of grounding (or lack thereof), we curate a novel and comprehensive dataset of object affordances -- Text2Afford, characterized by 15 affordance classes. Unlike affordance datasets collected in vision and language domains, we annotate in-the-wild sentences with objects and affordances. Experimental results reveal that PTLMs exhibit limited reasoning abilities when it comes to uncommon object affordances. We also observe that pre-trained VLMs do not necessarily capture object affordances effectively. Through few-shot fine-tuning, we demonstrate improvement in affordance knowledge in PTLMs and VLMs. Our research contributes a novel dataset for language grounding tasks, and presents insights into LM capabilities, advancing the understanding of object affordances. Codes and data are available at https://github.com/sayantan11995/Affordance
△ Less
Submitted 23 July, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs
Authors:
Haritz Puerto,
Martin Tutek,
Somak Aditya,
Xiaodan Zhu,
Iryna Gurevych
Abstract:
Reasoning is a fundamental component of language understanding. Recent prompting techniques, such as chain of thought, have consistently improved LLMs' performance on various reasoning tasks. Nevertheless, there is still little understanding of what triggers reasoning abilities in LLMs in the inference stage. In this paper, we introduce code prompting, a chain of prompts that transforms a natural…
▽ More
Reasoning is a fundamental component of language understanding. Recent prompting techniques, such as chain of thought, have consistently improved LLMs' performance on various reasoning tasks. Nevertheless, there is still little understanding of what triggers reasoning abilities in LLMs in the inference stage. In this paper, we introduce code prompting, a chain of prompts that transforms a natural language problem into code and directly prompts the LLM using the generated code without resorting to external code execution. We hypothesize that code prompts can elicit certain reasoning capabilities of LLMs trained on text and code and utilize the proposed method to improve conditional reasoning, the ability to infer different conclusions depending on the fulfillment of certain conditions. We find that code prompting exhibits a high-performance boost for multiple LLMs (up to 22.52 percentage points on GPT 3.5, 7.75 on Mixtral, and 16.78 on Mistral) across multiple conditional reasoning datasets. We then conduct comprehensive experiments to understand how code prompts trigger reasoning abilities and which capabilities are elicited in the underlying models. Our analysis of GPT 3.5 reveals that the code formatting of the input problem is essential for performance improvement. Furthermore, code prompts improve sample efficiency of in-context learning and facilitate state tracking of variables or entities.
△ Less
Submitted 28 September, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions
Authors:
Pengfei Hong,
Navonil Majumder,
Deepanway Ghosal,
Somak Aditya,
Rada Mihalcea,
Soujanya Poria
Abstract:
Recent advancements in Large Language Models (LLMs) have showcased striking results on existing logical reasoning benchmarks, with some models even surpassing human performance. However, the true depth of their competencies and robustness in reasoning tasks remains an open question. To this end, in this paper, we focus on two popular reasoning tasks: arithmetic reasoning and code generation. Parti…
▽ More
Recent advancements in Large Language Models (LLMs) have showcased striking results on existing logical reasoning benchmarks, with some models even surpassing human performance. However, the true depth of their competencies and robustness in reasoning tasks remains an open question. To this end, in this paper, we focus on two popular reasoning tasks: arithmetic reasoning and code generation. Particularly, we introduce (i) a general ontology of perturbations for math and coding questions, (ii) a semi-automatic method to apply these perturbations, and (iii) two datasets, GSMORE and HUMANEVAL-CORE, respectively, of perturbed math and coding problems to probe LLM capabilities in numeric reasoning and coding tasks. Through comprehensive evaluations of both closed-source and open-source LLMs, we show a significant performance drop across all the models against the perturbed questions, suggesting that the current LLMs lack robust problem solving skills and structured reasoning abilities in many areas, as defined by our ontology. We open-source the datasets and source codes at: https://github.com/declare-lab/LLM-ReasoningTest.
△ Less
Submitted 2 November, 2024; v1 submitted 17 January, 2024;
originally announced January 2024.
-
Multi-functional OFDM Signal Design for Integrated Sensing, Communications, and Power Transfer
Authors:
Yumeng Zhang,
Sundar Aditya,
Bruno Clerckx
Abstract:
The wireless domain is witnessing a flourishing of integrated systems, e.g. (a) integrated sensing and communications, and (b) simultaneous wireless information and power transfer, due to their potential to use resources (spectrum, power) judiciously. Inspired by this trend, we investigate integrated sensing, communications and powering (ISCAP), through the design of a wideband OFDM signal to powe…
▽ More
The wireless domain is witnessing a flourishing of integrated systems, e.g. (a) integrated sensing and communications, and (b) simultaneous wireless information and power transfer, due to their potential to use resources (spectrum, power) judiciously. Inspired by this trend, we investigate integrated sensing, communications and powering (ISCAP), through the design of a wideband OFDM signal to power a sensor while simultaneously performing target-sensing and communication. To characterize the ISCAP performance region, we assume symbols with non-zero mean asymmetric Gaussian distribution (i.e., the input distribution), and optimize its mean and variance at each subcarrier to maximize the harvested power, subject to constraints on the achievable rate (communications) and the average side-to-peak-lobe difference (sensing). The resulting input distribution, through simulations, achieves a larger performance region than that of (i) a symmetric complex Gaussian input distribution with identical mean and variance for the real and imaginary parts, (ii) a zero-mean symmetric complex Gaussian input distribution, and (iii) the superposed power-splitting communication and sensing signal (the coexisting solution). In particular, the optimized input distribution balances the three functions by exhibiting the following features: (a) symbols in subcarriers with strong communication channels have high variance to satisfy the rate constraint, while the other symbols are dominated by the mean, forming a relatively uniform sum of mean and variance across subcarriers for sensing; (b) with looser communication and sensing constraints, large absolute means appear on subcarriers with stronger powering channels for higher harvested power. As a final note, the results highlight the great potential of the co-designed ISCAP system for further efficiency enhancement.
△ Less
Submitted 23 June, 2024; v1 submitted 31 October, 2023;
originally announced November 2023.
-
Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models
Authors:
Man Luo,
Shrinidhi Kumbhar,
Ming shen,
Mihir Parmar,
Neeraj Varshney,
Pratyay Banerjee,
Somak Aditya,
Chitta Baral
Abstract:
Logical reasoning is fundamental for humans yet presents a substantial challenge in the domain of Artificial Intelligence. Initially, researchers used Knowledge Representation and Reasoning (KR) systems that did not scale and required non-trivial manual effort. Recently, the emergence of large language models (LLMs) has demonstrated the ability to overcome various limitations of formal Knowledge R…
▽ More
Logical reasoning is fundamental for humans yet presents a substantial challenge in the domain of Artificial Intelligence. Initially, researchers used Knowledge Representation and Reasoning (KR) systems that did not scale and required non-trivial manual effort. Recently, the emergence of large language models (LLMs) has demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems. Consequently, there's a growing interest in using LLMs for logical reasoning via natural language. This work strives to understand the proficiency of LLMs in logical reasoning by offering a brief review of the latest progress in this area; with a focus on the logical reasoning datasets, tasks, and the methods adopted to utilize LLMs for reasoning. To offer a thorough analysis, we have compiled a benchmark titled LogiGLUE. This includes 24 varied datasets encompassing deductive, abductive, and inductive reasoning. Utilizing LogiGLUE as a foundation, we have trained an instruction fine-tuned language model, resulting in LogiT5. We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model across the different logical reasoning categories. We also assess various LLMs using LogiGLUE, and the findings indicate that LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning. We aim to shed light on the capabilities and potential pathways for enhancing logical reasoning proficiency in LLMs, paving the way for more advanced and nuanced developments in this critical field.
△ Less
Submitted 30 March, 2024; v1 submitted 1 October, 2023;
originally announced October 2023.
-
Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks
Authors:
Abhinav Rao,
Sachin Vashistha,
Atharva Naik,
Somak Aditya,
Monojit Choudhury
Abstract:
Recent explorations with commercial Large Language Models (LLMs) have shown that non-expert users can jailbreak LLMs by simply manipulating their prompts; resulting in degenerate output behavior, privacy and security breaches, offensive outputs, and violations of content regulator policies. Limited studies have been conducted to formalize and analyze these attacks and their mitigations. We bridge…
▽ More
Recent explorations with commercial Large Language Models (LLMs) have shown that non-expert users can jailbreak LLMs by simply manipulating their prompts; resulting in degenerate output behavior, privacy and security breaches, offensive outputs, and violations of content regulator policies. Limited studies have been conducted to formalize and analyze these attacks and their mitigations. We bridge this gap by proposing a formalism and a taxonomy of known (and possible) jailbreaks. We survey existing jailbreak methods and their effectiveness on open-source and commercial LLMs (such as GPT-based models, OPT, BLOOM, and FLAN-T5-XXL). We further discuss the challenges of jailbreak detection in terms of their effectiveness against known attacks. For further analysis, we release a dataset of model outputs across 3700 jailbreak prompts over 4 tasks.
△ Less
Submitted 27 March, 2024; v1 submitted 24 May, 2023;
originally announced May 2023.
-
ReMask: A Robust Information-Masking Approach for Domain Counterfactual Generation
Authors:
Pengfei Hong,
Rishabh Bhardwaj,
Navonil Majumdar,
Somak Aditya,
Soujanya Poria
Abstract:
Domain shift is a big challenge in NLP, thus, many approaches resort to learning domain-invariant features to mitigate the inference phase domain shift. Such methods, however, fail to leverage the domain-specific nuances relevant to the task at hand. To avoid such drawbacks, domain counterfactual generation aims to transform a text from the source domain to a given target domain. However, due to t…
▽ More
Domain shift is a big challenge in NLP, thus, many approaches resort to learning domain-invariant features to mitigate the inference phase domain shift. Such methods, however, fail to leverage the domain-specific nuances relevant to the task at hand. To avoid such drawbacks, domain counterfactual generation aims to transform a text from the source domain to a given target domain. However, due to the limited availability of data, such frequency-based methods often miss and lead to some valid and spurious domain-token associations. Hence, we employ a three-step domain obfuscation approach that involves frequency and attention norm-based masking, to mask domain-specific cues, and unmasking to regain the domain generic context. Our experiments empirically show that the counterfactual samples sourced from our masked text lead to improved domain transfer on 10 out of 12 domain sentiment classification settings, with an average of 2% accuracy improvement over the state-of-the-art for unsupervised domain adaptation (UDA). Further, our model outperforms the state-of-the-art by achieving 1.4% average accuracy improvement in the adversarial domain adaptation (ADA) setting. Moreover, our model also shows its domain adaptation efficacy on a large multi-domain intent classification dataset where it attains state-of-the-art results. We release the codes publicly at \url{https://github.com/declare-lab/remask}.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
Local Interpretable Model Agnostic Shap Explanations for machine learning models
Authors:
P. Sai Ram Aditya,
Mayukha Pal
Abstract:
With the advancement of technology for artificial intelligence (AI) based solutions and analytics compute engines, machine learning (ML) models are getting more complex day by day. Most of these models are generally used as a black box without user interpretability. Such complex ML models make it more difficult for people to understand or trust their predictions. There are variety of frameworks us…
▽ More
With the advancement of technology for artificial intelligence (AI) based solutions and analytics compute engines, machine learning (ML) models are getting more complex day by day. Most of these models are generally used as a black box without user interpretability. Such complex ML models make it more difficult for people to understand or trust their predictions. There are variety of frameworks using explainable AI (XAI) methods to demonstrate explainability and interpretability of ML models to make their predictions more trustworthy. In this manuscript, we propose a methodology that we define as Local Interpretable Model Agnostic Shap Explanations (LIMASE). This proposed ML explanation technique uses Shapley values under the LIME paradigm to achieve the following (a) explain prediction of any model by using a locally faithful and interpretable decision tree model on which the Tree Explainer is used to calculate the shapley values and give visually interpretable explanations. (b) provide visually interpretable global explanations by plotting local explanations of several data points. (c) demonstrate solution for the submodular optimization problem. (d) also bring insight into regional interpretation e) faster computation compared to use of kernel explainer.
△ Less
Submitted 10 October, 2022;
originally announced October 2022.
-
Sensing using Coded Communications Signals
Authors:
Sundar Aditya,
Onur Dizdar,
Bruno Clerckx,
Xueru Li
Abstract:
A key challenge for a common waveform for Integrated Sensing and Communications (ISAC) - widely seen as an attractive proposition to achieve high performance for both functionalities, while efficiently utilizing available resources -- lies in leveraging information-bearing channel-coded communications signals (c.c.s) for sensing. In this paper, we investigate the sensing performance of c.c.s in (m…
▽ More
A key challenge for a common waveform for Integrated Sensing and Communications (ISAC) - widely seen as an attractive proposition to achieve high performance for both functionalities, while efficiently utilizing available resources -- lies in leveraging information-bearing channel-coded communications signals (c.c.s) for sensing. In this paper, we investigate the sensing performance of c.c.s in (multi-user) interference-limited operation, and show that it is limited by sidelobes in the range-Doppler map, whose form depends on whether the c.c.s modulates a single-carrier or OFDM waveform. While uncoded communications signals -- comprising a block of $N$ i.i.d zero-mean symbols -- give rise to asymptotically (i.e., as $N \rightarrow \infty$) zero sidelobes due to the law of large numbers, it is not obvious that the same holds for c.c.s, as structured channel coding schemes (e.g., linear block codes) induce dependence across codeword symbols. In this paper, we show that c.c.s also give rise to asymptotically zero sidelobes -- for both single-carrier and OFDM waveforms -- by deriving upper bounds for the tail probabilities of the sidelobe magnitudes that decay as $\exp( - O($code rate $\times$ block length$))$. This implies that for any code rate, c.c.s are effective sensing signals that are robust to multi-user interference at sufficiently large block lengths, with negligible difference in performance based on whether they modulate a single-carrier or OFDM waveform. We verify the latter implication through simulations, where we observe the sensing performance (characterized by the detection and false-alarm probabilities) of a QPSK-modulated c.c.s (code rate = 120/1024, block length = 1024 symbols) to match that of a comparable interference-free FMCW waveform even at high interference levels (signal-to-interference ratio of -11dB), for both single-carrier and OFDM waveforms.
△ Less
Submitted 9 September, 2022;
originally announced September 2022.
-
Generating Intermediate Steps for NLI with Next-Step Supervision
Authors:
Deepanway Ghosal,
Somak Aditya,
Monojit Choudhury
Abstract:
The Natural Language Inference (NLI) task often requires reasoning over multiple steps to reach the conclusion. While the necessity of generating such intermediate steps (instead of a summary explanation) has gained popular support, it is unclear how to generate such steps without complete end-to-end supervision and how such generated steps can be further utilized. In this work, we train a sequenc…
▽ More
The Natural Language Inference (NLI) task often requires reasoning over multiple steps to reach the conclusion. While the necessity of generating such intermediate steps (instead of a summary explanation) has gained popular support, it is unclear how to generate such steps without complete end-to-end supervision and how such generated steps can be further utilized. In this work, we train a sequence-to-sequence model to generate only the next step given an NLI premise and hypothesis pair (and previous steps); then enhance it with external knowledge and symbolic search to generate intermediate steps with only next-step supervision. We show the correctness of such generated steps through automated and human verification. Furthermore, we show that such generated steps can help improve end-to-end NLI task performance using simple data augmentation strategies, across multiple public NLI datasets.
△ Less
Submitted 31 August, 2022;
originally announced August 2022.
-
Multilingual CheckList: Generation and Evaluation
Authors:
Karthikeyan K,
Shaily Bhatt,
Pankaj Singh,
Somak Aditya,
Sandipan Dandapat,
Sunayana Sitaram,
Monojit Choudhury
Abstract:
Multilingual evaluation benchmarks usually contain limited high-resource languages and do not test models for specific linguistic capabilities. CheckList is a template-based evaluation approach that tests models for specific capabilities. The CheckList template creation process requires native speakers, posing a challenge in scaling to hundreds of languages. In this work, we explore multiple appro…
▽ More
Multilingual evaluation benchmarks usually contain limited high-resource languages and do not test models for specific linguistic capabilities. CheckList is a template-based evaluation approach that tests models for specific capabilities. The CheckList template creation process requires native speakers, posing a challenge in scaling to hundreds of languages. In this work, we explore multiple approaches to generate Multilingual CheckLists. We device an algorithm - Template Extraction Algorithm (TEA) for automatically extracting target language CheckList templates from machine translated instances of a source language templates. We compare the TEA CheckLists with CheckLists created with different levels of human intervention. We further introduce metrics along the dimensions of cost, diversity, utility, and correctness to compare the CheckLists. We thoroughly analyze different approaches to creating CheckLists in Hindi. Furthermore, we experiment with 9 more different languages. We find that TEA followed by human verification is ideal for scaling Checklist-based evaluation to multiple languages while TEA gives a good estimates of model performance.
△ Less
Submitted 11 October, 2022; v1 submitted 24 March, 2022;
originally announced March 2022.
-
LoNLI: An Extensible Framework for Testing Diverse Logical Reasoning Capabilities for NLI
Authors:
Ishan Tarunesh,
Somak Aditya,
Monojit Choudhury
Abstract:
Natural Language Inference (NLI) is considered a representative task to test natural language understanding (NLU). In this work, we propose an extensible framework to collectively yet categorically test diverse Logical reasoning capabilities required for NLI (and, by extension, NLU). Motivated by behavioral testing, we create a semi-synthetic large test bench (363 templates, 363k examples) and an…
▽ More
Natural Language Inference (NLI) is considered a representative task to test natural language understanding (NLU). In this work, we propose an extensible framework to collectively yet categorically test diverse Logical reasoning capabilities required for NLI (and, by extension, NLU). Motivated by behavioral testing, we create a semi-synthetic large test bench (363 templates, 363k examples) and an associated framework that offers the following utilities: 1) individually test and analyze reasoning capabilities along 17 reasoning dimensions (including pragmatic reasoning); 2) design experiments to study cross-capability information content (leave one out or bring one in); and 3) the synthetic nature enables us to control for artifacts and biases. We extend a publicly available framework of automated test case instantiation from free-form natural language templates (CheckList) and a well-defined taxonomy of capabilities to cover a wide range of increasingly harder test cases while varying the complexity of natural language. Through our analysis of state-of-the-art NLI systems, we observe that our benchmark is indeed hard (and non-trivial even with training on additional resources). Some capabilities stand out as harder. Further, fine-grained analysis and fine-tuning experiments reveal more insights about these capabilities and the models -- supporting and extending previous observations; thus showing the utility of the proposed testbench.
△ Less
Submitted 2 September, 2023; v1 submitted 4 December, 2021;
originally announced December 2021.
-
Analyzing the Effects of Reasoning Types on Cross-Lingual Transfer Performance
Authors:
Karthikeyan K,
Aalok Sathe,
Somak Aditya,
Monojit Choudhury
Abstract:
Multilingual language models achieve impressive zero-shot accuracies in many languages in complex tasks such as Natural Language Inference (NLI). Examples in NLI (and equivalent complex tasks) often pertain to various types of sub-tasks, requiring different kinds of reasoning. Certain types of reasoning have proven to be more difficult to learn in a monolingual context, and in the crosslingual con…
▽ More
Multilingual language models achieve impressive zero-shot accuracies in many languages in complex tasks such as Natural Language Inference (NLI). Examples in NLI (and equivalent complex tasks) often pertain to various types of sub-tasks, requiring different kinds of reasoning. Certain types of reasoning have proven to be more difficult to learn in a monolingual context, and in the crosslingual context, similar observations may shed light on zero-shot transfer efficiency and few-shot sample selection. Hence, to investigate the effects of types of reasoning on transfer performance, we propose a category-annotated multilingual NLI dataset and discuss the challenges to scale monolingual annotations to multiple languages. We statistically observe interesting effects that the confluence of reasoning types and language similarities have on transfer performance.
△ Less
Submitted 5 October, 2021;
originally announced October 2021.
-
Trusting RoBERTa over BERT: Insights from CheckListing the Natural Language Inference Task
Authors:
Ishan Tarunesh,
Somak Aditya,
Monojit Choudhury
Abstract:
The recent state-of-the-art natural language understanding (NLU) systems often behave unpredictably, failing on simpler reasoning examples. Despite this, there has been limited focus on quantifying progress towards systems with more predictable behavior. We think that reasoning capability-wise behavioral summary is a step towards bridging this gap. We create a CheckList test-suite (184K examples)…
▽ More
The recent state-of-the-art natural language understanding (NLU) systems often behave unpredictably, failing on simpler reasoning examples. Despite this, there has been limited focus on quantifying progress towards systems with more predictable behavior. We think that reasoning capability-wise behavioral summary is a step towards bridging this gap. We create a CheckList test-suite (184K examples) for the Natural Language Inference (NLI) task, a representative NLU task. We benchmark state-of-the-art NLI systems on this test-suite, which reveals fine-grained insights into the reasoning abilities of BERT and RoBERTa. Our analysis further reveals inconsistencies of the models on examples derived from the same template or distinct templates but pertaining to same reasoning capability, indicating that generalizing the models' behavior through observations made on a CheckList is non-trivial. Through an user-study, we find that users were able to utilize behavioral information to generalize much better for examples predicted from RoBERTa, compared to that of BERT.
△ Less
Submitted 15 July, 2021;
originally announced July 2021.
-
Analyzing the Nuances of Transformers' Polynomial Simplification Abilities
Authors:
Vishesh Agarwal,
Somak Aditya,
Navin Goyal
Abstract:
Symbolic Mathematical tasks such as integration often require multiple well-defined steps and understanding of sub-tasks to reach a solution. To understand Transformers' abilities in such tasks in a fine-grained manner, we deviate from traditional end-to-end settings, and explore a step-wise polynomial simplification task. Polynomials can be written in a simple normal form as a sum of monomials wh…
▽ More
Symbolic Mathematical tasks such as integration often require multiple well-defined steps and understanding of sub-tasks to reach a solution. To understand Transformers' abilities in such tasks in a fine-grained manner, we deviate from traditional end-to-end settings, and explore a step-wise polynomial simplification task. Polynomials can be written in a simple normal form as a sum of monomials which are ordered in a lexicographic order. For a polynomial which is not necessarily in this normal form, a sequence of simplification steps is applied to reach the fully simplified (i.e., in the normal form) polynomial. We propose a synthetic Polynomial dataset generation algorithm that generates polynomials with unique proof steps. Through varying coefficient configurations, input representation, proof granularity, and extensive hyper-parameter tuning, we observe that Transformers consistently struggle with numeric multiplication. We explore two ways to mitigate this: Curriculum Learning and a Symbolic Calculator approach (where the numeric operations are offloaded to a calculator). Both approaches provide significant gains over the vanilla Transformers-based baseline.
△ Less
Submitted 28 April, 2021;
originally announced April 2021.
-
TaxiNLI: Taking a Ride up the NLU Hill
Authors:
Pratik Joshi,
Somak Aditya,
Aalok Sathe,
Monojit Choudhury
Abstract:
Pre-trained Transformer-based neural architectures have consistently achieved state-of-the-art performance in the Natural Language Inference (NLI) task. Since NLI examples encompass a variety of linguistic, logical, and reasoning phenomena, it remains unclear as to which specific concepts are learnt by the trained systems and where they can achieve strong generalization. To investigate this questi…
▽ More
Pre-trained Transformer-based neural architectures have consistently achieved state-of-the-art performance in the Natural Language Inference (NLI) task. Since NLI examples encompass a variety of linguistic, logical, and reasoning phenomena, it remains unclear as to which specific concepts are learnt by the trained systems and where they can achieve strong generalization. To investigate this question, we propose a taxonomic hierarchy of categories that are relevant for the NLI task. We introduce TAXINLI, a new dataset, that has 10k examples from the MNLI dataset (Williams et al., 2018) with these taxonomic labels. Through various experiments on TAXINLI, we observe that whereas for certain taxonomic categories SOTA neural models have achieved near perfect accuracies - a large jump over the previous models - some categories still remain difficult. Our work adds to the growing body of literature that shows the gaps in the current NLI systems and datasets through a systematic presentation and analysis of reasoning categories.
△ Less
Submitted 9 October, 2020; v1 submitted 30 September, 2020;
originally announced September 2020.
-
Uncovering Relations for Marketing Knowledge Representation
Authors:
Somak Aditya,
Atanu Sinha
Abstract:
Online behaviors of consumers and marketers generate massive marketing data, which ever more sophisticated models attempt to turn into insights and aid decisions by marketers. Yet, in making decisions human managers bring to bear marketing knowledge which reside outside of data and models. Thus, it behooves creation of an automated marketing knowledge base that can interact with data and models. C…
▽ More
Online behaviors of consumers and marketers generate massive marketing data, which ever more sophisticated models attempt to turn into insights and aid decisions by marketers. Yet, in making decisions human managers bring to bear marketing knowledge which reside outside of data and models. Thus, it behooves creation of an automated marketing knowledge base that can interact with data and models. Currently, marketing knowledge is dispersed in large corpora, but no definitive knowledge base for marketing exists. Out of the two broad aspects of marketing knowledge - representation and reasoning - this treatise focuses on the former. Specifically, we focus on creation of marketing knowledge graph from corpora, which requires identification of entities and relations. The relation identification task is particularly challenging in marketing, because of the non-factoid nature of much marketing knowledge, and the difficulty of forming rules that govern relations. Specifically, we define a set of relations to capture marketing knowledge, propose a pipeline for creating the knowledge graph from text and propose a rule-guided semi-supervised relation prediction algorithm to extract relations between marketing entities from sentences.
△ Less
Submitted 23 January, 2020; v1 submitted 17 December, 2019;
originally announced December 2019.
-
Integrating Knowledge and Reasoning in Image Understanding
Authors:
Somak Aditya,
Yezhou Yang,
Chitta Baral
Abstract:
Deep learning based data-driven approaches have been successfully applied in various image understanding applications ranging from object recognition, semantic segmentation to visual question answering. However, the lack of knowledge integration as well as higher-level reasoning capabilities with the methods still pose a hindrance. In this work, we present a brief survey of a few representative re…
▽ More
Deep learning based data-driven approaches have been successfully applied in various image understanding applications ranging from object recognition, semantic segmentation to visual question answering. However, the lack of knowledge integration as well as higher-level reasoning capabilities with the methods still pose a hindrance. In this work, we present a brief survey of a few representative reasoning mechanisms, knowledge integration methods and their corresponding image understanding applications developed by various groups of researchers, approaching the problem from a variety of angles. Furthermore, we discuss upon key efforts on integrating external knowledge with neural networks. Taking cues from these efforts, we conclude by discussing potential pathways to improve reasoning capabilities.
△ Less
Submitted 24 June, 2019;
originally announced June 2019.
-
Quantitative Depth Quality Assessment of RGBD Cameras At Close Range Using 3D Printed Fixtures
Authors:
Michele Pratusevich,
Jason Chrisos,
Shreyas Aditya
Abstract:
Mobile robots that manipulate their environments require high-accuracy scene understanding at close range. Typically this understanding is achieved with RGBD cameras, but the evaluation process for selecting an appropriate RGBD camera for the application is minimally quantitative. Limited manufacturer-published metrics do not translate to observed quality in real-world cluttered environments, sinc…
▽ More
Mobile robots that manipulate their environments require high-accuracy scene understanding at close range. Typically this understanding is achieved with RGBD cameras, but the evaluation process for selecting an appropriate RGBD camera for the application is minimally quantitative. Limited manufacturer-published metrics do not translate to observed quality in real-world cluttered environments, since quality is application-specific. To bridge the gap, we present a method for quantitatively measuring depth quality using a set of extendable 3D printed fixtures that approximate real-world conditions. By framing depth quality as point cloud density and root mean square error (RMSE) from a known geometry, we present a method that is extendable by other system integrators for custom environments. We show a comparison of 3 cameras and present a case study for camera selection, provide reference meshes and analysis code, and discuss further extensions.
△ Less
Submitted 21 March, 2019;
originally announced March 2019.
-
Spatial Knowledge Distillation to aid Visual Reasoning
Authors:
Somak Aditya,
Rudra Saha,
Yezhou Yang,
Chitta Baral
Abstract:
For tasks involving language and vision, the current state-of-the-art methods tend not to leverage any additional information that might be present to gather relevant (commonsense) knowledge. A representative task is Visual Question Answering where large diagnostic datasets have been proposed to test a system's capability of answering questions about images. The training data is often accompanied…
▽ More
For tasks involving language and vision, the current state-of-the-art methods tend not to leverage any additional information that might be present to gather relevant (commonsense) knowledge. A representative task is Visual Question Answering where large diagnostic datasets have been proposed to test a system's capability of answering questions about images. The training data is often accompanied by annotations of individual object properties and spatial locations. In this work, we take a step towards integrating this additional privileged information in the form of spatial knowledge to aid in visual reasoning. We propose a framework that combines recent advances in knowledge distillation (teacher-student framework), relational reasoning and probabilistic logical languages to incorporate such knowledge in existing neural networks for the task of Visual Question Answering. Specifically, for a question posed against an image, we use a probabilistic logical language to encode the spatial knowledge and the spatial understanding about the question in the form of a mask that is directly provided to the teacher network. The student network learns from the ground-truth information as well as the teachers prediction via distillation. We also demonstrate the impact of predicting such a mask inside the teachers network using attention. Empirically, we show that both the methods improve the test accuracy over a state-of-the-art approach on a publicly available dataset.
△ Less
Submitted 11 December, 2018; v1 submitted 10 December, 2018;
originally announced December 2018.
-
Characterizing the Impact of SNR Heterogeneity on Time-of-Arrival based Localization Outage Probability
Authors:
Sundar Aditya,
Harpreet S. Dhillon,
Andreas F. Molisch,
R. Michael Buehrer,
Hatim Behairy
Abstract:
In localization, an outage occurs if the positioning error exceeds a pre-defined threshold, $ε_{\rm th}$. For time-of-arrival based localization, a key factor affecting the positioning error is the relative positions of the anchors, with respect to the target location. Specifically, the positioning error is a function of (a) the distance-dependent signal-to-noise ratios (SNRs) of the anchor-target…
▽ More
In localization, an outage occurs if the positioning error exceeds a pre-defined threshold, $ε_{\rm th}$. For time-of-arrival based localization, a key factor affecting the positioning error is the relative positions of the anchors, with respect to the target location. Specifically, the positioning error is a function of (a) the distance-dependent signal-to-noise ratios (SNRs) of the anchor-target links, and (b) the pairwise angles subtended by the anchors at the target location. From a design perspective, characterizing the distribution of the positioning error over an ensemble of target and anchor locations is essential for providing probabilistic performance guarantees against outage. To solve this difficult problem, previous works have assumed all links to have the same SNR (i.e., SNR homogeneity), which neglects the impact of distance variation among the anchors on the positioning error. In this paper, we model SNR heterogeneity among anchors using a distance-dependent pathloss model and derive an accurate approximation for the error complementary cumulative distribution function (ccdf). By highlighting the accuracy of our results, relative to previous ones that ignore SNR heterogeneity, we concretely demonstrate that SNR heterogeneity has a considerable impact on the error distribution.
△ Less
Submitted 22 April, 2018;
originally announced April 2018.
-
Robust Fruit Counting: Combining Deep Learning, Tracking, and Structure from Motion
Authors:
Xu Liu,
Steven W. Chen,
Shreyas Aditya,
Nivedha Sivakumar,
Sandeep Dcunha,
Chao Qu,
Camillo J. Taylor,
Jnaneshwar Das,
Vijay Kumar
Abstract:
We present a novel fruit counting pipeline that combines deep segmentation, frame to frame tracking, and 3D localization to accurately count visible fruits across a sequence of images. Our pipeline works on image streams from a monocular camera, both in natural light, as well as with controlled illumination at night. We first train a Fully Convolutional Network (FCN) and segment video frame images…
▽ More
We present a novel fruit counting pipeline that combines deep segmentation, frame to frame tracking, and 3D localization to accurately count visible fruits across a sequence of images. Our pipeline works on image streams from a monocular camera, both in natural light, as well as with controlled illumination at night. We first train a Fully Convolutional Network (FCN) and segment video frame images into fruit and non-fruit pixels. We then track fruits across frames using the Hungarian Algorithm where the objective cost is determined from a Kalman Filter corrected Kanade-Lucas-Tomasi (KLT) Tracker. In order to correct the estimated count from tracking process, we combine tracking results with a Structure from Motion (SfM) algorithm to calculate relative 3D locations and size estimates to reject outliers and double counted fruit tracks. We evaluate our algorithm by comparing with ground-truth human-annotated visual counts. Our results demonstrate that our pipeline is able to accurately and reliably count fruits across image sequences, and the correction step can significantly improve the counting accuracy and robustness. Although discussed in the context of fruit counting, our work can extend to detection, tracking, and counting of a variety of other stationary features of interest such as leaf-spots, wilt, and blossom.
△ Less
Submitted 2 August, 2018; v1 submitted 1 April, 2018;
originally announced April 2018.
-
Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering
Authors:
Somak Aditya,
Yezhou Yang,
Chitta Baral
Abstract:
Many vision and language tasks require commonsense reasoning beyond data-driven image and natural language processing. Here we adopt Visual Question Answering (VQA) as an example task, where a system is expected to answer a question in natural language about an image. Current state-of-the-art systems attempted to solve the task using deep neural architectures and achieved promising performance. Ho…
▽ More
Many vision and language tasks require commonsense reasoning beyond data-driven image and natural language processing. Here we adopt Visual Question Answering (VQA) as an example task, where a system is expected to answer a question in natural language about an image. Current state-of-the-art systems attempted to solve the task using deep neural architectures and achieved promising performance. However, the resulting systems are generally opaque and they struggle in understanding questions for which extra knowledge is required. In this paper, we present an explicit reasoning layer on top of a set of penultimate neural network based systems. The reasoning layer enables reasoning and answering questions where additional knowledge is required, and at the same time provides an interpretable interface to the end users. Specifically, the reasoning layer adopts a Probabilistic Soft Logic (PSL) based engine to reason over a basket of inputs: visual relations, the semantic parse of the question, and background ontological knowledge from word2vec and ConceptNet. Experimental analysis of the answers and the key evidential predicates generated on the VQA dataset validate our approach.
△ Less
Submitted 23 March, 2018;
originally announced March 2018.
-
A Tractable Analysis of the Blind-spot Probability in Localization Networks under Correlated Blocking
Authors:
Sundar Aditya,
Harpreet S. Dhillon,
Andreas F. Molisch,
Hatim Behairy
Abstract:
In localization applications, the line-of-sight between anchors and targets may be blocked by obstacles in the environment. A target that is invisible (i.e., without line-of-sight) to a sufficient number of anchors cannot be unambiguously localized and is, therefore, said to be in a blind spot. In this paper, we analyze the blind spot probability of a typical target by using stochastic geometry to…
▽ More
In localization applications, the line-of-sight between anchors and targets may be blocked by obstacles in the environment. A target that is invisible (i.e., without line-of-sight) to a sufficient number of anchors cannot be unambiguously localized and is, therefore, said to be in a blind spot. In this paper, we analyze the blind spot probability of a typical target by using stochastic geometry to model the randomness in the obstacle and anchor locations. In doing so, we handle correlated anchor blocking induced by the obstacles, unlike previous works that assume independent anchor blocking. We first characterize the regime over which the independent blocking assumption underestimates the blind spot probability of the typical target, which in turn, is characterized as a function of the distribution of the visible area, surrounding the target location. Since this distribution is difficult to characterize exactly, we formulate the nearest two-obstacle approximation, which is equivalent to considering correlated blocking for only the nearest two obstacles from the target and assuming independent blocking for the remaining obstacles. Based on this, we derive an approximate expression for the blind spot probability, which helps determine the anchor deployment intensity needed for the blind spot probability of a typical target to be at most a threshold, $μ$.
△ Less
Submitted 8 July, 2018; v1 submitted 25 January, 2018;
originally announced January 2018.
-
Localization of Multiple Targets with Identical Radar Signatures in Multipath Environments with Correlated Blocking
Authors:
Sundar Aditya,
Andreas F. Molisch,
Naif Rabeah,
Hatim Behairy
Abstract:
This paper addresses the problem of localizing an unknown number of targets, all having the same radar signature, by a distributed MIMO radar consisting of single antenna transmitters and receivers that cannot determine directions of departure and arrival. Furthermore, we consider the presence of multipath propagation, and the possible (correlated) blocking of the direct paths (going from the tran…
▽ More
This paper addresses the problem of localizing an unknown number of targets, all having the same radar signature, by a distributed MIMO radar consisting of single antenna transmitters and receivers that cannot determine directions of departure and arrival. Furthermore, we consider the presence of multipath propagation, and the possible (correlated) blocking of the direct paths (going from the transmitter and reflecting off a target to the receiver). In its most general form, this problem can be cast as a Bayesian estimation problem where every multipath component is accounted for. However, when the environment map is unknown, this problem is ill-posed and hence, a tractable approximation is derived where only direct paths are accounted for. In particular, we take into account the correlated blocking by scatterers in the environment which appears as a prior term in the Bayesian estimation framework. A sub-optimal polynomial-time algorithm to solve the Bayesian multi-target localization problem with correlated blocking is proposed and its performance is evaluated using simulations. We found that when correlated blocking is severe, assuming the blocking events to be independent and having constant probability (as was done in previous papers) resulted in poor detection performance, with false alarms more likely to occur than detections.
△ Less
Submitted 3 November, 2017;
originally announced November 2017.
-
Asymptotic Blind-spot Analysis of Localization Networks under Correlated Blocking using a Poisson Line Process
Authors:
Sundar Aditya,
Harpreet S. Dhillon,
Andreas F. Molisch,
Hatim Behairy
Abstract:
In a localization network, the line-of-sight between anchors (transceivers) and targets may be blocked due to the presence of obstacles in the environment. Due to the non-zero size of the obstacles, the blocking is typically correlated across both anchor and target locations, with the extent of correlation increasing with obstacle size. If a target does not have line-of-sight to a minimum number o…
▽ More
In a localization network, the line-of-sight between anchors (transceivers) and targets may be blocked due to the presence of obstacles in the environment. Due to the non-zero size of the obstacles, the blocking is typically correlated across both anchor and target locations, with the extent of correlation increasing with obstacle size. If a target does not have line-of-sight to a minimum number of anchors, then its position cannot be estimated unambiguously and is, therefore, said to be in a blind-spot. However, the analysis of the blind-spot probability of a given target is challenging due to the inherent randomness in the obstacle locations and sizes. In this letter, we develop a new framework to analyze the worst-case impact of correlated blocking on the blind-spot probability of a typical target; in particular, we model the obstacles by a Poisson line process and the anchor locations by a Poisson point process. For this setup, we define the notion of the asymptotic blind-spot probability of the typical target and derive a closed-form expression for it as a function of the area distribution of a typical Poisson-Voronoi cell. As an upper bound for the more realistic case when obstacles have finite dimensions, the asymptotic blind-spot probability is useful as a design tool to ensure that the blind-spot probability of a typical target does not exceed a desired threshold, $ε$.
△ Less
Submitted 12 July, 2017;
originally announced July 2017.
-
Answering Image Riddles using Vision and Reasoning through Probabilistic Soft Logic
Authors:
Somak Aditya,
Yezhou Yang,
Chitta Baral,
Yiannis Aloimonos
Abstract:
In this work, we explore a genre of puzzles ("image riddles") which involves a set of images and a question. Answering these puzzles require both capabilities involving visual detection (including object, activity recognition) and, knowledge-based or commonsense reasoning. We compile a dataset of over 3k riddles where each riddle consists of 4 images and a groundtruth answer. The annotations are v…
▽ More
In this work, we explore a genre of puzzles ("image riddles") which involves a set of images and a question. Answering these puzzles require both capabilities involving visual detection (including object, activity recognition) and, knowledge-based or commonsense reasoning. We compile a dataset of over 3k riddles where each riddle consists of 4 images and a groundtruth answer. The annotations are validated using crowd-sourced evaluation. We also define an automatic evaluation metric to track future progress. Our task bears similarity with the commonly known IQ tasks such as analogy solving, sequence filling that are often used to test intelligence.
We develop a Probabilistic Reasoning-based approach that utilizes probabilistic commonsense knowledge to answer these riddles with a reasonable accuracy. We demonstrate the results of our approach using both automatic and human evaluations. Our approach achieves some promising results for these riddles and provides a strong baseline for future attempts. We make the entire dataset and related materials publicly available to the community in ImageRiddle Website (http://bit.ly/22f9Ala).
△ Less
Submitted 17 November, 2016;
originally announced November 2016.
-
From Images to Sentences through Scene Description Graphs using Commonsense Reasoning and Knowledge
Authors:
Somak Aditya,
Yezhou Yang,
Chitta Baral,
Cornelia Fermuller,
Yiannis Aloimonos
Abstract:
In this paper we propose the construction of linguistic descriptions of images. This is achieved through the extraction of scene description graphs (SDGs) from visual scenes using an automatically constructed knowledge base. SDGs are constructed using both vision and reasoning. Specifically, commonsense reasoning is applied on (a) detections obtained from existing perception methods on given image…
▽ More
In this paper we propose the construction of linguistic descriptions of images. This is achieved through the extraction of scene description graphs (SDGs) from visual scenes using an automatically constructed knowledge base. SDGs are constructed using both vision and reasoning. Specifically, commonsense reasoning is applied on (a) detections obtained from existing perception methods on given images, (b) a "commonsense" knowledge base constructed using natural language processing of image annotations and (c) lexical ontological knowledge from resources such as WordNet. Amazon Mechanical Turk(AMT)-based evaluations on Flickr8k, Flickr30k and MS-COCO datasets show that in most cases, sentences auto-constructed from SDGs obtained by our method give a more relevant and thorough description of an image than a recent state-of-the-art image caption based approach. Our Image-Sentence Alignment Evaluation results are also comparable to that of the recent state-of-the art approaches.
△ Less
Submitted 10 November, 2015;
originally announced November 2015.
-
GRAPLEr: A Distributed Collaborative Environment for Lake Ecosystem Modeling that Integrates Overlay Networks, High-throughput Computing, and Web Services
Authors:
Kensworth Subratie,
Saumitra Aditya,
Renato Figueiredo,
Cayelan C. Carey,
Paul Hanson
Abstract:
The GLEON Research And PRAGMA Lake Expedition -- GRAPLE -- is a collaborative effort between computer science and lake ecology researchers. It aims to improve our understanding and predictive capacity of the threats to the water quality of our freshwater resources, including climate change. This paper presents GRAPLEr, a distributed computing system used to address the modeling needs of GRAPLE res…
▽ More
The GLEON Research And PRAGMA Lake Expedition -- GRAPLE -- is a collaborative effort between computer science and lake ecology researchers. It aims to improve our understanding and predictive capacity of the threats to the water quality of our freshwater resources, including climate change. This paper presents GRAPLEr, a distributed computing system used to address the modeling needs of GRAPLE researchers. GRAPLEr integrates and applies overlay virtual network, high-throughput computing, and Web service technologies in a novel way. First, its user-level IP-over-P2P (IPOP) overlay network allows compute and storage resources distributed across independently-administered institutions (including private and public clouds) to be aggregated into a common virtual network, despite the presence of firewalls and network address translators. Second, resources aggregated by the IPOP virtual network run unmodified high-throughput computing middleware (HTCondor) to enable large numbers of model simulations to be executed concurrently across the distributed computing resources. Third, a Web service interface allows end users to submit job requests to the system using client libraries that integrate with the R statistical computing environment. The paper presents the GRAPLEr architecture, describes its implementation and reports on its performance for batches of General Lake Model (GLM) simulations across three cloud infrastructures (University of Florida, CloudLab, and Microsoft Azure).
△ Less
Submitted 29 September, 2015;
originally announced September 2015.
-
Algorithmic Perspectives of Network Transitive Reduction Problems and their Applications to Synthesis and Analysis of Biological Networks
Authors:
Satabdi Aditya,
Bhaskar DasGupta,
Marek Karpinski
Abstract:
In this survey paper, we will present a number of core algorithmic questions concerning several transitive reduction problems on network that have applications in network synthesis and analysis involving cellular processes. Our starting point will be the so-called minimum equivalent digraph problem, a classic computational problem in combinatorial algorithms. We will subsequently consider a few no…
▽ More
In this survey paper, we will present a number of core algorithmic questions concerning several transitive reduction problems on network that have applications in network synthesis and analysis involving cellular processes. Our starting point will be the so-called minimum equivalent digraph problem, a classic computational problem in combinatorial algorithms. We will subsequently consider a few non-trivial extensions or generalizations of this problem motivated by applications in systems biology. We will then discuss the applications of these algorithmic methodologies in the context of three major biological research questions: synthesizing and simplifying signal transduction networks, analyzing disease networks, and measuring redundancy of biological networks.
△ Less
Submitted 27 December, 2013;
originally announced December 2013.
-
A Channel Coding Perspective of Collaborative Filtering
Authors:
S. T. Aditya,
Onkar Dabeer,
Bikash Kumar Dey
Abstract:
We consider the problem of collaborative filtering from a channel coding perspective. We model the underlying rating matrix as a finite alphabet matrix with block constant structure. The observations are obtained from this underlying matrix through a discrete memoryless channel with a noisy part representing noisy user behavior and an erasure part representing missing data. Moreover, the cluster…
▽ More
We consider the problem of collaborative filtering from a channel coding perspective. We model the underlying rating matrix as a finite alphabet matrix with block constant structure. The observations are obtained from this underlying matrix through a discrete memoryless channel with a noisy part representing noisy user behavior and an erasure part representing missing data. Moreover, the clusters over which the underlying matrix is constant are {\it unknown}. We establish a sharp threshold result for this model: if the largest cluster size is smaller than $C_1 \log(mn)$ (where the rating matrix is of size $m \times n$), then the underlying matrix cannot be recovered with any estimator, but if the smallest cluster size is larger than $C_2 \log(mn)$, then we show a polynomial time estimator with diminishing probability of error. In the case of uniform cluster size, not only the order of the threshold, but also the constant is identified.
△ Less
Submitted 18 August, 2009;
originally announced August 2009.
-
A Channel Coding Perspective of Recommendation Systems
Authors:
S. T. Aditya,
Onkar Dabeer,
Bikash Kumar Dey
Abstract:
Motivated by recommendation systems, we consider the problem of estimating block constant binary matrices (of size $m \times n$) from sparse and noisy observations. The observations are obtained from the underlying block constant matrix after unknown row and column permutations, erasures, and errors. We derive upper and lower bounds on the achievable probability of error. For fixed erasure and e…
▽ More
Motivated by recommendation systems, we consider the problem of estimating block constant binary matrices (of size $m \times n$) from sparse and noisy observations. The observations are obtained from the underlying block constant matrix after unknown row and column permutations, erasures, and errors. We derive upper and lower bounds on the achievable probability of error. For fixed erasure and error probability, we show that there exists a constant $C_1$ such that if the cluster sizes are less than $C_1 \ln(mn)$, then for any algorithm the probability of error approaches one as $m, n \tends \infty$. On the other hand, we show that a simple polynomial time algorithm gives probability of error diminishing to zero provided the cluster sizes are greater than $C_2 \ln(mn)$ for a suitable constant $C_2$.
△ Less
Submitted 13 January, 2009;
originally announced January 2009.