-
Robust Reward Modeling via Causal Rubrics
Authors:
Pragya Srivastava,
Harman Singh,
Rahul Madhavan,
Gandharv Patil,
Sravanti Addepalli,
Arun Suggala,
Rengarajan Aravamudhan,
Soumya Sharma,
Anirban Laha,
Aravindan Raghuveer,
Karthikeyan Shanmugam,
Doina Precup
Abstract:
Reward models (RMs) are fundamental to aligning Large Language Models (LLMs) via human feedback, yet they often suffer from reward hacking. They tend to latch on to superficial or spurious attributes, such as response length or formatting, mistaking these cues learned from correlations in training data for the true causal drivers of quality (e.g., factuality, relevance). This occurs because standa…
▽ More
Reward models (RMs) are fundamental to aligning Large Language Models (LLMs) via human feedback, yet they often suffer from reward hacking. They tend to latch on to superficial or spurious attributes, such as response length or formatting, mistaking these cues learned from correlations in training data for the true causal drivers of quality (e.g., factuality, relevance). This occurs because standard training objectives struggle to disentangle these factors, leading to brittle RMs and misaligned policies. We introduce Crome (Causally Robust Reward Modeling), a novel framework grounded in an explicit causal model designed to mitigate reward hacking. Crome employs the following synthetic targeted augmentations during training: (1) Causal Augmentations, which are pairs that differ along specific causal attributes, to enforce sensitivity along each causal attribute individually, and (2) Neutral Augmentations, which are tie-label pairs varying primarily in spurious attributes, to enforce invariance along spurious attributes. Notably, our augmentations are produced without any knowledge of spurious factors, via answer interventions only along causal rubrics, that are identified by querying an oracle LLM. Empirically, Crome significantly outperforms standard baselines on RewardBench, improving average accuracy by up to 5.4% and achieving gains of up to 13.2% and 7.2% in specific categories. The robustness of Crome is further testified by the consistent gains obtained in a Best-of-N inference setting across increasing N, across various benchmarks, including the popular RewardBench (covering chat, chat-hard, safety, and reasoning tasks), the safety-focused WildGuardTest, and the reasoning-specific GSM8k.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
A Word is Worth 4-bit: Efficient Log Parsing with Binary Coded Decimal Recognition
Authors:
Prerak Srivastava,
Giulio Corallo,
Sergey Rybalko
Abstract:
System-generated logs are typically converted into categorical log templates through parsing. These templates are crucial for generating actionable insights in various downstream tasks. However, existing parsers often fail to capture fine-grained template details, leading to suboptimal accuracy and reduced utility in downstream tasks requiring precise pattern identification. We propose a character…
▽ More
System-generated logs are typically converted into categorical log templates through parsing. These templates are crucial for generating actionable insights in various downstream tasks. However, existing parsers often fail to capture fine-grained template details, leading to suboptimal accuracy and reduced utility in downstream tasks requiring precise pattern identification. We propose a character-level log parser utilizing a novel neural architecture that aggregates character embeddings. Our approach estimates a sequence of binary-coded decimals to achieve highly granular log templates extraction. Our low-resource character-level parser, tested on revised Loghub-2k and a manually annotated industrial dataset, matches LLM-based parsers in accuracy while outperforming semantic parsers in efficiency.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
Nine Ways to Break Copyright Law and Why Our LLM Won't: A Fair Use Aligned Generation Framework
Authors:
Aakash Sen Sharma,
Debdeep Sanyal,
Priyansh Srivastava,
Sundar Atreya H.,
Shirish Karande,
Mohan Kankanhalli,
Murari Mandal
Abstract:
Large language models (LLMs) commonly risk copyright infringement by reproducing protected content verbatim or with insufficient transformative modifications, posing significant ethical, legal, and practical concerns. Current inference-time safeguards predominantly rely on restrictive refusal-based filters, often compromising the practical utility of these models. To address this, we collaborated…
▽ More
Large language models (LLMs) commonly risk copyright infringement by reproducing protected content verbatim or with insufficient transformative modifications, posing significant ethical, legal, and practical concerns. Current inference-time safeguards predominantly rely on restrictive refusal-based filters, often compromising the practical utility of these models. To address this, we collaborated closely with intellectual property experts to develop FUA-LLM (Fair Use Aligned Language Models), a legally-grounded framework explicitly designed to align LLM outputs with fair-use doctrine. Central to our method is FairUseDB, a carefully constructed dataset containing 18,000 expert-validated examples covering nine realistic infringement scenarios. Leveraging this dataset, we apply Direct Preference Optimization (DPO) to fine-tune open-source LLMs, encouraging them to produce legally compliant and practically useful alternatives rather than resorting to blunt refusal. Recognizing the shortcomings of traditional evaluation metrics, we propose new measures: Weighted Penalty Utility and Compliance Aware Harmonic Mean (CAH) to balance infringement risk against response utility. Extensive quantitative experiments coupled with expert evaluations confirm that FUA-LLM substantially reduces problematic outputs (up to 20\%) compared to state-of-the-art approaches, while preserving real-world usability.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
Sense and Sensitivity: Examining the Influence of Semantic Recall on Long Context Code Reasoning
Authors:
Adam Štorek,
Mukur Gupta,
Samira Hajizadeh,
Prashast Srivastava,
Suman Jana
Abstract:
Although modern Large Language Models (LLMs) support extremely large contexts, their effectiveness in utilizing long context for code reasoning remains unclear. This paper investigates LLM reasoning ability over code snippets within large repositories and how it relates to their recall ability. Specifically, we differentiate between lexical code recall (verbatim retrieval) and semantic code recall…
▽ More
Although modern Large Language Models (LLMs) support extremely large contexts, their effectiveness in utilizing long context for code reasoning remains unclear. This paper investigates LLM reasoning ability over code snippets within large repositories and how it relates to their recall ability. Specifically, we differentiate between lexical code recall (verbatim retrieval) and semantic code recall (remembering what the code does). To measure semantic recall, we propose SemTrace, a code reasoning technique where the impact of specific statements on output is attributable and unpredictable. We also present a method to quantify semantic recall sensitivity in existing benchmarks. Our evaluation of state-of-the-art LLMs reveals a significant drop in code reasoning accuracy as a code snippet approaches the middle of the input context, particularly with techniques requiring high semantic recall like SemTrace. Moreover, we find that lexical recall varies by granularity, with models excelling at function retrieval but struggling with line-by-line recall. Notably, a disconnect exists between lexical and semantic recall, suggesting different underlying mechanisms. Finally, our findings indicate that current code reasoning benchmarks may exhibit low semantic recall sensitivity, potentially underestimating LLM challenges in leveraging in-context information.
△ Less
Submitted 20 May, 2025; v1 submitted 19 May, 2025;
originally announced May 2025.
-
The Rate-Immediacy Barrier in Explicit Tree Code Constructions
Authors:
Gil Cohen,
Leonard J. Schulman,
Piyush Srivastava
Abstract:
Since the introduction of tree codes by Schulman (STOC 1993), explicit construction of such codes has remained a notorious challenge. While the construction of asymptotically-good explicit tree codes continues to be elusive, a work by Cohen, Haeupler and Schulman (STOC 2018), as well as the state-of-the-art construction by Ben Yaacov, Cohen, and Yankovitz (STOC 2022) have achieved codes with rate…
▽ More
Since the introduction of tree codes by Schulman (STOC 1993), explicit construction of such codes has remained a notorious challenge. While the construction of asymptotically-good explicit tree codes continues to be elusive, a work by Cohen, Haeupler and Schulman (STOC 2018), as well as the state-of-the-art construction by Ben Yaacov, Cohen, and Yankovitz (STOC 2022) have achieved codes with rate $Ω(1/\log\log n)$, exponentially improving upon the original construction of Evans, Klugerman and Schulman from 1994. All of these constructions rely, at least in part, on increasingly sophisticated methods of combining (block) error-correcting codes.
In this work, we identify a fundamental barrier to constructing tree codes using current techniques. We introduce a key property, which we call immediacy, that, while not required by the original definition of tree codes, is shared by all known constructions and inherently arises from recursive combinations of error-correcting codes. Our main technical contribution is the proof of a rate-immediacy tradeoff, which, in particular, implies that any tree code with constant distance and non-trivial immediacy must necessarily have vanishing rate. By applying our rate-immediacy tradeoff to existing constructions, we establish that their known rate analyses are essentially optimal. More broadly, our work highlights the need for fundamentally new ideas--beyond the recursive use of error-correcting codes--to achieve substantial progress in explicitly constructing asymptotically-good tree codes.
△ Less
Submitted 12 April, 2025;
originally announced April 2025.
-
XOXO: Stealthy Cross-Origin Context Poisoning Attacks against AI Coding Assistants
Authors:
Adam Štorek,
Mukur Gupta,
Noopur Bhatt,
Aditya Gupta,
Janie Kim,
Prashast Srivastava,
Suman Jana
Abstract:
AI coding assistants are widely used for tasks like code generation. These tools now require large and complex contexts, automatically sourced from various origins$\unicode{x2014}$across files, projects, and contributors$\unicode{x2014}$forming part of the prompt fed to underlying LLMs. This automatic context-gathering introduces new vulnerabilities, allowing attackers to subtly poison input to co…
▽ More
AI coding assistants are widely used for tasks like code generation. These tools now require large and complex contexts, automatically sourced from various origins$\unicode{x2014}$across files, projects, and contributors$\unicode{x2014}$forming part of the prompt fed to underlying LLMs. This automatic context-gathering introduces new vulnerabilities, allowing attackers to subtly poison input to compromise the assistant's outputs, potentially generating vulnerable code or introducing critical errors. We propose a novel attack, Cross-Origin Context Poisoning (XOXO), that is challenging to detect as it relies on adversarial code modifications that are semantically equivalent. Traditional program analysis techniques struggle to identify these perturbations since the semantics of the code remains correct, making it appear legitimate. This allows attackers to manipulate coding assistants into producing incorrect outputs, while shifting the blame to the victim developer. We introduce a novel, task-agnostic, black-box attack algorithm GCGS that systematically searches the transformation space using a Cayley Graph, achieving a 75.72% attack success rate on average across five tasks and eleven models, including GPT 4.1 and Claude 3.5 Sonnet v2 used by popular AI coding assistants. Furthermore, defenses like adversarial fine-tuning are ineffective against our attack, underscoring the need for new security measures in LLM-powered coding tools.
△ Less
Submitted 20 May, 2025; v1 submitted 18 March, 2025;
originally announced March 2025.
-
Approximating the Total Variation Distance between Gaussians
Authors:
Arnab Bhattacharyya,
Weiming Feng,
Piyush Srivastava
Abstract:
The total variation distance is a metric of central importance in statistics and probability theory. However, somewhat surprisingly, questions about computing it algorithmically appear not to have been systematically studied until very recently. In this paper, we contribute to this line of work by studying this question in the important special case of multivariate Gaussians. More formally, we con…
▽ More
The total variation distance is a metric of central importance in statistics and probability theory. However, somewhat surprisingly, questions about computing it algorithmically appear not to have been systematically studied until very recently. In this paper, we contribute to this line of work by studying this question in the important special case of multivariate Gaussians. More formally, we consider the problem of approximating the total variation distance between two multivariate Gaussians to within an $ε$-relative error. Previous works achieved a fixed constant relative error approximation via closed-form formulas. In this work, we give algorithms that given any two $n$-dimensional Gaussians $D_1,D_2$, and any error bound $ε> 0$, approximate the total variation distance $D := d_{TV}(D_1,D_2)$ to $ε$-relative accuracy in $\text{poly}(n,\frac{1}ε,\log \frac{1}{D})$ operations. The main technical tool in our work is a reduction that helps us extend the recent progress on computing the TV-distance between discrete random variables to our continuous setting.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
Deterministically approximating the volume of a Kostka polytope
Authors:
Hariharan Narayanan,
Piyush Srivastava
Abstract:
Polynomial-time deterministic approximation of volumes of polytopes, up to an approximation factor that grows at most sub-exponentially with the dimension, remains an open problem. Recent work on this question has focused on identifying interesting classes of polytopes for which such approximation algorithms can be obtained. In this paper, we focus on one such class of polytopes: the Kostka polyto…
▽ More
Polynomial-time deterministic approximation of volumes of polytopes, up to an approximation factor that grows at most sub-exponentially with the dimension, remains an open problem. Recent work on this question has focused on identifying interesting classes of polytopes for which such approximation algorithms can be obtained. In this paper, we focus on one such class of polytopes: the Kostka polytopes. The volumes of Kostka polytopes appear naturally in questions of random matrix theory, in the context of evaluating the probability density that a random Hermitian matrix with fixed spectrum $λ$ has a given diagonal $μ$ (the so-called randomized Schur-Horn problem): the corresponding Kostka polytope is denoted $\mathrm{GT}(λ, μ)$. We give a polynomial-time deterministic algorithm for approximating the volume of a ($Ω(n^2)$ dimensional) Kostka polytope $\mathrm{GT}(λ, μ)$ to within a multiplicative factor of $\exp(O(n\log n))$, when $λ$ is an integral partition with $n$ parts, with entries bounded above by a polynomial in $n$, and $μ$ is an integer vector lying in the interior of the permutohedron (i.e., convex hull of all permutations) of $λ$. The algorithm thus gives asymptotically correct estimates of the log-volume of Kostka polytopes corresponding to such $(λ, μ)$. Our approach is based on a partition function interpretation of a continuous analogue of Schur polynomials.
△ Less
Submitted 5 April, 2025; v1 submitted 9 March, 2025;
originally announced March 2025.
-
Autotelic Reinforcement Learning: Exploring Intrinsic Motivations for Skill Acquisition in Open-Ended Environments
Authors:
Prakhar Srivastava,
Jasmeet Singh
Abstract:
This paper presents a comprehensive overview of autotelic Reinforcement Learning (RL), emphasizing the role of intrinsic motivations in the open-ended formation of skill repertoires. We delineate the distinctions between knowledge-based and competence-based intrinsic motivations, illustrating how these concepts inform the development of autonomous agents capable of generating and pursuing self-def…
▽ More
This paper presents a comprehensive overview of autotelic Reinforcement Learning (RL), emphasizing the role of intrinsic motivations in the open-ended formation of skill repertoires. We delineate the distinctions between knowledge-based and competence-based intrinsic motivations, illustrating how these concepts inform the development of autonomous agents capable of generating and pursuing self-defined goals. The typology of Intrinsically Motivated Goal Exploration Processes (IMGEPs) is explored, with a focus on the implications for multi-goal RL and developmental robotics. The autotelic learning problem is framed within a reward-free Markov Decision Process (MDP), WHERE agents must autonomously represent, generate, and master their own goals. We address the unique challenges in evaluating such agents, proposing various metrics for measuring exploration, generalization, and robustness in complex environments. This work aims to advance the understanding of autotelic RL agents and their potential for enhancing skill acquisition in a diverse and dynamic setting.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
Complexity of Minimal Faithful Permutation Degree for Fitting-free Groups
Authors:
Michael Levet,
Pranjal Srivastava,
Dhara Thakkar
Abstract:
In this paper, we investigate the complexity of computing the minimal faithful permutation degree for groups without abelian normal subgroups. When our groups are given as quotients of permutation groups, we establish that this problem is in $\textsf{P}$. Furthermore, in the setting of permutation groups, we obtain an upper bound of $\textsf{NC}$ for this problem. This improves upon the work of Da…
▽ More
In this paper, we investigate the complexity of computing the minimal faithful permutation degree for groups without abelian normal subgroups. When our groups are given as quotients of permutation groups, we establish that this problem is in $\textsf{P}$. Furthermore, in the setting of permutation groups, we obtain an upper bound of $\textsf{NC}$ for this problem. This improves upon the work of Das and Thakkar (STOC 2024), who established a Las Vegas polynomial-time algorithm for this class in the setting of permutation groups.
△ Less
Submitted 28 April, 2025; v1 submitted 27 January, 2025;
originally announced January 2025.
-
Region of Interest based Medical Image Compression
Authors:
Utkarsh Prakash Srivastava,
Toshiaki Fujii
Abstract:
The vast volume of medical image data necessitates efficient compression techniques to support remote healthcare services. This paper explores Region of Interest (ROI) coding to address the balance between compression rate and image quality. By leveraging UNET segmentation on the Brats 2020 dataset, we accurately identify tumor regions, which are critical for diagnosis. These regions are then subj…
▽ More
The vast volume of medical image data necessitates efficient compression techniques to support remote healthcare services. This paper explores Region of Interest (ROI) coding to address the balance between compression rate and image quality. By leveraging UNET segmentation on the Brats 2020 dataset, we accurately identify tumor regions, which are critical for diagnosis. These regions are then subjected to High Efficiency Video Coding (HEVC) for compression, enhancing compression rates while preserving essential diagnostic information. This approach ensures that critical image regions maintain their quality, while non-essential areas are compressed more. Our method optimizes storage space and transmission bandwidth, meeting the demands of telemedicine and large-scale medical imaging. Through this technique, we provide a robust solution that maintains the integrity of vital data and improves the efficiency of medical image handling.
△ Less
Submitted 6 January, 2025;
originally announced January 2025.
-
Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing
Authors:
Peihao Wang,
Ruisi Cai,
Yuehao Wang,
Jiajun Zhu,
Pragya Srivastava,
Zhangyang Wang,
Pan Li
Abstract:
Structured State Space Models (SSMs) have emerged as alternatives to transformers. While SSMs are often regarded as effective in capturing long-sequence dependencies, we rigorously demonstrate that they are inherently limited by strong recency bias. Our empirical studies also reveal that this bias impairs the models' ability to recall distant information and introduces robustness issues. Our scali…
▽ More
Structured State Space Models (SSMs) have emerged as alternatives to transformers. While SSMs are often regarded as effective in capturing long-sequence dependencies, we rigorously demonstrate that they are inherently limited by strong recency bias. Our empirical studies also reveal that this bias impairs the models' ability to recall distant information and introduces robustness issues. Our scaling experiments then discovered that deeper structures in SSMs can facilitate the learning of long contexts. However, subsequent theoretical analysis reveals that as SSMs increase in depth, they exhibit another inevitable tendency toward over-smoothing, e.g., token representations becoming increasingly indistinguishable. This fundamental dilemma between recency and over-smoothing hinders the scalability of existing SSMs. Inspired by our theoretical findings, we propose to polarize two channels of the state transition matrices in SSMs, setting them to zero and one, respectively, simultaneously addressing recency bias and over-smoothing. Experiments demonstrate that our polarization technique consistently enhances the associative recall accuracy of long-range tokens and unlocks SSMs to benefit further from deeper architectures. All source codes are released at https://github.com/VITA-Group/SSM-Bottleneck.
△ Less
Submitted 10 March, 2025; v1 submitted 31 December, 2024;
originally announced January 2025.
-
Decoding Emotion: Speech Perception Patterns in Individuals with Self-reported Depression
Authors:
Guneesh Vats,
Priyanka Srivastava,
Chiranjeevi Yarra
Abstract:
The current study examines the relationship between self-reported depression and the perception of affective speech within the Indian population. PANAS and PHQ-9 were used to assess current mood and depression, respectively. Participants' emotional reactivity was recorded on a valence and arousal scale against the affective speech audio presented in a sequence. No significant differences between t…
▽ More
The current study examines the relationship between self-reported depression and the perception of affective speech within the Indian population. PANAS and PHQ-9 were used to assess current mood and depression, respectively. Participants' emotional reactivity was recorded on a valence and arousal scale against the affective speech audio presented in a sequence. No significant differences between the depression and no-depression groups were observed for any of the emotional stimuli, except the audio file depicting neutral emotion. Significantly higher PANAS scores by the depression than the no-depression group indicate the impact of pre-disposed mood on the current mood status. Contrary to previous findings, this study did not observe reduced positive emotional reactivity by the depression group. However, the results demonstrated consistency in emotional reactivity for speech stimuli depicting sadness and anger across all measures of emotion perception.
△ Less
Submitted 28 December, 2024;
originally announced December 2024.
-
Approximate counting of permutation patterns
Authors:
Omri Ben-Eliezer,
Slobodan Mitrović,
Pranjal Srivastava
Abstract:
We consider the problem of counting the copies of a length-$k$ pattern $σ$ in a sequence $f \colon [n] \to \mathbb{R}$, where a copy is a subset of indices $i_1 < \ldots < i_k \in [n]$ such that $f(i_j) < f(i_\ell)$ if and only if $σ(j) < σ(\ell)$. This problem is motivated by a range of connections and applications in ranking, nonparametric statistics, combinatorics, and fine-grained complexity,…
▽ More
We consider the problem of counting the copies of a length-$k$ pattern $σ$ in a sequence $f \colon [n] \to \mathbb{R}$, where a copy is a subset of indices $i_1 < \ldots < i_k \in [n]$ such that $f(i_j) < f(i_\ell)$ if and only if $σ(j) < σ(\ell)$. This problem is motivated by a range of connections and applications in ranking, nonparametric statistics, combinatorics, and fine-grained complexity, especially when $k$ is a small fixed constant.
Recent advances have significantly improved our understanding of counting and detecting patterns. Guillemot and Marx [2014] demonstrated that the detection variant is solvable in $O(n)$ time for any fixed $k$. Their proof has laid the foundations for the discovery of the twin-width, a concept that has notably advanced parameterized complexity in recent years. Counting, in contrast, is harder: it has a conditional lower bound of $n^{Ω(k / \log k)}$ [Berendsohn, Kozma, and Marx 2019] and is expected to be polynomially harder than detection as early as $k = 4$, given its equivalence to counting $4$-cycles in graphs [Dudek and Gawrychowski, 2020].
In this work, we design a deterministic near-linear time $(1+\varepsilon)$-approximation algorithm for counting $σ$-copies in $f$ for all $k \leq 5$. Combined with the conditional lower bound for $k=4$, this establishes the first known separation between approximate and exact algorithms for pattern counting. Interestingly, our algorithm leverages the Birgé decomposition -- a sublinear tool for monotone distributions widely used in distribution testing -- which, to our knowledge, has not been applied in a pattern counting context before.
△ Less
Submitted 7 November, 2024;
originally announced November 2024.
-
Emulators for stellar profiles in binary population modeling
Authors:
Elizabeth Teng,
Ugur Demir,
Zoheyr Doctor,
Philipp M. Srivastava,
Shamal Lalvani,
Vicky Kalogera,
Aggelos Katsaggelos,
Jeff J. Andrews,
Simone S. Bavera,
Max M. Briel,
Seth Gossage,
Konstantinos Kovlakas,
Matthias U. Kruckow,
Kyle Akira Rocha,
Meng Sun,
Zepei Xing,
Emmanouil Zapartas
Abstract:
Knowledge about the internal physical structure of stars is crucial to understanding their evolution. The novel binary population synthesis code POSYDON includes a module for interpolating the stellar and binary properties of any system at the end of binary MESA evolution based on a pre-computed set of models. In this work, we present a new emulation method for predicting stellar profiles, i.e., t…
▽ More
Knowledge about the internal physical structure of stars is crucial to understanding their evolution. The novel binary population synthesis code POSYDON includes a module for interpolating the stellar and binary properties of any system at the end of binary MESA evolution based on a pre-computed set of models. In this work, we present a new emulation method for predicting stellar profiles, i.e., the internal stellar structure along the radial axis, using machine learning techniques. We use principal component analysis for dimensionality reduction and fully-connected feed-forward neural networks for making predictions. We find accuracy to be comparable to that of nearest neighbor approximation, with a strong advantage in terms of memory and storage efficiency. By providing a versatile framework for modeling stellar internal structure, the emulation method presented here will enable faster simulations of higher physical fidelity, offering a foundation for a wide range of large-scale population studies of stellar and binary evolution.
△ Less
Submitted 11 February, 2025; v1 submitted 14 October, 2024;
originally announced October 2024.
-
Language-Agnostic Analysis of Speech Depression Detection
Authors:
Sona Binu,
Jismi Jose,
Fathima Shimna K V,
Alino Luke Hans,
Reni K. Cherian,
Starlet Ben Alex,
Priyanka Srivastava,
Chiranjeevi Yarra
Abstract:
The people with Major Depressive Disorder (MDD) exhibit the symptoms of tonal variations in their speech compared to the healthy counterparts. However, these tonal variations not only confine to the state of MDD but also on the language, which has unique tonal patterns. This work analyzes automatic speech-based depression detection across two languages, English and Malayalam, which exhibits distin…
▽ More
The people with Major Depressive Disorder (MDD) exhibit the symptoms of tonal variations in their speech compared to the healthy counterparts. However, these tonal variations not only confine to the state of MDD but also on the language, which has unique tonal patterns. This work analyzes automatic speech-based depression detection across two languages, English and Malayalam, which exhibits distinctive prosodic and phonemic characteristics. We propose an approach that utilizes speech data collected along with self-reported labels from participants reading sentences from IViE corpus, in both English and Malayalam. The IViE corpus consists of five sets of sentences: simple sentences, WH-questions, questions without morphosyntactic markers, inversion questions and coordinations, that can naturally prompt speakers to speak in different tonal patterns. Convolutional Neural Networks (CNNs) are employed for detecting depression from speech. The CNN model is trained to identify acoustic features associated with depression in speech, focusing on both languages. The model's performance is evaluated on the collected dataset containing recordings from both depressed and non-depressed speakers, analyzing its effectiveness in detecting depression across the two languages. Our findings and collected data could contribute to the development of language-agnostic speech-based depression detection systems, thereby enhancing accessibility for diverse populations.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.
-
Resilience of the Electric Grid through Trustable IoT-Coordinated Assets (Extended version)
Authors:
Vineet J. Nair,
Venkatesh Venkataramanan,
Priyank Srivastava,
Partha S. Sarker,
Anurag Srivastava,
Laurentiu D. Marinovici,
Jun Zha,
Christopher Irwin,
Prateek Mittal,
John Williams,
Jayant Kumar,
H. Vincent Poor,
Anuradha M. Annaswamy
Abstract:
The electricity grid has evolved from a physical system to a cyber-physical system with digital devices that perform measurement, control, communication, computation, and actuation. The increased penetration of distributed energy resources (DERs) including renewable generation, flexible loads, and storage provides extraordinary opportunities for improvements in efficiency and sustainability. Howev…
▽ More
The electricity grid has evolved from a physical system to a cyber-physical system with digital devices that perform measurement, control, communication, computation, and actuation. The increased penetration of distributed energy resources (DERs) including renewable generation, flexible loads, and storage provides extraordinary opportunities for improvements in efficiency and sustainability. However, they can introduce new vulnerabilities in the form of cyberattacks, which can cause significant challenges in ensuring grid resilience. We propose a framework in this paper for achieving grid resilience through suitably coordinated assets including a network of Internet of Things (IoT) devices. A local electricity market is proposed to identify trustable assets and carry out this coordination. Situational Awareness (SA) of locally available DERs with the ability to inject power or reduce consumption is enabled by the market, together with a monitoring procedure for their trustability and commitment. With this SA, we show that a variety of cyberattacks can be mitigated using local trustable resources without stressing the bulk grid. Multiple demonstrations are carried out using a high-fidelity co-simulation platform, real-time hardware-in-the-loop validation, and a utility-friendly simulator.
△ Less
Submitted 30 January, 2025; v1 submitted 21 June, 2024;
originally announced June 2024.
-
FOX: Coverage-guided Fuzzing as Online Stochastic Control
Authors:
Dongdong She,
Adam Storek,
Yuchong Xie,
Seoyoung Kweon,
Prashast Srivastava,
Suman Jana
Abstract:
Fuzzing is an effective technique for discovering software vulnerabilities by generating random test inputs and executing them against the target program. However, fuzzing large and complex programs remains challenging due to difficulties in uncovering deeply hidden vulnerabilities. This paper addresses the limitations of existing coverage-guided fuzzers, focusing on the scheduler and mutator comp…
▽ More
Fuzzing is an effective technique for discovering software vulnerabilities by generating random test inputs and executing them against the target program. However, fuzzing large and complex programs remains challenging due to difficulties in uncovering deeply hidden vulnerabilities. This paper addresses the limitations of existing coverage-guided fuzzers, focusing on the scheduler and mutator components. Existing schedulers suffer from information sparsity and the inability to handle fine-grained feedback metrics. The mutators are agnostic of target program branches, leading to wasted computation and slower coverage exploration. To overcome these issues, we propose an end-to-end online stochastic control formulation for coverage-guided fuzzing. Our approach incorporates a novel scheduler and custom mutator that can adapt to branch logic, maximizing aggregate edge coverage achieved over multiple stages. The scheduler utilizes fine-grained branch distance measures to identify frontier branches, where new coverage is likely to be achieved. The mutator leverages branch distance information to perform efficient and targeted seed mutations, leading to robust progress with minimal overhead. We present FOX, a proof-of-concept implementation of our control-theoretic approach, and compare it to industry-standard coverage-guided fuzzers. 6 CPU-years of extensive evaluations on the FuzzBench dataset and complex real-world programs (a total of 38 test programs) demonstrate that FOX outperforms existing state-of-the-art fuzzers, achieving average coverage improvements up to 26.45% in real-world standalone programs and 6.59% in FuzzBench programs over the state-of-the-art AFL++. In addition, it uncovers 20 unique bugs in popular real-world applications including eight that are previously unknown, showcasing real-world security impact.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
A direct proof of a unified law of robustness for Bregman divergence losses
Authors:
Santanu Das,
Jatin Batra,
Piyush Srivastava
Abstract:
In contemporary deep learning practice, models are often trained to near zero loss i.e. to nearly interpolate the training data. However, the number of parameters in the model is usually far more than the number of data points n, the theoretical minimum needed for interpolation: a phenomenon referred to as overparameterization. In an interesting piece of work, Bubeck and Sellke considered a natura…
▽ More
In contemporary deep learning practice, models are often trained to near zero loss i.e. to nearly interpolate the training data. However, the number of parameters in the model is usually far more than the number of data points n, the theoretical minimum needed for interpolation: a phenomenon referred to as overparameterization. In an interesting piece of work, Bubeck and Sellke considered a natural notion of interpolation: the model is said to interpolate when the model's training loss goes below the loss of the conditional expectation of the response given the covariate. For this notion of interpolation and for a broad class of covariate distributions (specifically those satisfying a natural notion of concentration of measure), they showed that overparameterization is necessary for robust interpolation i.e. if the interpolating function is required to be Lipschitz. Their main proof technique applies to regression with square loss against a scalar response, but they remark that via a connection to Rademacher complexity and using tools such as the Ledoux-Talagrand contraction inequality, their result can be extended to more general losses, at least in the case of scalar response variables. In this work, we recast the original proof technique of Bubeck and Sellke in terms of a bias-variance type decomposition, and show that this view directly unlocks a generalization to Bregman divergence losses (even for vector-valued responses), without the use of tools such as Rademacher complexity or the Ledoux-Talagrand contraction principle. Bregman divergences are a natural class of losses since for these, the best estimator is the conditional expectation of the response given the covariate, and include other practical losses such as the cross entropy loss. Our work thus gives a more general understanding of the main proof technique of Bubeck and Sellke and demonstrates its broad utility.
△ Less
Submitted 21 April, 2025; v1 submitted 26 May, 2024;
originally announced May 2024.
-
A New Construction of Optimal Symmetrical ZCCS
Authors:
Rajen Kumar,
Prashant Kumar Srivastava,
Sudhan Majhi
Abstract:
We propose new constructions for a two-dimensional ($2$D) perfect array, complete complementary code (CCC), and multiple CCCs as an optimal symmetrical $Z$-complementary code set (ZCCS). We propose a method to generate a two-dimensional perfect array and CCC. By utilising mutually orthogonal sequences, we developed a method to extend the length of a CCC without affecting the set or code size. Addi…
▽ More
We propose new constructions for a two-dimensional ($2$D) perfect array, complete complementary code (CCC), and multiple CCCs as an optimal symmetrical $Z$-complementary code set (ZCCS). We propose a method to generate a two-dimensional perfect array and CCC. By utilising mutually orthogonal sequences, we developed a method to extend the length of a CCC without affecting the set or code size. Additionally, this concept is extended to include the development of multiple CCCs, and the correlation characteristics of these multiple CCCs are identical with the characteristics of optimal symmetrical ZCCS.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Multiple Spectrally Null Constrained Complete Complementary Codes of Various Lengths Over Small Alphabet
Authors:
Rajen Kumar,
Palash Sarkar,
Prashant Kumar Srivastava,
Sudhan Majhi
Abstract:
Complete complementary codes (CCCs) are highly valuable in the fields of information security, radar and communication. The spectrally null constrained (SNC) problem arises in radar and modern communication systems due to the reservation or prohibition of specific spectrums from transmission. The literature on SNC-CCCs is somewhat limited in comparison to the literature on traditional CCCs. The ma…
▽ More
Complete complementary codes (CCCs) are highly valuable in the fields of information security, radar and communication. The spectrally null constrained (SNC) problem arises in radar and modern communication systems due to the reservation or prohibition of specific spectrums from transmission. The literature on SNC-CCCs is somewhat limited in comparison to the literature on traditional CCCs. The main objective of this paper is to discover several configurations of SNC-CCCs that possess more flexibility in their parameters. The proposed construction utilised the existing CCCs and mutually orthogonal sequences. The proposed construction can cover almost all lengths with the smallest alphabets $\{-1,0,1\}$. Further, the idea of SNC-CCC is extended to multiple SNC-CCC with an inter-set zero cross-correlation zone (ZCCZ). Based on our construction, we can also control the correlation value outside the ZCCZ. The beauty of the obtained codes have aperiodic and periodic inter-set ZCCZ and low cross-correlation side-lobs.
△ Less
Submitted 11 October, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
Evaluating LLMs' Mathematical Reasoning in Financial Document Question Answering
Authors:
Pragya Srivastava,
Manuj Malik,
Vivek Gupta,
Tanuja Ganu,
Dan Roth
Abstract:
Large Language Models (LLMs), excel in natural language understanding, but their capability for complex mathematical reasoning with an amalgamation of structured tables and unstructured text is uncertain. This study explores LLMs' mathematical reasoning on four financial tabular question-answering datasets: TATQA, FinQA, ConvFinQA, and Multihiertt. Through extensive experiments with various models…
▽ More
Large Language Models (LLMs), excel in natural language understanding, but their capability for complex mathematical reasoning with an amalgamation of structured tables and unstructured text is uncertain. This study explores LLMs' mathematical reasoning on four financial tabular question-answering datasets: TATQA, FinQA, ConvFinQA, and Multihiertt. Through extensive experiments with various models and prompting techniques, we assess how LLMs adapt to complex tables and mathematical tasks. We focus on sensitivity to table complexity and performance variations with an increasing number of arithmetic reasoning steps. The results provide insights into LLMs' capabilities and limitations in handling complex mathematical scenarios for semi-structured tables. Ultimately, we introduce a novel prompting technique tailored to semi-structured documents, matching or outperforming other baselines in performance while providing a nuanced understanding of LLMs abilities for such a task.
△ Less
Submitted 29 February, 2024; v1 submitted 17 February, 2024;
originally announced February 2024.
-
NICE: To Optimize In-Context Examples or Not?
Authors:
Pragya Srivastava,
Satvik Golechha,
Amit Deshpande,
Amit Sharma
Abstract:
Recent work shows that in-context learning and optimization of in-context examples (ICE) can significantly improve the accuracy of large language models (LLMs) on a wide range of tasks, leading to an apparent consensus that ICE optimization is crucial for better performance. However, most of these studies assume a fixed or no instruction provided in the prompt. We challenge this consensus by inves…
▽ More
Recent work shows that in-context learning and optimization of in-context examples (ICE) can significantly improve the accuracy of large language models (LLMs) on a wide range of tasks, leading to an apparent consensus that ICE optimization is crucial for better performance. However, most of these studies assume a fixed or no instruction provided in the prompt. We challenge this consensus by investigating the necessity of optimizing ICE when task-specific instructions are provided and find that there are many tasks for which it yields diminishing returns. In particular, using a diverse set of tasks and a systematically created instruction set with gradually added details, we find that as the prompt instruction becomes more detailed, the returns on ICE optimization diminish. To characterize this behavior, we introduce a task-specific metric called Normalized Invariability to Choice of Examples (NICE) that quantifies the learnability of tasks from a given instruction, and provides a heuristic to help decide whether to optimize instructions or ICE for a new task. Given a task, the proposed metric can reliably predict the utility of optimizing ICE compared to using random ICE. Our code is available at https://github.com/microsoft/nice-icl.
△ Less
Submitted 6 June, 2024; v1 submitted 9 February, 2024;
originally announced February 2024.
-
Rhizomes and Diffusions for Processing Highly Skewed Graphs on Fine-Grain Message-Driven Systems
Authors:
Bibrak Qamar Chandio,
Prateek Srivastava,
Maciej Brodowicz,
Martin Swany,
Thomas Sterling
Abstract:
The paper provides a unified co-design of 1) a programming and execution model that allows spawning tasks from within the vertex data at runtime, 2) language constructs for \textit{actions} that send work to where the data resides, combining parallel expressiveness of local control objects (LCOs) to implement asynchronous graph processing primitives, 3) and an innovative vertex-centric data-struct…
▽ More
The paper provides a unified co-design of 1) a programming and execution model that allows spawning tasks from within the vertex data at runtime, 2) language constructs for \textit{actions} that send work to where the data resides, combining parallel expressiveness of local control objects (LCOs) to implement asynchronous graph processing primitives, 3) and an innovative vertex-centric data-structure, using the concept of Rhizomes, that parallelizes both the out and in-degree load of vertex objects across many cores and yet provides a single programming abstraction to the vertex objects. The data structure hierarchically parallelizes the out-degree load of vertices and the in-degree load laterally. The rhizomes internally communicate and remain consistent, using event-driven synchronization mechanisms, to provide a unified and correct view of the vertex.
Simulated experimental results show performance gains for BFS, SSSP, and Page Rank on large chip sizes for the tested input graph datasets containing highly skewed degree distribution. The improvements come from the ability to express and create fine-grain dynamic computing task in the form of \textit{actions}, language constructs that aid the compiler to generate code that the runtime system uses to optimally schedule tasks, and the data structure that shares both in and out-degree compute workload among memory-processing elements.
△ Less
Submitted 7 May, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
On Learning Spatial Provenance in Privacy-Constrained Wireless Networks
Authors:
Manish Bansal,
Pramsu Srivastava,
J. Harshan
Abstract:
In Vehicle-to-Everything networks that involve multi-hop communication, the Road Side Units (RSUs) typically aim to collect location information from the participating vehicles to provide security and network diagnostics features. While the vehicles commonly use the Global Positioning System (GPS) for navigation, they may refrain from sharing their precise GPS coordinates with the RSUs due to priv…
▽ More
In Vehicle-to-Everything networks that involve multi-hop communication, the Road Side Units (RSUs) typically aim to collect location information from the participating vehicles to provide security and network diagnostics features. While the vehicles commonly use the Global Positioning System (GPS) for navigation, they may refrain from sharing their precise GPS coordinates with the RSUs due to privacy concerns. Therefore, to jointly address the high localization requirements by the RSUs as well as the vehicles' privacy, we present a novel spatial-provenance framework wherein each vehicle uses Bloom filters to embed their partial location information when forwarding the packets. In this framework, the RSUs and the vehicles agree upon fragmenting the coverage area into several smaller regions so that the vehicles can embed the identity of their regions through Bloom filters. Given the probabilistic nature of Bloom filters, we derive an analytical expression on the error-rates in provenance recovery and then pose an optimization problem to choose the underlying parameters. With the help of extensive simulation results, we show that our method offers near-optimal Bloom filter parameters in learning spatial provenance. Some interesting trade-offs between the communication-overhead, spatial privacy of the vehicles and the error rates in provenance recovery are also discussed.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Precipitation Downscaling with Spatiotemporal Video Diffusion
Authors:
Prakhar Srivastava,
Ruihan Yang,
Gavin Kerrigan,
Gideon Dresdner,
Jeremy McGibbon,
Christopher Bretherton,
Stephan Mandt
Abstract:
In climate science and meteorology, high-resolution local precipitation (rain and snowfall) predictions are limited by the computational costs of simulation-based methods. Statistical downscaling, or super-resolution, is a common workaround where a low-resolution prediction is improved using statistical approaches. Unlike traditional computer vision tasks, weather and climate applications require…
▽ More
In climate science and meteorology, high-resolution local precipitation (rain and snowfall) predictions are limited by the computational costs of simulation-based methods. Statistical downscaling, or super-resolution, is a common workaround where a low-resolution prediction is improved using statistical approaches. Unlike traditional computer vision tasks, weather and climate applications require capturing the accurate conditional distribution of high-resolution given low-resolution patterns to assure reliable ensemble averages and unbiased estimates of extreme events, such as heavy rain. This work extends recent video diffusion models to precipitation super-resolution, employing a deterministic downscaler followed by a temporally-conditioned diffusion model to capture noise characteristics and high-frequency patterns. We test our approach on FV3GFS output, an established large-scale global atmosphere model, and compare it against six state-of-the-art baselines. Our analysis, capturing CRPS, MSE, precipitation distributions, and qualitative aspects using California and the Himalayas as examples, establishes our method as a new standard for data-driven precipitation downscaling.
△ Less
Submitted 20 June, 2024; v1 submitted 10 December, 2023;
originally announced December 2023.
-
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Authors:
Ruihang Lai,
Junru Shao,
Siyuan Feng,
Steven S. Lyubomirsky,
Bohan Hou,
Wuwei Lin,
Zihao Ye,
Hongyi Jin,
Yuchen Jin,
Jiawei Liu,
Lesheng Jin,
Yaxing Cai,
Ziheng Jiang,
Yong Wu,
Sunghyun Park,
Prakalp Srivastava,
Jared G. Roesch,
Todd C. Mowry,
Tianqi Chen
Abstract:
Dynamic shape computations have become critical in modern machine learning workloads, especially in emerging large language models. The success of these models has driven the demand for their universal deployment across a diverse set of backend environments. In this paper, we present Relax, a compiler abstraction for optimizing end-to-end dynamic machine learning workloads. Relax introduces a cros…
▽ More
Dynamic shape computations have become critical in modern machine learning workloads, especially in emerging large language models. The success of these models has driven the demand for their universal deployment across a diverse set of backend environments. In this paper, we present Relax, a compiler abstraction for optimizing end-to-end dynamic machine learning workloads. Relax introduces a cross-level abstraction that encapsulates computational graphs, loop-level tensor programs, and external library calls in a single representation. Relax also introduces first-class symbolic shape annotations to track dynamic shape computations globally across the program, enabling dynamic shape-aware cross-level optimizations. We build an end-to-end compilation framework using the proposed approach to optimize dynamic shape models. Experimental results on LLMs show that Relax delivers performance competitive with state-of-the-art systems across various GPUs and enables deployment of emerging models to a broader set of emerging environments, including mobile phones, embedded devices, and web browsers.
△ Less
Submitted 6 February, 2025; v1 submitted 1 November, 2023;
originally announced November 2023.
-
Enhancing ML model accuracy for Digital VLSI circuits using diffusion models: A study on synthetic data generation
Authors:
Prasha Srivastava,
Pawan Kumar,
Zia Abbas
Abstract:
Generative AI has seen remarkable growth over the past few years, with diffusion models being state-of-the-art for image generation. This study investigates the use of diffusion models in generating artificial data generation for electronic circuits for enhancing the accuracy of subsequent machine learning models in tasks such as performance assessment, design, and testing when training data is us…
▽ More
Generative AI has seen remarkable growth over the past few years, with diffusion models being state-of-the-art for image generation. This study investigates the use of diffusion models in generating artificial data generation for electronic circuits for enhancing the accuracy of subsequent machine learning models in tasks such as performance assessment, design, and testing when training data is usually known to be very limited. We utilize simulations in the HSPICE design environment with 22nm CMOS technology nodes to obtain representative real training data for our proposed diffusion model. Our results demonstrate the close resemblance of synthetic data using diffusion model to real data. We validate the quality of generated data, and demonstrate that data augmentation certainly effective in predictive analysis of VLSI design for digital circuits.
△ Less
Submitted 15 October, 2023;
originally announced October 2023.
-
Disentangling Societal Inequality from Model Biases: Gender Inequality in Divorce Court Proceedings
Authors:
Sujan Dutta,
Parth Srivastava,
Vaishnavi Solunke,
Swaprava Nath,
Ashiqur R. KhudaBukhsh
Abstract:
Divorce is the legal dissolution of a marriage by a court. Since this is usually an unpleasant outcome of a marital union, each party may have reasons to call the decision to quit which is generally documented in detail in the court proceedings. Via a substantial corpus of 17,306 court proceedings, this paper investigates gender inequality through the lens of divorce court proceedings. While emerg…
▽ More
Divorce is the legal dissolution of a marriage by a court. Since this is usually an unpleasant outcome of a marital union, each party may have reasons to call the decision to quit which is generally documented in detail in the court proceedings. Via a substantial corpus of 17,306 court proceedings, this paper investigates gender inequality through the lens of divorce court proceedings. While emerging data sources (e.g., public court records) on sensitive societal issues hold promise in aiding social science research, biases present in cutting-edge natural language processing (NLP) methods may interfere with or affect such studies. We thus require a thorough analysis of potential gaps and limitations present in extant NLP resources. In this paper, on the methodological side, we demonstrate that existing NLP resources required several non-trivial modifications to quantify societal inequalities. On the substantive side, we find that while a large number of court cases perhaps suggest changing norms in India where women are increasingly challenging patriarchy, AI-powered analyses of these court proceedings indicate striking gender inequality with women often subjected to domestic violence.
△ Less
Submitted 8 July, 2023;
originally announced July 2023.
-
Accelerated Algorithms for a Class of Optimization Problems with Equality and Box Constraints
Authors:
Anjali Parashar,
Priyank Srivastava,
Anuradha M. Annaswamy
Abstract:
Convex optimization with equality and inequality constraints is a ubiquitous problem in several optimization and control problems in large-scale systems. Recently there has been a lot of interest in establishing accelerated convergence of the loss function. A class of high-order tuners was recently proposed in an effort to lead to accelerated convergence for the case when no constraints are pres…
▽ More
Convex optimization with equality and inequality constraints is a ubiquitous problem in several optimization and control problems in large-scale systems. Recently there has been a lot of interest in establishing accelerated convergence of the loss function. A class of high-order tuners was recently proposed in an effort to lead to accelerated convergence for the case when no constraints are present. In this paper, we propose a new high-order tuner that can accommodate the presence of equality constraints. In order to accommodate the underlying box constraints, time-varying gains are introduced in the high-order tuner which leverage convexity and ensure anytime feasibility of the constraints. Numerical examples are provided to support the theoretical derivations.
△ Less
Submitted 7 May, 2023;
originally announced May 2023.
-
A Construction of Arbitrarily Large Type-II $Z$ Complementary Code Set
Authors:
Rajen Kumar,
Prashant Kumar Srivastava,
Sudhan Majhi
Abstract:
For a type-I $(K,M,Z,N)$-ZCCS, it follows $K \leq M \left\lfloor \frac{N}{Z}\right\rfloor$. In this paper, we propose a construction of type-II $(p^{k+n},p^k,p^{n+r}-p^r+1,p^{n+r})$-$Z$ complementary code set (ZCCS) using an extended Boolean function, its properties of Hamiltonian paths and the concept of isolated vertices, where $p\ge 2$. However, the proposed type-II ZCCS provides…
▽ More
For a type-I $(K,M,Z,N)$-ZCCS, it follows $K \leq M \left\lfloor \frac{N}{Z}\right\rfloor$. In this paper, we propose a construction of type-II $(p^{k+n},p^k,p^{n+r}-p^r+1,p^{n+r})$-$Z$ complementary code set (ZCCS) using an extended Boolean function, its properties of Hamiltonian paths and the concept of isolated vertices, where $p\ge 2$. However, the proposed type-II ZCCS provides $K = M(N-Z+1)$ codes, where as for type-I $(K,M,N,Z)$-ZCCS, it is $K \leq M \left\lfloor \frac{N}{Z}\right\rfloor$. Therefore, the proposed type-II ZCCS provides a larger number of codes compared to type-I ZCCS. Further, as a special case of the proposed construction, $(p^k,p^k,p^n)$-CCC can be generated, for any integral value of $p\ge2$ and $k\le n$.
△ Less
Submitted 14 May, 2024; v1 submitted 2 May, 2023;
originally announced May 2023.
-
Argument Mining using BERT and Self-Attention based Embeddings
Authors:
Pranjal Srivastava,
Pranav Bhatnagar,
Anurag Goel
Abstract:
Argument mining automatically identifies and extracts the structure of inference and reasoning conveyed in natural language arguments. To the best of our knowledge, most of the state-of-the-art works in this field have focused on using tree-like structures and linguistic modeling. But, these approaches are not able to model more complex structures which are often found in online forums and real wo…
▽ More
Argument mining automatically identifies and extracts the structure of inference and reasoning conveyed in natural language arguments. To the best of our knowledge, most of the state-of-the-art works in this field have focused on using tree-like structures and linguistic modeling. But, these approaches are not able to model more complex structures which are often found in online forums and real world argumentation structures. In this paper, a novel methodology for argument mining is proposed which employs attention-based embeddings for link prediction to model the causational hierarchies in typical argument structures prevalent in online discourse.
△ Less
Submitted 27 February, 2023;
originally announced February 2023.
-
Qualitative Data Augmentation for Performance Prediction in VLSI circuits
Authors:
Prasha Srivastava,
Pawan Kumar,
Zia Abbas
Abstract:
Various studies have shown the advantages of using Machine Learning (ML) techniques for analog and digital IC design automation and optimization. Data scarcity is still an issue for electronic designs, while training highly accurate ML models. This work proposes generating and evaluating artificial data using generative adversarial networks (GANs) for circuit data to aid and improve the accuracy o…
▽ More
Various studies have shown the advantages of using Machine Learning (ML) techniques for analog and digital IC design automation and optimization. Data scarcity is still an issue for electronic designs, while training highly accurate ML models. This work proposes generating and evaluating artificial data using generative adversarial networks (GANs) for circuit data to aid and improve the accuracy of ML models trained with a small training data set. The training data is obtained by various simulations in the Cadence Virtuoso, HSPICE, and Microcap design environment with TSMC 180nm and 22nm CMOS technology nodes. The artificial data is generated and tested for an appropriate set of analog and digital circuits. The experimental results show that the proposed artificial data generation significantly improves ML models and reduces the percentage error by more than 50\% of the original percentage error, which were previously trained with insufficient data. Furthermore, this research aims to contribute to the extensive application of AI/ML in the field of VLSI design and technology by relieving the training data availability-related challenges.
△ Less
Submitted 15 February, 2023;
originally announced February 2023.
-
Baechi: Fast Device Placement of Machine Learning Graphs
Authors:
Beomyeol Jeon,
Linda Cai,
Chirag Shetty,
Pallavi Srivastava,
Jintao Jiang,
Xiaolan Ke,
Yitao Meng,
Cong Xie,
Indranil Gupta
Abstract:
Machine Learning graphs (or models) can be challenging or impossible to train when either devices have limited memory, or models are large. To split the model across devices, learning-based approaches are still popular. While these result in model placements that train fast on data (i.e., low step times), learning-based model-parallelism is time-consuming, taking many hours or days to create a pla…
▽ More
Machine Learning graphs (or models) can be challenging or impossible to train when either devices have limited memory, or models are large. To split the model across devices, learning-based approaches are still popular. While these result in model placements that train fast on data (i.e., low step times), learning-based model-parallelism is time-consuming, taking many hours or days to create a placement plan of operators on devices. We present the Baechi system, the first to adopt an algorithmic approach to the placement problem for running machine learning training graphs on small clusters of memory-constrained devices. We integrate our implementation of Baechi into two popular open-source learning frameworks: TensorFlow and PyTorch. Our experimental results using GPUs show that: (i) Baechi generates placement plans 654 X - 206K X faster than state-of-the-art learning-based approaches, and (ii) Baechi-placed model's step (training) time is comparable to expert placements in PyTorch, and only up to 6.2% worse than expert placements in TensorFlow. We prove mathematically that our two algorithms are within a constant factor of the optimal. Our work shows that compared to learning-based approaches, algorithmic approaches can face different challenges for adaptation to Machine learning systems, but also they offer proven bounds, and significant performance benefits.
△ Less
Submitted 20 January, 2023;
originally announced January 2023.
-
How to (virtually) train your speaker localizer
Authors:
Prerak Srivastava,
Antoine Deleforge,
Archontis Politis,
Emmanuel Vincent
Abstract:
Learning-based methods have become ubiquitous in speaker localization. Existing systems rely on simulated training sets for the lack of sufficiently large, diverse and annotated real datasets. Most room acoustics simulators used for this purpose rely on the image source method (ISM) because of its computational efficiency. This paper argues that carefully extending the ISM to incorporate more real…
▽ More
Learning-based methods have become ubiquitous in speaker localization. Existing systems rely on simulated training sets for the lack of sufficiently large, diverse and annotated real datasets. Most room acoustics simulators used for this purpose rely on the image source method (ISM) because of its computational efficiency. This paper argues that carefully extending the ISM to incorporate more realistic surface, source and microphone responses into training sets can significantly boost the real-world performance of speaker localization systems. It is shown that increasing the training-set realism of a state-of-the-art direction-of-arrival estimator yields consistent improvements across three different real test sets featuring human speakers in a variety of rooms and various microphone arrays. An ablation study further reveals that every added layer of realism contributes positively to these improvements.
△ Less
Submitted 25 May, 2023; v1 submitted 30 November, 2022;
originally announced November 2022.
-
Sampling from convex sets with a cold start using multiscale decompositions
Authors:
Hariharan Narayanan,
Amit Rajaraman,
Piyush Srivastava
Abstract:
Running a random walk in a convex body $K\subseteq\mathbb{R}^n$ is a standard approach to sample approximately uniformly from the body. The requirement is that from a suitable initial distribution, the distribution of the walk comes close to the uniform distribution $π_K$ on $K$ after a number of steps polynomial in $n$ and the aspect ratio $R/r$ (i.e., when $rB_2 \subseteq K \subseteq RB_{2}$).…
▽ More
Running a random walk in a convex body $K\subseteq\mathbb{R}^n$ is a standard approach to sample approximately uniformly from the body. The requirement is that from a suitable initial distribution, the distribution of the walk comes close to the uniform distribution $π_K$ on $K$ after a number of steps polynomial in $n$ and the aspect ratio $R/r$ (i.e., when $rB_2 \subseteq K \subseteq RB_{2}$).
Proofs of rapid mixing of such walks often require the probability density $η_0$ of the initial distribution with respect to $π_K$ to be at most $\mathrm{poly}(n)$: this is called a "warm start". Achieving a warm start often requires non-trivial pre-processing before starting the random walk. This motivates proving rapid mixing from a "cold start", wherein $η_0$ can be as high as $\exp(\mathrm{poly}(n))$. Unlike warm starts, a cold start is usually trivial to achieve. However, a random walk need not mix rapidly from a cold start: an example being the well-known "ball walk". On the other hand, Lovász and Vempala proved that the "hit-and-run" random walk mixes rapidly from a cold start. For the related coordinate hit-and-run (CHR) walk, which has been found to be promising in computational experiments, rapid mixing from a warm start was proved only recently but the question of rapid mixing from a cold start remained open.
We construct a family of random walks inspired by classical decompositions of subsets of $\mathbb{R}^n$ into countably many axis-aligned dyadic cubes. We show that even with a cold start, the mixing times of these walks are bounded by a polynomial in $n$ and the aspect ratio. Our main technical ingredient is an isoperimetric inequality for $K$ for a metric that magnifies distances between points close to the boundary of $K$. As a corollary, we show that the CHR walk also mixes rapidly both from a cold start and from a point not too close to the boundary of $K$.
△ Less
Submitted 22 November, 2024; v1 submitted 8 November, 2022;
originally announced November 2022.
-
Towards Zero-Shot and Few-Shot Table Question Answering using GPT-3
Authors:
Pragya Srivastava,
Tanuja Ganu,
Saikat Guha
Abstract:
We present very early results on using GPT-3 to perform question answering on tabular data. We find that stock pre-trained GPT-3 is able to zero-shot learn the table structure from a serialized JSON array-of-arrays representation, and able to answer lookup queries and simple comparison questions in natural language without any fine-tuning. We further find that simple prompt engineering to include…
▽ More
We present very early results on using GPT-3 to perform question answering on tabular data. We find that stock pre-trained GPT-3 is able to zero-shot learn the table structure from a serialized JSON array-of-arrays representation, and able to answer lookup queries and simple comparison questions in natural language without any fine-tuning. We further find that simple prompt engineering to include few-shot static Q&A examples significantly improves accuracy. Lastly, we find that intermixing passage text improves accuracy even further on heterogeneous data. We apply our approach on a novel dataset of simple tables in newspaper infographics with promising results. Overall, we find much cause for optimism in this basic approach.
△ Less
Submitted 31 October, 2022;
originally announced October 2022.
-
Realistic sources, receivers and walls improve the generalisability of virtually-supervised blind acoustic parameter estimators
Authors:
Prerak Srivastava,
Antoine Deleforge,
Emmanuel Vincent
Abstract:
Blind acoustic parameter estimation consists in inferring the acoustic properties of an environment from recordings of unknown sound sources. Recent works in this area have utilized deep neural networks trained either partially or exclusively on simulated data, due to the limited availability of real annotated measurements. In this paper, we study whether a model purely trained using a fast image-…
▽ More
Blind acoustic parameter estimation consists in inferring the acoustic properties of an environment from recordings of unknown sound sources. Recent works in this area have utilized deep neural networks trained either partially or exclusively on simulated data, due to the limited availability of real annotated measurements. In this paper, we study whether a model purely trained using a fast image-source room impulse response simulator can generalize to real data. We present an ablation study on carefully crafted simulated training sets that account for different levels of realism in source, receiver and wall responses. The extent of realism is controlled by the sampling of wall absorption coefficients and by applying measured directivity patterns to microphones and sources. A state-of-the-art model trained on these datasets is evaluated on the task of jointly estimating the room's volume, total surface area, and octave-band reverberation times from multiple, multichannel speech recordings. Results reveal that every added layer of simulation realism at train time significantly improves the estimation of all quantities on real signals.
△ Less
Submitted 19 July, 2022;
originally announced July 2022.
-
A Construction of Type-II ZCCS for the MC-CDMA System with Low PMEPR
Authors:
Rajen Kumar,
Sushant Kumar Jha,
Prashant Kumar Srivastava,
Sudhan Majhi
Abstract:
In this letter, we propose a novel construction of type-II $Z$-complementary code set (ZCCS) having arbitrary sequence length using the Kronecker product between a complete complementary code (CCC) and mutually orthogonal uni-modular sequences. In this construction, Barker sequences are used to reduce row sequence peak-to-mean envelope power ratio (PMEPR) for some specific lengths sequence and col…
▽ More
In this letter, we propose a novel construction of type-II $Z$-complementary code set (ZCCS) having arbitrary sequence length using the Kronecker product between a complete complementary code (CCC) and mutually orthogonal uni-modular sequences. In this construction, Barker sequences are used to reduce row sequence peak-to-mean envelope power ratio (PMEPR) for some specific lengths sequence and column sequence PMEPR for some specific sizes of codes. The column sequence PMEPR of the proposed type-II ZCCS is upper bounded by a number smaller than $2$. The proposed construction also contributes new lengths of type-II $Z$-complementary pair (ZCP) and type-II $Z$-complementary set (ZCS). Furthermore, the PMEPR of these new type-II ZCPs is also lower than existing type-II ZCPs.
△ Less
Submitted 22 August, 2023; v1 submitted 6 July, 2022;
originally announced July 2022.
-
ConvGeN: Convex space learning improves deep-generative oversampling for tabular imbalanced classification on smaller datasets
Authors:
Kristian Schultz,
Saptarshi Bej,
Waldemar Hahn,
Markus Wolfien,
Prashant Srivastava,
Olaf Wolkenhauer
Abstract:
Data is commonly stored in tabular format. Several fields of research are prone to small imbalanced tabular data. Supervised Machine Learning on such data is often difficult due to class imbalance. Synthetic data generation, i.e., oversampling, is a common remedy used to improve classifier performance. State-of-the-art linear interpolation approaches, such as LoRAS and ProWRAS can be used to gener…
▽ More
Data is commonly stored in tabular format. Several fields of research are prone to small imbalanced tabular data. Supervised Machine Learning on such data is often difficult due to class imbalance. Synthetic data generation, i.e., oversampling, is a common remedy used to improve classifier performance. State-of-the-art linear interpolation approaches, such as LoRAS and ProWRAS can be used to generate synthetic samples from the convex space of the minority class to improve classifier performance in such cases. Deep generative networks are common deep learning approaches for synthetic sample generation, widely used for synthetic image generation. However, their scope on synthetic tabular data generation in the context of imbalanced classification is not adequately explored. In this article, we show that existing deep generative models perform poorly compared to linear interpolation based approaches for imbalanced classification problems on smaller tabular datasets. To overcome this, we propose a deep generative model, ConvGeN that combines the idea of convex space learning with deep generative models. ConvGeN learns the coefficients for the convex combinations of the minority class samples, such that the synthetic data is distinct enough from the majority class. Our benchmarking experiments demonstrate that our proposed model ConvGeN improves imbalanced classification on such small datasets, as compared to existing deep generative models, while being at-par with the existing linear interpolation approaches. Moreover, we discuss how our model can be used for synthetic tabular data generation in general, even outside the scope of data imbalance and thus, improves the overall applicability of convex space learning.
△ Less
Submitted 13 July, 2022; v1 submitted 20 June, 2022;
originally announced June 2022.
-
On complex roots of the independence polynomial
Authors:
Ferenc Bencs,
Péter Csikvári,
Piyush Srivastava,
Jan Vondrák
Abstract:
It is known from the work of Shearer (1985) (and also Scott and Sokal (2005)) that the independence polynomial $Z_G(λ)$ of a graph $G$ of maximum degree at most $d+1$ does not vanish provided that $\vertλ\vert \leq \frac{d^d}{(d+1)^{d+1}}$. Significant extensions of this result have recently been given in the case $\Re λ\geq 0$ by Peters and Regts (2019) and Bencs and Csikvári (arxiv:1807.08963).…
▽ More
It is known from the work of Shearer (1985) (and also Scott and Sokal (2005)) that the independence polynomial $Z_G(λ)$ of a graph $G$ of maximum degree at most $d+1$ does not vanish provided that $\vertλ\vert \leq \frac{d^d}{(d+1)^{d+1}}$. Significant extensions of this result have recently been given in the case $\Re λ\geq 0$ by Peters and Regts (2019) and Bencs and Csikvári (arxiv:1807.08963). In this paper, our motivation is to further extend these results and find zero free regions when $\Re λ\leq 0$.
We begin by giving new geometric criteria for establishing zero-free regions as well as for carrying out semi-rigorous numerical explorations. We then provide two examples of the (rigorous) use of these criteria, by establishing two new zero-free regions in the left-half plane. We also improve upon the results of Bencs and Csikvári (arxiv:1807.08963) for the right half-plane using our framework. By a direct application of the interpolation method of Barvinok, combined with extensions due to Patel and Regts, these results also imply deterministic polynomial time approximation algorithms for the independence polynomial of bounded degree graphs in the new zero-free regions.
△ Less
Submitted 13 November, 2022; v1 submitted 11 April, 2022;
originally announced April 2022.
-
Diffusion Probabilistic Modeling for Video Generation
Authors:
Ruihan Yang,
Prakhar Srivastava,
Stephan Mandt
Abstract:
Denoising diffusion probabilistic models are a promising new class of generative models that mark a milestone in high-quality image generation. This paper showcases their ability to sequentially generate video, surpassing prior methods in perceptual and probabilistic forecasting metrics. We propose an autoregressive, end-to-end optimized video diffusion model inspired by recent advances in neural…
▽ More
Denoising diffusion probabilistic models are a promising new class of generative models that mark a milestone in high-quality image generation. This paper showcases their ability to sequentially generate video, surpassing prior methods in perceptual and probabilistic forecasting metrics. We propose an autoregressive, end-to-end optimized video diffusion model inspired by recent advances in neural video compression. The model successively generates future frames by correcting a deterministic next-frame prediction using a stochastic residual generated by an inverse diffusion process. We compare this approach against five baselines on four datasets involving natural and simulation-based videos. We find significant improvements in terms of perceptual quality for all datasets. Furthermore, by introducing a scalable version of the Continuous Ranked Probability Score (CRPS) applicable to video, we show that our model also outperforms existing approaches in their probabilistic frame forecasting ability.
△ Less
Submitted 7 December, 2022; v1 submitted 15 March, 2022;
originally announced March 2022.
-
Uniting Control and Data Parallelism: Towards Scalable Memory-Driven Dynamic Graph Processing
Authors:
Bibrak Qamar Chandio,
Thomas Sterling,
Prateek Srivastava
Abstract:
Control parallelism and data parallelism is mostly reasoned and optimized as separate functions. Because of this, workloads that are irregular, fine-grain and dynamic such as dynamic graph processing become very hard to scale. An experimental research approach to computer architecture that synthesizes prior techniques of parallel computing along with new innovations is proposed in this paper. We e…
▽ More
Control parallelism and data parallelism is mostly reasoned and optimized as separate functions. Because of this, workloads that are irregular, fine-grain and dynamic such as dynamic graph processing become very hard to scale. An experimental research approach to computer architecture that synthesizes prior techniques of parallel computing along with new innovations is proposed in this paper. We establish the background and motivation of the research undertaking and provide a detailed description of the proposed omputing system that is highly parallel non-von Neumann, memory-centric and memory-driven. We also present a message-driven (or even-driven) programming model called "diffusive computation" and provide insights into its properties using SSSP and Triangle Counting problems as examples.
△ Less
Submitted 7 March, 2023; v1 submitted 18 February, 2022;
originally announced February 2022.
-
Causal effect of racial bias in data and machine learning algorithms on user persuasiveness & discriminatory decision making: An Empirical Study
Authors:
Kinshuk Sengupta,
Praveen Ranjan Srivastava
Abstract:
Language data and models demonstrate various types of bias, be it ethnic, religious, gender, or socioeconomic. AI/NLP models, when trained on the racially biased dataset, AI/NLP models instigate poor model explainability, influence user experience during decision making and thus further magnifies societal biases, raising profound ethical implications for society. The motivation of the study is to…
▽ More
Language data and models demonstrate various types of bias, be it ethnic, religious, gender, or socioeconomic. AI/NLP models, when trained on the racially biased dataset, AI/NLP models instigate poor model explainability, influence user experience during decision making and thus further magnifies societal biases, raising profound ethical implications for society. The motivation of the study is to investigate how AI systems imbibe bias from data and produce unexplainable discriminatory outcomes and influence an individual's articulateness of system outcome due to the presence of racial bias features in datasets. The design of the experiment involves studying the counterfactual impact of racial bias features present in language datasets and its associated effect on the model outcome. A mixed research methodology is adopted to investigate the cross implication of biased model outcome on user experience, effect on decision-making through controlled lab experimentation. The findings provide foundation support for correlating the implication of carry-over an artificial intelligence model solving NLP task due to biased concept presented in the dataset. Further, the research outcomes justify the negative influence on users' persuasiveness that leads to alter the decision-making quotient of an individual when trying to rely on the model outcome to act. The paper bridges the gap across the harm caused in establishing poor customer trustworthiness due to an inequitable system design and provides strong support for researchers, policymakers, and data scientists to build responsible AI frameworks within organizations.
△ Less
Submitted 25 November, 2022; v1 submitted 22 January, 2022;
originally announced February 2022.
-
Citation inequity and gendered citation practices in contemporary physics
Authors:
Erin G. Teich,
Jason Z. Kim,
Christopher W. Lynn,
Samantha C. Simon,
Andrei A. Klishin,
Karol P. Szymula,
Pragya Srivastava,
Lee C. Bassett,
Perry Zurn,
Jordan D. Dworkin,
Dani S. Bassett
Abstract:
The historical and contemporary under-attribution of women's contributions to scientific scholarship is well-known and well-studied, with effects that are felt today in myriad ways by women scientists. One measure of this under-attribution is the so-called citation gap between men and women: the under-citation of papers authored by women relative to expected rates coupled with a corresponding over…
▽ More
The historical and contemporary under-attribution of women's contributions to scientific scholarship is well-known and well-studied, with effects that are felt today in myriad ways by women scientists. One measure of this under-attribution is the so-called citation gap between men and women: the under-citation of papers authored by women relative to expected rates coupled with a corresponding over-citation of papers authored by men relative to expected rates. We explore the citation gap in contemporary physics, analyzing over one million articles published over the last 25 years in 35 physics journals that span a wide range of subfields. Using a model that predicts papers' expected citation rates according to a set of characteristics separate from author gender, we find a global bias wherein papers authored by women are significantly under-cited, and papers authored by men are significantly over-cited. Moreover, we find that citation behavior varies along several dimensions, such that imbalances differ according to who is citing, where they are citing, and how they are citing. Specifically, citation imbalance in favor of man-authored papers is highest for papers authored by men, papers published in general physics journals, and papers likely to be less familiar to citing authors. Our results suggest that, although deciding which papers to cite is an individual choice, the cumulative effects of these choices needlessly harm a subset of scholars. We discuss several strategies for the mitigation of these effects, including conscious behavioral changes at the individual, journal, and community levels.
△ Less
Submitted 16 December, 2021;
originally announced December 2021.
-
HRNET: AI on Edge for mask detection and social distancing
Authors:
Kinshuk Sengupta,
Praveen Ranjan Srivastava
Abstract:
The purpose of the paper is to provide innovative emerging technology framework for community to combat epidemic situations. The paper proposes a unique outbreak response system framework based on artificial intelligence and edge computing for citizen centric services to help track and trace people eluding safety policies like mask detection and social distancing measure in public or workplace set…
▽ More
The purpose of the paper is to provide innovative emerging technology framework for community to combat epidemic situations. The paper proposes a unique outbreak response system framework based on artificial intelligence and edge computing for citizen centric services to help track and trace people eluding safety policies like mask detection and social distancing measure in public or workplace setup. The framework further provides implementation guideline in industrial setup as well for governance and contact tracing tasks. The adoption will thus lead in smart city planning and development focusing on citizen health systems contributing to improved quality of life. The conceptual framework presented is validated through quantitative data analysis via secondary data collection from researcher's public websites, GitHub repositories and renowned journals and further benchmarking were conducted for experimental results in Microsoft Azure cloud environment. The study includes selective AI-models for benchmark analysis and were assessed on performance and accuracy in edge computing environment for large scale societal setup. Overall YOLO model Outperforms in object detection task and is faster enough for mask detection and HRNetV2 outperform semantic segmentation problem applied to solve social distancing task in AI-Edge inferencing environmental setup. The paper proposes new Edge-AI algorithm for building technology-oriented solutions for detecting mask in human movement and social distance. The paper enriches the technological advancement in artificial intelligence and edge-computing applied to problems in society and healthcare systems. The framework further equips government agency, system providers to design and constructs technology-oriented models in community setup to Increase the quality of life using emerging technologies into smart urban environments.
△ Less
Submitted 3 February, 2022; v1 submitted 30 November, 2021;
originally announced November 2021.
-
Universal Lower Bound for Learning Causal DAGs with Atomic Interventions
Authors:
Vibhor Porwal,
Piyush Srivastava,
Gaurav Sinha
Abstract:
A well-studied challenge that arises in the structure learning problem of causal directed acyclic graphs (DAG) is that using observational data, one can only learn the graph up to a "Markov equivalence class" (MEC). The remaining undirected edges have to be oriented using interventions, which can be very expensive to perform in applications. Thus, the problem of minimizing the number of interventi…
▽ More
A well-studied challenge that arises in the structure learning problem of causal directed acyclic graphs (DAG) is that using observational data, one can only learn the graph up to a "Markov equivalence class" (MEC). The remaining undirected edges have to be oriented using interventions, which can be very expensive to perform in applications. Thus, the problem of minimizing the number of interventions needed to fully orient the MEC has received a lot of recent attention, and is also the focus of this work. Our first result is a new universal lower bound on the number of single-node interventions that any algorithm (whether active or passive) would need to perform in order to orient a given MEC. Our second result shows that this bound is, in fact, within a factor of two of the size of the smallest set of single-node interventions that can orient the MEC. Our lower bound is provably better than previously known lower bounds. Further, using simulations on synthetic graphs and by giving examples of special graph families, we show that our bound is often significantly better. To prove our lower bound, we develop the notion of clique-block shared-parents (CBSP) orderings, which are topological orderings of DAGs without v-structures and satisfy certain special properties. We also use the techniques developed here to extend our results to the setting of multi-node interventions.
△ Less
Submitted 19 May, 2022; v1 submitted 9 November, 2021;
originally announced November 2021.
-
Addressing practical challenges in Active Learning via a hybrid query strategy
Authors:
Deepesh Agarwal,
Pravesh Srivastava,
Sergio Martin-del-Campo,
Balasubramaniam Natarajan,
Babji Srinivasan
Abstract:
Active Learning (AL) is a powerful tool to address modern machine learning problems with significantly fewer labeled training instances. However, implementation of traditional AL methodologies in practical scenarios is accompanied by multiple challenges due to the inherent assumptions. There are several hindrances, such as unavailability of labels for the AL algorithm at the beginning; unreliable…
▽ More
Active Learning (AL) is a powerful tool to address modern machine learning problems with significantly fewer labeled training instances. However, implementation of traditional AL methodologies in practical scenarios is accompanied by multiple challenges due to the inherent assumptions. There are several hindrances, such as unavailability of labels for the AL algorithm at the beginning; unreliable external source of labels during the querying process; or incompatible mechanisms to evaluate the performance of Active Learner. Inspired by these practical challenges, we present a hybrid query strategy-based AL framework that addresses three practical challenges simultaneously: cold-start, oracle uncertainty and performance evaluation of Active Learner in the absence of ground truth. While a pre-clustering approach is employed to address the cold-start problem, the uncertainty surrounding the expertise of labeler and confidence in the given labels is incorporated to handle oracle uncertainty. The heuristics obtained during the querying process serve as the fundamental premise for accessing the performance of Active Learner. The robustness of the proposed AL framework is evaluated across three different environments and industrial settings. The results demonstrate the capability of the proposed framework to tackle practical challenges during AL implementation in real-world scenarios.
△ Less
Submitted 7 October, 2021;
originally announced October 2021.
-
Blind Room Parameter Estimation Using Multiple-Multichannel Speech Recordings
Authors:
Prerak Srivastava,
Antoine Deleforge,
Emmanuel Vincent
Abstract:
Knowing the geometrical and acoustical parameters of a room may benefit applications such as audio augmented reality, speech dereverberation or audio forensics. In this paper, we study the problem of jointly estimating the total surface area, the volume, as well as the frequency-dependent reverberation time and mean surface absorption of a room in a blind fashion, based on two-channel noisy speech…
▽ More
Knowing the geometrical and acoustical parameters of a room may benefit applications such as audio augmented reality, speech dereverberation or audio forensics. In this paper, we study the problem of jointly estimating the total surface area, the volume, as well as the frequency-dependent reverberation time and mean surface absorption of a room in a blind fashion, based on two-channel noisy speech recordings from multiple, unknown source-receiver positions. A novel convolutional neural network architecture leveraging both single- and inter-channel cues is proposed and trained on a large, realistic simulated dataset. Results on both simulated and real data show that using multiple observations in one room significantly reduces estimation errors and variances on all target quantities, and that using two channels helps the estimation of surface and volume. The proposed model outperforms a recently proposed blind volume estimation method on the considered datasets.
△ Less
Submitted 29 July, 2021;
originally announced July 2021.
-
SALADnet: Self-Attentive multisource Localization in the Ambisonics Domain
Authors:
Pierre-Amaury Grumiaux,
Srdan Kitic,
Prerak Srivastava,
Laurent Girin,
Alexandre Guérin
Abstract:
In this work, we propose a novel self-attention based neural network for robust multi-speaker localization from Ambisonics recordings. Starting from a state-of-the-art convolutional recurrent neural network, we investigate the benefit of replacing the recurrent layers by self-attention encoders, inherited from the Transformer architecture. We evaluate these models on synthetic and real-world data,…
▽ More
In this work, we propose a novel self-attention based neural network for robust multi-speaker localization from Ambisonics recordings. Starting from a state-of-the-art convolutional recurrent neural network, we investigate the benefit of replacing the recurrent layers by self-attention encoders, inherited from the Transformer architecture. We evaluate these models on synthetic and real-world data, with up to 3 simultaneous speakers. The obtained results indicate that the majority of the proposed architectures either perform on par, or outperform the CRNN baseline, especially in the multisource scenario. Moreover, by avoiding the recurrent layers, the proposed models lend themselves to parallel computing, which is shown to produce considerable savings in execution time.
△ Less
Submitted 23 July, 2021;
originally announced July 2021.