Search | arXiv e-print repository

Active Learning via Regression Beyond Realizability

Authors: Atul Ganju, Shashaank Aiyer, Ved Sriraman, Karthik Sridharan

Abstract: We present a new active learning framework for multiclass classification based on surrogate risk minimization that operates beyond the standard realizability assumption. Existing surrogate-based active learning algorithms crucially rely on realizability$\unicode{x2014}$the assumption that the optimal surrogate predictor lies within the model class$\unicode{x2014}$limiting their applicability in pr… ▽ More We present a new active learning framework for multiclass classification based on surrogate risk minimization that operates beyond the standard realizability assumption. Existing surrogate-based active learning algorithms crucially rely on realizability$\unicode{x2014}$the assumption that the optimal surrogate predictor lies within the model class$\unicode{x2014}$limiting their applicability in practical, misspecified settings. In this work we show that under conditions significantly weaker than realizability, as long as the class of models considered is convex, one can still obtain a label and sample complexity comparable to prior work. Despite achieving similar rates, the algorithmic approaches from prior works can be shown to fail in non-realizable settings where our assumption is satisfied. Our epoch-based active learning algorithm departs from prior methods by fitting a model from the full class to the queried data in each epoch and returning an improper classifier obtained by aggregating these models. △ Less

Submitted 30 May, 2025; originally announced June 2025.

arXiv:2505.23546 [pdf, other]

Going from a Representative Agent to Counterfactuals in Combinatorial Choice

Authors: Yanqiu Ruan, Karthyek Murthy, Karthik Natarajan

Abstract: We study decision-making problems where data comprises points from a collection of binary polytopes, capturing aggregate information stemming from various combinatorial selection environments. We propose a nonparametric approach for counterfactual inference in this setting based on a representative agent model, where the available data is viewed as arising from maximizing separable concave utility… ▽ More We study decision-making problems where data comprises points from a collection of binary polytopes, capturing aggregate information stemming from various combinatorial selection environments. We propose a nonparametric approach for counterfactual inference in this setting based on a representative agent model, where the available data is viewed as arising from maximizing separable concave utility functions over the respective binary polytopes. Our first contribution is to precisely characterize the selection probabilities representable under this model and show that verifying the consistency of any given aggregated selection dataset reduces to solving a polynomial-sized linear program. Building on this characterization, we develop a nonparametric method for counterfactual prediction. When data is inconsistent with the model, finding a best-fitting approximation for prediction reduces to solving a compact mixed-integer convex program. Numerical experiments based on synthetic data demonstrate the method's flexibility, predictive accuracy, and strong representational power even under model misspecification. △ Less

Submitted 29 May, 2025; originally announced May 2025.

Comments: 22 pages, 3 figures

arXiv:2504.20172 [pdf, other]

Causal Identification in Time Series Models

Authors: Erik Jahn, Karthik Karnik, Leonard J. Schulman

Abstract: In this paper, we analyze the applicability of the Causal Identification algorithm to causal time series graphs with latent confounders. Since these graphs extend over infinitely many time steps, deciding whether causal effects across arbitrary time intervals are identifiable appears to require computation on graph segments of unbounded size. Even for deciding the identifiability of intervention e… ▽ More In this paper, we analyze the applicability of the Causal Identification algorithm to causal time series graphs with latent confounders. Since these graphs extend over infinitely many time steps, deciding whether causal effects across arbitrary time intervals are identifiable appears to require computation on graph segments of unbounded size. Even for deciding the identifiability of intervention effects on variables that are close in time, no bound is known on how many time steps in the past need to be considered. We give a first bound of this kind that only depends on the number of variables per time step and the maximum time lag of any direct or latent causal effect. More generally, we show that applying the Causal Identification algorithm to a constant-size segment of the time series graph is sufficient to decide identifiability of causal effects, even across unbounded time intervals. △ Less

Submitted 28 April, 2025; originally announced April 2025.

arXiv:2504.03190 [pdf, other]

The Ground Cost for Optimal Transport of Angular Velocity

Authors: Karthik Elamvazhuthi, Abhishek Halder

Abstract: We revisit the optimal transport problem over angular velocity dynamics given by the controlled Euler equation. The solution of this problem enables stochastic guidance of spin states of a rigid body (e.g., spacecraft) over hard deadline constraint by transferring a given initial state statistics to a desired terminal state statistics. This is an instance of generalized optimal transport over a no… ▽ More We revisit the optimal transport problem over angular velocity dynamics given by the controlled Euler equation. The solution of this problem enables stochastic guidance of spin states of a rigid body (e.g., spacecraft) over hard deadline constraint by transferring a given initial state statistics to a desired terminal state statistics. This is an instance of generalized optimal transport over a nonlinear dynamical system. While prior work has reported existence-uniqueness and numerical solution of this dynamical optimal transport problem, here we present structural results about the equivalent Kantorovich a.k.a. optimal coupling formulation. Specifically, we focus on deriving the ground cost for the associated Kantorovich optimal coupling formulation. The ground cost equals to the cost of transporting unit amount of mass from a specific realization of the initial or source joint probability measure to a realization of the terminal or target joint probability measure, and determines the Kantorovich formulation. Finding the ground cost leads to solving a structured deterministic nonlinear optimal control problem, which is shown to be amenable to an analysis technique pioneered by Athans et. al. We show that such techniques have broader applicability in determining the ground cost (thus Kantorovich formulation) for a class of generalized optimal mass transport problems involving nonlinear dynamics with translated norm-invariant drift. △ Less

Submitted 4 April, 2025; originally announced April 2025.

arXiv:2503.21980 [pdf, other]

Rolled Gaussian process models for curves on manifolds

Authors: Simon Preston, Karthik Bharath, Pablo Lopez-Custodio, Alfred Kume

Abstract: Given a planar curve, imagine rolling a sphere along that curve without slipping or twisting, and by this means tracing out a curve on the sphere. It is well known that such a rolling operation induces a local isometry between the sphere and the plane so that the two curves uniquely determine each other, and moreover, the operation extends to a general class of manifolds in any dimension. We use r… ▽ More Given a planar curve, imagine rolling a sphere along that curve without slipping or twisting, and by this means tracing out a curve on the sphere. It is well known that such a rolling operation induces a local isometry between the sphere and the plane so that the two curves uniquely determine each other, and moreover, the operation extends to a general class of manifolds in any dimension. We use rolling to construct an analogue of a Gaussian process on a manifold starting from a Euclidean Gaussian process. The resulting model is generative, and is amenable to statistical inference given data as curves on a manifold. We illustrate with examples on the unit sphere, symmetric positive-definite matrices, and with a robotics application involving 3D orientations. △ Less

Submitted 27 March, 2025; originally announced March 2025.

arXiv:2503.08004 [pdf, ps, other]

Multiplayer Information Asymmetric Bandits in Metric Spaces

Authors: William Chang, Aditi Karthik

Abstract: In recent years the information asymmetric Lipschitz bandits In this paper we studied the Lipschitz bandit problem applied to the multiplayer information asymmetric problem studied in \cite{chang2022online, chang2023optimal}. More specifically we consider information asymmetry in rewards, actions, or both. We adopt the CAB algorithm given in \cite{kleinberg2004nearly} which uses a fixed discretiza… ▽ More In recent years the information asymmetric Lipschitz bandits In this paper we studied the Lipschitz bandit problem applied to the multiplayer information asymmetric problem studied in \cite{chang2022online, chang2023optimal}. More specifically we consider information asymmetry in rewards, actions, or both. We adopt the CAB algorithm given in \cite{kleinberg2004nearly} which uses a fixed discretization to give regret bounds of the same order (in the dimension of the action) space in all 3 problem settings. We also adopt their zooming algorithm \cite{ kleinberg2008multi}which uses an adaptive discretization and apply it to information asymmetry in rewards and information asymmetry in actions. △ Less

Submitted 10 March, 2025; originally announced March 2025.

arXiv:2501.13607 [pdf, other]

Optimal Multi-Objective Best Arm Identification with Fixed Confidence

Authors: Zhirui Chen, P. N. Karthik, Yeow Meng Chee, Vincent Y. F. Tan

Abstract: We consider a multi-armed bandit setting with finitely many arms, in which each arm yields an $M$-dimensional vector reward upon selection. We assume that the reward of each dimension (a.k.a. {\em objective}) is generated independently of the others. The best arm of any given objective is the arm with the largest component of mean corresponding to the objective. The end goal is to identify the bes… ▽ More We consider a multi-armed bandit setting with finitely many arms, in which each arm yields an $M$-dimensional vector reward upon selection. We assume that the reward of each dimension (a.k.a. {\em objective}) is generated independently of the others. The best arm of any given objective is the arm with the largest component of mean corresponding to the objective. The end goal is to identify the best arm of {\em every} objective in the shortest (expected) time subject to an upper bound on the probability of error (i.e., fixed-confidence regime). We establish a problem-dependent lower bound on the limiting growth rate of the expected stopping time, in the limit of vanishing error probabilities. This lower bound, we show, is characterised by a max-min optimisation problem that is computationally expensive to solve at each time step. We propose an algorithm that uses the novel idea of {\em surrogate proportions} to sample the arms at each time step, eliminating the need to solve the max-min optimisation problem at each step. We demonstrate theoretically that our algorithm is asymptotically optimal. In addition, we provide extensive empirical studies to substantiate the efficiency of our algorithm. While existing works on pure exploration with multi-objective multi-armed bandits predominantly focus on {\em Pareto frontier identification}, our work fills the gap in the literature by conducting a formal investigation of the multi-objective best arm identification problem. △ Less

Submitted 23 January, 2025; originally announced January 2025.

Comments: Accepted to AISTATS 2025

arXiv:2412.18818 [pdf, other]

Empirical likelihood for Fréchet means on open books

Authors: Karthik Bharath, Huiling Le, Andrew T A Wood, Xi Yan

Abstract: Empirical Likelihood (EL) is a type of nonparametric likelihood that is useful in many statistical inference problems, including confidence region construction and $k$-sample problems. It enjoys some remarkable theoretical properties, notably Bartlett correctability. One area where EL has potential but is under-developed is in non-Euclidean statistics where the Fréchet mean is the population chara… ▽ More Empirical Likelihood (EL) is a type of nonparametric likelihood that is useful in many statistical inference problems, including confidence region construction and $k$-sample problems. It enjoys some remarkable theoretical properties, notably Bartlett correctability. One area where EL has potential but is under-developed is in non-Euclidean statistics where the Fréchet mean is the population characteristic of interest. Only recently has a general EL method been proposed for smooth manifolds. In this work, we continue progress in this direction and develop an EL method for the Fréchet mean on a stratified metric space that is not a manifold: the open book, obtained by gluing copies of a Euclidean space along their common boundaries. The structure of an open book captures the essential behaviour of the Fréchet mean around certain singular regions of more general stratified spaces for complex data objects, and relates intimately to the local geometry of non-binary trees in the well-studied phylogenetic treespace. We derive a version of Wilks' theorem for the EL statistic, and elucidate on the delicate interplay between the asymptotic distribution and topology of the neighbourhood around the population Fréchet mean. We then present a bootstrap calibration of the EL, which proves that under mild conditions, bootstrap calibration of EL confidence regions have coverage error of size $O(n^{-2})$ rather than $O(n^{-1})$. △ Less

Submitted 25 December, 2024; originally announced December 2024.

arXiv:2411.18747 [pdf, other]

Deterministic and Probabilistic Rounding Error Analysis for Mixed-Precision Arithmetic on Modern Computing Units

Authors: Sahil Bhola, Karthik Duraisamy

Abstract: Modern computer architectures support low-precision arithmetic, which present opportunities for the adoption of mixed-precision algorithms to achieve high computational throughput and reduce energy consumption. As a growing number of scientific computations leverage specialized hardware accelerators, the risk of rounding errors increases, potentially compromising the reliability of models. This sh… ▽ More Modern computer architectures support low-precision arithmetic, which present opportunities for the adoption of mixed-precision algorithms to achieve high computational throughput and reduce energy consumption. As a growing number of scientific computations leverage specialized hardware accelerators, the risk of rounding errors increases, potentially compromising the reliability of models. This shift towards hardware-optimized, low-precision computations highlights the importance of rounding error analysis to ensure that performance gains do not come at the expense of accuracy, especially in high-stakes scientific applications. In this work, we conduct rounding error analysis on widely used operations such as fused multiply-add (FMA), mixed-precision FMA (MPFMA), and NVIDIA Tensor cores. We present a deterministic and probabilistic approach to quantifying the accumulated rounding errors. Numerical experiments are presented to perform the multiply and accumulate operation (MAC) and matrix-matrix multiplication using Tensor cores with random data. We show that probabilistic bounds produce tighter estimates by nearly an order of magnitude compared to deterministic ones for matrix-matrix multiplication. △ Less

Submitted 27 November, 2024; originally announced November 2024.

arXiv:2411.18416 [pdf, other]

Probabilistic size-and-shape functional mixed models

Authors: Fangyi Wang, Karthik Bharath, Oksana Chkrebtii, Sebastian Kurtek

Abstract: The reliable recovery and uncertainty quantification of a fixed effect function $μ$ in a functional mixed model, for modelling population- and object-level variability in noisily observed functional data, is a notoriously challenging task: variations along the $x$ and $y$ axes are confounded with additive measurement error, and cannot in general be disentangled. The question then as to what proper… ▽ More The reliable recovery and uncertainty quantification of a fixed effect function $μ$ in a functional mixed model, for modelling population- and object-level variability in noisily observed functional data, is a notoriously challenging task: variations along the $x$ and $y$ axes are confounded with additive measurement error, and cannot in general be disentangled. The question then as to what properties of $μ$ may be reliably recovered becomes important. We demonstrate that it is possible to recover the size-and-shape of a square-integrable $μ$ under a Bayesian functional mixed model. The size-and-shape of $μ$ is a geometric property invariant to a family of space-time unitary transformations, viewed as rotations of the Hilbert space, that jointly transform the $x$ and $y$ axes. A random object-level unitary transformation then captures size-and-shape \emph{preserving} deviations of $μ$ from an individual function, while a random linear term and measurement error capture size-and-shape \emph{altering} deviations. The model is regularized by appropriate priors on the unitary transformations, posterior summaries of which may then be suitably interpreted as optimal data-driven rotations of a fixed orthonormal basis for the Hilbert space. Our numerical experiments demonstrate utility of the proposed model, and superiority over the current state-of-the-art. △ Less

Submitted 27 November, 2024; originally announced November 2024.

Comments: NeurIPS 2024

arXiv:2410.16295 [pdf, other]

Universal Approximation of Mean-Field Models via Transformers

Authors: Shiba Biswal, Karthik Elamvazhuthi, Rishi Sonthalia

Abstract: This paper investigates the use of transformers to approximate the mean-field dynamics of interacting particle systems exhibiting collective behavior. Such systems are fundamental in modeling phenomena across physics, biology, and engineering, including opinion formation, biological networks, and swarm robotics. The key characteristic of these systems is that the particles are indistinguishable, l… ▽ More This paper investigates the use of transformers to approximate the mean-field dynamics of interacting particle systems exhibiting collective behavior. Such systems are fundamental in modeling phenomena across physics, biology, and engineering, including opinion formation, biological networks, and swarm robotics. The key characteristic of these systems is that the particles are indistinguishable, leading to permutation-equivariant dynamics. First, we empirically demonstrate that transformers are well-suited for approximating a variety of mean field models, including the Cucker-Smale model for flocking and milling, and the mean-field system for training two-layer neural networks. We validate our numerical experiments via mathematical theory. Specifically, we prove that if a finite-dimensional transformer effectively approximates the finite-dimensional vector field governing the particle system, then the $L_2$ distance between the \textit{expected transformer} and the infinite-dimensional mean-field vector field can be uniformly bounded by a function of the number of particles observed during training. Leveraging this result, we establish theoretical bounds on the distance between the true mean-field dynamics and those obtained using the transformer. △ Less

Submitted 27 May, 2025; v1 submitted 6 October, 2024; originally announced October 2024.

arXiv:2410.02979 [pdf, ps, other]

Optimization, Isoperimetric Inequalities, and Sampling via Lyapunov Potentials

Authors: August Y. Chen, Karthik Sridharan

Abstract: In this paper, we prove that optimizability of any F using Gradient Flow from all initializations implies a Poincaré Inequality for Gibbs measures mu_{beta} = e^{-βF}/Z at low temperature. In particular, under mild regularity assumptions on the convergence rate of Gradient Flow, we establish that mu_{beta} satisfies a Poincaré Inequality with constant O(C'+1/beta) for beta >= Omega(d), where C' is… ▽ More In this paper, we prove that optimizability of any F using Gradient Flow from all initializations implies a Poincaré Inequality for Gibbs measures mu_{beta} = e^{-βF}/Z at low temperature. In particular, under mild regularity assumptions on the convergence rate of Gradient Flow, we establish that mu_{beta} satisfies a Poincaré Inequality with constant O(C'+1/beta) for beta >= Omega(d), where C' is the Poincaré constant of mu_{beta} restricted to a neighborhood of the global minimizers of F. Under an additional mild condition on F, we show that mu_{beta} satisfies a Log-Sobolev Inequality with constant O(S beta C') where S denotes the second moment of mu_{beta}. Here asymptotic notation hides F-dependent parameters. At a high level, this establishes that optimizability via Gradient Flow from every initialization implies a Poincaré and Log-Sobolev Inequality for the low-temperature Gibbs measure, which in turn imply sampling from all initializations. Analogously, we establish that under the same assumptions, if F can be initialized from everywhere except some set S, then mu_{beta} satisfies a Weak Poincaré Inequality with parameters (C', mu_{beta}(S)) for β= Omega(d). At a high level, this shows while optimizability from 'most' initializations implies a Weak Poincaré Inequality, which in turn implies sampling from suitable warm starts. Our regularity assumptions are mild and as a consequence, we show we can efficiently sample from several new natural and interesting classes of non-log-concave densities, an important setting with relatively few examples. As another corollary, we obtain efficient discrete-time sampling results for log-concave measures satisfying milder regularity conditions than smoothness, similar to Lehec (2023). △ Less

Submitted 4 March, 2025; v1 submitted 3 October, 2024; originally announced October 2024.

Comments: 52 pages. New version with new results on Weak Poincaré Inequalities and improved presentation

arXiv:2409.19901 [pdf, other]

SurvCORN: Survival Analysis with Conditional Ordinal Ranking Neural Network

Authors: Muhammad Ridzuan, Numan Saeed, Fadillah Adamsyah Maani, Karthik Nandakumar, Mohammad Yaqub

Abstract: Survival analysis plays a crucial role in estimating the likelihood of future events for patients by modeling time-to-event data, particularly in healthcare settings where predictions about outcomes such as death and disease recurrence are essential. However, this analysis poses challenges due to the presence of censored data, where time-to-event information is missing for certain data points. Yet… ▽ More Survival analysis plays a crucial role in estimating the likelihood of future events for patients by modeling time-to-event data, particularly in healthcare settings where predictions about outcomes such as death and disease recurrence are essential. However, this analysis poses challenges due to the presence of censored data, where time-to-event information is missing for certain data points. Yet, censored data can offer valuable insights, provided we appropriately incorporate the censoring time during modeling. In this paper, we propose SurvCORN, a novel method utilizing conditional ordinal ranking networks to predict survival curves directly. Additionally, we introduce SurvMAE, a metric designed to evaluate the accuracy of model predictions in estimating time-to-event outcomes. Through empirical evaluation on two real-world cancer datasets, we demonstrate SurvCORN's ability to maintain accurate ordering between patient outcomes while improving individual time-to-event predictions. Our contributions extend recent advancements in ordinal regression to survival analysis, offering valuable insights into accurate prognosis in healthcare settings. △ Less

Submitted 29 September, 2024; originally announced September 2024.

arXiv:2404.13056 [pdf, other]

doi 10.1016/j.cma.2024.117457

Variational Bayesian Optimal Experimental Design with Normalizing Flows

Authors: Jiayuan Dong, Christian Jacobsen, Mehdi Khalloufi, Maryam Akram, Wanjiao Liu, Karthik Duraisamy, Xun Huan

Abstract: Bayesian optimal experimental design (OED) seeks experiments that maximize the expected information gain (EIG) in model parameters. Directly estimating the EIG using nested Monte Carlo is computationally expensive and requires an explicit likelihood. Variational OED (vOED), in contrast, estimates a lower bound of the EIG without likelihood evaluations by approximating the posterior distributions w… ▽ More Bayesian optimal experimental design (OED) seeks experiments that maximize the expected information gain (EIG) in model parameters. Directly estimating the EIG using nested Monte Carlo is computationally expensive and requires an explicit likelihood. Variational OED (vOED), in contrast, estimates a lower bound of the EIG without likelihood evaluations by approximating the posterior distributions with variational forms, and then tightens the bound by optimizing its variational parameters. We introduce the use of normalizing flows (NFs) for representing variational distributions in vOED; we call this approach vOED-NFs. Specifically, we adopt NFs with a conditional invertible neural network architecture built from compositions of coupling layers, and enhanced with a summary network for data dimension reduction. We present Monte Carlo estimators to the lower bound along with gradient expressions to enable a gradient-based simultaneous optimization of the variational parameters and the design variables. The vOED-NFs algorithm is then validated in two benchmark problems, and demonstrated on a partial differential equation-governed application of cathodic electrophoretic deposition and an implicit likelihood case with stochastic modeling of aphid population. The findings suggest that a composition of 4--5 coupling layers is able to achieve lower EIG estimation bias, under a fixed budget of forward model runs, compared to previous approaches. The resulting NFs produce approximate posteriors that agree well with the true posteriors, able to capture non-Gaussian and multi-modal features effectively. △ Less

Submitted 27 April, 2025; v1 submitted 8 April, 2024; originally announced April 2024.

MSC Class: 62K05; 94A17; 62C10; 62F15

Journal ref: Computer Methods in Applied Mechanics and Engineering 433 (2025) 117457

arXiv:2404.12556 [pdf, other]

Exploiting Higher-Order Statistics for Robust Probabilistic Rounding Error Analysis

Authors: Sahil Bhola, Karthik Duraisamy

Abstract: Modern computer hardware supports low- and mixed-precision arithmetic for enhanced computational efficiency. In practical predictive modeling, however, it becomes vital to quantify the uncertainty due to rounding along with other sources of uncertainty (such as measurement, sampling, and numerical discretization) to ensure efficiency gains do not compromise accuracy. Higham and Mary [1] showed tha… ▽ More Modern computer hardware supports low- and mixed-precision arithmetic for enhanced computational efficiency. In practical predictive modeling, however, it becomes vital to quantify the uncertainty due to rounding along with other sources of uncertainty (such as measurement, sampling, and numerical discretization) to ensure efficiency gains do not compromise accuracy. Higham and Mary [1] showed that modeling rounding errors as zero-mean independent random variables yields a problem size-dependent constant, $\tildeγ_n \propto \sqrt{n}$, which scales more slowly than in traditional deterministic analysis. We propose a novel variance-informed probabilistic rounding error analysis, modeling rounding errors as bounded, independent, and identically distributed (i.i.d.) random variables. This yields a new constant $\hatγ_n$, dependent on the mean, variance, and bounds of the rounding error distribution. We rigorously show that $\hatγ_n \propto \sqrt{n}$ using statistical properties of rounding errors, without ad-hoc assumptions, as in Higham and Mary. This new constant increases gradually with problem size and can improve the rounding error estimates for large arithmetic operations performed at low precision by up to six orders of magnitude. We conduct numerical experiments on random vector dot products, matrix-vector multiplication, a linear system solution, and a stochastic boundary value problem. We show that quantifying rounding uncertainty along with traditional sources (numerical discretization, sampling, parameters) enables a more efficient allocation of computational resources, thereby balancing computational efficiency with predictive accuracy. This study is a step towards a comprehensive mixed-precision approach that improves model reliability and enables budgeting of computational resources in predictive modeling and decision-making. △ Less

Submitted 16 January, 2025; v1 submitted 18 April, 2024; originally announced April 2024.

arXiv:2403.18124 [pdf, other]

Stochastic Finite Volume Method for Uncertainty Management in Gas Pipeline Network Flows

Authors: Saif R. Kazi, Sidhant Misra, Svetlana Tokareva, Kaarthik Sundar, Anatoly Zlotnik

Abstract: Natural gas consumption by users of pipeline networks is subject to increasing uncertainty that originates from the intermittent nature of electric power loads serviced by gas-fired generators. To enable computationally efficient optimization of gas network flows subject to uncertainty, we develop a finite volume representation of stochastic solutions of hyperbolic partial differential equation (P… ▽ More Natural gas consumption by users of pipeline networks is subject to increasing uncertainty that originates from the intermittent nature of electric power loads serviced by gas-fired generators. To enable computationally efficient optimization of gas network flows subject to uncertainty, we develop a finite volume representation of stochastic solutions of hyperbolic partial differential equation (PDE) systems on graph-connected domains with nodal coupling and boundary conditions. The representation is used to express the physical constraints in stochastic optimization problems for gas flow allocation subject to uncertain parameters. The method is based on the stochastic finite volume approach that was recently developed for uncertainty quantification in transient flows represented by hyperbolic PDEs on graphs. In this study, we develop optimization formulations for steady-state gas flow over actuated transport networks subject to probabilistic constraints. In addition to the distributions for the physical solutions, we examine the dual variables that are produced by way of the optimization, and interpret them as price distributions that quantify the financial volatility that arises through demand uncertainty modeled in an optimization-driven gas market mechanism. We demonstrate the computation and distributional analysis using a single-pipe example and a small test network. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Report number: LA-UR-24-22647 MSC Class: 65Kxx; 90C35; 90C15

arXiv:2403.04033 [pdf, ps, other]

Online Learning with Unknown Constraints

Authors: Karthik Sridharan, Seung Won Wilson Yoo

Abstract: We consider the problem of online learning where the sequence of actions played by the learner must adhere to an unknown safety constraint at every round. The goal is to minimize regret with respect to the best safe action in hindsight while simultaneously satisfying the safety constraint with high probability on each round. We provide a general meta-algorithm that leverages an online regression o… ▽ More We consider the problem of online learning where the sequence of actions played by the learner must adhere to an unknown safety constraint at every round. The goal is to minimize regret with respect to the best safe action in hindsight while simultaneously satisfying the safety constraint with high probability on each round. We provide a general meta-algorithm that leverages an online regression oracle to estimate the unknown safety constraint, and converts the predictions of an online learning oracle to predictions that adhere to the unknown safety constraint. On the theoretical side, our algorithm's regret can be bounded by the regret of the online regression and online learning oracles, the eluder dimension of the model class containing the unknown safety constraint, and a novel complexity measure that captures the difficulty of safe learning. We complement our result with an asymptotic lower bound that shows that the aforementioned complexity measure is necessary. When the constraints are linear, we instantiate our result to provide a concrete algorithm with $\sqrt{T}$ regret using a scaling transformation that balances optimistic exploration with pessimistic constraint satisfaction. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2401.11515 [pdf, other]

Geometry-driven Bayesian Inference for Ultrametric Covariance Matrices

Authors: Tsung-Hung Yao, Zhenke Wu, Karthik Bharath, Veerabhadran Baladandayuthapani

Abstract: Ultrametric matrices are a class of covariance matrices that arise in latent tree models. As a parameter space in a statistical model, the set of ultrametric matrices is neither convex nor a smooth manifold. Focus in the literature has hitherto been restricted to estimation through projections and relaxation-based techniques, and inferential methods are lacking. Motivated by this, we establish a b… ▽ More Ultrametric matrices are a class of covariance matrices that arise in latent tree models. As a parameter space in a statistical model, the set of ultrametric matrices is neither convex nor a smooth manifold. Focus in the literature has hitherto been restricted to estimation through projections and relaxation-based techniques, and inferential methods are lacking. Motivated by this, we establish a bijection between the set of positive definite ultrametric matrices and the set of rooted, leaf-labeled trees equipped with the stratified geometry of the well-known phylogenetic treespace. Using the pullback geometry under the bijection and by adapting sampling algorithms in Bayesian phylogenetics, we develop algorithms to sample from the posterior distribution on the set of ultrametric matrices in a Bayesian latent tree model where the tree may be binary or multifurcating. We demonstrate the utility of the algorithms in simulation studies, and illustrate them on a pre-clinical cancer application to quantify uncertainty about treatment trees that identify treatments with high mechanism similarity that target correlated pathways. △ Less

Submitted 24 April, 2025; v1 submitted 21 January, 2024; originally announced January 2024.

arXiv:2401.09073 [pdf, other]

Fixed-Budget Differentially Private Best Arm Identification

Authors: Zhirui Chen, P. N. Karthik, Yeow Meng Chee, Vincent Y. F. Tan

Abstract: We study best arm identification (BAI) in linear bandits in the fixed-budget regime under differential privacy constraints, when the arm rewards are supported on the unit interval. Given a finite budget $T$ and a privacy parameter $\varepsilon>0$, the goal is to minimise the error probability in finding the arm with the largest mean after $T$ sampling rounds, subject to the constraint that the pol… ▽ More We study best arm identification (BAI) in linear bandits in the fixed-budget regime under differential privacy constraints, when the arm rewards are supported on the unit interval. Given a finite budget $T$ and a privacy parameter $\varepsilon>0$, the goal is to minimise the error probability in finding the arm with the largest mean after $T$ sampling rounds, subject to the constraint that the policy of the decision maker satisfies a certain {\em $\varepsilon$-differential privacy} ($\varepsilon$-DP) constraint. We construct a policy satisfying the $\varepsilon$-DP constraint (called {\sc DP-BAI}) by proposing the principle of {\em maximum absolute determinants}, and derive an upper bound on its error probability. Furthermore, we derive a minimax lower bound on the error probability, and demonstrate that the lower and the upper bounds decay exponentially in $T$, with exponents in the two bounds matching order-wise in (a) the sub-optimality gaps of the arms, (b) $\varepsilon$, and (c) the problem complexity that is expressible as the sum of two terms, one characterising the complexity of standard fixed-budget BAI (without privacy constraints), and the other accounting for the $\varepsilon$-DP constraint. Additionally, we present some auxiliary results that contribute to the derivation of the lower bound on the error probability. These results, we posit, may be of independent interest and could prove instrumental in proving lower bounds on error probabilities in several other bandit problems. Whereas prior works provide results for BAI in the fixed-budget regime without privacy constraints or in the fixed-confidence regime with privacy constraints, our work fills the gap in the literature by providing the results for BAI in the fixed-budget regime under the $\varepsilon$-DP constraint. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: Accepted to ICLR 2024

arXiv:2401.01231 [pdf, other]

Movement of insurgent gangs: A Bayesian kernel density model for incomplete temporal data

Authors: Karthik Sriram, Dhruv Gupta, Rajiv Parikh

Abstract: We develop a Bayesian modeling framework to address a pressing real-life problem faced by the police in tackling insurgent gangs. Unlike criminals associated with common crimes such as robbery, theft or street crime, insurgent gangs are trained in sophisticated arms and strategise against the government to weaken its resolve. They are constantly on the move, operating over large areas causing dama… ▽ More We develop a Bayesian modeling framework to address a pressing real-life problem faced by the police in tackling insurgent gangs. Unlike criminals associated with common crimes such as robbery, theft or street crime, insurgent gangs are trained in sophisticated arms and strategise against the government to weaken its resolve. They are constantly on the move, operating over large areas causing damage to national properties and terrorizing ordinary citizens. Different from the more commonly addressed problem of modeling crime-events, our context requires that an approach be formulated to model the movement of insurgent gangs, which is more valuable to the police forces in preempting their activities and nabbing them. This paper evolved as a collaborative work with the Indian police to help augment their tactics with a systematic method, by integrating past data on observed gang-locations with the expert knowledge of the police officers. A methodological challenge in modeling the movement of insurgent gangs is that the data on their locations is incomplete, since they are observable only at some irregularly separated time-points. Based on a weighted kernel density formulation for temporal data, we analytically derive the closed form of the likelihood, conditional on incomplete past observed data. Building on the current tactics used by the police, we device an approach for constructing an expert-prior on gang-locations, along with a sequential Bayesian procedure for estimation and prediction. We also propose a new metric for predictive assessment that complements another known metric used in similar problems. △ Less

Submitted 2 January, 2024; originally announced January 2024.

arXiv:2312.14882 [pdf, ps, other]

Sampling and estimation on manifolds using the Langevin diffusion

Authors: Karthik Bharath, Alexander Lewis, Akash Sharma, Michael V Tretyakov

Abstract: Error bounds are derived for sampling and estimation using a discretization of an intrinsically defined Langevin diffusion with invariant measure $\text{d}μ_φ\propto e^{-φ} \mathrm{dvol}_g $ on a compact Riemannian manifold. Two estimators of linear functionals of $μ_φ$ based on the discretized Markov process are considered: a time-averaging estimator based on a single trajectory and an ensemble-a… ▽ More Error bounds are derived for sampling and estimation using a discretization of an intrinsically defined Langevin diffusion with invariant measure $\text{d}μ_φ\propto e^{-φ} \mathrm{dvol}_g $ on a compact Riemannian manifold. Two estimators of linear functionals of $μ_φ$ based on the discretized Markov process are considered: a time-averaging estimator based on a single trajectory and an ensemble-averaging estimator based on multiple independent trajectories. Imposing no restrictions beyond a nominal level of smoothness on $φ$, first-order error bounds, in discretization step size, on the bias and variance/mean-square error of both estimators are derived. The order of error matches the optimal rate in Euclidean and flat spaces, and leads to a first-order bound on distance between the invariant measure $μ_φ$ and a stationary measure of the discretized Markov process. This order is preserved even upon using retractions when exponential maps are unavailable in closed form, thus enhancing practicality of the proposed algorithms. Generality of the proof techniques, which exploit links between two partial differential equations and the semigroup of operators corresponding to the Langevin diffusion, renders them amenable for the study of a more general class of sampling algorithms related to the Langevin diffusion. Conditions for extending analysis to the case of non-compact manifolds are discussed. Numerical illustrations with distributions, log-concave and otherwise, on the manifolds of positive and negative curvature elucidate on the derived bounds and demonstrate practical utility of the sampling algorithm. △ Less

Submitted 26 April, 2025; v1 submitted 22 December, 2023; originally announced December 2023.

arXiv:2312.12361 [pdf, other]

Improved multifidelity Monte Carlo estimators based on normalizing flows and dimensionality reduction techniques

Authors: Andrea Zanoni, Gianluca Geraci, Matteo Salvador, Karthik Menon, Alison L. Marsden, Daniele E. Schiavazzi

Abstract: We study the problem of multifidelity uncertainty propagation for computationally expensive models. In particular, we consider the general setting where the high-fidelity and low-fidelity models have a dissimilar parameterization both in terms of number of random inputs and their probability distributions, which can be either known in closed form or provided through samples. We derive novel multif… ▽ More We study the problem of multifidelity uncertainty propagation for computationally expensive models. In particular, we consider the general setting where the high-fidelity and low-fidelity models have a dissimilar parameterization both in terms of number of random inputs and their probability distributions, which can be either known in closed form or provided through samples. We derive novel multifidelity Monte Carlo estimators which rely on a shared subspace between the high-fidelity and low-fidelity models where the parameters follow the same probability distribution, i.e., a standard Gaussian. We build the shared space employing normalizing flows to map different probability distributions into a common one, together with linear and nonlinear dimensionality reduction techniques, active subspaces and autoencoders, respectively, which capture the subspaces where the models vary the most. We then compose the existing low-fidelity model with these transformations and construct modified models with an increased correlation with the high-fidelity model, which therefore yield multifidelity Monte Carlo estimators with reduced variance. A series of numerical experiments illustrate the properties and advantages of our approaches. △ Less

Submitted 14 June, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.11230 [pdf, other]

Dirichlet-based Uncertainty Quantification for Personalized Federated Learning with Improved Posterior Networks

Authors: Nikita Kotelevskii, Samuel Horváth, Karthik Nandakumar, Martin Takáč, Maxim Panov

Abstract: In modern federated learning, one of the main challenges is to account for inherent heterogeneity and the diverse nature of data distributions for different clients. This problem is often addressed by introducing personalization of the models towards the data distribution of the particular client. However, a personalized model might be unreliable when applied to the data that is not typical for th… ▽ More In modern federated learning, one of the main challenges is to account for inherent heterogeneity and the diverse nature of data distributions for different clients. This problem is often addressed by introducing personalization of the models towards the data distribution of the particular client. However, a personalized model might be unreliable when applied to the data that is not typical for this client. Eventually, it may perform worse for these data than the non-personalized global model trained in a federated way on the data from all the clients. This paper presents a new approach to federated learning that allows selecting a model from global and personalized ones that would perform better for a particular input point. It is achieved through a careful modeling of predictive uncertainties that helps to detect local and global in- and out-of-distribution data and use this information to select the model that is confident in a prediction. The comprehensive experimental evaluation on the popular real-world image datasets shows the superior performance of the model in the presence of out-of-distribution data while performing on par with state-of-the-art personalized federated learning algorithms in the standard scenarios. △ Less

Submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.00854 [pdf, other]

A Probabilistic Neural Twin for Treatment Planning in Peripheral Pulmonary Artery Stenosis

Authors: John D. Lee, Jakob Richter, Martin R. Pfaller, Jason M. Szafron, Karthik Menon, Andrea Zanoni, Michael R. Ma, Jeffrey A. Feinstein, Jacqueline Kreutzer, Alison L. Marsden, Daniele E. Schiavazzi

Abstract: The substantial computational cost of high-fidelity models in numerical hemodynamics has, so far, relegated their use mainly to offline treatment planning. New breakthroughs in data-driven architectures and optimization techniques for fast surrogate modeling provide an exciting opportunity to overcome these limitations, enabling the use of such technology for time-critical decisions. We discuss an… ▽ More The substantial computational cost of high-fidelity models in numerical hemodynamics has, so far, relegated their use mainly to offline treatment planning. New breakthroughs in data-driven architectures and optimization techniques for fast surrogate modeling provide an exciting opportunity to overcome these limitations, enabling the use of such technology for time-critical decisions. We discuss an application to the repair of multiple stenosis in peripheral pulmonary artery disease through either transcatheter pulmonary artery rehabilitation or surgery, where it is of interest to achieve desired pressures and flows at specific locations in the pulmonary artery tree, while minimizing the risk for the patient. Since different degrees of success can be achieved in practice during treatment, we formulate the problem in probability, and solve it through a sample-based approach. We propose a new offline-online pipeline for probabilsitic real-time treatment planning which combines offline assimilation of boundary conditions, model reduction, and training dataset generation with online estimation of marginal probabilities, possibly conditioned on the degree of augmentation observed in already repaired lesions. Moreover, we propose a new approach for the parametrization of arbitrarily shaped vascular repairs through iterative corrections of a zero-dimensional approximant. We demonstrate this pipeline for a diseased model of the pulmonary artery tree available through the Vascular Model Repository. △ Less

Submitted 1 December, 2023; originally announced December 2023.

arXiv:2310.13393 [pdf, ps, other]

Optimal Best Arm Identification with Fixed Confidence in Restless Bandits

Authors: P. N. Karthik, Vincent Y. F. Tan, Arpan Mukherjee, Ali Tajer

Abstract: We study best arm identification in a restless multi-armed bandit setting with finitely many arms. The discrete-time data generated by each arm forms a homogeneous Markov chain taking values in a common, finite state space. The state transitions in each arm are captured by an ergodic transition probability matrix (TPM) that is a member of a single-parameter exponential family of TPMs. The real-val… ▽ More We study best arm identification in a restless multi-armed bandit setting with finitely many arms. The discrete-time data generated by each arm forms a homogeneous Markov chain taking values in a common, finite state space. The state transitions in each arm are captured by an ergodic transition probability matrix (TPM) that is a member of a single-parameter exponential family of TPMs. The real-valued parameters of the arm TPMs are unknown and belong to a given space. Given a function $f$ defined on the common state space of the arms, the goal is to identify the best arm -- the arm with the largest average value of $f$ evaluated under the arm's stationary distribution -- with the fewest number of samples, subject to an upper bound on the decision's error probability (i.e., the fixed-confidence regime). A lower bound on the growth rate of the expected stopping time is established in the asymptote of a vanishing error probability. Furthermore, a policy for best arm identification is proposed, and its expected stopping time is proved to have an asymptotic growth rate that matches the lower bound. It is demonstrated that tracking the long-term behavior of a certain Markov decision process and its state-action visitation proportions are the key ingredients in analyzing the converse and achievability bounds. It is shown that under every policy, the state-action visitation proportions satisfy a specific approximate flow conservation constraint and that these proportions match the optimal proportions dictated by the lower bound under any asymptotically optimal policy. The prior studies on best arm identification in restless bandits focus on independent observations from the arms, rested Markov arms, and restless Markov arms with known arm TPMs. In contrast, this work is the first to study best arm identification in restless bandits with unknown arm TPMs. △ Less

Submitted 23 June, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

Comments: Accepted to the IEEE Transactions on Information Theory

arXiv:2309.11512 [pdf, other]

Multidimensional well-being of US households at a fine spatial scale using fused household surveys: fusionACS

Authors: Kevin Ummel, Miguel Poblete-Cazenave, Karthik Akkiraju, Nick Graetz, Hero Ashman, Cora Kingdon, Steven Herrera Tenorio, Aaryaman "Sunny" Singhal, Daniel Aldana Cohen, Narasimha D. Rao

Abstract: Social science often relies on surveys of households and individuals. Dozens of such surveys are regularly administered by the U.S. government. However, they field independent, unconnected samples with specialized questions, limiting research questions to those that can be answered by a single survey. The fusionACS project seeks to integrate data from multiple U.S. household surveys by statistical… ▽ More Social science often relies on surveys of households and individuals. Dozens of such surveys are regularly administered by the U.S. government. However, they field independent, unconnected samples with specialized questions, limiting research questions to those that can be answered by a single survey. The fusionACS project seeks to integrate data from multiple U.S. household surveys by statistically "fusing" variables from "donor" surveys onto American Community Survey (ACS) microdata. This results in an integrated microdataset of household attributes and well-being dimensions that can be analyzed to address research questions in ways that are not currently possible. The presented data comprise the fusion onto the ACS of select donor variables from the Residential Energy Consumption Survey (RECS) of 2015, the National Household Transportation Survey (NHTS) of 2017, the American Housing Survey (AHS) of 2019, and the Consumer Expenditure Survey - Interview (CEI) for the years 2015-2019. The underlying statistical techniques are included in an open-source $R$ package, fusionModel, that provides generic tools for the creation, analysis, and validation of fused microdata. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: 35 pages, 6 figures

arXiv:2307.09423 [pdf, other]

Scaling Laws for Imitation Learning in Single-Agent Games

Authors: Jens Tuyls, Dhruv Madeka, Kari Torkkola, Dean Foster, Karthik Narasimhan, Sham Kakade

Abstract: Imitation Learning (IL) is one of the most widely used methods in machine learning. Yet, many works find it is often unable to fully recover the underlying expert behavior, even in constrained environments like single-agent games. However, none of these works deeply investigate the role of scaling up the model and data size. Inspired by recent work in Natural Language Processing (NLP) where "scali… ▽ More Imitation Learning (IL) is one of the most widely used methods in machine learning. Yet, many works find it is often unable to fully recover the underlying expert behavior, even in constrained environments like single-agent games. However, none of these works deeply investigate the role of scaling up the model and data size. Inspired by recent work in Natural Language Processing (NLP) where "scaling up" has resulted in increasingly more capable LLMs, we investigate whether carefully scaling up model and data size can bring similar improvements in the imitation learning setting for single-agent games. We first demonstrate our findings on a variety of Atari games, and thereafter focus on the extremely challenging game of NetHack. In all games, we find that IL loss and mean return scale smoothly with the compute budget (FLOPs) and are strongly correlated, resulting in power laws for training compute-optimal IL agents. Finally, we forecast and train several NetHack agents with IL and find they outperform prior state-of-the-art by 1.5x in all settings. Our work both demonstrates the scaling behavior of imitation learning in a variety of single-agent games, as well as the viability of scaling up current approaches for increasingly capable agents in NetHack, a game that remains elusively hard for current AI systems. △ Less

Submitted 19 December, 2024; v1 submitted 18 July, 2023; originally announced July 2023.

Comments: Accepted at TMLR 2024

arXiv:2307.04998 [pdf, other]

Selective Sampling and Imitation Learning via Online Regression

Authors: Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu

Abstract: We consider the problem of Imitation Learning (IL) by actively querying noisy expert for feedback. While imitation learning has been empirically successful, much of prior work assumes access to noiseless expert feedback which is not practical in many applications. In fact, when one only has access to noisy expert feedback, algorithms that rely on purely offline data (non-interactive IL) can be sho… ▽ More We consider the problem of Imitation Learning (IL) by actively querying noisy expert for feedback. While imitation learning has been empirically successful, much of prior work assumes access to noiseless expert feedback which is not practical in many applications. In fact, when one only has access to noisy expert feedback, algorithms that rely on purely offline data (non-interactive IL) can be shown to need a prohibitively large number of samples to be successful. In contrast, in this work, we provide an interactive algorithm for IL that uses selective sampling to actively query the noisy expert for feedback. Our contributions are twofold: First, we provide a new selective sampling algorithm that works with general function classes and multiple actions, and obtains the best-known bounds for the regret and the number of queries. Next, we extend this analysis to the problem of IL with noisy expert feedback and provide a new IL algorithm that makes limited queries. Our algorithm for selective sampling leverages function approximation, and relies on an online regression oracle w.r.t.~the given model class to predict actions, and to decide whether to query the expert for its label. On the theoretical side, the regret bound of our algorithm is upper bounded by the regret of the online regression oracle, while the query complexity additionally depends on the eluder dimension of the model class. We complement this with a lower bound that demonstrates that our results are tight. We extend our selective sampling algorithm for IL with general function approximation and provide bounds on both the regret and the number of queries made to the noisy expert. A key novelty here is that our regret and query complexity bounds only depend on the number of times the optimal policy (and not the noisy expert, or the learner) go to states that have a small margin. △ Less

Submitted 10 July, 2023; originally announced July 2023.

arXiv:2305.06082 [pdf, ps, other]

Best Arm Identification in Bandits with Limited Precision Sampling

Authors: Kota Srinivas Reddy, P. N. Karthik, Nikhil Karamchandani, Jayakrishnan Nair

Abstract: We study best arm identification in a variant of the multi-armed bandit problem where the learner has limited precision in arm selection. The learner can only sample arms via certain exploration bundles, which we refer to as boxes. In particular, at each sampling epoch, the learner selects a box, which in turn causes an arm to get pulled as per a box-specific probability distribution. The pulled a… ▽ More We study best arm identification in a variant of the multi-armed bandit problem where the learner has limited precision in arm selection. The learner can only sample arms via certain exploration bundles, which we refer to as boxes. In particular, at each sampling epoch, the learner selects a box, which in turn causes an arm to get pulled as per a box-specific probability distribution. The pulled arm and its instantaneous reward are revealed to the learner, whose goal is to find the best arm by minimising the expected stopping time, subject to an upper bound on the error probability. We present an asymptotic lower bound on the expected stopping time, which holds as the error probability vanishes. We show that the optimal allocation suggested by the lower bound is, in general, non-unique and therefore challenging to track. We propose a modified tracking-based algorithm to handle non-unique optimal allocations, and demonstrate that it is asymptotically optimal. We also present non-asymptotic lower and upper bounds on the stopping time in the simpler setting when the arms accessible from one box do not overlap with those of others. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: ISIT 2023

arXiv:2304.08740 [pdf, other]

Estimating Joint Probability Distribution With Low-Rank Tensor Decomposition, Radon Transforms and Dictionaries

Authors: Pranava Singhal, Waqar Mirza, Ajit Rajwade, Karthik S. Gurumoorthy

Abstract: In this paper, we describe a method for estimating the joint probability density from data samples by assuming that the underlying distribution can be decomposed as a mixture of product densities with few mixture components. Prior works have used such a decomposition to estimate the joint density from lower-dimensional marginals, which can be estimated more reliably with the same number of samples… ▽ More In this paper, we describe a method for estimating the joint probability density from data samples by assuming that the underlying distribution can be decomposed as a mixture of product densities with few mixture components. Prior works have used such a decomposition to estimate the joint density from lower-dimensional marginals, which can be estimated more reliably with the same number of samples. We combine two key ideas: dictionaries to represent 1-D densities, and random projections to estimate the joint distribution from 1-D marginals, explored separately in prior work. Our algorithm benefits from improved sample complexity over the previous dictionary-based approach by using 1-D marginals for reconstruction. We evaluate the performance of our method on estimating synthetic probability densities and compare it with the previous dictionary-based approach and Gaussian Mixture Models (GMMs). Our algorithm outperforms these other approaches in all the experimental settings. △ Less

Submitted 18 April, 2023; originally announced April 2023.

MSC Class: 62G07

arXiv:2304.02025 [pdf, other]

doi 10.1038/s41598-023-44589-3

Estimating Global Identifiability Using Conditional Mutual Information in a Bayesian Framework

Authors: Sahil Bhola, Karthik Duraisamy

Abstract: A novel information-theoretic approach is proposed to assess the global practical identifiability of Bayesian statistical models. Based on the concept of conditional mutual information, an estimate of information gained for each model parameter is used to quantify the identifiability with practical considerations. No assumptions are made about the structure of the statistical model or the prior di… ▽ More A novel information-theoretic approach is proposed to assess the global practical identifiability of Bayesian statistical models. Based on the concept of conditional mutual information, an estimate of information gained for each model parameter is used to quantify the identifiability with practical considerations. No assumptions are made about the structure of the statistical model or the prior distribution while constructing the estimator. The estimator has the following notable advantages: first, no controlled experiment or data is required to conduct the practical identifiability analysis; second, unlike popular variance-based global sensitivity analysis methods, different forms of uncertainties, such as model-form, parameter, or measurement can be taken into account; third, the identifiability analysis is global, and therefore independent of a realization of the parameters. If an individual parameter has low identifiability, it can belong to an identifiable subset such that parameters within the subset have a functional relationship and thus have a combined effect on the statistical model. The practical identifiability framework is extended to highlight the dependencies between parameter pairs that emerge a posteriori to find identifiable parameter subsets. The applicability of the proposed approach is demonstrated using a linear Gaussian model and a non-linear methane-air reduced kinetics model. It is shown that by examining the information gained for each model parameter along with its dependencies with other parameters, a subset of parameters that can be estimated with high posterior certainty can be found. △ Less

Submitted 21 September, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

arXiv:2303.04390 [pdf, other]

Many-core algorithms for high-dimensional gradients on phylogenetic trees

Authors: Karthik Gangavarapu, Xiang Ji, Guy Baele, Mathieu Fourment, Philippe Lemey, Frederick A. Matsen IV, Marc A. Suchard

Abstract: The rapid growth in genomic pathogen data spurs the need for efficient inference techniques, such as Hamiltonian Monte Carlo (HMC) in a Bayesian framework, to estimate parameters of these phylogenetic models where the dimensions of the parameters increase with the number of sequences $N$. HMC requires repeated calculation of the gradient of the data log-likelihood with respect to (wrt) all branch-… ▽ More The rapid growth in genomic pathogen data spurs the need for efficient inference techniques, such as Hamiltonian Monte Carlo (HMC) in a Bayesian framework, to estimate parameters of these phylogenetic models where the dimensions of the parameters increase with the number of sequences $N$. HMC requires repeated calculation of the gradient of the data log-likelihood with respect to (wrt) all branch-length-specific (BLS) parameters that traditionally takes $\mathcal{O}(N^2)$ operations using the standard pruning algorithm. A recent study proposes an approach to calculate this gradient in $\mathcal{O}(N)$, enabling researchers to take advantage of gradient-based samplers such as HMC. The CPU implementation of this approach makes the calculation of the gradient computationally tractable for nucleotide-based models but falls short in performance for larger state-space size models, such as codon models. Here, we describe novel massively parallel algorithms to calculate the gradient of the log-likelihood wrt all BLS parameters that take advantage of graphics processing units (GPUs) and result in many fold higher speedups over previous CPU implementations. We benchmark these GPU algorithms on three computing systems using three evolutionary inference examples: carnivores, dengue and yeast, and observe a greater than 128-fold speedup over the CPU implementation for codon-based models and greater than 8-fold speedup for nucleotide-based models. As a practical demonstration, we also estimate the timing of the first introduction of West Nile virus into the continental Unites States under a codon model with a relaxed molecular clock from 104 full viral genomes, an inference task previously intractable. We provide an implementation of our GPU algorithms in BEAGLE v4.0.0, an open source library for statistical phylogenetics that enables parallel calculations on multi-core CPUs and GPUs. △ Less

Submitted 8 March, 2023; originally announced March 2023.

arXiv:2211.07484 [pdf, ps, other]

Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression

Authors: Aleksandrs Slivkins, Xingyu Zhou, Karthik Abinav Sankararaman, Dylan J. Foster

Abstract: We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption. This problem generalizes contextual bandits with knapsacks (CBwK), allowing for packing and covering constraints, as well as positive and negative resource consumption. We provide the first algorithm f… ▽ More We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption. This problem generalizes contextual bandits with knapsacks (CBwK), allowing for packing and covering constraints, as well as positive and negative resource consumption. We provide the first algorithm for CBwLC (or CBwK) that is based on regression oracles. The algorithm is simple, computationally efficient, and statistically optimal under mild assumptions. Further, we provide the first vanishing-regret guarantees for CBwLC (or CBwK) that extend beyond the stochastic environment. We side-step strong impossibility results from prior work by identifying a weaker (and, arguably, fairer) benchmark to compare against. Our algorithm builds on LagrangeBwK (Immorlica et al., FOCS 2019), a Lagrangian-based technique for CBwK, and SquareCB (Foster and Rakhlin, ICML 2020), a regression-based technique for contextual bandits. Our analysis leverages the inherent modularity of both techniques. △ Less

Submitted 26 November, 2024; v1 submitted 14 November, 2022; originally announced November 2022.

Comments: A preliminary version of this paper, authored by A. Slivkins, K.A. Sankararaman and D.J. Foster, has been published at COLT 2023. The present version (since Jun'24) features an important improvement, due to Xingyu Zhou. The Oct'24 version fixes an inaccuracy in Section 6 when the analysis from Section 4 is invoked

arXiv:2210.14843 [pdf, other]

TuneUp: A Simple Improved Training Strategy for Graph Neural Networks

Authors: Weihua Hu, Kaidi Cao, Kexin Huang, Edward W Huang, Karthik Subbian, Kenji Kawaguchi, Jure Leskovec

Abstract: Despite recent advances in Graph Neural Networks (GNNs), their training strategies remain largely under-explored. The conventional training strategy learns over all nodes in the original graph(s) equally, which can be sub-optimal as certain nodes are often more difficult to learn than others. Here we present TuneUp, a simple curriculum-based training strategy for improving the predictive performan… ▽ More Despite recent advances in Graph Neural Networks (GNNs), their training strategies remain largely under-explored. The conventional training strategy learns over all nodes in the original graph(s) equally, which can be sub-optimal as certain nodes are often more difficult to learn than others. Here we present TuneUp, a simple curriculum-based training strategy for improving the predictive performance of GNNs. TuneUp trains a GNN in two stages. In the first stage, TuneUp applies conventional training to obtain a strong base GNN. The base GNN tends to perform well on head nodes (nodes with large degrees) but less so on tail nodes (nodes with small degrees). Therefore, the second stage of TuneUp focuses on improving prediction on the difficult tail nodes by further training the base GNN on synthetically generated tail node data. We theoretically analyze TuneUp and show it provably improves generalization performance on tail nodes. TuneUp is simple to implement and applicable to a broad range of GNN architectures and prediction tasks. Extensive evaluation of TuneUp on five diverse GNN architectures, three types of prediction tasks, and both transductive and inductive settings shows that TuneUp significantly improves the performance of the base GNN on tail nodes, while often even improving the performance on head nodes. Altogether, TuneUp produces up to 57.6% and 92.2% relative predictive performance improvement in the transductive and the challenging inductive settings, respectively. △ Less

Submitted 26 August, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

arXiv:2209.12667 [pdf, other]

Shape And Structure Preserving Differential Privacy

Authors: Carlos Soto, Karthik Bharath, Matthew Reimherr, Aleksandra Slavkovic

Abstract: It is common for data structures such as images and shapes of 2D objects to be represented as points on a manifold. The utility of a mechanism to produce sanitized differentially private estimates from such data is intimately linked to how compatible it is with the underlying structure and geometry of the space. In particular, as recently shown, utility of the Laplace mechanism on a positively cur… ▽ More It is common for data structures such as images and shapes of 2D objects to be represented as points on a manifold. The utility of a mechanism to produce sanitized differentially private estimates from such data is intimately linked to how compatible it is with the underlying structure and geometry of the space. In particular, as recently shown, utility of the Laplace mechanism on a positively curved manifold, such as Kendall's 2D shape space, is significantly influences by the curvature. Focusing on the problem of sanitizing the Fréchet mean of a sample of points on a manifold, we exploit the characterisation of the mean as the minimizer of an objective function comprised of the sum of squared distances and develop a K-norm gradient mechanism on Riemannian manifolds that favors values that produce gradients close to the the zero of the objective function. For the case of positively curved manifolds, we describe how using the gradient of the squared distance function offers better control over sensitivity than the Laplace mechanism, and demonstrate this numerically on a dataset of shapes of corpus callosa. Further illustrations of the mechanism's utility on a sphere and the manifold of symmetric positive definite matrices are also presented. △ Less

Submitted 21 September, 2022; originally announced September 2022.

Comments: 15 pages (including supplementary material and references), 3 figures (including supplementary material), to be published in NeurIPS 2022

arXiv:2208.09215 [pdf, other]

Almost Cost-Free Communication in Federated Best Arm Identification

Authors: Kota Srinivas Reddy, P. N. Karthik, Vincent Y. F. Tan

Abstract: We study the problem of best arm identification in a federated learning multi-armed bandit setup with a central server and multiple clients. Each client is associated with a multi-armed bandit in which each arm yields {\em i.i.d.}\ rewards following a Gaussian distribution with an unknown mean and known variance. The set of arms is assumed to be the same at all the clients. We define two notions o… ▽ More We study the problem of best arm identification in a federated learning multi-armed bandit setup with a central server and multiple clients. Each client is associated with a multi-armed bandit in which each arm yields {\em i.i.d.}\ rewards following a Gaussian distribution with an unknown mean and known variance. The set of arms is assumed to be the same at all the clients. We define two notions of best arm -- local and global. The local best arm at a client is the arm with the largest mean among the arms local to the client, whereas the global best arm is the arm with the largest average mean across all the clients. We assume that each client can only observe the rewards from its local arms and thereby estimate its local best arm. The clients communicate with a central server on uplinks that entail a cost of $C\ge0$ units per usage per uplink. The global best arm is estimated at the server. The goal is to identify the local best arms and the global best arm with minimal total cost, defined as the sum of the total number of arm selections at all the clients and the total communication cost, subject to an upper bound on the error probability. We propose a novel algorithm {\sc FedElim} that is based on successive elimination and communicates only in exponential time steps and obtain a high probability instance-dependent upper bound on its total cost. The key takeaway from our paper is that for any $C\geq 0$ and error probabilities sufficiently small, the total number of arm selections (resp.\ the total cost) under {\sc FedElim} is at most~$2$ (resp.~$3$) times the maximum total number of arm selections under its variant that communicates in every time step. Additionally, we show that the latter is optimal in expectation up to a constant factor, thereby demonstrating that communication is almost cost-free in {\sc FedElim}. We numerically validate the efficacy of {\sc FedElim}. △ Less

Submitted 19 December, 2022; v1 submitted 19 August, 2022; originally announced August 2022.

Comments: Accepted to AAAI 2023

arXiv:2208.06115 [pdf, other]

A Nonparametric Approach with Marginals for Modeling Consumer Choice

Authors: Yanqiu Ruan, Xiaobo Li, Karthyek Murthy, Karthik Natarajan

Abstract: Given data on the choices made by consumers for different offer sets, a key challenge is to develop parsimonious models that describe and predict consumer choice behavior while being amenable to prescriptive tasks such as pricing and assortment optimization. The marginal distribution model (MDM) is one such model, which requires only the specification of marginal distributions of the random utilit… ▽ More Given data on the choices made by consumers for different offer sets, a key challenge is to develop parsimonious models that describe and predict consumer choice behavior while being amenable to prescriptive tasks such as pricing and assortment optimization. The marginal distribution model (MDM) is one such model, which requires only the specification of marginal distributions of the random utilities. This paper aims to establish necessary and sufficient conditions for given choice data to be consistent with the MDM hypothesis, inspired by the usefulness of similar characterizations for the random utility model (RUM). This endeavor leads to an exact characterization of the set of choice probabilities that the MDM can represent. Verifying the consistency of choice data with this characterization is equivalent to solving a polynomial-sized linear program. Since the analogous verification task for RUM is computationally intractable and neither of these models subsumes the other, MDM is helpful in striking a balance between tractability and representational power. The characterization is then used with robust optimization for making data-driven sales and revenue predictions for new unseen assortments. When the choice data lacks consistency with the MDM hypothesis, finding the best-fitting MDM choice probabilities reduces to solving a mixed integer convex program. Numerical results using real world data and synthetic data demonstrate that MDM exhibits competitive representational power and prediction performance compared to RUM and parametric models while being significantly faster in computation than RUM. △ Less

Submitted 14 April, 2025; v1 submitted 12 August, 2022; originally announced August 2022.

arXiv:2207.13797 [pdf, other]

Identification and Inference with Min-over-max Estimators for the Measurement of Labor Market Fairness

Authors: Karthik Rajkumar

Abstract: These notes shows how to do inference on the Demographic Parity (DP) metric. Although the metric is a complex statistic involving min and max computations, we propose a smooth approximation of those functions and derive its asymptotic distribution. The limit of these approximations and their gradients converge to those of the true max and min functions, wherever they exist. More importantly, when… ▽ More These notes shows how to do inference on the Demographic Parity (DP) metric. Although the metric is a complex statistic involving min and max computations, we propose a smooth approximation of those functions and derive its asymptotic distribution. The limit of these approximations and their gradients converge to those of the true max and min functions, wherever they exist. More importantly, when the true max and min functions are not differentiable, the approximations still are, and they provide valid asymptotic inference everywhere in the domain. We conclude with some directions on how to compute confidence intervals for DP, how to test if it is under 0.8 (the U.S. Equal Employment Opportunity Commission fairness threshold), and how to do inference in an A/B test. △ Less

Submitted 27 July, 2022; originally announced July 2022.

Comments: 12 pages, 3 figures

arXiv:2207.10914 [pdf, other]

Spatially Penalised Registration of Multivariate Functional Data

Authors: Xiaohan Guo, Sebastian Kurtek, Karthik Bharath

Abstract: Registration of multivariate functional data involves handling of both cross-component and cross-observation phase variations. Allowing for the two phase variations to be modelled as general diffeomorphic time warpings, in this work we focus on the hitherto unconsidered setting where phase variation of the component functions are spatially correlated. We propose an algorithm to optimize a metric-b… ▽ More Registration of multivariate functional data involves handling of both cross-component and cross-observation phase variations. Allowing for the two phase variations to be modelled as general diffeomorphic time warpings, in this work we focus on the hitherto unconsidered setting where phase variation of the component functions are spatially correlated. We propose an algorithm to optimize a metric-based objective function for registration with a novel penalty term that incorporates the spatial correlation between the component phase variations through a kriging estimate of an appropriate phase random field. The penalty term encourages the overall phase at a particular location to be similar to the spatially weighted average phase in its neighbourhood, and thus engenders a regularization that prevents over-alignment. Utility of the registration method, and its superior performance compared to methods that fail to account for the spatial correlation, is demonstrated through performance on simulated examples and two multivariate functional datasets pertaining to EEG signals and ozone concentrations. The generality of the framework opens up the possibility for extension to settings involving different forms of correlation between the component functions and their phases. △ Less

Submitted 22 July, 2022; originally announced July 2022.

arXiv:2206.13063 [pdf, other]

On the Complexity of Adversarial Decision Making

Authors: Dylan J. Foster, Alexander Rakhlin, Ayush Sekhari, Karthik Sridharan

Abstract: A central problem in online learning and decision making -- from bandits to reinforcement learning -- is to understand what modeling assumptions lead to sample-efficient learning guarantees. We consider a general adversarial decision making framework that encompasses (structured) bandit problems with adversarial rewards and reinforcement learning problems with adversarial dynamics. Our main result… ▽ More A central problem in online learning and decision making -- from bandits to reinforcement learning -- is to understand what modeling assumptions lead to sample-efficient learning guarantees. We consider a general adversarial decision making framework that encompasses (structured) bandit problems with adversarial rewards and reinforcement learning problems with adversarial dynamics. Our main result is to show -- via new upper and lower bounds -- that the Decision-Estimation Coefficient, a complexity measure introduced by Foster et al. in the stochastic counterpart to our setting, is necessary and sufficient to obtain low regret for adversarial decision making. However, compared to the stochastic setting, one must apply the Decision-Estimation Coefficient to the convex hull of the class of models (or, hypotheses) under consideration. This establishes that the price of accommodating adversarial rewards or dynamics is governed by the behavior of the model class under convexification, and recovers a number of existing results -- both positive and negative. En route to obtaining these guarantees, we provide new structural results that connect the Decision-Estimation Coefficient to variants of other well-known complexity measures, including the Information Ratio of Russo and Van Roy and the Exploration-by-Optimization objective of Lattimore and György. △ Less

Submitted 27 June, 2022; originally announced June 2022.

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2206.03040 [pdf, other]

doi 10.1145/3534678.3539194

Learning Backward Compatible Embeddings

Authors: Weihua Hu, Rajas Bansal, Kaidi Cao, Nikhil Rao, Karthik Subbian, Jure Leskovec

Abstract: Embeddings, low-dimensional vector representation of objects, are fundamental in building modern machine learning systems. In industrial settings, there is usually an embedding team that trains an embedding model to solve intended tasks (e.g., product recommendation). The produced embeddings are then widely consumed by consumer teams to solve their unintended tasks (e.g., fraud detection). However… ▽ More Embeddings, low-dimensional vector representation of objects, are fundamental in building modern machine learning systems. In industrial settings, there is usually an embedding team that trains an embedding model to solve intended tasks (e.g., product recommendation). The produced embeddings are then widely consumed by consumer teams to solve their unintended tasks (e.g., fraud detection). However, as the embedding model gets updated and retrained to improve performance on the intended task, the newly-generated embeddings are no longer compatible with the existing consumer models. This means that historical versions of the embeddings can never be retired or all consumer teams have to retrain their models to make them compatible with the latest version of the embeddings, both of which are extremely costly in practice. Here we study the problem of embedding version updates and their backward compatibility. We formalize the problem where the goal is for the embedding team to keep updating the embedding version, while the consumer teams do not have to retrain their models. We develop a solution based on learning backward compatible embeddings, which allows the embedding model version to be updated frequently, while also allowing the latest version of the embedding to be quickly transformed into any backward compatible historical version of it, so that consumer teams do not have to retrain their models. Under our framework, we explore six methods and systematically evaluate them on a real-world recommender system application. We show that the best method, which we call BC-Aligner, maintains backward compatibility with existing unintended tasks even after multiple model version updates. Simultaneously, BC-Aligner achieves the intended task performance similar to the embedding model that is solely optimized for the intended task. △ Less

Submitted 7 June, 2022; originally announced June 2022.

Comments: KDD 2022, Applied Data Science Track

arXiv:2203.15236 [pdf, ps, other]

Best Arm Identification in Restless Markov Multi-Armed Bandits

Authors: P. N. Karthik, Kota Srinivas Reddy, Vincent Y. F. Tan

Abstract: We study the problem of identifying the best arm in a multi-armed bandit environment when each arm is a time-homogeneous and ergodic discrete-time Markov process on a common, finite state space. The state evolution on each arm is governed by the arm's transition probability matrix (TPM). A decision entity that knows the set of arm TPMs but not the exact mapping of the TPMs to the arms, wishes to f… ▽ More We study the problem of identifying the best arm in a multi-armed bandit environment when each arm is a time-homogeneous and ergodic discrete-time Markov process on a common, finite state space. The state evolution on each arm is governed by the arm's transition probability matrix (TPM). A decision entity that knows the set of arm TPMs but not the exact mapping of the TPMs to the arms, wishes to find the index of the best arm as quickly as possible, subject to an upper bound on the error probability. The decision entity selects one arm at a time sequentially, and all the unselected arms continue to undergo state evolution ({\em restless} arms). For this problem, we derive the first-known problem instance-dependent asymptotic lower bound on the growth rate of the expected time required to find the index of the best arm, where the asymptotics is as the error probability vanishes. Further, we propose a sequential policy that, for an input parameter $R$, forcibly selects an arm that has not been selected for $R$ consecutive time instants. We show that this policy achieves an upper bound that depends on $R$ and is monotonically non-increasing as $R\to\infty$. The question of whether, in general, the limiting value of the upper bound as $R\to\infty$ matches with the lower bound, remains open. We identify a special case in which the upper and the lower bounds match. Prior works on best arm identification have dealt with (a) independent and identically distributed observations from the arms, and (b) rested Markov arms, whereas our work deals with the more difficult setting of restless Markov arms. △ Less

Submitted 29 March, 2022; originally announced March 2022.

Comments: 41 pages

arXiv:2203.01667 [pdf, other]

Joint Probability Estimation Using Tensor Decomposition and Dictionaries

Authors: Shaan ul Haque, Ajit Rajwade, Karthik S. Gurumoorthy

Abstract: In this work, we study non-parametric estimation of joint probabilities of a given set of discrete and continuous random variables from their (empirically estimated) 2D marginals, under the assumption that the joint probability could be decomposed and approximated by a mixture of product densities/mass functions. The problem of estimating the joint probability density function (PDF) using semi-par… ▽ More In this work, we study non-parametric estimation of joint probabilities of a given set of discrete and continuous random variables from their (empirically estimated) 2D marginals, under the assumption that the joint probability could be decomposed and approximated by a mixture of product densities/mass functions. The problem of estimating the joint probability density function (PDF) using semi-parametric techniques such as Gaussian Mixture Models (GMMs) is widely studied. However such techniques yield poor results when the underlying densities are mixtures of various other families of distributions such as Laplacian or generalized Gaussian, uniform, Cauchy, etc. Further, GMMs are not the best choice to estimate joint distributions which are hybrid in nature, i.e., some random variables are discrete while others are continuous. We present a novel approach for estimating the PDF using ideas from dictionary representations in signal processing coupled with low rank tensor decompositions. To the best our knowledge, this is the first work on estimating joint PDFs employing dictionaries alongside tensor decompositions. We create a dictionary of various families of distributions by inspecting the data, and use it to approximate each decomposed factor of the product in the mixture. Our approach can naturally handle hybrid $N$-dimensional distributions. We test our approach on a variety of synthetic and real datasets to demonstrate its effectiveness in terms of better classification rates and lower error rates, when compared to state of the art estimators. △ Less

Submitted 3 March, 2022; originally announced March 2022.

arXiv:2203.01161 [pdf, ps, other]

Discrete Optimal Transport with Independent Marginals is #P-Hard

Authors: Bahar Taşkesen, Soroosh Shafieezadeh-Abadeh, Daniel Kuhn, Karthik Natarajan

Abstract: We study the computational complexity of the optimal transport problem that evaluates the Wasserstein distance between the distributions of two K-dimensional discrete random vectors. The best known algorithms for this problem run in polynomial time in the maximum of the number of atoms of the two distributions. However, if the components of either random vector are independent, then this number ca… ▽ More We study the computational complexity of the optimal transport problem that evaluates the Wasserstein distance between the distributions of two K-dimensional discrete random vectors. The best known algorithms for this problem run in polynomial time in the maximum of the number of atoms of the two distributions. However, if the components of either random vector are independent, then this number can be exponential in K even though the size of the problem description scales linearly with K. We prove that the described optimal transport problem is #P-hard even if all components of the first random vector are independent uniform Bernoulli random variables, while the second random vector has merely two atoms, and even if only approximate solutions are sought. We also develop a dynamic programming-type algorithm that approximates the Wasserstein distance in pseudo-polynomial time when the components of the first random vector follow arbitrary independent discrete distributions, and we identify special problem instances that can be solved exactly in strongly polynomial time. △ Less

Submitted 14 October, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

arXiv:2201.09371 [pdf, other]

doi 10.1214/22-AOAS1696

Probabilistic Learning of Treatment Trees in Cancer

Authors: Tsung-Hung Yao, Zhenke Wu, Karthik Bharath, Jinju Li, Veerabhadran Baladandayuthapan

Abstract: Accurate identification of synergistic treatment combinations and their underlying biological mechanisms is critical across many disease domains, especially cancer. In translational oncology research, preclinical systems such as patient-derived xenografts (PDX) have emerged as a unique study design evaluating multiple treatments administered to samples from the same human tumor implanted into gene… ▽ More Accurate identification of synergistic treatment combinations and their underlying biological mechanisms is critical across many disease domains, especially cancer. In translational oncology research, preclinical systems such as patient-derived xenografts (PDX) have emerged as a unique study design evaluating multiple treatments administered to samples from the same human tumor implanted into genetically identical mice. In this paper, we propose a novel Bayesian probabilistic tree-based framework for PDX data to investigate the hierarchical relationships between treatments by inferring treatment cluster trees, referred to as treatment trees (Rx-tree). The framework motivates a new metric of mechanistic similarity between two or more treatments accounting for inherent uncertainty in tree estimation; treatments with a high estimated similarity have potentially high mechanistic synergy. Building upon Dirichlet Diffusion Trees, we derive a closed-form marginal likelihood encoding the tree structure, which facilitates computationally efficient posterior inference via a new two-stage algorithm. Simulation studies demonstrate superior performance of the proposed method in recovering the tree structure and treatment similarities. Our analyses of a recently collated PDX dataset produce treatment similarity estimates that show a high degree of concordance with known biological mechanisms across treatments in five different cancers. More importantly, we uncover new and potentially effective combination therapies that confer synergistic regulation of specific downstream biological pathways for future clinical investigations. Our accompanying code, data, and shiny application for visualization of results are available at: https://github.com/bayesrx/RxTree. △ Less

Submitted 23 January, 2022; originally announced January 2022.

arXiv:2111.02516 [pdf, other]

Differential Privacy Over Riemannian Manifolds

Authors: Matthew Reimherr, Karthik Bharath, Carlos Soto

Abstract: In this work we consider the problem of releasing a differentially private statistical summary that resides on a Riemannian manifold. We present an extension of the Laplace or K-norm mechanism that utilizes intrinsic distances and volumes on the manifold. We also consider in detail the specific case where the summary is the Fréchet mean of data residing on a manifold. We demonstrate that our mecha… ▽ More In this work we consider the problem of releasing a differentially private statistical summary that resides on a Riemannian manifold. We present an extension of the Laplace or K-norm mechanism that utilizes intrinsic distances and volumes on the manifold. We also consider in detail the specific case where the summary is the Fréchet mean of data residing on a manifold. We demonstrate that our mechanism is rate optimal and depends only on the dimension of the manifold, not on the dimension of any ambient space, while also showing how ignoring the manifold structure can decrease the utility of the sanitized summary. We illustrate our framework in two examples of particular interest in statistics: the space of symmetric positive definite matrices, which is used for covariance matrices, and the sphere, which can be used as a space for modeling discrete distributions. △ Less

Submitted 3 November, 2021; originally announced November 2021.

Comments: 15 pages (including supplementary material and references), 2 figures (including supplementary material), published in NeurIPS

arXiv:2109.01965 [pdf, other]

Scalable Feature Selection for (Multitask) Gradient Boosted Trees

Authors: Cuize Han, Nikhil Rao, Daria Sorokina, Karthik Subbian

Abstract: Gradient Boosted Decision Trees (GBDTs) are widely used for building ranking and relevance models in search and recommendation. Considerations such as latency and interpretability dictate the use of as few features as possible to train these models. Feature selection in GBDT models typically involves heuristically ranking the features by importance and selecting the top few, or by performing a ful… ▽ More Gradient Boosted Decision Trees (GBDTs) are widely used for building ranking and relevance models in search and recommendation. Considerations such as latency and interpretability dictate the use of as few features as possible to train these models. Feature selection in GBDT models typically involves heuristically ranking the features by importance and selecting the top few, or by performing a full backward feature elimination routine. On-the-fly feature selection methods proposed previously scale suboptimally with the number of features, which can be daunting in high dimensional settings. We develop a scalable forward feature selection variant for GBDT, via a novel group testing procedure that works well in high dimensions, and enjoys favorable theoretical performance and computational guarantees. We show via extensive experiments on both public and proprietary datasets that the proposed method offers significant speedups in training time, while being as competitive as existing GBDT methods in terms of model performance metrics. We also extend the method to the multitask setting, allowing the practitioner to select common features across tasks, as well as selecting task-specific features. △ Less

Submitted 4 September, 2021; originally announced September 2021.

Comments: Correct a mistake in the proof of Lemma B1 in http://proceedings.mlr.press/v108/han20a.html

Journal ref: Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:885-894, 2020

arXiv:2106.15436 [pdf, other]

Topo-Geometric Analysis of Variability in Point Clouds using Persistence Landscapes

Authors: James Matuk, Sebastian Kurtek, Karthik Bharath

Abstract: Topological data analysis provides a set of tools to uncover low-dimensional structure in noisy point clouds. Prominent amongst the tools is persistence homology, which summarizes birth-death times of homological features using data objects known as persistence diagrams. To better aid statistical analysis, a functional representation of the diagrams, known as persistence landscapes, enable use of… ▽ More Topological data analysis provides a set of tools to uncover low-dimensional structure in noisy point clouds. Prominent amongst the tools is persistence homology, which summarizes birth-death times of homological features using data objects known as persistence diagrams. To better aid statistical analysis, a functional representation of the diagrams, known as persistence landscapes, enable use of functional data analysis and machine learning tools. Topological and geometric variabilities inherent in point clouds are confounded in both persistence diagrams and landscapes, and it is important to distinguish topological signal from noise to draw reliable conclusions on the structure of the point clouds when using persistence homology. We develop a framework for decomposing variability in persistence diagrams into topological signal and topological noise through alignment of persistence landscapes using an elastic Riemannian metric. Aligned landscapes (amplitude) isolate the topological signal. Reparameterizations used for landscape alignment (phase) are linked to a resolution parameter used to generate persistence diagrams, and capture topological noise in the form of geometric, global scaling and sampling variabilities. We illustrate the importance of decoupling topological signal and topological noise in persistence diagrams (landscapes) using several simulated examples. We also demonstrate that our approach provides novel insights in two real data studies. △ Less

Submitted 1 February, 2024; v1 submitted 29 June, 2021; originally announced June 2021.

arXiv:2106.11880 [pdf, other]

Dynamic Customer Embeddings for Financial Service Applications

Authors: Nima Chitsazan, Samuel Sharpe, Dwipam Katariya, Qianyu Cheng, Karthik Rajasethupathy

Abstract: As financial services (FS) companies have experienced drastic technology driven changes, the availability of new data streams provides the opportunity for more comprehensive customer understanding. We propose Dynamic Customer Embeddings (DCE), a framework that leverages customers' digital activity and a wide range of financial context to learn dense representations of customers in the FS industry.… ▽ More As financial services (FS) companies have experienced drastic technology driven changes, the availability of new data streams provides the opportunity for more comprehensive customer understanding. We propose Dynamic Customer Embeddings (DCE), a framework that leverages customers' digital activity and a wide range of financial context to learn dense representations of customers in the FS industry. Our method examines customer actions and pageviews within a mobile or web digital session, the sequencing of the sessions themselves, and snapshots of common financial features across our organization at the time of login. We test our customer embeddings using real world data in three prediction problems: 1) the intent of a customer in their next digital session, 2) the probability of a customer calling the call centers after a session, and 3) the probability of a digital session to be fraudulent. DCE showed performance lift in all three downstream problems. △ Less

Submitted 22 June, 2021; originally announced June 2021.

Comments: ICML Workshop on Representation Learning for Finance and E-Commerce Applications

Showing 1–50 of 171 results for author: Karthik