Search | arXiv e-print repository

S-CFE: Simple Counterfactual Explanations

Authors: Shpresim Sadiku, Moritz Wagner, Sai Ganesh Nagarajan, Sebastian Pokutta

Abstract: We study the problem of finding optimal sparse, manifold-aligned counterfactual explanations for classifiers. Canonically, this can be formulated as an optimization problem with multiple non-convex components, including classifier loss functions and manifold alignment (or \emph{plausibility}) metrics. The added complexity of enforcing \emph{sparsity}, or shorter explanations, complicates the probl… ▽ More We study the problem of finding optimal sparse, manifold-aligned counterfactual explanations for classifiers. Canonically, this can be formulated as an optimization problem with multiple non-convex components, including classifier loss functions and manifold alignment (or \emph{plausibility}) metrics. The added complexity of enforcing \emph{sparsity}, or shorter explanations, complicates the problem further. Existing methods often focus on specific models and plausibility measures, relying on convex $\ell_1$ regularizers to enforce sparsity. In this paper, we tackle the canonical formulation using the accelerated proximal gradient (APG) method, a simple yet efficient first-order procedure capable of handling smooth non-convex objectives and non-smooth $\ell_p$ (where $0 \leq p < 1$) regularizers. This enables our approach to seamlessly incorporate various classifiers and plausibility measures while producing sparser solutions. Our algorithm only requires differentiable data-manifold regularizers and supports box constraints for bounded feature ranges, ensuring the generated counterfactuals remain \emph{actionable}. Finally, experiments on real-world datasets demonstrate that our approach effectively produces sparse, manifold-aligned counterfactual explanations while maintaining proximity to the factual data and computational efficiency. △ Less

Submitted 28 January, 2025; v1 submitted 21 October, 2024; originally announced October 2024.

arXiv:2106.09510 [pdf, ps, other]

Extremal mild solutions for Hilfer fractional evolution equation with mixed monotone Impulsive conditions

Authors: Divya Raghavan, Sukavanam Nagarajan

Abstract: The well established mixed monotone iterative technique that is used to study the existence and uniqueness of fractional order system is studied explicitly for impulsive system with Hilfer fractional order in this paper. The procedure of finding mild $L$-quasi solution of such impulsive evolution equation with noncomapct semigroups involves measure of non-compactness and Sadovskii's fixed point th… ▽ More The well established mixed monotone iterative technique that is used to study the existence and uniqueness of fractional order system is studied explicitly for impulsive system with Hilfer fractional order in this paper. The procedure of finding mild $L$-quasi solution of such impulsive evolution equation with noncomapct semigroups involves measure of non-compactness and Sadovskii's fixed point theorem as well. An example is provided to illustrate the main results. △ Less

Submitted 17 June, 2021; originally announced June 2021.

Comments: 18 pages. arXiv admin note: text overlap with arXiv:2011.11434

MSC Class: 26A33; 34K30; 34K45; 47D06

arXiv:2011.11434 [pdf, ps, other]

Extremal mild solutions of Hilfer fractional Impulsive systems

Authors: Divya Raghavan, Sukavanam Nagarajan

Abstract: The well established monotone iterative technique that is used to study the existence and uniqueness of fractional impulsive system is extended to Hilfer fractional order in this paper. The results are derived by using the method of upper and lower solution and Gronwall inequality. Also, conditions on non-compactness of measure is used effectively to prove the main result. The well established monotone iterative technique that is used to study the existence and uniqueness of fractional impulsive system is extended to Hilfer fractional order in this paper. The results are derived by using the method of upper and lower solution and Gronwall inequality. Also, conditions on non-compactness of measure is used effectively to prove the main result. △ Less

Submitted 23 November, 2020; originally announced November 2020.

Comments: 10 pages

MSC Class: 26A33; 34K30; 34K45; 47D06

arXiv:2009.09419 [pdf, ps, other]

Generalized Mittag-Leffler stability of fractional impulsive differential system

Authors: Divya Raghavan, Sukavanam Nagarajan, Chengbo Zhai

Abstract: This paper establishes integral representations of mild solutions of impulsive Hilfer fractional differential equations with impulsive conditions and fluctuating lower bounds at impulsive points. Further, the paper provides sufficient conditions for generalized Mittag-Leffler stability of a class of impulsive fractional differential systems with Hilfer order. The analysis extends through both, ins… ▽ More This paper establishes integral representations of mild solutions of impulsive Hilfer fractional differential equations with impulsive conditions and fluctuating lower bounds at impulsive points. Further, the paper provides sufficient conditions for generalized Mittag-Leffler stability of a class of impulsive fractional differential systems with Hilfer order. The analysis extends through both, instantaneous and non-instantaneous impulsive conditions. The theory utilizes continuous Lyapunov functions, to ascertain the stability conditions. An example is provided to study the solution of the system with a changeable lower bound for the non-instantaneous impulsive conditions. △ Less

Submitted 17 May, 2022; v1 submitted 20 September, 2020; originally announced September 2020.

MSC Class: 33E12; 93D20; 93D05

arXiv:2006.09735 [pdf, other]

Efficient Statistics for Sparse Graphical Models from Truncated Samples

Authors: Arnab Bhattacharyya, Rathin Desai, Sai Ganesh Nagarajan, Ioannis Panageas

Abstract: In this paper, we study high-dimensional estimation from truncated samples. We focus on two fundamental and classical problems: (i) inference of sparse Gaussian graphical models and (ii) support recovery of sparse linear models. (i) For Gaussian graphical models, suppose $d$-dimensional samples ${\bf x}$ are generated from a Gaussian $N(μ,Σ)$ and observed only if they belong to a subset… ▽ More In this paper, we study high-dimensional estimation from truncated samples. We focus on two fundamental and classical problems: (i) inference of sparse Gaussian graphical models and (ii) support recovery of sparse linear models. (i) For Gaussian graphical models, suppose $d$-dimensional samples ${\bf x}$ are generated from a Gaussian $N(μ,Σ)$ and observed only if they belong to a subset $S \subseteq \mathbb{R}^d$. We show that $μ$ and $Σ$ can be estimated with error $ε$ in the Frobenius norm, using $\tilde{O}\left(\frac{\textrm{nz}(Σ^{-1})}{ε^2}\right)$ samples from a truncated $\mathcal{N}(μ,Σ)$ and having access to a membership oracle for $S$. The set $S$ is assumed to have non-trivial measure under the unknown distribution but is otherwise arbitrary. (ii) For sparse linear regression, suppose samples $({\bf x},y)$ are generated where $y = {\bf x}^\top{Ω^*} + \mathcal{N}(0,1)$ and $({\bf x}, y)$ is seen only if $y$ belongs to a truncation set $S \subseteq \mathbb{R}$. We consider the case that $Ω^*$ is sparse with a support set of size $k$. Our main result is to establish precise conditions on the problem dimension $d$, the support size $k$, the number of observations $n$, and properties of the samples and the truncation that are sufficient to recover the support of $Ω^*$. Specifically, we show that under some mild assumptions, only $O(k^2 \log d)$ samples are needed to estimate $Ω^*$ in the $\ell_\infty$-norm up to a bounded error. For both problems, our estimator minimizes the sum of the finite population negative log-likelihood function and an $\ell_1$-regularization term. △ Less

Submitted 17 June, 2020; originally announced June 2020.

arXiv:2004.11188 [pdf, ps, other]

Properties of relaxed trajectories of non-linear fractional impulsive control systems

Authors: Divya Raghavan, Sukavanam Nagarajan

Abstract: A non-convex control system governed by a nonlinear impulsive evolution equation of Hilfer fractional order in a Banach space is considered. The existence of admissible state-control pair is established. Then the introduction of suitable measure-valued control convexifies the system, and the relaxed system is obtained. Further, the relaxation theorem for the described class is proved along with th… ▽ More A non-convex control system governed by a nonlinear impulsive evolution equation of Hilfer fractional order in a Banach space is considered. The existence of admissible state-control pair is established. Then the introduction of suitable measure-valued control convexifies the system, and the relaxed system is obtained. Further, the relaxation theorem for the described class is proved along with the existence of optimal relaxed control. △ Less

Submitted 28 May, 2022; v1 submitted 23 April, 2020; originally announced April 2020.

Comments: 21 pages

MSC Class: 37L05; 49J45; 26A33; 49N25

arXiv:2003.00777 [pdf, other]

Better Depth-Width Trade-offs for Neural Networks through the lens of Dynamical Systems

Authors: Vaggos Chatziafratis, Sai Ganesh Nagarajan, Ioannis Panageas

Abstract: The expressivity of neural networks as a function of their depth, width and type of activation units has been an important question in deep learning theory. Recently, depth separation results for ReLU networks were obtained via a new connection with dynamical systems, using a generalized notion of fixed points of a continuous map $f$, called periodic points. In this work, we strengthen the connect… ▽ More The expressivity of neural networks as a function of their depth, width and type of activation units has been an important question in deep learning theory. Recently, depth separation results for ReLU networks were obtained via a new connection with dynamical systems, using a generalized notion of fixed points of a continuous map $f$, called periodic points. In this work, we strengthen the connection with dynamical systems and we improve the existing width lower bounds along several aspects. Our first main result is period-specific width lower bounds that hold under the stronger notion of $L^1$-approximation error, instead of the weaker classification error. Our second contribution is that we provide sharper width lower bounds, still yielding meaningful exponential depth-width separations, in regimes where previous results wouldn't apply. A byproduct of our results is that there exists a universal constant characterizing the depth-width trade-offs, as long as $f$ has odd periods. Technically, our results follow by unveiling a tighter connection between the following three quantities of a given function: its period, its Lipschitz constant and the growth rate of the number of oscillations arising under compositions of the function $f$ with itself. △ Less

Submitted 20 July, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

Comments: Appeared in ICML 2020

arXiv:1912.04378 [pdf, ps, other]

Depth-Width Trade-offs for ReLU Networks via Sharkovsky's Theorem

Authors: Vaggos Chatziafratis, Sai Ganesh Nagarajan, Ioannis Panageas, Xiao Wang

Abstract: Understanding the representational power of Deep Neural Networks (DNNs) and how their structural properties (e.g., depth, width, type of activation unit) affect the functions they can compute, has been an important yet challenging question in deep learning and approximation theory. In a seminal paper, Telgarsky highlighted the benefits of depth by presenting a family of functions (based on simple… ▽ More Understanding the representational power of Deep Neural Networks (DNNs) and how their structural properties (e.g., depth, width, type of activation unit) affect the functions they can compute, has been an important yet challenging question in deep learning and approximation theory. In a seminal paper, Telgarsky highlighted the benefits of depth by presenting a family of functions (based on simple triangular waves) for which DNNs achieve zero classification error, whereas shallow networks with fewer than exponentially many nodes incur constant error. Even though Telgarsky's work reveals the limitations of shallow neural networks, it does not inform us on why these functions are difficult to represent and in fact he states it as a tantalizing open question to characterize those functions that cannot be well-approximated by smaller depths. In this work, we point to a new connection between DNNs expressivity and Sharkovsky's Theorem from dynamical systems, that enables us to characterize the depth-width trade-offs of ReLU networks for representing functions based on the presence of generalized notion of fixed points, called periodic points (a fixed point is a point of period 1). Motivated by our observation that the triangle waves used in Telgarsky's work contain points of period 3 - a period that is special in that it implies chaotic behavior based on the celebrated result by Li-Yorke - we proceed to give general lower bounds for the width needed to represent periodic functions as a function of the depth. Technically, the crux of our approach is based on an eigenvalue analysis of the dynamical system associated with such functions. △ Less

Submitted 9 December, 2019; originally announced December 2019.

arXiv:1902.06958 [pdf, other]

On the Analysis of EM for truncated mixtures of two Gaussians

Authors: Sai Ganesh Nagarajan, Ioannis Panageas

Abstract: Motivated by a recent result of Daskalakis et al. 2018, we analyze the population version of Expectation-Maximization (EM) algorithm for the case of \textit{truncated} mixtures of two Gaussians. Truncated samples from a $d$-dimensional mixture of two Gaussians $\frac{1}{2} \mathcal{N}(\vecμ, \vecΣ)+ \frac{1}{2} \mathcal{N}(-\vecμ, \vecΣ)$ means that a sample is only revealed if it falls in some su… ▽ More Motivated by a recent result of Daskalakis et al. 2018, we analyze the population version of Expectation-Maximization (EM) algorithm for the case of \textit{truncated} mixtures of two Gaussians. Truncated samples from a $d$-dimensional mixture of two Gaussians $\frac{1}{2} \mathcal{N}(\vecμ, \vecΣ)+ \frac{1}{2} \mathcal{N}(-\vecμ, \vecΣ)$ means that a sample is only revealed if it falls in some subset $S \subset \mathbb{R}^d$ of positive (Lebesgue) measure. We show that for $d=1$, EM converges almost surely (under random initialization) to the true mean (variance $σ^2$ is known) for any measurable set $S$. Moreover, for $d>1$ we show EM almost surely converges to the true mean for any measurable set $S$ when the map of EM has only three fixed points, namely $-\vecμ, \vec{0}, \vecμ$ (covariance matrix $\vecΣ$ is known), and prove local convergence if there are more than three fixed points. We also provide convergence rates of our findings. Our techniques deviate from those of Daskalakis et al. 2017, which heavily depend on symmetry that the untruncated problem exhibits. For example, for an arbitrary measurable set $S$, it is impossible to compute a closed form of the update rule of EM. Moreover, arbitrarily truncating the mixture, induces further correlations among the variables. We circumvent these challenges by using techniques from dynamical systems, probability and statistics; implicit function theorem, stability analysis around the fixed points of the update rule of EM and correlation inequalities (FKG). △ Less

Submitted 9 May, 2020; v1 submitted 19 February, 2019; originally announced February 2019.

Comments: Appeared in ALT 2020. Last version fixes statement about rates for single dimensional case

arXiv:1211.3211 [pdf, other]

Effectiveness of sparse Bayesian algorithm for MVAR coefficient estimation in MEG/EEG source-space causality analysis

Authors: Kensuke Sekihara, Hagai Attias, Julia P. Owen, Srikantan S. Nagarajan

Abstract: This paper examines the effectiveness of a sparse Bayesian algorithm to estimate multivariate autoregressive coefficients when a large amount of background interference exists. This paper employs computer experiments to compare two methods in the source-space causality analysis: the conventional least-squares method and a sparse Bayesian method. Results of our computer experiments show that the in… ▽ More This paper examines the effectiveness of a sparse Bayesian algorithm to estimate multivariate autoregressive coefficients when a large amount of background interference exists. This paper employs computer experiments to compare two methods in the source-space causality analysis: the conventional least-squares method and a sparse Bayesian method. Results of our computer experiments show that the interference affects the least-squares method in a very severe manner. It produces large false-positive results, unless the signal-to-interference ratio is very high. On the other hand, the sparse Bayesian method is relatively insensitive to the existence of interference. However, this robustness of the sparse Bayesian method is attained on the scarifies of the detectability of true causal relationship. Our experiments also show that the surrogate data bootstrapping method tends to give a statistical threshold that are too low for the sparse method. The permutation-test-based method gives a higher (more conservative) threshold and it should be used with the sparse Bayesian method whenever the control period is available. △ Less

Submitted 14 November, 2012; originally announced November 2012.

Comments: Proceedings of the 8th Annual Conference of Non-invasive Functional Source Imaging held at Banff, May 2011

Showing 1–10 of 10 results for author: Nagarajan, S