Search | arXiv e-print repository

Elephant random walks on infinite Cayley trees

Abstract: In this article, we initiate the study of elephant random walks on finitely generated infinite groups whose Cayley graphs are homogeneous trees of degree $d \ge 3$ (e.g., groups of the form $\mathbb{Z}^{* d_1} * \mathbb{Z}_2^{*d_2}$ with $2d_1 + d_2 \ge 3$). We show that the asymptotic speed of the walk does not depend on the memory parameter $p \in [0, 1)$ and equals $\frac{d - 2}{d}$, the asympt… ▽ More In this article, we initiate the study of elephant random walks on finitely generated infinite groups whose Cayley graphs are homogeneous trees of degree $d \ge 3$ (e.g., groups of the form $\mathbb{Z}^{* d_1} * \mathbb{Z}_2^{*d_2}$ with $2d_1 + d_2 \ge 3$). We show that the asymptotic speed of the walk does not depend on the memory parameter $p \in [0, 1)$ and equals $\frac{d - 2}{d}$, the asymptotic speed of simple random walk on these graphs. The rate of convergence to the limiting speed turns out to depend on the memory parameter $p$ and one encounters phase transitions akin to elephant random walks on $\mathbb{Z}^d$. △ Less

Submitted 3 September, 2025; originally announced September 2025.

Comments: 12 pages, 1 figure

MSC Class: 60G50; 82C41; 60K99; 60G42

arXiv:2507.15097 [pdf, ps, other]

Learning under Latent Group Sparsity via Diffusion on Networks

Authors: Subhroshekhar Ghosh, Soumendu Sundar Mukherjee

Abstract: Group or cluster structure on explanatory variables in machine learning problems is a very general phenomenon, which has attracted broad interest from practitioners and theoreticians alike. In this work we contribute an approach to sparse learning under such group structure, that does not require prior information on the group identities. Our paradigm is motivated by the Laplacian geometry of an u… ▽ More Group or cluster structure on explanatory variables in machine learning problems is a very general phenomenon, which has attracted broad interest from practitioners and theoreticians alike. In this work we contribute an approach to sparse learning under such group structure, that does not require prior information on the group identities. Our paradigm is motivated by the Laplacian geometry of an underlying network with a related community structure, and proceeds by directly incorporating this into a penalty that is effectively computed via a heat-flow-based local network dynamics. The proposed penalty interpolates between the lasso and the group lasso penalties, the runtime of the heat-flow dynamics being the interpolating parameter. As such it can automatically default to lasso when the group structure reflected in the Laplacian is weak. In fact, we demonstrate a data-driven procedure to construct such a network based on the available data. Notably, we dispense with computationally intensive pre-processing involving clustering of variables, spectral or otherwise. Our technique is underpinned by rigorous theorems that guarantee its effective performance and provide bounds on its sample complexity. In particular, in a wide range of settings, it provably suffices to run the diffusion for time that is only logarithmic in the problem dimensions. We explore in detail the interfaces of our approach with key statistical physics models in network science, such as the Gaussian Free Field and the Stochastic Block Model. Our work raises the possibility of applying similar diffusion-based techniques to classical learning tasks, exploiting the interplay between geometric, dynamical and stochastic structures underlying the data. △ Less

Submitted 20 July, 2025; originally announced July 2025.

Comments: 49 pages, 4 figures, 2 tables; this submission subsumes the earlier preprint arXiv:2201.08326

arXiv:2505.10555 [pdf, ps, other]

Spectra of contractions of the Gaussian Orthogonal Tensor Ensemble

Authors: Soumendu Sundar Mukherjee, Himasish Talukdar

Abstract: In this article, we study the spectra of matrix-valued contractions of the Gaussian Orthogonal Tensor Ensemble (GOTE). Let $\mathcal{G}$ denote a random tensor of order $r$ and dimension $n$ drawn from the density \[ f(\mathcal{G}) \propto \exp\bigg(-\frac{1}{2r}\|\mathcal{G}\|^2_{\mathrm{F}}\bigg). \] For $\mathbf{w} \in \mathbb{S}^{n - 1}$, the unit-sphere in $\mathbb{R}^n$, we consider the ma… ▽ More In this article, we study the spectra of matrix-valued contractions of the Gaussian Orthogonal Tensor Ensemble (GOTE). Let $\mathcal{G}$ denote a random tensor of order $r$ and dimension $n$ drawn from the density \[ f(\mathcal{G}) \propto \exp\bigg(-\frac{1}{2r}\|\mathcal{G}\|^2_{\mathrm{F}}\bigg). \] For $\mathbf{w} \in \mathbb{S}^{n - 1}$, the unit-sphere in $\mathbb{R}^n$, we consider the matrix-valued contraction $\mathcal{G} \cdot \mathbf{w}^{\otimes (r - 2)}$ when both $r$ and $n$ go to infinity such that $r / n \to c \in [0, \infty]$. We obtain semi-circle bulk-limits in all regimes, generalising the works of Goulart et al. (2022); Au and Garza-Vargas (2023); Bonnin (2024) in the fixed-$r$ setting. We also study the edge-spectrum. We obtain a Baik-Ben Arous-Péché phase-transition for the largest and the smallest eigenvalues at $r = 4$, generalising a result of Mukherjee et al. (2024) in the context of adjacency matrices of random hypergraphs. We also show that the extreme eigenvectors of $\mathcal{G} \cdot \mathbf{w}^{\otimes (r - 2)}$ contain non-trivial information about the contraction direction $\mathbf{w}$. Finally, we report some results, in the case $r = 4$, on mixed contractions $\mathcal{G} \cdot \mathbf{u} \otimes \mathbf{v}$, $\mathbf{u}, \mathbf{v} \in \mathbb{S}^{n - 1}$. While the total variation distance between the joint distribution of the entries of $\mathcal{G} \cdot \mathbf{u} \otimes \mathbf{v}$ and that of $\mathcal{G} \cdot \mathbf{u} \otimes \mathbf{u}$ goes to $0$ when $\|\mathbf{u} - \mathbf{v}\| = o(n^{-1})$, the bulk and the largest eigenvalues of these two matrices have the same limit profile as long as $\|\mathbf{u} - \mathbf{v}\| = o(1)$. Furthermore, it turns out that there are no outlier eigenvalues in the spectrum of $\mathcal{G} \cdot \mathbf{u} \otimes \mathbf{v}$ when $\langle \mathbf{u}, \mathbf{v} \rangle = o(1)$. △ Less

Submitted 15 May, 2025; originally announced May 2025.

Comments: 45 pages, 1 figure; abstract shortened to meet arXiv requirements

arXiv:2504.07720 [pdf, ps, other]

Filtering through a topological lens: homology for point processes on the time-frequency plane

Authors: Juan Manuel Miramont, Kin Aun Tan, Soumendu Sundar Mukherjee, Rémi Bardenet, Subhroshekhar Ghosh

Abstract: We introduce a very general approach to the analysis of signals from their noisy measurements from the perspective of Topological Data Analysis (TDA). While TDA has emerged as a powerful analytical tool for data with pronounced topological structures, here we demonstrate its applicability for general problems of signal processing, without any a-priori geometric feature. Our methods are well-suited… ▽ More We introduce a very general approach to the analysis of signals from their noisy measurements from the perspective of Topological Data Analysis (TDA). While TDA has emerged as a powerful analytical tool for data with pronounced topological structures, here we demonstrate its applicability for general problems of signal processing, without any a-priori geometric feature. Our methods are well-suited to a wide array of time-dependent signals in different scientific domains, with acoustic signals being a particularly important application. We invoke time-frequency representations of such signals, focusing on their zeros which are gaining salience as a signal processing tool in view of their stability properties. Leveraging state-of-the-art topological concepts, such as stable and minimal volumes, we develop a complete suite of TDA-based methods to explore the delicate stochastic geometry of these zeros, capturing signals based on the disruption they cause to this rigid, hyperuniform spatial structure. Unlike classical spatial data tools, TDA is able to capture the full spectrum of the stochastic geometry of the zeros, thereby leading to powerful inferential outcomes that are underpinned by a principled statistical foundation. This is reflected in the power and versatility of our applications, which include competitive performance in processing. a wide variety of audio signals (esp. in low SNR regimes), effective detection and reconstruction of gravitational wave signals (a reputed signal processing challenge with non-Gaussian noise), and medical time series data from EEGs, indicating a wide horizon for the approach and methods introduced in this paper. △ Less

Submitted 25 July, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

arXiv:2412.19802 [pdf, other]

A new approach to locally adaptive polynomial regression

Authors: Sabyasachi Chatterjee, Subhajit Goswami, Soumendu Sundar Mukherjee

Abstract: Adaptive bandwidth selection is a fundamental challenge in nonparametric regression. This paper introduces a new bandwidth selection procedure inspired by the optimality criteria for $\ell_0$-penalized regression. Although similar in spirit to Lepski's method and its variants in selecting the largest interval satisfying an admissibility criterion, our approach stems from a distinct philosophy, uti… ▽ More Adaptive bandwidth selection is a fundamental challenge in nonparametric regression. This paper introduces a new bandwidth selection procedure inspired by the optimality criteria for $\ell_0$-penalized regression. Although similar in spirit to Lepski's method and its variants in selecting the largest interval satisfying an admissibility criterion, our approach stems from a distinct philosophy, utilizing criteria based on $\ell_2$-norms of interval projections rather than explicit point and variance estimates. We obtain non-asymptotic risk bounds for the local polynomial regression methods based on our bandwidth selection procedure which adapt (near-)optimally to the local Hölder exponent of the underlying regression function simultaneously at all points in its domain. Furthermore, we show that there is a single ideal choice of a global tuning parameter in each case under which the above-mentioned local adaptivity holds. The optimal risks of our methods derive from the properties of solutions to a new ``bandwidth selection equation'' which is of independent interest. We believe that the principles underlying our approach provide a new perspective to the classical yet ever relevant problem of locally adaptive nonparametric regression. △ Less

Submitted 20 May, 2025; v1 submitted 27 December, 2024; originally announced December 2024.

Comments: 29 pages, 4 figures; in this version, the title has been updated and the exposition significantly expanded

arXiv:2409.11381 [pdf, other]

Edge spectra of Gaussian random symmetric matrices with correlated entries

Authors: Debapratim Banerjee, Soumendu Sundar Mukherjee, Dipranjan Pal

Abstract: We study the largest eigenvalue of a Gaussian random symmetric matrix $X_n$, with zero-mean, unit variance entries satisfying the condition $\sup_{(i, j) \ne (i', j')}|\mathbb{E}[X_{ij} X_{i'j'}]| = O(n^{-(1 + \varepsilon)})$, where $\varepsilon > 0$. It follows from Catalano et al. (2024) that the empirical spectral distribution of $n^{-1/2} X_n$ converges weakly almost surely to the standard sem… ▽ More We study the largest eigenvalue of a Gaussian random symmetric matrix $X_n$, with zero-mean, unit variance entries satisfying the condition $\sup_{(i, j) \ne (i', j')}|\mathbb{E}[X_{ij} X_{i'j'}]| = O(n^{-(1 + \varepsilon)})$, where $\varepsilon > 0$. It follows from Catalano et al. (2024) that the empirical spectral distribution of $n^{-1/2} X_n$ converges weakly almost surely to the standard semi-circle law. Using a Füredi-Komlós-type high moment analysis, we show that the largest eigenvalue $λ_1(n^{-1/2} X_n)$ of $n^{-1/2} X_n$ converges almost surely to $2$. This result is essentially optimal in the sense that one cannot take $\varepsilon = 0$ and still obtain an almost sure limit of $2$. We also derive Gaussian fluctuation results for the largest eigenvalue in the case where the entries have a common non-zero mean. Let $Y_n = X_n + \fracλ{\sqrt{n}}\mathbf{1} \mathbf{1}^\top$. When $\varepsilon \ge 1$ and $λ\gg n^{1/4}$, we show that \[ n^{1/2}\bigg(λ_1(n^{-1/2} Y_n) - λ- \frac{1}λ\bigg) \xrightarrow{d} \sqrt{2} Z, \] where $Z$ is a standard Gaussian. On the other hand, when $0 < \varepsilon < 1$, we have $\mathrm{Var}(\frac{1}{n}\sum_{i, j}X_{ij}) = O(n^{1 - \varepsilon})$. Assuming that $\mathrm{Var}(\frac{1}{n}\sum_{i, j} X_{ij}) = σ^2 n^{1 - \varepsilon} (1 + o(1))$, if $λ\gg n^{\varepsilon/4}$, then we have \[ n^{\varepsilon/2}\bigg(λ_1(n^{-1/2} Y_n) - λ- \frac{1}λ\bigg) \xrightarrow{d} σZ. \] While the ranges of $λ$ in these fluctuation results are certainly not optimal, a striking aspect is that different scalings are required in the two regimes $0 < \varepsilon < 1$ and $\varepsilon \ge 1$. △ Less

Submitted 7 February, 2025; v1 submitted 17 September, 2024; originally announced September 2024.

Comments: 27 pages, 2 figures; abstract shortened to meet arXiv requirements

arXiv:2409.03756 [pdf, other]

Spectra of adjacency and Laplacian matrices of Erdős-Rényi hypergraphs

Authors: Soumendu Sundar Mukherjee, Dipranjan Pal, Himasish Talukdar

Abstract: We study adjacency and Laplacian matrices of Erdős-Rényi $r$-uniform hypergraphs on $n$ vertices with hyperedge inclusion probability $p$, in the setting where $r$ can vary with $n$ such that $r / n \to c \in [0, 1)$. Adjacency matrices of hypergraphs are contractions of adjacency tensors and their entries exhibit long range correlations. We show that under the Erdős-Rényi model, the expected empi… ▽ More We study adjacency and Laplacian matrices of Erdős-Rényi $r$-uniform hypergraphs on $n$ vertices with hyperedge inclusion probability $p$, in the setting where $r$ can vary with $n$ such that $r / n \to c \in [0, 1)$. Adjacency matrices of hypergraphs are contractions of adjacency tensors and their entries exhibit long range correlations. We show that under the Erdős-Rényi model, the expected empirical spectral distribution of an appropriately normalised hypergraph adjacency matrix converges weakly to the semi-circle law with variance $(1 - c)^2$ as long as $\frac{d_{\avg}}{r^7} \to \infty$, where $d_{\avg} = \binom{n-1}{r-1} p$. In contrast with the Erdős-Rényi random graph ($r = 2$), two eigenvalues stick out of the bulk of the spectrum. When $r$ is fixed and $d_{\avg} \gg n^{r - 2} \log^4 n$, we uncover an interesting Baik-Ben Arous-Péché (BBP) phase transition at the value $r = 3$. For $r \in \{2, 3\}$, an appropriately scaled largest (resp. smallest) eigenvalue converges in probability to $2$ (resp. $-2$), the right (resp. left) end point of the support of the standard semi-circle law, and when $r \ge 4$, it converges to $\sqrt{r - 2} + \frac{1}{\sqrt{r - 2}}$ (resp. $-\sqrt{r - 2} - \frac{1}{\sqrt{r - 2}}$). Further, in a Gaussian version of the model we show that an appropriately scaled largest (resp. smallest) eigenvalue converges in distribution to $\frac{c}{2} ζ+ \big[\frac{c^2}{4}ζ^2 + c(1 - c)\big]^{1/2}$ (resp. $\frac{c}{2} ζ- \big[\frac{c^2}{4}ζ^2 + c(1 - c)\big]^{1/2}$), where $ζ$ is a standard Gaussian. We also establish analogous results for the bulk and edge eigenvalues of the associated Laplacian matrices. △ Less

Submitted 5 September, 2024; originally announced September 2024.

arXiv:2409.02911 [pdf, other]

Bulk Spectra of Truncated Sample Covariance Matrices

Authors: Subhroshekhar Ghosh, Soumendu Sundar Mukherjee, Himasish Talukdar

Abstract: Determinantal Point Processes (DPPs), which originate from quantum and statistical physics, are known for modelling diversity. Recent research [Ghosh and Rigollet (2020)] has demonstrated that certain matrix-valued $U$-statistics (that are truncated versions of the usual sample covariance matrix) can effectively estimate parameters in the context of Gaussian DPPs and enhance dimension reduction te… ▽ More Determinantal Point Processes (DPPs), which originate from quantum and statistical physics, are known for modelling diversity. Recent research [Ghosh and Rigollet (2020)] has demonstrated that certain matrix-valued $U$-statistics (that are truncated versions of the usual sample covariance matrix) can effectively estimate parameters in the context of Gaussian DPPs and enhance dimension reduction techniques, outperforming standard methods like PCA in clustering applications. This paper explores the spectral properties of these matrix-valued $U$-statistics in the null setting of an isotropic design. These matrices may be represented as $X L X^\top$, where $X$ is a data matrix and $L$ is the Laplacian matrix of a random geometric graph associated to $X$. The main mathematically interesting twist here is that the matrix $L$ is dependent on $X$. We give complete descriptions of the bulk spectra of these matrix-valued $U$-statistics in terms of the Stieltjes transforms of their empirical spectral measures. The results and the techniques are in fact able to address a broader class of kernelised random matrices, connecting their limiting spectra to generalised Marčenko-Pastur laws and free probability. △ Less

Submitted 4 September, 2024; originally announced September 2024.

Comments: 26 pages, 2 figures

arXiv:2312.12428 [pdf, other]

The "visible" Wigner matrix

Authors: Arup Bose, Soumendu Sundar Mukherjee

Abstract: We consider the ``visible'' Wigner matrix, a Wigner matrix whose $(i, j)$-th entry is coerced to zero if $i, j$ are co-prime. Using a recent result from elementary number theory on co-primality patterns in integers, we show that the limiting spectral distribution of this matrix exists, and give explicit descriptions of its moments in terms of infinite products over primes $p$ of certain polynomial… ▽ More We consider the ``visible'' Wigner matrix, a Wigner matrix whose $(i, j)$-th entry is coerced to zero if $i, j$ are co-prime. Using a recent result from elementary number theory on co-primality patterns in integers, we show that the limiting spectral distribution of this matrix exists, and give explicit descriptions of its moments in terms of infinite products over primes $p$ of certain polynomials evaluated at $1/p$. We also consider the complementary ``invisible'' Wigner matrix. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 12 pages, 5 figures, 1 table

MSC Class: 60B20

arXiv:2312.07839 [pdf, ps, other]

Minimax-optimal estimation for sparse multi-reference alignment with collision-free signals

Authors: Subhro Ghosh, Soumendu Sundar Mukherjee, Jing Bin Pan

Abstract: The Multi-Reference Alignment (MRA) problem aims at the recovery of an unknown signal from repeated observations under the latent action of a group of cyclic isometries, in the presence of additive noise of high intensity $σ$. It is a more tractable version of the celebrated cryo EM model. In the crucial high noise regime, it is known that its sample complexity scales as $σ^6$. Recent investigatio… ▽ More The Multi-Reference Alignment (MRA) problem aims at the recovery of an unknown signal from repeated observations under the latent action of a group of cyclic isometries, in the presence of additive noise of high intensity $σ$. It is a more tractable version of the celebrated cryo EM model. In the crucial high noise regime, it is known that its sample complexity scales as $σ^6$. Recent investigations have shown that for the practically significant setting of sparse signals, the sample complexity of the maximum likelihood estimator asymptotically scales with the noise level as $σ^4$. In this work, we investigate minimax optimality for signal estimation under the MRA model for so-called collision-free signals. In particular, this signal class covers the setting of generic signals of dilute sparsity (wherein the support size $s=O(L^{1/3})$, where $L$ is the ambient dimension. We demonstrate that the minimax optimal rate of estimation in for the sparse MRA problem in this setting is $σ^2/\sqrt{n}$, where $n$ is the sample size. In particular, this widely generalizes the sample complexity asymptotics for the restricted MLE in this setting, establishing it as the statistically optimal estimator. Finally, we demonstrate a concentration inequality for the restricted MLE on its deviations from the ground truth. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2309.10864 [pdf, other]

A dynamic mean-field statistical model of academic collaboration

Authors: Soumendu Sundar Mukherjee, Tamojit Sadhukhan, Shirshendu Chatterjee

Abstract: There is empirical evidence that collaboration in academia has increased significantly during the past few decades, perhaps due to the breathtaking advancements in communication and technology during this period. Multi-author articles have become more frequent than single-author ones. Interdisciplinary collaboration is also on the rise. Although there have been several studies on the dynamical asp… ▽ More There is empirical evidence that collaboration in academia has increased significantly during the past few decades, perhaps due to the breathtaking advancements in communication and technology during this period. Multi-author articles have become more frequent than single-author ones. Interdisciplinary collaboration is also on the rise. Although there have been several studies on the dynamical aspects of collaboration networks, systematic statistical models which theoretically explain various empirically observed features of such networks have been lacking. In this work, we propose a dynamic mean-field model and an associated estimation framework for academic collaboration networks. We primarily focus on how the degree of collaboration of a typical author, rather than the local structure of her collaboration network, changes over time. We consider several popular indices of collaboration from the literature and study their dynamics under the proposed model. In particular, we obtain exact formulae for the expectations and temporal rates of change of these indices. Through extensive simulation experiments, we demonstrate that the proposed model has enough flexibility to capture various phenomena characteristic of real-world collaboration networks. Using metadata on papers from the arXiv repository, we empirically study the mean-field collaboration dynamics in disciplines such as Computer Science, Mathematics and Physics. △ Less

Submitted 19 September, 2023; originally announced September 2023.

Comments: 27 pages, 20 figures

arXiv:2308.02344 [pdf, ps, other]

Learning Networks from Gaussian Graphical Models and Gaussian Free Fields

Authors: Subhro Ghosh, Soumendu Sundar Mukherjee, Hoang-Son Tran, Ujan Gangopadhyay

Abstract: We investigate the problem of estimating the structure of a weighted network from repeated measurements of a Gaussian Graphical Model (GGM) on the network. In this vein, we consider GGMs whose covariance structures align with the geometry of the weighted network on which they are based. Such GGMs have been of longstanding interest in statistical physics, and are referred to as the Gaussian Free Fi… ▽ More We investigate the problem of estimating the structure of a weighted network from repeated measurements of a Gaussian Graphical Model (GGM) on the network. In this vein, we consider GGMs whose covariance structures align with the geometry of the weighted network on which they are based. Such GGMs have been of longstanding interest in statistical physics, and are referred to as the Gaussian Free Field (GFF). In recent years, they have attracted considerable interest in the machine learning and theoretical computer science. In this work, we propose a novel estimator for the weighted network (equivalently, its Laplacian) from repeated measurements of a GFF on the network, based on the Fourier analytic properties of the Gaussian distribution. In this pursuit, our approach exploits complex-valued statistics constructed from observed data, that are of interest on their own right. We demonstrate the effectiveness of our estimator with concrete recovery guarantees and bounds on the required sample complexity. In particular, we show that the proposed statistic achieves the parametric rate of estimation for fixed network size. In the setting of networks growing with sample size, our results show that for Erdos-Renyi random graphs $G(d,p)$ above the connectivity threshold, we demonstrate that network recovery takes place with high probability as soon as the sample size $n$ satisfies $n \gg d^4 \log d \cdot p^{-2}$. △ Less

Submitted 4 August, 2023; originally announced August 2023.

arXiv:2307.12982 [pdf, other]

Consistent model selection in the spiked Wigner model via AIC-type criteria

Authors: Soumendu Sundar Mukherjee

Abstract: Consider the spiked Wigner model \[ X = \sum_{i = 1}^k λ_i u_i u_i^\top + σG, \] where $G$ is an $N \times N$ GOE random matrix, and the eigenvalues $λ_i$ are all spiked, i.e. above the Baik-Ben Arous-Péché (BBP) threshold $σ$. We consider AIC-type model selection criteria of the form \[ -2 \, (\text{maximised log-likelihood}) + γ\, (\text{number of parameters}) \] for estimating the number… ▽ More Consider the spiked Wigner model \[ X = \sum_{i = 1}^k λ_i u_i u_i^\top + σG, \] where $G$ is an $N \times N$ GOE random matrix, and the eigenvalues $λ_i$ are all spiked, i.e. above the Baik-Ben Arous-Péché (BBP) threshold $σ$. We consider AIC-type model selection criteria of the form \[ -2 \, (\text{maximised log-likelihood}) + γ\, (\text{number of parameters}) \] for estimating the number $k$ of spikes. For $γ> 2$, the above criterion is strongly consistent provided $λ_k > λ_γ$, where $λ_γ$ is a threshold strictly above the BBP threshold, whereas for $γ< 2$, it almost surely overestimates $k$. Although AIC (which corresponds to $γ= 2$) is not strongly consistent, we show that taking $γ= 2 + δ_N$, where $δ_N \to 0$ and $δ_N \gg N^{-2/3}$, results in a weakly consistent estimator of $k$. We further show that a soft minimiser of AIC, where one chooses the least complex model whose AIC score is close to the minimum AIC score, is strongly consistent. Based on a spiked (generalised) Wigner representation, we also develop similar model selection criteria for consistently estimating the number of communities in a balanced stochastic block model under some sparsity restrictions. △ Less

Submitted 7 February, 2025; v1 submitted 24 July, 2023; originally announced July 2023.

Comments: 25 pages, 2 figures, 5 tables

arXiv:2304.01145 [pdf, other]

On a generalisation of the coupon collector problem

Authors: Siva Athreya, Satyaki Mukherjee, Soumendu Sundar Mukherjee

Abstract: We consider a generalisation of the classical coupon collector problem. We define a super-coupon to be any $s$-subset of a universe of $n$ coupons. In each round, a random $r$-subset from the universe is drawn and all its $s$-subsets are marked as collected. We show that the time to collect all super-coupons is $\binom{r}{s}^{-1}\binom{n}{s} \log \binom{n}{s}(1 + o(1))$ on average and has a Gumbel… ▽ More We consider a generalisation of the classical coupon collector problem. We define a super-coupon to be any $s$-subset of a universe of $n$ coupons. In each round, a random $r$-subset from the universe is drawn and all its $s$-subsets are marked as collected. We show that the time to collect all super-coupons is $\binom{r}{s}^{-1}\binom{n}{s} \log \binom{n}{s}(1 + o(1))$ on average and has a Gumbel limit after a suitable normalisation. In a similar vein, we show that for any $α\in (0, 1)$, the expected time to collect $(1 - α)$ proportion of all super-coupons is $\binom{r}{s}^{-1}\binom{n}{s} \log \big(\frac{1}α\big)(1 + o(1))$. The $r = s$ case of this model is equivalent to the classical coupon collector model. We also consider a temporally dependent model where the $r$-subsets are drawn according to the following Markovian dynamics: the $r$-subset at round $k + 1$ is formed by replacing a random coupon from the $r$-subset drawn at round $k$ with another random coupon from outside this $r$-subset. We link the time it takes to collect all super-coupons in the $r = s$ case of this model to the cover time of random walk on a certain finite regular graph and conjecture that in general, it takes $\frac{r}{s} \binom{r}{s}^{-1}\binom{n}{s}\log\binom{n}{s}(1 + o(1))$ time on average to collect all super-coupons. △ Less

Submitted 12 September, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

Comments: 16 pages, 4 figures

arXiv:2302.12693 [pdf, ps, other]

Wasserstein Projection Pursuit of Non-Gaussian Signals

Authors: Satyaki Mukherjee, Soumendu Sundar Mukherjee, Debarghya Ghoshdastidar

Abstract: We consider the general dimensionality reduction problem of locating in a high-dimensional data cloud, a $k$-dimensional non-Gaussian subspace of interesting features. We use a projection pursuit approach -- we search for mutually orthogonal unit directions which maximise the 2-Wasserstein distance of the empirical distribution of data-projections along these directions from a standard Gaussian. U… ▽ More We consider the general dimensionality reduction problem of locating in a high-dimensional data cloud, a $k$-dimensional non-Gaussian subspace of interesting features. We use a projection pursuit approach -- we search for mutually orthogonal unit directions which maximise the 2-Wasserstein distance of the empirical distribution of data-projections along these directions from a standard Gaussian. Under a generative model, where there is a underlying (unknown) low-dimensional non-Gaussian subspace, we prove rigorous statistical guarantees on the accuracy of approximating this unknown subspace by the directions found by our projection pursuit approach. Our results operate in the regime where the data dimensionality is comparable to the sample size, and thus supplement the recent literature on the non-feasibility of locating interesting directions via projection pursuit in the complementary regime where the data dimensionality is much larger than the sample size. △ Less

Submitted 24 February, 2023; originally announced February 2023.

arXiv:2208.01365 [pdf, other]

Concentration inequalities for correlated network-valued processes with applications to community estimation and changepoint analysis

Authors: Sayak Chatterjee, Shirshendu Chatterjee, Soumendu Sundar Mukherjee, Anirban Nath, Sharmodeep Bhattacharyya

Abstract: Network-valued time series are currently a common form of network data. However, the study of the aggregate behavior of network sequences generated from network-valued stochastic processes is relatively rare. Most of the existing research focuses on the simple setup where the networks are independent (or conditionally independent) across time, and all edges are updated synchronously at each time s… ▽ More Network-valued time series are currently a common form of network data. However, the study of the aggregate behavior of network sequences generated from network-valued stochastic processes is relatively rare. Most of the existing research focuses on the simple setup where the networks are independent (or conditionally independent) across time, and all edges are updated synchronously at each time step. In this paper, we study the concentration properties of the aggregated adjacency matrix and the corresponding Laplacian matrix associated with network sequences generated from lazy network-valued stochastic processes, where edges update asynchronously, and each edge follows a lazy stochastic process for its updates independent of the other edges. We demonstrate the usefulness of these concentration results in proving consistency of standard estimators in community estimation and changepoint estimation problems. We also conduct a simulation study to demonstrate the effect of the laziness parameter, which controls the extent of temporal correlation, on the accuracy of community and changepoint estimation. △ Less

Submitted 2 August, 2022; originally announced August 2022.

Comments: 27 pages, 4 figures

arXiv:2201.08326 [pdf, other]

Learning with latent group sparsity via heat flow dynamics on networks

Authors: Subhroshekhar Ghosh, Soumendu Sundar Mukherjee

Abstract: Group or cluster structure on explanatory variables in machine learning problems is a very general phenomenon, which has attracted broad interest from practitioners and theoreticians alike. In this work we contribute an approach to learning under such group structure, that does not require prior information on the group identities. Our paradigm is motivated by the Laplacian geometry of an underlyi… ▽ More Group or cluster structure on explanatory variables in machine learning problems is a very general phenomenon, which has attracted broad interest from practitioners and theoreticians alike. In this work we contribute an approach to learning under such group structure, that does not require prior information on the group identities. Our paradigm is motivated by the Laplacian geometry of an underlying network with a related community structure, and proceeds by directly incorporating this into a penalty that is effectively computed via a heat flow-based local network dynamics. In fact, we demonstrate a procedure to construct such a network based on the available data. Notably, we dispense with computationally intensive pre-processing involving clustering of variables, spectral or otherwise. Our technique is underpinned by rigorous theorems that guarantee its effective performance and provide bounds on its sample complexity. In particular, in a wide range of settings, it provably suffices to run the heat flow dynamics for time that is only logarithmic in the problem dimensions. We explore in detail the interfaces of our approach with key statistical physics models in network science, such as the Gaussian Free Field and the Stochastic Block Model. We validate our approach by successful applications to real-world data from a wide array of application domains, including computer science, genetics, climatology and economics. Our work raises the possibility of applying similar diffusion-based techniques to classical learning tasks, exploiting the interplay between geometric, dynamical and stochastic structures underlying the data. △ Less

Submitted 20 January, 2022; originally announced January 2022.

Comments: 36 pages, 3 figures, 3 tables

arXiv:2102.05839 [pdf, ps, other]

Distribution of Eigenvalues of Matrix Ensembles arising from Wigner and Palindromic Toeplitz Blocks

Authors: Keller Blackwell, Neelima Borade, Arup Bose, Charles Devlin VI, Noah Luntzlara, Renyuan Ma, Steven J. Miller, Soumendu Sundar Mukherjee, Mengxi Wang, Wanqiao Xu

Abstract: Random Matrix Theory (RMT) has successfully modeled diverse systems, from energy levels of heavy nuclei to zeros of $L$-functions; this correspondence has allowed RMT to successfully predict many number theoretic behaviors. However there are some operations which to date have no RMT analogue. Our motivation is to find an RMT analogue of Rankin-Selberg convolution, which constructs a new $L$-functi… ▽ More Random Matrix Theory (RMT) has successfully modeled diverse systems, from energy levels of heavy nuclei to zeros of $L$-functions; this correspondence has allowed RMT to successfully predict many number theoretic behaviors. However there are some operations which to date have no RMT analogue. Our motivation is to find an RMT analogue of Rankin-Selberg convolution, which constructs a new $L$-functions from an input pair. We report one such attempt; while it does not appear to model convolution, it does create new ensembles with properties hybridizing those of its constituents. For definiteness we concentrate on the ensemble of palindromic real symmetric Toeplitz (PST) matrices and the ensemble of real symmetric matrices, whose limiting spectral measures are the Gaussian and semi-circular distributions, respectively; these were chosen as they are the two extreme cases in terms of moment calculations. For a PST matrix $A$ and a real symmetric matrix $B$, we construct an ensemble of random real symmetric block matrices whose first row is $\lbrace A, B \rbrace$ and whose second row is $\lbrace B, A \rbrace$. By Markov's Method of Moments and the use of free probability, we show this ensemble converges weakly and almost surely to a new, universal distribution with a hybrid of Gaussian and semi-circular behaviors. We extend this construction by considering an iterated concatenation of matrices from an arbitrary pair of random real symmetric sub-ensembles with different limiting spectral measures. We prove that finite iterations converge to new, universal distributions with hybrid behavior, and that infinite iterations converge to the limiting spectral measure of the dominant component matrix. △ Less

Submitted 10 February, 2021; originally announced February 2021.

Comments: 14 pages, 5 figures. arXiv admin note: text overlap with arXiv:1908.03834

MSC Class: 15A52 (primary); 60F99; 62H10 (secondary)

arXiv:2101.04105 [pdf, other]

doi 10.1142/S2010326322500423

Some characterization results on classical and free Poisson thinning

Authors: Soumendu Sundar Mukherjee

Abstract: Poisson thinning is an elementary result in probability, which is of great importance in the theory of Poisson point processes. In this article, we record a couple of characterization results on Poisson thinning. We also consider several free probability analogues of Poisson thinning, which we collectively dub as \emph{free Poisson thinning}, and prove characterization results for them, similar to… ▽ More Poisson thinning is an elementary result in probability, which is of great importance in the theory of Poisson point processes. In this article, we record a couple of characterization results on Poisson thinning. We also consider several free probability analogues of Poisson thinning, which we collectively dub as \emph{free Poisson thinning}, and prove characterization results for them, similar to the classical case. One of these free Poisson thinning procedures arises naturally as a high-dimensional asymptotic analogue of Cochran's theorem from multivariate statistics on the "Wishart-ness" of quadratic functions of Gaussian random matrices. We note the implications of our characterization results in the context of Cochran's theorem. We also prove a free probability analogue of Craig's theorem, another well-known result in multivariate statistics on the independence of quadratic functions of Gaussian random matrices. △ Less

Submitted 4 September, 2022; v1 submitted 11 January, 2021; originally announced January 2021.

Comments: 19 pages, 1 figure, to appear in RMTA

MSC Class: 46L54; 60E05; 62E10

arXiv:2011.04470 [pdf, other]

High dimensional PCA: a new model selection criterion

Authors: Abhinav Chakraborty, Soumendu Sundar Mukherjee, Arijit Chakrabarti

Abstract: Given a random sample from a multivariate population, estimating the number of large eigenvalues of the population covariance matrix is an important problem in Statistics with wide applications in many areas. In the context of Principal Component Analysis (PCA), the linear combinations of the original variables having the largest amounts of variation are determined by this number. In this paper, w… ▽ More Given a random sample from a multivariate population, estimating the number of large eigenvalues of the population covariance matrix is an important problem in Statistics with wide applications in many areas. In the context of Principal Component Analysis (PCA), the linear combinations of the original variables having the largest amounts of variation are determined by this number. In this paper, we study the high dimensional asymptotic regime where the number of variables grows at the same rate as the number of observations, and use the spiked covariance model proposed in Johnstone (2001), under which the problem reduces to model selection. Our focus is on the Akaike Information Criterion (AIC) which is known to be strongly consistent from the work of Bai et al. (2018). However, Bai et al. (2018) requires a certain "gap condition" ensuring the dominant eigenvalues to be above a threshold strictly larger than the BBP threshold (Baik et al. (2005), both quantities depending on the limiting ratio of the number of variables and observations. It is well-known that, below the BBP threshold, a spiked covariance structure becomes indistinguishable from one with no spikes. Thus the strong consistency of AIC requires some extra signal strength. In this paper, we investigate whether consistency continues to hold even if the "gap" is made smaller. We show that strong consistency under arbitrarily small gap is achievable if we alter the penalty term of AIC suitably depending on the target gap. Furthermore, another intuitive alteration of the penalty can indeed make the gap exactly zero, although we can only achieve weak consistency in this case. We compare the two newly-proposed estimators with other existing estimators in the literature via extensive simulation studies, and show, by suitably calibrating our proposals, that a significant improvement in terms of mean-squared error is achievable. △ Less

Submitted 9 November, 2020; originally announced November 2020.

Comments: 37 pages, 6 figures, 2 tables

MSC Class: 62H12; 62H25

arXiv:2008.05916 [pdf, other]

doi 10.1093/imrn/rnac215

On $*$-Convergence of Schur-Hadamard Products of Independent Nonsymmetric Random Matrices

Authors: Soumendu Sundar Mukherjee

Abstract: Let $\{x_α\}_{α\in \mathbb{Z}}$ and $\{y_α\}_{α\in \mathbb{Z}}$ be two independent collections of zero mean, unit variance random variables with uniformly bounded moments of all orders. Consider a nonsymmetric Toeplitz matrix $X_n = ((x_{i - j}))_{1 \le i, j \le n}$ and a Hankel matrix $Y_n = ((y_{i + j}))_{1 \le i, j \le n}$, and let $M_n = X_n \odot Y_n$ be their elementwise/Schur-Hadamard produ… ▽ More Let $\{x_α\}_{α\in \mathbb{Z}}$ and $\{y_α\}_{α\in \mathbb{Z}}$ be two independent collections of zero mean, unit variance random variables with uniformly bounded moments of all orders. Consider a nonsymmetric Toeplitz matrix $X_n = ((x_{i - j}))_{1 \le i, j \le n}$ and a Hankel matrix $Y_n = ((y_{i + j}))_{1 \le i, j \le n}$, and let $M_n = X_n \odot Y_n$ be their elementwise/Schur-Hadamard product. In this article, we show that almost surely, $n^{-1/2}M_n$, as an element of the $*$-probability space $(\mathcal{M}_n(\mathbb{C}), \frac{1}{n}\mathrm{tr})$, converges in $*$-distribution to a circular variable. With i.i.d. Rademacher entries, this construction gives a matrix model for circular variables with only $O(n)$ bits of randomness. We also consider a dependent setup where $\{x_α\}$ and $\{y_β\}$ are independent strongly multiplicative systems (à la Gaposhkin [7]) satisfying an additional \emph{admissibility} condition, and have uniformly bounded moments of all orders -- a nontrivial example of such a system being $\{\sqrt{2}\sin(2^n πU)\}_{n \in \mathbb{Z}_+}$, where $U \sim \mathrm{Uniform}(0, 1)$. In this case, we show in-expectation and in-probability convergence of the $*$-moments of $n^{-1/2}M_n$ to those of a circular variable. Finally, we generalise our results to Schur-Hadamard products of structured random matrices of the form $X_n = ((x_{L_X(i, j)}))_{1 \le i, j \le n}$ and $Y_n = ((y_{L_Y(i, j)}))_{1 \le i, j \le n}$, under certain assumptions on the \emph{link-functions} $L_X$ and $L_Y$, most notably the injectivity of the map $(i, j) \mapsto (L_X(i, j), L_Y(i, j))$. Based on numerical evidence, we conjecture that the circular law $μ_{\mathrm{circ}}$, i.e. the uniform measure on the unit disk of $\mathbb{C}$, which is also the Brown measure of a circular variable, is in fact the limiting spectral measure of $n^{-1/2}M_n$. △ Less

Submitted 5 September, 2022; v1 submitted 13 August, 2020; originally announced August 2020.

Comments: 18 pages, 2 figures, to appear in IMRN

MSC Class: 46L54; 60B20

arXiv:2007.10989 [pdf, ps, other]

Construction of product $*$-probability spaces via free cumulants

Authors: Arup Bose, Soumendu Sundar Mukherjee

Abstract: It is well known that free independence is equivalent to the vanishing of mixed free cumulants. The purpose of this short note is to build free products of $*$-probability spaces using this as the definition of freeness and relying on free cumulants instead of moments. It is well known that free independence is equivalent to the vanishing of mixed free cumulants. The purpose of this short note is to build free products of $*$-probability spaces using this as the definition of freeness and relying on free cumulants instead of moments. △ Less

Submitted 7 February, 2025; v1 submitted 21 July, 2020; originally announced July 2020.

Comments: 8 pages

MSC Class: 46L54

arXiv:1905.06661 [pdf, other]

When random initializations help: a study of variational inference for community detection

Authors: Purnamrita Sarkar, Y. X. Rachel Wang, Soumendu Sundar Mukherjee

Abstract: Variational approximation has been widely used in large-scale Bayesian inference recently, the simplest kind of which involves imposing a mean field assumption to approximate complicated latent structures. Despite the computational scalability of mean field, theoretical studies of its loss function surface and the convergence behavior of iterative updates for optimizing the loss are far from compl… ▽ More Variational approximation has been widely used in large-scale Bayesian inference recently, the simplest kind of which involves imposing a mean field assumption to approximate complicated latent structures. Despite the computational scalability of mean field, theoretical studies of its loss function surface and the convergence behavior of iterative updates for optimizing the loss are far from complete. In this paper, we focus on the problem of community detection for a simple two-class Stochastic Blockmodel (SBM) with equal class sizes. Using batch co-ordinate ascent (BCAVI) for updates, we show different convergence behavior with respect to different initializations. When the parameters are known or estimated within a reasonable range and held fixed, we characterize conditions under which an initialization can converge to the ground truth. On the other hand, when the parameters need to be estimated iteratively, a random initialization will converge to an uninformative local optimum. △ Less

Submitted 18 May, 2019; v1 submitted 16 May, 2019; originally announced May 2019.

Comments: 32 pages, 5 figures

arXiv:1708.05573 [pdf, other]

Two provably consistent divide and conquer clustering algorithms for large networks

Authors: Soumendu Sundar Mukherjee, Purnamrita Sarkar, Peter J. Bickel

Abstract: In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms which perform clustering on a number of small subgraphs and finally patches the results into a single clustering. The main advantage of these algorithms is that they bring down significantly the computational cost of traditional algorithms, including spectral… ▽ More In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms which perform clustering on a number of small subgraphs and finally patches the results into a single clustering. The main advantage of these algorithms is that they bring down significantly the computational cost of traditional algorithms, including spectral clustering, semi-definite programs, modularity based methods, likelihood based methods etc., without losing on accuracy and even improving accuracy at times. These algorithms are also, by nature, parallelizable. Thus, exploiting the facts that most traditional algorithms are accurate and the corresponding optimization problems are much simpler in small problems, our divide-and-conquer methods provide an omnibus recipe for scaling traditional algorithms up to large networks. We prove consistency of these algorithms under various subgraph selection procedures and perform extensive simulations and real-data analysis to understand the advantages of the divide-and-conquer approach in various settings. △ Less

Submitted 18 August, 2017; originally announced August 2017.

Comments: 41 pages, comments are most welcome

arXiv:1408.0874 [pdf, other]

Limiting spectral distribution of a class of Hankel type random matrices

Authors: Anirban Basak, Arup Bose, Soumendu Sundar Mukherjee

Abstract: We consider an indexed class of real symmetric random matrices which generalize the symmetric Hankel and Reverse Circulant matrices. We show that the limiting spectral distributions of these matrices exist almost surely and the limit is continuous in the index. We also study other properties of the limit. We consider an indexed class of real symmetric random matrices which generalize the symmetric Hankel and Reverse Circulant matrices. We show that the limiting spectral distributions of these matrices exist almost surely and the limit is continuous in the index. We also study other properties of the limit. △ Less

Submitted 5 August, 2014; originally announced August 2014.

Comments: 19 pages, 2 figures, 1 table

MSC Class: Primary 15B52; 60B20; secondary 60B10; 60F99; 60B99

arXiv:1402.3683 [pdf, other]

Bulk behaviour of skew-symmetric patterned random matrices

Authors: Arup Bose, Soumendu Sundar Mukherjee

Abstract: Limiting Spectral Distributions (LSD) of real symmetric patterned matrices have been well-studied. In this article, we consider skew-symmetric/anti-symmetric patterned random matrices and establish the LSDs of several common matrices. For the skew-symmetric Wigner, skew-symmetric Toeplitz and the skew-symmetric Circulant, the LSDs (on the imaginary axis) are the same as those in the symmetric case… ▽ More Limiting Spectral Distributions (LSD) of real symmetric patterned matrices have been well-studied. In this article, we consider skew-symmetric/anti-symmetric patterned random matrices and establish the LSDs of several common matrices. For the skew-symmetric Wigner, skew-symmetric Toeplitz and the skew-symmetric Circulant, the LSDs (on the imaginary axis) are the same as those in the symmetric cases. For the skew-symmetric Hankel and the skew-symmetric Reverse Circulant however, we obtain new LSDs. We also show the existence of the LSDs for the triangular versions of these matrices. We then introduce a related modification of the symmetric matrices by changing the sign of the lower triangle part of the matrices. In this case, the modified Wigner, modified Hankel and the modified Reverse Circulants have the same LSDs as their usual symmetric counterparts while new LSDs are obtained for the modified Toeplitz and the modified Symmetric Circulant. △ Less

Submitted 15 February, 2014; originally announced February 2014.

Comments: 21 pages, 2 figures

MSC Class: Primary 15B52; 60B20; secondary 60B10; 60F99; 60B99

arXiv:1402.2207 [pdf, other]

Bulk behaviour of Schur-Hadamard products of symmetric random matrices

Authors: Arup Bose, Soumendu Sundar Mukherjee

Abstract: We develop a general method for establishing the existence of the Limiting Spectral Distributions (LSD) of Schur-Hadamard products of independent symmetric patterned random matrices. We apply this method to show that the LSDs of Schur-Hadamard products of some common patterned matrices exist and identify the limits. In particular, the Schur-Hadamard product of independent Toeplitz and Hankel matri… ▽ More We develop a general method for establishing the existence of the Limiting Spectral Distributions (LSD) of Schur-Hadamard products of independent symmetric patterned random matrices. We apply this method to show that the LSDs of Schur-Hadamard products of some common patterned matrices exist and identify the limits. In particular, the Schur-Hadamard product of independent Toeplitz and Hankel matrices has the semi-circular LSD. We also prove an invariance theorem that may be used to find the LSD in many examples. △ Less

Submitted 15 March, 2014; v1 submitted 10 February, 2014; originally announced February 2014.

Comments: 27 pages, 1 figure; to appear, Random Matrices: Theory and Applications. This is the final version, incorporating referee comments

MSC Class: Primary 15B52; 60B20; secondary 60B10; 60F99; 60B99

arXiv:1303.4251 [pdf, ps, other]

An Approximation Inequality for Continued Radicals and Power Forms

Authors: Soumendu Sundar Mukherjee

Abstract: In this article we derive an approximation inequality for continued radicals, generalizing an inequality of Herschfeld for continued square roots to arbitrary radicals, which is useful in exploring convergence issues and obtaining convergence rates. In fact, we generalize this inequality further to encompass the more general continued power forms. We demonstrate the use of this inequality by obtai… ▽ More In this article we derive an approximation inequality for continued radicals, generalizing an inequality of Herschfeld for continued square roots to arbitrary radicals, which is useful in exploring convergence issues and obtaining convergence rates. In fact, we generalize this inequality further to encompass the more general continued power forms. We demonstrate the use of this inequality by obtaining estimates for the convergence rates of several continued radicals including the famous Ramanujan radical. △ Less

Submitted 24 October, 2013; v1 submitted 18 March, 2013; originally announced March 2013.

Comments: 12 pages

MSC Class: 40A25; 40A05; 26D20

Showing 1–28 of 28 results for author: Mukherjee, S S