Skip to main content

Showing 1–6 of 6 results for author: Thangaraj, A

Searching in archive math. Search in all archives.
.
  1. arXiv:2404.05819  [pdf, other

    stat.ML cs.IT cs.LG math.PR math.ST

    Just Wing It: Near-Optimal Estimation of Missing Mass in a Markovian Sequence

    Authors: Ashwin Pananjady, Vidya Muthukumar, Andrew Thangaraj

    Abstract: We study the problem of estimating the stationary mass -- also called the unigram mass -- that is missing from a single trajectory of a discrete-time, ergodic Markov chain. This problem has several applications -- for example, estimating the stationary missing mass is critical for accurately smoothing probability estimates in sequence models. While the classical Good--Turing estimator from the 195… ▽ More

    Submitted 5 October, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: Version v2 simplifies proofs and corrects an error in Theorem 2. It is consistent with manuscript to appear in Journal of Machine Learning Research

  2. arXiv:2202.02772  [pdf, ps, other

    math.ST

    Missing Mass Estimation from Sticky Channels

    Authors: Prafulla Chandra, Andrew Thangaraj, Nived Rajaraman

    Abstract: Distribution estimation under error-prone or non-ideal sampling modelled as "sticky" channels have been studied recently motivated by applications such as DNA computing. Missing mass, the sum of probabilities of missing letters, is an important quantity that plays a crucial role in distribution estimation, particularly in the large alphabet regime. In this work, we consider the problem of estimati… ▽ More

    Submitted 6 February, 2022; originally announced February 2022.

  3. arXiv:2110.01968  [pdf, other

    math.ST cs.IT

    Missing $g$-mass: Investigating the Missing Parts of Distributions

    Authors: Prafulla Chandra, Andrew Thangaraj

    Abstract: Estimating the underlying distribution from \textit{iid} samples is a classical and important problem in statistics. When the alphabet size is large compared to number of samples, a portion of the distribution is highly likely to be unobserved or sparsely observed. The missing mass, defined as the sum of probabilities $\text{Pr}(x)$ over the missing letters $x$, and the Good-Turing estimator for m… ▽ More

    Submitted 27 May, 2023; v1 submitted 5 October, 2021; originally announced October 2021.

  4. arXiv:2102.01938  [pdf, other

    cs.IT math.ST stat.ML

    How good is Good-Turing for Markov samples?

    Authors: Prafulla Chandra, Andrew Thangaraj, Nived Rajaraman

    Abstract: The Good-Turing (GT) estimator for the missing mass (i.e., total probability of missing symbols) in $n$ samples is the number of symbols that appeared exactly once divided by $n$. For i.i.d. samples, the bias and squared-error risk of the GT estimator can be shown to fall as $1/n$ by bounding the expected error uniformly over all symbols. In this work, we study convergence of the GT estimator for… ▽ More

    Submitted 27 May, 2023; v1 submitted 3 February, 2021; originally announced February 2021.

  5. arXiv:2001.04130  [pdf, ps, other

    math.ST

    Convergence of Chao Unseen Species Estimator

    Authors: Nived Rajaraman, Prafulla Chandra, Andrew Thangaraj, Ananda Theertha Suresh

    Abstract: Support size estimation and the related problem of unseen species estimation have wide applications in ecology and database analysis. Perhaps the most used support size estimator is the Chao estimator. Despite its wide spread use, little is known about its theoretical properties. We analyze the Chao estimator and show that its worst case mean squared error (MSE) is smaller than the MSE of the plug… ▽ More

    Submitted 13 January, 2020; originally announced January 2020.

    Comments: 20 pages, 1 figure, short version presented at International Symposium on Information Theory (ISIT) 2019

  6. arXiv:1505.00562  [pdf, ps, other

    cs.IT math.PR stat.AP

    Approximation of Capacity for ISI Channels with One-bit Output Quantization

    Authors: Radha Krishna Ganti, Andrew Thangaraj, Arijit Mondal

    Abstract: Motivated by recent high bandwidth communication systems, Inter-Symbol Interference (ISI) channels with 1-bit quantized output are considered under an average-power-constrained continuous input. While the exact capacity is difficult to characterize, an approximation that matches with the exact channel output up to a probability of error is provided. The approximation does not have additive noise,… ▽ More

    Submitted 4 May, 2015; originally announced May 2015.

    Comments: Will be presented at ISIT 2015