-
Non-asymptotic approximations for Pearson's chi-square statistic and its application to confidence intervals for strictly convex functions of the probability weights of discrete distributions
Abstract: In this paper, we develop a non-asymptotic local normal approximation for multinomial probabilities. First, we use it to find non-asymptotic total variation bounds between the measures induced by uniformly jittered multinomials and the multivariate normals with the same means and covariances. From the total variation bounds, we also derive a comparison of the cumulative distribution functions and… ▽ More
Submitted 4 September, 2023; originally announced September 2023.
Comments: 20 pages, 3 figures
MSC Class: 62E17; 62F25; 62F30; 60E15; 60F99; 62E20; 62H10; 62H12
-
arXiv:2208.06753 [pdf, ps, other]
Sharp Frequency Bounds for Sample-Based Queries
Abstract: A data sketch algorithm scans a big data set, collecting a small amount of data -- the sketch, which can be used to statistically infer properties of the big data set. Some data sketch algorithms take a fixed-size random sample of a big data set, and use that sample to infer frequencies of items that meet various criteria in the big data set. This paper shows how to statistically infer probably ap… ▽ More
Submitted 13 August, 2022; originally announced August 2022.
Comments: 3 pages
MSC Class: 62P99 ACM Class: G.3
Journal ref: In 2019 IEEE Big Data, pages 5983-5985, 2019
-
Bounding Means of Discrete Distributions
Abstract: We introduce methods to bound the mean of a discrete distribution (or finite population) based on sample data, for random variables with a known set of possible values. In particular, the methods can be applied to categorical data with known category-based values. For small sample sizes, we show how to leverage the knowledge of the set of possible values to compute bounds that are stronger than fo… ▽ More
Submitted 3 November, 2021; v1 submitted 6 September, 2021; originally announced September 2021.
Comments: 9 pages, 8 figures
MSC Class: 62G15; 62G05
Journal ref: IEEE International Conference on Big Data, December 15-18, 2021
-
Improved Error Bounds Based on Worst Likely Assignments
Abstract: Error bounds based on worst likely assignments use permutation tests to validate classifiers. Worst likely assignments can produce effective bounds even for data sets with 100 or fewer training examples. This paper introduces a statistic for use in the permutation tests of worst likely assignments that improves error bounds, especially for accurate classifiers, which are typically the classifiers… ▽ More
Submitted 31 March, 2015; originally announced April 2015.
Comments: IJCNN 2015