Search | arXiv e-print repository

arXiv:1912.03528 [pdf, other]

Tighter Confidence Intervals for Rating Systems

Abstract: Rating systems are ubiquitous, with applications ranging from product recommendation to teaching evaluations. Confidence intervals for functionals of rating data such as empirical means or quantiles are critical to decision-making in various applications including recommendation/ranking algorithms. Confidence intervals derived from standard Hoeffding and Bernstein bounds can be quite loose, especi… ▽ More Rating systems are ubiquitous, with applications ranging from product recommendation to teaching evaluations. Confidence intervals for functionals of rating data such as empirical means or quantiles are critical to decision-making in various applications including recommendation/ranking algorithms. Confidence intervals derived from standard Hoeffding and Bernstein bounds can be quite loose, especially in small sample regimes, since these bounds do not exploit the geometric structure of the probability simplex. We propose a new approach to deriving confidence intervals that are tailored to the geometry associated with multi-star/value rating systems using a combination of techniques from information theory, including Kullback-Leibler, Sanov, and Csisz{á}r inequalities. The new confidence intervals are almost always as good or better than all standard methods and are significantly tighter in many situations. The standard bounds can require several times more samples than our new bounds to achieve specified confidence interval widths. △ Less

Submitted 7 December, 2019; originally announced December 2019.

arXiv:1809.06522 [pdf, other]

Concentration Inequalities for the Empirical Distribution

Authors: Jay Mardia, Jiantao Jiao, Ervin Tánczos, Robert D. Nowak, Tsachy Weissman

Abstract: We study concentration inequalities for the Kullback--Leibler (KL) divergence between the empirical distribution and the true distribution. Applying a recursion technique, we improve over the method of types bound uniformly in all regimes of sample size $n$ and alphabet size $k$, and the improvement becomes more significant when $k$ is large. We discuss the applications of our results in obtaining… ▽ More We study concentration inequalities for the Kullback--Leibler (KL) divergence between the empirical distribution and the true distribution. Applying a recursion technique, we improve over the method of types bound uniformly in all regimes of sample size $n$ and alphabet size $k$, and the improvement becomes more significant when $k$ is large. We discuss the applications of our results in obtaining tighter concentration inequalities for $L_1$ deviations of the empirical distribution from the true distribution, and the difference between concentration around the expectation or zero. We also obtain asymptotically tight bounds on the variance of the KL divergence between the empirical and true distribution, and demonstrate their quantitatively different behaviors between small and large sample sizes compared to the alphabet size. △ Less

Submitted 18 October, 2019; v1 submitted 18 September, 2018; originally announced September 2018.

Comments: Accepted for publication in Information and Inference

arXiv:1709.03570 [pdf, other]

A KL-LUCB Bandit Algorithm for Large-Scale Crowdsourcing

Authors: Bob Mankoff, Robert Nowak, Ervin Tanczos

Abstract: This paper focuses on best-arm identification in multi-armed bandits with bounded rewards. We develop an algorithm that is a fusion of lil-UCB and KL-LUCB, offering the best qualities of the two algorithms in one method. This is achieved by proving a novel anytime confidence bound for the mean of bounded distributions, which is the analogue of the LIL-type bounds recently developed for sub-Gaussia… ▽ More This paper focuses on best-arm identification in multi-armed bandits with bounded rewards. We develop an algorithm that is a fusion of lil-UCB and KL-LUCB, offering the best qualities of the two algorithms in one method. This is achieved by proving a novel anytime confidence bound for the mean of bounded distributions, which is the analogue of the LIL-type bounds recently developed for sub-Gaussian distributions. We corroborate our theoretical results with numerical experiments based on the New Yorker Cartoon Caption Contest. △ Less

Submitted 11 September, 2017; originally announced September 2017.

arXiv:1702.07899 [pdf, other]

Are there needles in a moving haystack? Adaptive sensing for detection of dynamically evolving signals

Authors: Rui M. Castro, Ervin Tánczos

Abstract: In this paper we investigate the problem of detecting dynamically evolving signals. We model the signal as an $n$ dimensional vector that is either zero or has $s$ non-zero components. At each time step $t\in \mathbb{N}$ the non-zero components change their location independently with probability $p$. The statistical problem is to decide whether the signal is a zero vector or in fact it has non-ze… ▽ More In this paper we investigate the problem of detecting dynamically evolving signals. We model the signal as an $n$ dimensional vector that is either zero or has $s$ non-zero components. At each time step $t\in \mathbb{N}$ the non-zero components change their location independently with probability $p$. The statistical problem is to decide whether the signal is a zero vector or in fact it has non-zero components. This decision is based on $m$ noisy observations of individual signal components collected at times $t=1,\ldots,m$. We consider two different sensing paradigms, namely adaptive and non-adaptive sensing. For non-adaptive sensing the choice of components to measure has to be decided before the data collection process started, while for adaptive sensing one can adjust the sensing process based on observations collected earlier. We characterize the difficulty of this detection problem in both sensing paradigms in terms of the aforementioned parameters, with special interest to the speed of change of the active components. In addition we provide an adaptive sensing algorithm for this problem and contrast its performance to that of non-adaptive detection algorithms. △ Less

Submitted 14 November, 2017; v1 submitted 25 February, 2017; originally announced February 2017.

arXiv:1508.03002 [pdf, other]

Distribution-Free Detection of Structured Anomalies: Permutation and Rank-Based Scans

Authors: Ery Arias-Castro, Rui M. Castro, Ervin Tánczos, Meng Wang

Abstract: The scan statistic is by far the most popular method for anomaly detection, being popular in syndromic surveillance, signal and image processing, and target detection based on sensor networks, among other applications. The use of the scan statistics in such settings yields a hypothesis testing procedure, where the null hypothesis corresponds to the absence of anomalous behavior. If the null distri… ▽ More The scan statistic is by far the most popular method for anomaly detection, being popular in syndromic surveillance, signal and image processing, and target detection based on sensor networks, among other applications. The use of the scan statistics in such settings yields a hypothesis testing procedure, where the null hypothesis corresponds to the absence of anomalous behavior. If the null distribution is known, then calibration of a scan-based test is relatively easy, as it can be done by Monte Carlo simulation. When the null distribution is unknown, it is less straightforward. We investigate two procedures. The first one is a calibration by permutation and the other is a rank-based scan test, which is distribution-free and less sensitive to outliers. Furthermore, the rank scan test requires only a one-time calibration for a given data size making it computationally much more appealing. In both cases, we quantify the performance loss with respect to an oracle scan test that knows the null distribution. We show that using one of these calibration procedures results in only a very small loss of power in the context of a natural exponential family. This includes the classical normal location model, popular in signal processing, and the Poisson model, popular in syndromic surveillance. We perform numerical experiments on simulated data further supporting our theory and also on a real dataset from genomics. △ Less

Submitted 24 November, 2016; v1 submitted 12 August, 2015; originally announced August 2015.

arXiv:1410.4593 [pdf, ps, other]

Adaptive Compressed Sensing for Support Recovery of Structured Sparse Sets

Authors: Rui M. Castro, Ervin Tánczos

Abstract: This paper investigates the problem of recovering the support of structured signals via adaptive compressive sensing. We examine several classes of structured support sets, and characterize the fundamental limits of accurately recovering such sets through compressive measurements, while simultaneously providing adaptive support recovery protocols that perform near optimally for these classes. We s… ▽ More This paper investigates the problem of recovering the support of structured signals via adaptive compressive sensing. We examine several classes of structured support sets, and characterize the fundamental limits of accurately recovering such sets through compressive measurements, while simultaneously providing adaptive support recovery protocols that perform near optimally for these classes. We show that by adaptively designing the sensing matrix we can attain significant performance gains over non-adaptive protocols. These gains arise from the fact that adaptive sensing can: (i) better mitigate the effects of noise, and (ii) better capitalize on the structure of the support sets. △ Less

Submitted 2 September, 2016; v1 submitted 16 October, 2014; originally announced October 2014.

Comments: to appear in IEEE Transactions on Information Theory

arXiv:1311.7118 [pdf, ps, other]

Adaptive Sensing for Estimation of Structured Sparse Signals

Authors: Ervin Tánczos, Rui M. Castro

Abstract: In many practical settings one can sequentially and adaptively guide the collection of future data, based on information extracted from data collected previously. These sequential data collection procedures are known by different names, such as sequential experimental design, active learning or adaptive sensing/sampling. The intricate relation between data analysis and acquisition in adaptive sens… ▽ More In many practical settings one can sequentially and adaptively guide the collection of future data, based on information extracted from data collected previously. These sequential data collection procedures are known by different names, such as sequential experimental design, active learning or adaptive sensing/sampling. The intricate relation between data analysis and acquisition in adaptive sensing paradigms can be extremely powerful, and often allows for reliable signal estimation and detection in situations where non-adaptive sensing would fail dramatically. In this work we investigate the problem of estimating the support of a structured sparse signal from coordinate-wise observations under the adaptive sensing paradigm. We present a general procedure for support set estimation that is optimal in a variety of cases and shows that through the use of adaptive sensing one can: (i) mitigate the effect of observation noise when compared to non-adaptive sensing and, (ii) capitalize on structural information to a much larger extent than possible with non-adaptive sensing. In addition to a general procedure to perform adaptive sensing in structured settings we present both performance upper bounds, and corresponding lower bounds for both sensing paradigms. △ Less

Submitted 27 November, 2013; originally announced November 2013.

Showing 1–7 of 7 results for author: Tánczos, E