Skip to main content

Showing 1–50 of 55 results for author: Candés, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.10908  [pdf, ps, other

    stat.ML cs.LG

    Probably Approximately Correct Labels

    Authors: Emmanuel J. Candès, Andrew Ilyas, Tijana Zrnic

    Abstract: Obtaining high-quality labeled datasets is often costly, requiring either extensive human annotation or expensive experiments. We propose a method that supplements such "expert" labels with AI predictions from pre-trained models to construct labeled datasets more cost-effectively. Our approach results in probably approximately correct labels: with high probability, the overall labeling error is sm… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  2. arXiv:2502.09858  [pdf, other

    cs.LG cs.AI cs.CL q-bio.QM

    Automated Hypothesis Validation with Agentic Sequential Falsifications

    Authors: Kexin Huang, Ying Jin, Ryan Li, Michael Y. Li, Emmanuel Candès, Jure Leskovec

    Abstract: Hypotheses are central to information acquisition, decision-making, and discovery. However, many real-world hypotheses are abstract, high-level statements that are difficult to validate directly. This challenge is further intensified by the rise of hypothesis generation from Large Language Models (LLMs), which are prone to hallucination and produce hypotheses in volumes that make manual validation… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

  3. arXiv:2501.19393  [pdf, other

    cs.CL cs.AI cs.LG

    s1: Simple test-time scaling

    Authors: Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, Tatsunori Hashimoto

    Abstract: Test-time scaling is a promising new approach to language modeling that uses extra test-time compute to improve performance. Recently, OpenAI's o1 model showed this capability but did not publicly share its methodology, leading to many replication efforts. We seek the simplest approach to achieve test-time scaling and strong reasoning performance. First, we curate a small dataset s1K of 1,000 ques… ▽ More

    Submitted 1 March, 2025; v1 submitted 31 January, 2025; originally announced January 2025.

    Comments: 46 pages (9 main), 10 figures, 15 tables

  4. arXiv:2409.09781  [pdf, other

    math.ST cs.LG math.OC stat.CO stat.ML

    RandALO: Out-of-sample risk estimation in no time flat

    Authors: Parth Nobel, Daniel LeJeune, Emmanuel J. Candès

    Abstract: Estimating out-of-sample risk for models trained on large high-dimensional datasets is an expensive but essential part of the machine learning process, enabling practitioners to optimally tune hyperparameters. Cross-validation (CV) serves as the de facto standard for risk estimation but poorly trades off high bias ($K$-fold CV) for computational cost (leave-one-out CV). We propose a randomized app… ▽ More

    Submitted 25 April, 2025; v1 submitted 15 September, 2024; originally announced September 2024.

    Comments: 26 pages, 10 figures

  5. arXiv:2409.07431  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Synthetic continued pretraining

    Authors: Zitong Yang, Neil Band, Shuangping Li, Emmanuel Candès, Tatsunori Hashimoto

    Abstract: Pretraining on large-scale, unstructured internet text enables language models to acquire a significant amount of world knowledge. However, this knowledge acquisition is data-inefficient--to learn a given fact, models must be trained on hundreds to thousands of diverse representations of it. This poses a challenge when adapting a pretrained model to a small corpus of domain-specific documents, whe… ▽ More

    Submitted 3 October, 2024; v1 submitted 11 September, 2024; originally announced September 2024.

    Comments: Updated organization of experimental results and methods introduction. Released the dataset and model weights artifact

  6. arXiv:2408.15204  [pdf, other

    cs.CL cs.AI cs.HC

    Can Unconfident LLM Annotations Be Used for Confident Conclusions?

    Authors: Kristina Gligorić, Tijana Zrnic, Cinoo Lee, Emmanuel J. Candès, Dan Jurafsky

    Abstract: Large language models (LLMs) have shown high agreement with human raters across a variety of tasks, demonstrating potential to ease the challenges of human data collection. In computational social science (CSS), researchers are increasingly leveraging LLM annotations to complement slow and expensive human annotations. Still, guidelines for collecting and using LLM annotations, without compromising… ▽ More

    Submitted 8 February, 2025; v1 submitted 27 August, 2024; originally announced August 2024.

    Comments: Please cite as: Can Unconfident LLM Annotations Be Used for Confident Conclusions? Kristina Gligorić, Tijana Zrnic, Cinoo Lee, Emmanuel Candès, and Dan Jurafsky. NAACL, 2025

  7. arXiv:2406.09714  [pdf, other

    stat.ML cs.LG stat.ME

    Large language model validity via enhanced conformal prediction methods

    Authors: John J. Cherian, Isaac Gibbs, Emmanuel J. Candès

    Abstract: We develop new conformal inference methods for obtaining validity guarantees on the output of large language models (LLMs). Prior work in conformal language modeling identifies a subset of the text that satisfies a high-probability guarantee of correctness. These methods work by filtering claims from the LLM's original response if a scoring function evaluated on the claim fails to exceed a thresho… ▽ More

    Submitted 31 October, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 24 pages, 14 figures, NeurIPS

  8. arXiv:2403.03208  [pdf, other

    stat.ML cs.LG stat.ME

    Active Statistical Inference

    Authors: Tijana Zrnic, Emmanuel J. Candès

    Abstract: Inspired by the concept of active learning, we propose active inference$\unicode{x2013}$a methodology for statistical inference with machine-learning-assisted data collection. Assuming a budget on the number of labels that can be collected, the methodology uses a machine learning model to identify which data points would be most beneficial to label, thus effectively utilizing the budget. It operat… ▽ More

    Submitted 29 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  9. arXiv:2403.01046  [pdf, other

    cs.LG cs.AI cs.NE math.OC stat.ML

    A Library of Mirrors: Deep Neural Nets in Low Dimensions are Convex Lasso Models with Reflection Features

    Authors: Emi Zeger, Yifei Wang, Aaron Mishkin, Tolga Ergen, Emmanuel Candès, Mert Pilanci

    Abstract: We prove that training neural networks on 1-D data is equivalent to solving convex Lasso problems with discrete, explicitly defined dictionary matrices. We consider neural networks with piecewise linear activations and depths ranging from 2 to an arbitrary but finite number of layers. We first show that two-layer networks with piecewise linear activations are equivalent to Lasso models using a dis… ▽ More

    Submitted 23 July, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

  10. arXiv:2402.05203  [pdf, other

    cs.LG stat.ML

    Bellman Conformal Inference: Calibrating Prediction Intervals For Time Series

    Authors: Zitong Yang, Emmanuel Candès, Lihua Lei

    Abstract: We introduce Bellman Conformal Inference (BCI), a framework that wraps around any time series forecasting models and provides approximately calibrated prediction intervals. Unlike existing methods, BCI is able to leverage multi-step ahead forecasts and explicitly optimize the average interval lengths by solving a one-dimensional stochastic control problem (SCP) at each time step. In particular, we… ▽ More

    Submitted 9 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: 17 pages, 4 figures

  11. arXiv:2309.16598  [pdf, other

    stat.ML cs.LG stat.ME

    Cross-Prediction-Powered Inference

    Authors: Tijana Zrnic, Emmanuel J. Candès

    Abstract: While reliable data-driven decision-making hinges on high-quality labeled data, the acquisition of quality labels often involves laborious human annotations or slow and expensive scientific measurements. Machine learning is becoming an appealing alternative as sophisticated predictive techniques are being used to quickly and cheaply produce large amounts of predicted labels; e.g., predicted protei… ▽ More

    Submitted 28 February, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

  12. arXiv:2307.16895  [pdf, other

    cs.LG eess.SY stat.ME stat.ML

    Conformal PID Control for Time Series Prediction

    Authors: Anastasios N. Angelopoulos, Emmanuel J. Candes, Ryan J. Tibshirani

    Abstract: We study the problem of uncertainty quantification for time series prediction, with the goal of providing easy-to-use algorithms with formal guarantees. The algorithms we present build upon ideas from conformal prediction and control theory, are able to prospectively model conformal scores in an online setting, and adapt to the presence of systematic errors due to seasonality, trends, and general… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: Code available at https://github.com/aangelopoulos/conformal-time-series

  13. arXiv:2305.14535  [pdf, other

    cs.LG stat.ML

    Uncertainty Quantification over Graph with Conformalized Graph Neural Networks

    Authors: Kexin Huang, Ying Jin, Emmanuel Candès, Jure Leskovec

    Abstract: Graph Neural Networks (GNNs) are powerful machine learning prediction models on graph-structured data. However, GNNs lack rigorous uncertainty estimates, limiting their reliable deployment in settings where the cost of errors is significant. We propose conformalized GNN (CF-GNN), extending conformal prediction (CP) to graph-based models for guaranteed uncertainty estimates. Given an entity in the… ▽ More

    Submitted 30 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Published at NeurIPS 2023

  14. arXiv:2305.03712  [pdf, other

    stat.ME cs.CY cs.LG

    Statistical Inference for Fairness Auditing

    Authors: John J. Cherian, Emmanuel J. Candès

    Abstract: Before deploying a black-box model in high-stakes problems, it is important to evaluate the model's performance on sensitive subpopulations. For example, in a recidivism prediction task, we may wish to identify demographic groups for which our prediction model has unacceptably high false positive rates or certify that no such groups exist. In this paper, we frame this task, often referred to as "f… ▽ More

    Submitted 8 June, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

    Comments: 44 pages, 8 figures

  15. arXiv:2209.15265  [pdf, other

    cs.LG cs.IT math.OC stat.ML

    Overparameterized ReLU Neural Networks Learn the Simplest Models: Neural Isometry and Exact Recovery

    Authors: Yifei Wang, Yixuan Hua, Emmanuel Candés, Mert Pilanci

    Abstract: The practice of deep learning has shown that neural networks generalize remarkably well even with an extreme number of learned parameters. This appears to contradict traditional statistical wisdom, in which a trade-off between model complexity and fit to the data is essential. We aim to address this discrepancy by adopting a convex optimization and sparse recovery perspective. We consider the trai… ▽ More

    Submitted 17 February, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

  16. arXiv:2208.08401  [pdf, other

    stat.ME cs.LG

    Conformal Inference for Online Prediction with Arbitrary Distribution Shifts

    Authors: Isaac Gibbs, Emmanuel Candès

    Abstract: We consider the problem of forming prediction sets in an online setting where the distribution generating the data is allowed to vary over time. Previous approaches to this problem suffer from over-weighting historical data and thus may fail to quickly react to the underlying dynamics. Here we correct this issue and develop a novel procedure with provably small regret over all local time intervals… ▽ More

    Submitted 5 October, 2023; v1 submitted 17 August, 2022; originally announced August 2022.

    Comments: 35 pages, 15 figures

  17. arXiv:2110.01052  [pdf, other

    cs.LG cs.AI cs.CV stat.ME stat.ML

    Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

    Authors: Anastasios N. Angelopoulos, Stephen Bates, Emmanuel J. Candès, Michael I. Jordan, Lihua Lei

    Abstract: We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithms work with any underlying model and (unknown) data-generating distribution and do not require model refitting. The framework addresses, among other examples, false discovery rate control in multi-label classification, intersect… ▽ More

    Submitted 29 September, 2022; v1 submitted 3 October, 2021; originally announced October 2021.

    Comments: Code available at https://github.com/aangelopoulos/ltt

  18. arXiv:2106.00170  [pdf, other

    stat.ME cs.LG stat.ML

    Adaptive Conformal Inference Under Distribution Shift

    Authors: Isaac Gibbs, Emmanuel Candès

    Abstract: We develop methods for forming prediction sets in an online setting where the data generating distribution is allowed to vary over time in an unknown fashion. Our framework builds on ideas from conformal inference to provide a general wrapper that can be combined with any black box method that produces point predictions of the unseen label or estimated quantiles of its distribution. While previous… ▽ More

    Submitted 28 October, 2021; v1 submitted 31 May, 2021; originally announced June 2021.

    Comments: 25 pages, 9 figures

  19. arXiv:2006.04292  [pdf, other

    stat.ML cs.LG stat.ME

    Achieving Equalized Odds by Resampling Sensitive Attributes

    Authors: Yaniv Romano, Stephen Bates, Emmanuel J. Candès

    Abstract: We present a flexible framework for learning predictive models that approximately satisfy the equalized odds notion of fairness. This is achieved by introducing a general discrepancy functional that rigorously quantifies violations of this criterion. This differentiable functional is used as a penalty driving the model parameters towards equalized odds. To rigorously evaluate fitted models, we dev… ▽ More

    Submitted 7 June, 2020; originally announced June 2020.

    Comments: 14 pages, 4 figures

  20. arXiv:1908.05428  [pdf, other

    stat.ME cs.CY stat.AP stat.ML

    With Malice Towards None: Assessing Uncertainty via Equalized Coverage

    Authors: Yaniv Romano, Rina Foygel Barber, Chiara Sabatti, Emmanuel J. Candès

    Abstract: An important factor to guarantee a fair use of data-driven recommendation systems is that we should be able to communicate their uncertainty to decision makers. This can be accomplished by constructing prediction intervals, which provide an intuitive measure of the limits of predictive performance. To support equitable treatment, we force the construction of such intervals to be unbiased in the se… ▽ More

    Submitted 15 August, 2019; originally announced August 2019.

    Comments: 14 pages, 1 figure, 1 table

  21. arXiv:1902.02492  [pdf, other

    eess.IV cs.LG math.NA math.OC physics.optics

    Dual-Reference Design for Holographic Coherent Diffraction Imaging

    Authors: David A. Barmherzig, Ju Sun, Emmanuel J. Candès, T. J. Lane, Po-Nan Li

    Abstract: A new reference design is introduced for holographic coherent diffraction imaging. This consists in two references - "block" and "pinhole" shaped regions - placed adjacent to the imaging specimen. An efficient recovery algorithm is provided for the resulting holographic phase retrieval problem, which is based on solving a structured, overdetermined linear system. Analysis of the expected recovery… ▽ More

    Submitted 25 June, 2019; v1 submitted 7 February, 2019; originally announced February 2019.

  22. arXiv:1901.06453  [pdf, other

    cs.IT eess.SP math.NA math.OC

    Holographic Phase Retrieval and Reference Design

    Authors: David A. Barmherzig, Ju Sun, T. J. Lane, Po-Nan Li, Emmanuel J. Candès

    Abstract: A general mathematical framework and recovery algorithm is presented for the holographic phase retrieval problem. In this problem, which arises in holographic coherent diffraction imaging, a "reference" portion of the signal to be recovered via phase retrieval is a priori known from experimental design. A generic formula is also derived for the expected recovery error when the measurement data is… ▽ More

    Submitted 21 April, 2019; v1 submitted 18 January, 2019; originally announced January 2019.

    Comments: 27 pages, 10 figures

  23. arXiv:1706.01191  [pdf, other

    math.ST cs.IT math.PR stat.ML

    The Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square

    Authors: Pragya Sur, Yuxin Chen, Emmanuel J. Candès

    Abstract: Logistic regression is used thousands of times a day to fit data, predict future outcomes, and assess the statistical significance of explanatory variables. When used for the purpose of statistical inference, logistic models produce p-values for the regression coefficients by using an approximation to the distribution of the likelihood-ratio test. Indeed, Wilks' theorem asserts that whenever we ha… ▽ More

    Submitted 5 June, 2017; originally announced June 2017.

    Comments: 58 pages, 7 figures

  24. arXiv:1609.05820  [pdf, other

    cs.IT cs.CV cs.LG math.OC stat.ML

    The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences

    Authors: Yuxin Chen, Emmanuel Candes

    Abstract: Various applications involve assigning discrete label values to a collection of objects based on some pairwise noisy data. Due to the discrete---and hence nonconvex---structure of the problem, computing the optimal assignment (e.g.~maximum likelihood assignment) becomes intractable at first sight. This paper makes progress towards efficient computation by focusing on a concrete joint alignment pro… ▽ More

    Submitted 7 December, 2017; v1 submitted 19 September, 2016; originally announced September 2016.

    Comments: Accepted to Communications on Pure and Applied Mathematics

  25. arXiv:1511.01957  [pdf, other

    math.ST cs.IT stat.ML

    False Discoveries Occur Early on the Lasso Path

    Authors: Weijie Su, Malgorzata Bogdan, Emmanuel Candes

    Abstract: In regression settings where explanatory variables have very low correlations and there are relatively few effects, each of large magnitude, we expect the Lasso to find the important variables with few errors, if any. This paper shows that in a regime of linear sparsity---meaning that the fraction of variables with a non-vanishing effect tends to a constant, however small---this cannot really be t… ▽ More

    Submitted 14 September, 2016; v1 submitted 5 November, 2015; originally announced November 2015.

  26. arXiv:1505.05114  [pdf, other

    cs.IT cs.LG math.NA math.ST stat.ML

    Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems

    Authors: Yuxin Chen, Emmanuel J. Candes

    Abstract: We consider the fundamental problem of solving quadratic systems of equations in $n$ variables, where $y_i = |\langle \boldsymbol{a}_i, \boldsymbol{x} \rangle|^2$, $i = 1, \ldots, m$ and $\boldsymbol{x} \in \mathbb{R}^n$ is unknown. We propose a novel method, which starting with an initial guess computed by means of a spectral method, proceeds by minimizing a nonconvex functional as in the Wirting… ▽ More

    Submitted 22 March, 2016; v1 submitted 19 May, 2015; originally announced May 2015.

    Comments: accepted to Communications on Pure and Applied Mathematics (CPAM)

  27. arXiv:1504.00717  [pdf, other

    cs.IT math.NA math.OC

    Super-Resolution of Positive Sources: the Discrete Setup

    Authors: Veniamin I. Morgenshtern, Emmanuel J. Candes

    Abstract: In single-molecule microscopy it is necessary to locate with high precision point sources from noisy observations of the spectrum of the signal at frequencies capped by $f_c$, which is just about the frequency of natural light. This paper rigorously establishes that this super-resolution problem can be solved via linear programming in a stable manner. We prove that the quality of the reconstructio… ▽ More

    Submitted 2 April, 2015; originally announced April 2015.

    Comments: 31 page, 7 figures

  28. arXiv:1503.08393  [pdf, other

    math.ST cs.IT

    SLOPE is Adaptive to Unknown Sparsity and Asymptotically Minimax

    Authors: Weijie Su, Emmanuel Candes

    Abstract: We consider high-dimensional sparse regression problems in which we observe $y = X β+ z$, where $X$ is an $n \times p$ design matrix and $z$ is an $n$-dimensional vector of independent Gaussian errors, each with variance $σ^2$. Our focus is on the recently introduced SLOPE estimator ((Bogdan et al., 2014)), which regularizes the least-squares estimates with the rank-dependent penalty… ▽ More

    Submitted 23 September, 2015; v1 submitted 29 March, 2015; originally announced March 2015.

    Comments: To appear in the Annals of Statistics

  29. arXiv:1407.1065  [pdf, other

    cs.IT math.FA math.NA math.OC math.ST

    Phase Retrieval via Wirtinger Flow: Theory and Algorithms

    Authors: Emmanuel Candes, Xiaodong Li, Mahdi Soltanolkotabi

    Abstract: We study the problem of recovering the phase from magnitude measurements; specifically, we wish to reconstruct a complex-valued signal x of C^n about which we have phaseless samples of the form y_r = |< a_r,x >|^2, r = 1,2,...,m (knowledge of the phase of these samples would yield a linear system). This paper develops a non-convex formulation of the phase retrieval problem as well as a concrete so… ▽ More

    Submitted 24 November, 2015; v1 submitted 3 July, 2014; originally announced July 2014.

    Comments: IEEE Transactions on Information Theory, Vol. 64 (4), Feb. 2015

  30. arXiv:1310.3240  [pdf, other

    cs.IT math.FA math.NA math.OC math.ST

    Phase Retrieval from Coded Diffraction Patterns

    Authors: Emmanuel Candes, Xiaodong Li, Mahdi Soltanolkotabi

    Abstract: This paper considers the question of recovering the phase of an object from intensity-only measurements, a problem which naturally appears in X-ray crystallography and related disciplines. We study a physically realistic setup where one can modulate the signal of interest and then collect the intensity of its diffraction pattern, each modulation thereby producing a sort of coded diffraction patter… ▽ More

    Submitted 5 November, 2013; v1 submitted 11 October, 2013; originally announced October 2013.

  31. arXiv:1307.4610  [pdf, ps, other

    cs.IT

    Hyperspectral fluorescence microscopy based on Compressive Sampling

    Authors: Makhlad Chahid, Jerome Bobin, Hamed Shams Mousavi, Emmanuel Candes, Maxime Dahan, Vincent Studer

    Abstract: The mathematical theory of compressed sensing (CS) asserts that one can acquire signals from measurements whose rate is much lower than the total bandwidth. Whereas the CS theory is now well developed, challenges concerning hardware implementations of CS-based acquisition devices-especially in optics-have only started being addressed. This paper presents an implementation of compressive sensing in… ▽ More

    Submitted 17 July, 2013; originally announced July 2013.

  32. arXiv:1301.2603  [pdf, ps, other

    cs.LG cs.IT math.OC math.ST stat.ML

    Robust subspace clustering

    Authors: Mahdi Soltanolkotabi, Ehsan Elhamifar, Emmanuel J. Candès

    Abstract: Subspace clustering refers to the task of finding a multi-subspace representation that best fits a collection of points taken from a high-dimensional space. This paper introduces an algorithm inspired by sparse subspace clustering (SSC) [In IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2009) 2790-2797] to cluster noisy data, and develops some novel theory demonstrating its corr… ▽ More

    Submitted 23 May, 2014; v1 submitted 11 January, 2013; originally announced January 2013.

    Comments: Published in at http://dx.doi.org/10.1214/13-AOS1199 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS1199

    Journal ref: Annals of Statistics 2014, Vol. 42, No. 2, 669-699

  33. arXiv:1211.0817  [pdf, ps, other

    math.ST cs.LG stat.ML

    Discussion: Latent variable graphical model selection via convex optimization

    Authors: Emmanuel J. Candés, Mahdi Soltanolkotabi

    Abstract: Discussion of "Latent variable graphical model selection via convex optimization" by Venkat Chandrasekaran, Pablo A. Parrilo and Alan S. Willsky [arXiv:1008.1290].

    Submitted 5 November, 2012; originally announced November 2012.

    Comments: Published in at http://dx.doi.org/10.1214/12-AOS1001 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS1001

    Journal ref: Annals of Statistics 2012, Vol. 40, No. 4, 1997-2004

  34. arXiv:1211.0290  [pdf, other

    cs.IT math.NA

    Super-Resolution from Noisy Data

    Authors: Emmanuel Candes, Carlos Fernandez-Granda

    Abstract: This paper studies the recovery of a superposition of point sources from noisy bandlimited data. In the fewest possible words, we only have information about the spectrum of an object in a low-frequency band bounded by a certain cut-off frequency and seek to obtain a higher resolution estimate by extrapolating the spectrum up to a higher frequency. We show that as long as the sources are separated… ▽ More

    Submitted 9 July, 2013; v1 submitted 1 November, 2012; originally announced November 2012.

    Comments: 20 pages, 3 figures

  35. arXiv:1209.2262  [pdf, other

    cs.IT physics.ins-det

    A single-photon sampling architecture for solid-state imaging

    Authors: Ewout van den Berg, Emmanuel Candes, Garry Chinn, Craig Levin, Peter Olcott, Carlos Sing-Long

    Abstract: Advances in solid-state technology have enabled the development of silicon photomultiplier sensor arrays capable of sensing individual photons. Combined with high-frequency time-to-digital converters (TDCs), this technology opens up the prospect of sensors capable of recording with high accuracy both the time and location of each detected photon. Such a capability could lead to significant improve… ▽ More

    Submitted 11 September, 2012; originally announced September 2012.

    Comments: 24 pages, 3 figures, 5 tables

  36. arXiv:1208.6247  [pdf, ps, other

    cs.IT math.NA

    Solving Quadratic Equations via PhaseLift when There Are About As Many Equations As Unknowns

    Authors: Emmanuel J. Candes, Xiaodong Li

    Abstract: This note shows that we can recover a complex vector x in C^n exactly from on the order of n quadratic equations of the form |<a_i, x>|^2 = b_i, i = 1, ..., m, by using a semidefinite program known as PhaseLift. This improves upon earlier bounds in [3], which required the number of equations to be at least on the order of n log n. We also demonstrate optimal recovery results from noisy quadratic m… ▽ More

    Submitted 17 September, 2012; v1 submitted 30 August, 2012; originally announced August 2012.

    Comments: 6 pages

  37. arXiv:1203.5871  [pdf, other

    cs.IT math.NA

    Towards a Mathematical Theory of Super-Resolution

    Authors: Emmanuel Candes, Carlos Fernandez-Granda

    Abstract: This paper develops a mathematical theory of super-resolution. Broadly speaking, super-resolution is the problem of recovering the fine details of an object---the high end of its spectrum---from coarse scale information only---from samples at the low end of the spectrum. Suppose we have many point sources at unknown locations in $[0,1]$ and with unknown complex-valued amplitudes. We only observe F… ▽ More

    Submitted 13 November, 2012; v1 submitted 27 March, 2012; originally announced March 2012.

    Comments: 48 pages, 12 figures

  38. arXiv:1112.4258  [pdf, ps, other

    cs.IT cs.LG math.ST stat.ML

    A geometric analysis of subspace clustering with outliers

    Authors: Mahdi Soltanolkotabi, Emmanuel J. Candés

    Abstract: This paper considers the problem of clustering a collection of unlabeled data points assumed to lie near a union of lower-dimensional planes. As is common in computer vision or unsupervised learning applications, we do not know in advance how many subspaces there are nor do we have any information about their dimensions. We develop a novel geometric analysis of an algorithm named sparse subspace c… ▽ More

    Submitted 30 January, 2013; v1 submitted 19 December, 2011; originally announced December 2011.

    Comments: Published in at http://dx.doi.org/10.1214/12-AOS1034 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS1034

    Journal ref: Annals of Statistics 2012, Vol. 40, No. 4, 2195-2238

  39. arXiv:1111.4646  [pdf, other

    math.ST cs.IT

    On the Fundamental Limits of Adaptive Sensing

    Authors: Ery Arias-Castro, Emmanuel J. Candes, Mark Davenport

    Abstract: Suppose we can sequentially acquire arbitrary linear measurements of an n-dimensional vector x resulting in the linear model y = Ax + z, where z represents measurement noise. If the signal is known to be sparse, one would expect the following folk theorem to be true: choosing an adaptive strategy which cleverly selects the next row of A based on what has been previously observed should do far bett… ▽ More

    Submitted 13 August, 2012; v1 submitted 20 November, 2011; originally announced November 2011.

  40. arXiv:1109.4499  [pdf, other

    cs.IT math.NA

    PhaseLift: Exact and Stable Signal Recovery from Magnitude Measurements via Convex Programming

    Authors: Emmanuel J. Candes, Thomas Strohmer, Vladislav Voroninski

    Abstract: Suppose we wish to recover a signal x in C^n from m intensity measurements of the form |<x,z_i>|^2, i = 1, 2,..., m; that is, from data in which phase information is missing. We prove that if the vectors z_i are sampled independently and uniformly at random on the unit sphere, then the signal x can be recovered exactly (up to a global phase factor) by solving a convenient semidefinite program---a… ▽ More

    Submitted 21 September, 2011; originally announced September 2011.

  41. arXiv:1109.0573  [pdf, other

    cs.IT math.NA

    Phase Retrieval via Matrix Completion

    Authors: Emmanuel J. Candes, Yonina Eldar, Thomas Strohmer, Vlad Voroninski

    Abstract: This paper develops a novel framework for phase retrieval, a problem which arises in X-ray crystallography, diffraction imaging, astronomical imaging and many other applications. Our approach combines multiple structured illuminations together with ideas from convex programming to recover the phase from intensity measurements, typically from the modulus of the diffracted wave. We demonstrate empir… ▽ More

    Submitted 20 September, 2011; v1 submitted 2 September, 2011; originally announced September 2011.

  42. arXiv:1106.1474  [pdf, ps, other

    cs.IT

    Simple Bounds for Recovering Low-complexity Models

    Authors: Emmanuel Candes, Benjamin Recht

    Abstract: This note presents a unified analysis of the recovery of simple objects from random linear measurements. When the linear functionals are Gaussian, we show that an s-sparse vector in R^n can be efficiently recovered from 2s log n measurements with high probability and a rank r, n by n matrix can be efficiently recovered from r(6n-5r) with high probability. For sparse vectors, this is within an addi… ▽ More

    Submitted 28 February, 2012; v1 submitted 7 June, 2011; originally announced June 2011.

  43. arXiv:1104.5246  [pdf, ps, other

    cs.IT math.ST

    How well can we estimate a sparse vector?

    Authors: Emmanuel J. Candès, Mark A. Davenport

    Abstract: The estimation of a sparse vector in the linear model is a fundamental problem in signal processing, statistics, and compressive sensing. This paper establishes a lower bound on the mean-squared error, which holds regardless of the sensing/design matrix being used and regardless of the estimation procedure. This lower bound very nearly matches the known upper bound one gets by taking a random proj… ▽ More

    Submitted 1 March, 2013; v1 submitted 27 April, 2011; originally announced April 2011.

  44. arXiv:1011.3854  [pdf, ps, other

    cs.IT

    A probabilistic and RIPless theory of compressed sensing

    Authors: Emmanuel J. Candes, Yaniv Plan

    Abstract: This paper introduces a simple and very general theory of compressive sensing. In this theory, the sensing mechanism simply selects sensing vectors independently at random from a probability distribution F; it includes all models - e.g. Gaussian, frequency measurements - discussed in the literature, but also provides a framework for new measurement strategies as well. We prove that if the probabil… ▽ More

    Submitted 19 November, 2010; v1 submitted 16 November, 2010; originally announced November 2010.

    Comments: 36 pages

  45. arXiv:1005.2613  [pdf, other

    math.NA cs.IT

    Compressed Sensing with Coherent and Redundant Dictionaries

    Authors: Emmanuel J. Candes, Yonina C. Eldar, Deanna Needell, Paige Randall

    Abstract: This article presents novel results concerning the recovery of signals from undersampled data in the common situation where such signals are not sparse in an orthonormal basis or incoherent dictionary, but in a truly redundant dictionary. This work thus bridges a gap in the literature and shows not only that compressed sensing is viable in this context, but also that accurate recovery is possible… ▽ More

    Submitted 4 December, 2010; v1 submitted 14 May, 2010; originally announced May 2010.

    MSC Class: 94A12; 41A45; 42A10

  46. arXiv:1001.2363  [pdf, other

    cs.IT

    Stable Principal Component Pursuit

    Authors: Zihan Zhou, Xiaodong Li, John Wright, Emmanuel Candes, Yi Ma

    Abstract: In this paper, we study the problem of recovering a low-rank matrix (the principal components) from a high-dimensional data matrix despite both small entry-wise noise and gross sparse errors. Recently, it has been shown that a convex program, named Principal Component Pursuit (PCP), can recover the low-rank matrix when the data matrix is corrupted by gross sparse errors. We further prove that th… ▽ More

    Submitted 13 January, 2010; originally announced January 2010.

    Comments: 5-page paper submitted to ISIT 2010

  47. arXiv:1001.2362  [pdf, other

    cs.IT

    Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit

    Authors: Arvind Ganesh, John Wright, Xiaodong Li, Emmanuel J. Candes, Yi Ma

    Abstract: We consider the problem of recovering a low-rank matrix when some of its entries, whose locations are not known a priori, are corrupted by errors of arbitrarily large magnitude. It has recently been shown that this problem can be solved efficiently and effectively by a convex program named Principal Component Pursuit (PCP), provided that the fraction of corrupted entries and the rank of the matr… ▽ More

    Submitted 22 January, 2010; v1 submitted 13 January, 2010; originally announced January 2010.

    Comments: Submitted to ISIT 2010

  48. arXiv:1001.0339  [pdf, ps, other

    cs.IT

    Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements

    Authors: Emmanuel J. Candes, Yaniv Plan

    Abstract: This paper presents several novel theoretical results regarding the recovery of a low-rank matrix from just a few measurements consisting of linear combinations of the matrix entries. We show that properly constrained nuclear-norm minimization stably recovers a low-rank matrix from a constant number of noisy measurements per degree of freedom; this seems to be the first result of this nature. Fu… ▽ More

    Submitted 2 January, 2010; originally announced January 2010.

    Comments: 30 pages

  49. arXiv:0912.3599  [pdf, other

    cs.IT

    Robust Principal Component Analysis?

    Authors: Emmanuel J. Candes, Xiaodong Li, Yi Ma, John Wright

    Abstract: This paper is about a curious phenomenon. Suppose we have a data matrix, which is the superposition of a low-rank component and a sparse component. Can we recover each component individually? We prove that under some suitable assumptions, it is possible to recover both the low-rank and the sparse components exactly by solving a very convenient convex program called Principal Component Pursuit; a… ▽ More

    Submitted 18 December, 2009; originally announced December 2009.

  50. arXiv:0910.0413  [pdf, ps, other

    cs.IT

    Accurate low-rank matrix recovery from a small number of linear measurements

    Authors: Emmanuel J. Candes, Yaniv Plan

    Abstract: We consider the problem of recovering a lowrank matrix M from a small number of random linear measurements. A popular and useful example of this problem is matrix completion, in which the measurements reveal the values of a subset of the entries, and we wish to fill in the missing entries (this is the famous Netflix problem). When M is believed to have low rank, one would ideally try to recover… ▽ More

    Submitted 2 October, 2009; originally announced October 2009.

    Comments: 8 pages, 1 table