-
Probably Approximately Correct Labels
Authors:
Emmanuel J. Candès,
Andrew Ilyas,
Tijana Zrnic
Abstract:
Obtaining high-quality labeled datasets is often costly, requiring either extensive human annotation or expensive experiments. We propose a method that supplements such "expert" labels with AI predictions from pre-trained models to construct labeled datasets more cost-effectively. Our approach results in probably approximately correct labels: with high probability, the overall labeling error is sm…
▽ More
Obtaining high-quality labeled datasets is often costly, requiring either extensive human annotation or expensive experiments. We propose a method that supplements such "expert" labels with AI predictions from pre-trained models to construct labeled datasets more cost-effectively. Our approach results in probably approximately correct labels: with high probability, the overall labeling error is small. This solution enables rigorous yet efficient dataset curation using modern AI models. We demonstrate the benefits of the methodology through text annotation with large language models, image labeling with pre-trained vision models, and protein folding analysis with AlphaFold.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
Automated Hypothesis Validation with Agentic Sequential Falsifications
Authors:
Kexin Huang,
Ying Jin,
Ryan Li,
Michael Y. Li,
Emmanuel Candès,
Jure Leskovec
Abstract:
Hypotheses are central to information acquisition, decision-making, and discovery. However, many real-world hypotheses are abstract, high-level statements that are difficult to validate directly. This challenge is further intensified by the rise of hypothesis generation from Large Language Models (LLMs), which are prone to hallucination and produce hypotheses in volumes that make manual validation…
▽ More
Hypotheses are central to information acquisition, decision-making, and discovery. However, many real-world hypotheses are abstract, high-level statements that are difficult to validate directly. This challenge is further intensified by the rise of hypothesis generation from Large Language Models (LLMs), which are prone to hallucination and produce hypotheses in volumes that make manual validation impractical. Here we propose Popper, an agentic framework for rigorous automated validation of free-form hypotheses. Guided by Karl Popper's principle of falsification, Popper validates a hypothesis using LLM agents that design and execute falsification experiments targeting its measurable implications. A novel sequential testing framework ensures strict Type-I error control while actively gathering evidence from diverse observations, whether drawn from existing data or newly conducted procedures. We demonstrate Popper on six domains including biology, economics, and sociology. Popper delivers robust error control, high power, and scalability. Furthermore, compared to human scientists, Popper achieved comparable performance in validating complex biological hypotheses while reducing time by 10 folds, providing a scalable, rigorous solution for hypothesis validation.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
s1: Simple test-time scaling
Authors:
Niklas Muennighoff,
Zitong Yang,
Weijia Shi,
Xiang Lisa Li,
Li Fei-Fei,
Hannaneh Hajishirzi,
Luke Zettlemoyer,
Percy Liang,
Emmanuel Candès,
Tatsunori Hashimoto
Abstract:
Test-time scaling is a promising new approach to language modeling that uses extra test-time compute to improve performance. Recently, OpenAI's o1 model showed this capability but did not publicly share its methodology, leading to many replication efforts. We seek the simplest approach to achieve test-time scaling and strong reasoning performance. First, we curate a small dataset s1K of 1,000 ques…
▽ More
Test-time scaling is a promising new approach to language modeling that uses extra test-time compute to improve performance. Recently, OpenAI's o1 model showed this capability but did not publicly share its methodology, leading to many replication efforts. We seek the simplest approach to achieve test-time scaling and strong reasoning performance. First, we curate a small dataset s1K of 1,000 questions paired with reasoning traces relying on three criteria we validate through ablations: difficulty, diversity, and quality. Second, we develop budget forcing to control test-time compute by forcefully terminating the model's thinking process or lengthening it by appending "Wait" multiple times to the model's generation when it tries to end. This can lead the model to double-check its answer, often fixing incorrect reasoning steps. After supervised finetuning the Qwen2.5-32B-Instruct language model on s1K and equipping it with budget forcing, our model s1-32B exceeds o1-preview on competition math questions by up to 27% (MATH and AIME24). Further, scaling s1-32B with budget forcing allows extrapolating beyond its performance without test-time intervention: from 50% to 57% on AIME24. Our model, data, and code are open-source at https://github.com/simplescaling/s1
△ Less
Submitted 1 March, 2025; v1 submitted 31 January, 2025;
originally announced January 2025.
-
RandALO: Out-of-sample risk estimation in no time flat
Authors:
Parth Nobel,
Daniel LeJeune,
Emmanuel J. Candès
Abstract:
Estimating out-of-sample risk for models trained on large high-dimensional datasets is an expensive but essential part of the machine learning process, enabling practitioners to optimally tune hyperparameters. Cross-validation (CV) serves as the de facto standard for risk estimation but poorly trades off high bias ($K$-fold CV) for computational cost (leave-one-out CV). We propose a randomized app…
▽ More
Estimating out-of-sample risk for models trained on large high-dimensional datasets is an expensive but essential part of the machine learning process, enabling practitioners to optimally tune hyperparameters. Cross-validation (CV) serves as the de facto standard for risk estimation but poorly trades off high bias ($K$-fold CV) for computational cost (leave-one-out CV). We propose a randomized approximate leave-one-out (RandALO) risk estimator that is not only a consistent estimator of risk in high dimensions but also less computationally expensive than $K$-fold CV. We support our claims with extensive simulations on synthetic and real data and provide a user-friendly Python package implementing RandALO available on PyPI as randalo and at https://github.com/cvxgrp/randalo.
△ Less
Submitted 25 April, 2025; v1 submitted 15 September, 2024;
originally announced September 2024.
-
Synthetic continued pretraining
Authors:
Zitong Yang,
Neil Band,
Shuangping Li,
Emmanuel Candès,
Tatsunori Hashimoto
Abstract:
Pretraining on large-scale, unstructured internet text enables language models to acquire a significant amount of world knowledge. However, this knowledge acquisition is data-inefficient--to learn a given fact, models must be trained on hundreds to thousands of diverse representations of it. This poses a challenge when adapting a pretrained model to a small corpus of domain-specific documents, whe…
▽ More
Pretraining on large-scale, unstructured internet text enables language models to acquire a significant amount of world knowledge. However, this knowledge acquisition is data-inefficient--to learn a given fact, models must be trained on hundreds to thousands of diverse representations of it. This poses a challenge when adapting a pretrained model to a small corpus of domain-specific documents, where each fact may appear rarely or only once. We propose to bridge this gap with synthetic continued pretraining: using the small domain-specific corpus to synthesize a large corpus more amenable to learning, and then performing continued pretraining on the synthesized corpus. We instantiate this proposal with EntiGraph, a synthetic data augmentation algorithm that extracts salient entities from the source documents and then generates diverse text by drawing connections between the sampled entities. Synthetic continued pretraining with EntiGraph enables a language model to answer questions and follow generic instructions related to the source documents without access to them. If, instead, the source documents are available at inference time, we show that the knowledge acquired through our approach compounds with retrieval-augmented generation. To better understand these results, we build a simple mathematical model of EntiGraph, and show how synthetic data augmentation can "rearrange" knowledge to enable more data-efficient learning.
△ Less
Submitted 3 October, 2024; v1 submitted 11 September, 2024;
originally announced September 2024.
-
Can Unconfident LLM Annotations Be Used for Confident Conclusions?
Authors:
Kristina Gligorić,
Tijana Zrnic,
Cinoo Lee,
Emmanuel J. Candès,
Dan Jurafsky
Abstract:
Large language models (LLMs) have shown high agreement with human raters across a variety of tasks, demonstrating potential to ease the challenges of human data collection. In computational social science (CSS), researchers are increasingly leveraging LLM annotations to complement slow and expensive human annotations. Still, guidelines for collecting and using LLM annotations, without compromising…
▽ More
Large language models (LLMs) have shown high agreement with human raters across a variety of tasks, demonstrating potential to ease the challenges of human data collection. In computational social science (CSS), researchers are increasingly leveraging LLM annotations to complement slow and expensive human annotations. Still, guidelines for collecting and using LLM annotations, without compromising the validity of downstream conclusions, remain limited. We introduce Confidence-Driven Inference: a method that combines LLM annotations and LLM confidence indicators to strategically select which human annotations should be collected, with the goal of producing accurate statistical estimates and provably valid confidence intervals while reducing the number of human annotations needed. Our approach comes with safeguards against LLM annotations of poor quality, guaranteeing that the conclusions will be both valid and no less accurate than if we only relied on human annotations. We demonstrate the effectiveness of Confidence-Driven Inference over baselines in statistical estimation tasks across three CSS settings--text politeness, stance, and bias--reducing the needed number of human annotations by over 25% in each. Although we use CSS settings for demonstration, Confidence-Driven Inference can be used to estimate most standard quantities across a broad range of NLP problems.
△ Less
Submitted 8 February, 2025; v1 submitted 27 August, 2024;
originally announced August 2024.
-
Large language model validity via enhanced conformal prediction methods
Authors:
John J. Cherian,
Isaac Gibbs,
Emmanuel J. Candès
Abstract:
We develop new conformal inference methods for obtaining validity guarantees on the output of large language models (LLMs). Prior work in conformal language modeling identifies a subset of the text that satisfies a high-probability guarantee of correctness. These methods work by filtering claims from the LLM's original response if a scoring function evaluated on the claim fails to exceed a thresho…
▽ More
We develop new conformal inference methods for obtaining validity guarantees on the output of large language models (LLMs). Prior work in conformal language modeling identifies a subset of the text that satisfies a high-probability guarantee of correctness. These methods work by filtering claims from the LLM's original response if a scoring function evaluated on the claim fails to exceed a threshold calibrated via split conformal prediction. Existing methods in this area suffer from two deficiencies. First, the guarantee stated is not conditionally valid. The trustworthiness of the filtering step may vary based on the topic of the response. Second, because the scoring function is imperfect, the filtering step can remove many valuable and accurate claims. We address both of these challenges via two new conformal methods. First, we generalize the conditional conformal procedure of Gibbs et al. (2023) in order to adaptively issue weaker guarantees when they are required to preserve the utility of the output. Second, we show how to systematically improve the quality of the scoring function via a novel algorithm for differentiating through the conditional conformal procedure. We demonstrate the efficacy of our approach on biography and medical question-answering datasets.
△ Less
Submitted 31 October, 2024; v1 submitted 14 June, 2024;
originally announced June 2024.
-
Active Statistical Inference
Authors:
Tijana Zrnic,
Emmanuel J. Candès
Abstract:
Inspired by the concept of active learning, we propose active inference$\unicode{x2013}$a methodology for statistical inference with machine-learning-assisted data collection. Assuming a budget on the number of labels that can be collected, the methodology uses a machine learning model to identify which data points would be most beneficial to label, thus effectively utilizing the budget. It operat…
▽ More
Inspired by the concept of active learning, we propose active inference$\unicode{x2013}$a methodology for statistical inference with machine-learning-assisted data collection. Assuming a budget on the number of labels that can be collected, the methodology uses a machine learning model to identify which data points would be most beneficial to label, thus effectively utilizing the budget. It operates on a simple yet powerful intuition: prioritize the collection of labels for data points where the model exhibits uncertainty, and rely on the model's predictions where it is confident. Active inference constructs provably valid confidence intervals and hypothesis tests while leveraging any black-box machine learning model and handling any data distribution. The key point is that it achieves the same level of accuracy with far fewer samples than existing baselines relying on non-adaptively-collected data. This means that for the same number of collected samples, active inference enables smaller confidence intervals and more powerful p-values. We evaluate active inference on datasets from public opinion research, census analysis, and proteomics.
△ Less
Submitted 29 May, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
A Library of Mirrors: Deep Neural Nets in Low Dimensions are Convex Lasso Models with Reflection Features
Authors:
Emi Zeger,
Yifei Wang,
Aaron Mishkin,
Tolga Ergen,
Emmanuel Candès,
Mert Pilanci
Abstract:
We prove that training neural networks on 1-D data is equivalent to solving convex Lasso problems with discrete, explicitly defined dictionary matrices. We consider neural networks with piecewise linear activations and depths ranging from 2 to an arbitrary but finite number of layers. We first show that two-layer networks with piecewise linear activations are equivalent to Lasso models using a dis…
▽ More
We prove that training neural networks on 1-D data is equivalent to solving convex Lasso problems with discrete, explicitly defined dictionary matrices. We consider neural networks with piecewise linear activations and depths ranging from 2 to an arbitrary but finite number of layers. We first show that two-layer networks with piecewise linear activations are equivalent to Lasso models using a discrete dictionary of ramp functions, with breakpoints corresponding to the training data points. In certain general architectures with absolute value or ReLU activations, a third layer surprisingly creates features that reflect the training data about themselves. Additional layers progressively generate reflections of these reflections. The Lasso representation provides valuable insights into the analysis of globally optimal networks, elucidating their solution landscapes and enabling closed-form solutions in certain special cases. Numerical results show that reflections also occur when optimizing standard deep networks using standard non-convex optimizers. Additionally, we demonstrate our theory with autoregressive time series models.
△ Less
Submitted 23 July, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
Bellman Conformal Inference: Calibrating Prediction Intervals For Time Series
Authors:
Zitong Yang,
Emmanuel Candès,
Lihua Lei
Abstract:
We introduce Bellman Conformal Inference (BCI), a framework that wraps around any time series forecasting models and provides approximately calibrated prediction intervals. Unlike existing methods, BCI is able to leverage multi-step ahead forecasts and explicitly optimize the average interval lengths by solving a one-dimensional stochastic control problem (SCP) at each time step. In particular, we…
▽ More
We introduce Bellman Conformal Inference (BCI), a framework that wraps around any time series forecasting models and provides approximately calibrated prediction intervals. Unlike existing methods, BCI is able to leverage multi-step ahead forecasts and explicitly optimize the average interval lengths by solving a one-dimensional stochastic control problem (SCP) at each time step. In particular, we use the dynamic programming algorithm to find the optimal policy for the SCP. We prove that BCI achieves long-term coverage under arbitrary distribution shifts and temporal dependence, even with poor multi-step ahead forecasts. We find empirically that BCI avoids uninformative intervals that have infinite lengths and generates substantially shorter prediction intervals in multiple applications when compared with existing methods.
△ Less
Submitted 9 February, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
Cross-Prediction-Powered Inference
Authors:
Tijana Zrnic,
Emmanuel J. Candès
Abstract:
While reliable data-driven decision-making hinges on high-quality labeled data, the acquisition of quality labels often involves laborious human annotations or slow and expensive scientific measurements. Machine learning is becoming an appealing alternative as sophisticated predictive techniques are being used to quickly and cheaply produce large amounts of predicted labels; e.g., predicted protei…
▽ More
While reliable data-driven decision-making hinges on high-quality labeled data, the acquisition of quality labels often involves laborious human annotations or slow and expensive scientific measurements. Machine learning is becoming an appealing alternative as sophisticated predictive techniques are being used to quickly and cheaply produce large amounts of predicted labels; e.g., predicted protein structures are used to supplement experimentally derived structures, predictions of socioeconomic indicators from satellite imagery are used to supplement accurate survey data, and so on. Since predictions are imperfect and potentially biased, this practice brings into question the validity of downstream inferences. We introduce cross-prediction: a method for valid inference powered by machine learning. With a small labeled dataset and a large unlabeled dataset, cross-prediction imputes the missing labels via machine learning and applies a form of debiasing to remedy the prediction inaccuracies. The resulting inferences achieve the desired error probability and are more powerful than those that only leverage the labeled data. Closely related is the recent proposal of prediction-powered inference, which assumes that a good pre-trained model is already available. We show that cross-prediction is consistently more powerful than an adaptation of prediction-powered inference in which a fraction of the labeled data is split off and used to train the model. Finally, we observe that cross-prediction gives more stable conclusions than its competitors; its confidence intervals typically have significantly lower variability.
△ Less
Submitted 28 February, 2024; v1 submitted 28 September, 2023;
originally announced September 2023.
-
Conformal PID Control for Time Series Prediction
Authors:
Anastasios N. Angelopoulos,
Emmanuel J. Candes,
Ryan J. Tibshirani
Abstract:
We study the problem of uncertainty quantification for time series prediction, with the goal of providing easy-to-use algorithms with formal guarantees. The algorithms we present build upon ideas from conformal prediction and control theory, are able to prospectively model conformal scores in an online setting, and adapt to the presence of systematic errors due to seasonality, trends, and general…
▽ More
We study the problem of uncertainty quantification for time series prediction, with the goal of providing easy-to-use algorithms with formal guarantees. The algorithms we present build upon ideas from conformal prediction and control theory, are able to prospectively model conformal scores in an online setting, and adapt to the presence of systematic errors due to seasonality, trends, and general distribution shifts. Our theory both simplifies and strengthens existing analyses in online conformal prediction. Experiments on 4-week-ahead forecasting of statewide COVID-19 death counts in the U.S. show an improvement in coverage over the ensemble forecaster used in official CDC communications. We also run experiments on predicting electricity demand, market returns, and temperature using autoregressive, Theta, Prophet, and Transformer models. We provide an extendable codebase for testing our methods and for the integration of new algorithms, data sets, and forecasting rules.
△ Less
Submitted 31 July, 2023;
originally announced July 2023.
-
Uncertainty Quantification over Graph with Conformalized Graph Neural Networks
Authors:
Kexin Huang,
Ying Jin,
Emmanuel Candès,
Jure Leskovec
Abstract:
Graph Neural Networks (GNNs) are powerful machine learning prediction models on graph-structured data. However, GNNs lack rigorous uncertainty estimates, limiting their reliable deployment in settings where the cost of errors is significant. We propose conformalized GNN (CF-GNN), extending conformal prediction (CP) to graph-based models for guaranteed uncertainty estimates. Given an entity in the…
▽ More
Graph Neural Networks (GNNs) are powerful machine learning prediction models on graph-structured data. However, GNNs lack rigorous uncertainty estimates, limiting their reliable deployment in settings where the cost of errors is significant. We propose conformalized GNN (CF-GNN), extending conformal prediction (CP) to graph-based models for guaranteed uncertainty estimates. Given an entity in the graph, CF-GNN produces a prediction set/interval that provably contains the true label with pre-defined coverage probability (e.g. 90%). We establish a permutation invariance condition that enables the validity of CP on graph data and provide an exact characterization of the test-time coverage. Moreover, besides valid coverage, it is crucial to reduce the prediction set size/interval length for practical use. We observe a key connection between non-conformity scores and network structures, which motivates us to develop a topology-aware output correction model that learns to update the prediction and produces more efficient prediction sets/intervals. Extensive experiments show that CF-GNN achieves any pre-defined target marginal coverage while significantly reducing the prediction set/interval size by up to 74% over the baselines. It also empirically achieves satisfactory conditional coverage over various raw and network features.
△ Less
Submitted 30 October, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Statistical Inference for Fairness Auditing
Authors:
John J. Cherian,
Emmanuel J. Candès
Abstract:
Before deploying a black-box model in high-stakes problems, it is important to evaluate the model's performance on sensitive subpopulations. For example, in a recidivism prediction task, we may wish to identify demographic groups for which our prediction model has unacceptably high false positive rates or certify that no such groups exist. In this paper, we frame this task, often referred to as "f…
▽ More
Before deploying a black-box model in high-stakes problems, it is important to evaluate the model's performance on sensitive subpopulations. For example, in a recidivism prediction task, we may wish to identify demographic groups for which our prediction model has unacceptably high false positive rates or certify that no such groups exist. In this paper, we frame this task, often referred to as "fairness auditing," in terms of multiple hypothesis testing. We show how the bootstrap can be used to simultaneously bound performance disparities over a collection of groups with statistical guarantees. Our methods can be used to flag subpopulations affected by model underperformance, and certify subpopulations for which the model performs adequately. Crucially, our audit is model-agnostic and applicable to nearly any performance metric or group fairness criterion. Our methods also accommodate extremely rich -- even infinite -- collections of subpopulations. Further, we generalize beyond subpopulations by showing how to assess performance over certain distribution shifts. We test the proposed methods on benchmark datasets in predictive inference and algorithmic fairness and find that our audits can provide interpretable and trustworthy guarantees.
△ Less
Submitted 8 June, 2023; v1 submitted 5 May, 2023;
originally announced May 2023.
-
Overparameterized ReLU Neural Networks Learn the Simplest Models: Neural Isometry and Exact Recovery
Authors:
Yifei Wang,
Yixuan Hua,
Emmanuel Candés,
Mert Pilanci
Abstract:
The practice of deep learning has shown that neural networks generalize remarkably well even with an extreme number of learned parameters. This appears to contradict traditional statistical wisdom, in which a trade-off between model complexity and fit to the data is essential. We aim to address this discrepancy by adopting a convex optimization and sparse recovery perspective. We consider the trai…
▽ More
The practice of deep learning has shown that neural networks generalize remarkably well even with an extreme number of learned parameters. This appears to contradict traditional statistical wisdom, in which a trade-off between model complexity and fit to the data is essential. We aim to address this discrepancy by adopting a convex optimization and sparse recovery perspective. We consider the training and generalization properties of two-layer ReLU networks with standard weight decay regularization. Under certain regularity assumptions on the data, we show that ReLU networks with an arbitrary number of parameters learn only simple models that explain the data. This is analogous to the recovery of the sparsest linear model in compressed sensing. For ReLU networks and their variants with skip connections or normalization layers, we present isometry conditions that ensure the exact recovery of planted neurons. For randomly generated data, we show the existence of a phase transition in recovering planted neural network models, which is easy to describe: whenever the ratio between the number of samples and the dimension exceeds a numerical threshold, the recovery succeeds with high probability; otherwise, it fails with high probability. Surprisingly, ReLU networks learn simple and sparse models that generalize well even when the labels are noisy . The phase transition phenomenon is confirmed through numerical experiments.
△ Less
Submitted 17 February, 2023; v1 submitted 30 September, 2022;
originally announced September 2022.
-
Conformal Inference for Online Prediction with Arbitrary Distribution Shifts
Authors:
Isaac Gibbs,
Emmanuel Candès
Abstract:
We consider the problem of forming prediction sets in an online setting where the distribution generating the data is allowed to vary over time. Previous approaches to this problem suffer from over-weighting historical data and thus may fail to quickly react to the underlying dynamics. Here we correct this issue and develop a novel procedure with provably small regret over all local time intervals…
▽ More
We consider the problem of forming prediction sets in an online setting where the distribution generating the data is allowed to vary over time. Previous approaches to this problem suffer from over-weighting historical data and thus may fail to quickly react to the underlying dynamics. Here we correct this issue and develop a novel procedure with provably small regret over all local time intervals of a given width. We achieve this by modifying the adaptive conformal inference (ACI) algorithm of Gibbs and Candès (2021) to contain an additional step in which the step-size parameter of ACI's gradient descent update is tuned over time. Crucially, this means that unlike ACI, which requires knowledge of the rate of change of the data-generating mechanism, our new procedure is adaptive to both the size and type of the distribution shift. Our methods are highly flexible and can be used in combination with any baseline predictive algorithm that produces point estimates or estimated quantiles of the target without the need for distributional assumptions. We test our techniques on two real-world datasets aimed at predicting stock market volatility and COVID-19 case counts and find that they are robust and adaptive to real-world distribution shifts.
△ Less
Submitted 5 October, 2023; v1 submitted 17 August, 2022;
originally announced August 2022.
-
Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control
Authors:
Anastasios N. Angelopoulos,
Stephen Bates,
Emmanuel J. Candès,
Michael I. Jordan,
Lihua Lei
Abstract:
We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithms work with any underlying model and (unknown) data-generating distribution and do not require model refitting. The framework addresses, among other examples, false discovery rate control in multi-label classification, intersect…
▽ More
We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithms work with any underlying model and (unknown) data-generating distribution and do not require model refitting. The framework addresses, among other examples, false discovery rate control in multi-label classification, intersection-over-union control in instance segmentation, and the simultaneous control of the type-1 error of outlier detection and confidence set coverage in classification or regression. Our main insight is to reframe the risk-control problem as multiple hypothesis testing, enabling techniques and mathematical arguments different from those in the previous literature. We use the framework to provide new calibration methods for several core machine learning tasks, with detailed worked examples in computer vision and tabular medical data.
△ Less
Submitted 29 September, 2022; v1 submitted 3 October, 2021;
originally announced October 2021.
-
Adaptive Conformal Inference Under Distribution Shift
Authors:
Isaac Gibbs,
Emmanuel Candès
Abstract:
We develop methods for forming prediction sets in an online setting where the data generating distribution is allowed to vary over time in an unknown fashion. Our framework builds on ideas from conformal inference to provide a general wrapper that can be combined with any black box method that produces point predictions of the unseen label or estimated quantiles of its distribution. While previous…
▽ More
We develop methods for forming prediction sets in an online setting where the data generating distribution is allowed to vary over time in an unknown fashion. Our framework builds on ideas from conformal inference to provide a general wrapper that can be combined with any black box method that produces point predictions of the unseen label or estimated quantiles of its distribution. While previous conformal inference methods rely on the assumption that the data points are exchangeable, our adaptive approach provably achieves the desired coverage frequency over long-time intervals irrespective of the true data generating process. We accomplish this by modelling the distribution shift as a learning problem in a single parameter whose optimal value is varying over time and must be continuously re-estimated. We test our method, adaptive conformal inference, on two real world datasets and find that its predictions are robust to visible and significant distribution shifts.
△ Less
Submitted 28 October, 2021; v1 submitted 31 May, 2021;
originally announced June 2021.
-
Achieving Equalized Odds by Resampling Sensitive Attributes
Authors:
Yaniv Romano,
Stephen Bates,
Emmanuel J. Candès
Abstract:
We present a flexible framework for learning predictive models that approximately satisfy the equalized odds notion of fairness. This is achieved by introducing a general discrepancy functional that rigorously quantifies violations of this criterion. This differentiable functional is used as a penalty driving the model parameters towards equalized odds. To rigorously evaluate fitted models, we dev…
▽ More
We present a flexible framework for learning predictive models that approximately satisfy the equalized odds notion of fairness. This is achieved by introducing a general discrepancy functional that rigorously quantifies violations of this criterion. This differentiable functional is used as a penalty driving the model parameters towards equalized odds. To rigorously evaluate fitted models, we develop a formal hypothesis test to detect whether a prediction rule violates this property, the first such test in the literature. Both the model fitting and hypothesis testing leverage a resampled version of the sensitive attribute obeying equalized odds, by construction. We demonstrate the applicability and validity of the proposed framework both in regression and multi-class classification problems, reporting improved performance over state-of-the-art methods. Lastly, we show how to incorporate techniques for equitable uncertainty quantification---unbiased for each group under study---to communicate the results of the data analysis in exact terms.
△ Less
Submitted 7 June, 2020;
originally announced June 2020.
-
With Malice Towards None: Assessing Uncertainty via Equalized Coverage
Authors:
Yaniv Romano,
Rina Foygel Barber,
Chiara Sabatti,
Emmanuel J. Candès
Abstract:
An important factor to guarantee a fair use of data-driven recommendation systems is that we should be able to communicate their uncertainty to decision makers. This can be accomplished by constructing prediction intervals, which provide an intuitive measure of the limits of predictive performance. To support equitable treatment, we force the construction of such intervals to be unbiased in the se…
▽ More
An important factor to guarantee a fair use of data-driven recommendation systems is that we should be able to communicate their uncertainty to decision makers. This can be accomplished by constructing prediction intervals, which provide an intuitive measure of the limits of predictive performance. To support equitable treatment, we force the construction of such intervals to be unbiased in the sense that their coverage must be equal across all protected groups of interest. We present an operational methodology that achieves this goal by offering rigorous distribution-free coverage guarantees holding in finite samples. Our methodology, equalized coverage, is flexible as it can be viewed as a wrapper around any predictive algorithm. We test the applicability of the proposed framework on real data, demonstrating that equalized coverage constructs unbiased prediction intervals, unlike competitive methods.
△ Less
Submitted 15 August, 2019;
originally announced August 2019.
-
Dual-Reference Design for Holographic Coherent Diffraction Imaging
Authors:
David A. Barmherzig,
Ju Sun,
Emmanuel J. Candès,
T. J. Lane,
Po-Nan Li
Abstract:
A new reference design is introduced for holographic coherent diffraction imaging. This consists in two references - "block" and "pinhole" shaped regions - placed adjacent to the imaging specimen. An efficient recovery algorithm is provided for the resulting holographic phase retrieval problem, which is based on solving a structured, overdetermined linear system. Analysis of the expected recovery…
▽ More
A new reference design is introduced for holographic coherent diffraction imaging. This consists in two references - "block" and "pinhole" shaped regions - placed adjacent to the imaging specimen. An efficient recovery algorithm is provided for the resulting holographic phase retrieval problem, which is based on solving a structured, overdetermined linear system. Analysis of the expected recovery error on noisy data, which is contaminated by Poisson shot noise, shows that this simple modification synergizes the individual references and hence leads to uniformly superior performance over single-reference schemes. Numerical experiments on simulated data confirm the theoretical prediction, and the proposed dual-reference scheme achieves a smaller recovery error than leading single-reference schemes.
△ Less
Submitted 25 June, 2019; v1 submitted 7 February, 2019;
originally announced February 2019.
-
Holographic Phase Retrieval and Reference Design
Authors:
David A. Barmherzig,
Ju Sun,
T. J. Lane,
Po-Nan Li,
Emmanuel J. Candès
Abstract:
A general mathematical framework and recovery algorithm is presented for the holographic phase retrieval problem. In this problem, which arises in holographic coherent diffraction imaging, a "reference" portion of the signal to be recovered via phase retrieval is a priori known from experimental design. A generic formula is also derived for the expected recovery error when the measurement data is…
▽ More
A general mathematical framework and recovery algorithm is presented for the holographic phase retrieval problem. In this problem, which arises in holographic coherent diffraction imaging, a "reference" portion of the signal to be recovered via phase retrieval is a priori known from experimental design. A generic formula is also derived for the expected recovery error when the measurement data is corrupted by Poisson shot noise. This facilitates an optimization perspective towards reference design and analysis. We employ this optimization perspective towards quantifying the performance of various reference choices.
△ Less
Submitted 21 April, 2019; v1 submitted 18 January, 2019;
originally announced January 2019.
-
The Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square
Authors:
Pragya Sur,
Yuxin Chen,
Emmanuel J. Candès
Abstract:
Logistic regression is used thousands of times a day to fit data, predict future outcomes, and assess the statistical significance of explanatory variables. When used for the purpose of statistical inference, logistic models produce p-values for the regression coefficients by using an approximation to the distribution of the likelihood-ratio test. Indeed, Wilks' theorem asserts that whenever we ha…
▽ More
Logistic regression is used thousands of times a day to fit data, predict future outcomes, and assess the statistical significance of explanatory variables. When used for the purpose of statistical inference, logistic models produce p-values for the regression coefficients by using an approximation to the distribution of the likelihood-ratio test. Indeed, Wilks' theorem asserts that whenever we have a fixed number $p$ of variables, twice the log-likelihood ratio (LLR) $2Λ$ is distributed as a $χ^2_k$ variable in the limit of large sample sizes $n$; here, $k$ is the number of variables being tested. In this paper, we prove that when $p$ is not negligible compared to $n$, Wilks' theorem does not hold and that the chi-square approximation is grossly incorrect; in fact, this approximation produces p-values that are far too small (under the null hypothesis). Assume that $n$ and $p$ grow large in such a way that $p/n\rightarrowκ$ for some constant $κ< 1/2$. We prove that for a class of logistic models, the LLR converges to a rescaled chi-square, namely, $2Λ~\stackrel{\mathrm{d}}{\rightarrow}~α(κ)χ_k^2$, where the scaling factor $α(κ)$ is greater than one as soon as the dimensionality ratio $κ$ is positive. Hence, the LLR is larger than classically assumed. For instance, when $κ=0.3$, $α(κ)\approx1.5$. In general, we show how to compute the scaling factor by solving a nonlinear system of two equations with two unknowns. Our mathematical arguments are involved and use techniques from approximate message passing theory, non-asymptotic random matrix theory and convex geometry. We also complement our mathematical study by showing that the new limiting distribution is accurate for finite sample sizes. Finally, all the results from this paper extend to some other regression models such as the probit regression model.
△ Less
Submitted 5 June, 2017;
originally announced June 2017.
-
The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences
Authors:
Yuxin Chen,
Emmanuel Candes
Abstract:
Various applications involve assigning discrete label values to a collection of objects based on some pairwise noisy data. Due to the discrete---and hence nonconvex---structure of the problem, computing the optimal assignment (e.g.~maximum likelihood assignment) becomes intractable at first sight. This paper makes progress towards efficient computation by focusing on a concrete joint alignment pro…
▽ More
Various applications involve assigning discrete label values to a collection of objects based on some pairwise noisy data. Due to the discrete---and hence nonconvex---structure of the problem, computing the optimal assignment (e.g.~maximum likelihood assignment) becomes intractable at first sight. This paper makes progress towards efficient computation by focusing on a concrete joint alignment problem---that is, the problem of recovering $n$ discrete variables $x_i \in \{1,\cdots, m\}$, $1\leq i\leq n$ given noisy observations of their modulo differences $\{x_i - x_j~\mathsf{mod}~m\}$. We propose a low-complexity and model-free procedure, which operates in a lifted space by representing distinct label values in orthogonal directions, and which attempts to optimize quadratic functions over hypercubes. Starting with a first guess computed via a spectral method, the algorithm successively refines the iterates via projected power iterations. We prove that for a broad class of statistical models, the proposed projected power method makes no error---and hence converges to the maximum likelihood estimate---in a suitable regime. Numerical experiments have been carried out on both synthetic and real data to demonstrate the practicality of our algorithm. We expect this algorithmic framework to be effective for a broad range of discrete assignment problems.
△ Less
Submitted 7 December, 2017; v1 submitted 19 September, 2016;
originally announced September 2016.
-
False Discoveries Occur Early on the Lasso Path
Authors:
Weijie Su,
Malgorzata Bogdan,
Emmanuel Candes
Abstract:
In regression settings where explanatory variables have very low correlations and there are relatively few effects, each of large magnitude, we expect the Lasso to find the important variables with few errors, if any. This paper shows that in a regime of linear sparsity---meaning that the fraction of variables with a non-vanishing effect tends to a constant, however small---this cannot really be t…
▽ More
In regression settings where explanatory variables have very low correlations and there are relatively few effects, each of large magnitude, we expect the Lasso to find the important variables with few errors, if any. This paper shows that in a regime of linear sparsity---meaning that the fraction of variables with a non-vanishing effect tends to a constant, however small---this cannot really be the case, even when the design variables are stochastically independent. We demonstrate that true features and null features are always interspersed on the Lasso path, and that this phenomenon occurs no matter how strong the effect sizes are. We derive a sharp asymptotic trade-off between false and true positive rates or, equivalently, between measures of type I and type II errors along the Lasso path. This trade-off states that if we ever want to achieve a type II error (false negative rate) under a critical value, then anywhere on the Lasso path the type I error (false positive rate) will need to exceed a given threshold so that we can never have both errors at a low level at the same time. Our analysis uses tools from approximate message passing (AMP) theory as well as novel elements to deal with a possibly adaptive selection of the Lasso regularizing parameter.
△ Less
Submitted 14 September, 2016; v1 submitted 5 November, 2015;
originally announced November 2015.
-
Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems
Authors:
Yuxin Chen,
Emmanuel J. Candes
Abstract:
We consider the fundamental problem of solving quadratic systems of equations in $n$ variables, where $y_i = |\langle \boldsymbol{a}_i, \boldsymbol{x} \rangle|^2$, $i = 1, \ldots, m$ and $\boldsymbol{x} \in \mathbb{R}^n$ is unknown. We propose a novel method, which starting with an initial guess computed by means of a spectral method, proceeds by minimizing a nonconvex functional as in the Wirting…
▽ More
We consider the fundamental problem of solving quadratic systems of equations in $n$ variables, where $y_i = |\langle \boldsymbol{a}_i, \boldsymbol{x} \rangle|^2$, $i = 1, \ldots, m$ and $\boldsymbol{x} \in \mathbb{R}^n$ is unknown. We propose a novel method, which starting with an initial guess computed by means of a spectral method, proceeds by minimizing a nonconvex functional as in the Wirtinger flow approach. There are several key distinguishing features, most notably, a distinct objective functional and novel update rules, which operate in an adaptive fashion and drop terms bearing too much influence on the search direction. These careful selection rules provide a tighter initial guess, better descent directions, and thus enhanced practical performance. On the theoretical side, we prove that for certain unstructured models of quadratic systems, our algorithms return the correct solution in linear time, i.e. in time proportional to reading the data $\{\boldsymbol{a}_i\}$ and $\{y_i\}$ as soon as the ratio $m/n$ between the number of equations and unknowns exceeds a fixed numerical constant. We extend the theory to deal with noisy systems in which we only have $y_i \approx |\langle \boldsymbol{a}_i, \boldsymbol{x} \rangle|^2$ and prove that our algorithms achieve a statistical accuracy, which is nearly un-improvable. We complement our theoretical study with numerical examples showing that solving random quadratic systems is both computationally and statistically not much harder than solving linear systems of the same size---hence the title of this paper. For instance, we demonstrate empirically that the computational cost of our algorithm is about four times that of solving a least-squares problem of the same size.
△ Less
Submitted 22 March, 2016; v1 submitted 19 May, 2015;
originally announced May 2015.
-
Super-Resolution of Positive Sources: the Discrete Setup
Authors:
Veniamin I. Morgenshtern,
Emmanuel J. Candes
Abstract:
In single-molecule microscopy it is necessary to locate with high precision point sources from noisy observations of the spectrum of the signal at frequencies capped by $f_c$, which is just about the frequency of natural light. This paper rigorously establishes that this super-resolution problem can be solved via linear programming in a stable manner. We prove that the quality of the reconstructio…
▽ More
In single-molecule microscopy it is necessary to locate with high precision point sources from noisy observations of the spectrum of the signal at frequencies capped by $f_c$, which is just about the frequency of natural light. This paper rigorously establishes that this super-resolution problem can be solved via linear programming in a stable manner. We prove that the quality of the reconstruction crucially depends on the Rayleigh regularity of the support of the signal; that is, on the maximum number of sources that can occur within a square of side length about $1/f_c$. The theoretical performance guarantee is complemented with a converse result showing that our simple convex program convex is nearly optimal. Finally, numerical experiments illustrate our methods.
△ Less
Submitted 2 April, 2015;
originally announced April 2015.
-
SLOPE is Adaptive to Unknown Sparsity and Asymptotically Minimax
Authors:
Weijie Su,
Emmanuel Candes
Abstract:
We consider high-dimensional sparse regression problems in which we observe $y = X β+ z$, where $X$ is an $n \times p$ design matrix and $z$ is an $n$-dimensional vector of independent Gaussian errors, each with variance $σ^2$. Our focus is on the recently introduced SLOPE estimator ((Bogdan et al., 2014)), which regularizes the least-squares estimates with the rank-dependent penalty…
▽ More
We consider high-dimensional sparse regression problems in which we observe $y = X β+ z$, where $X$ is an $n \times p$ design matrix and $z$ is an $n$-dimensional vector of independent Gaussian errors, each with variance $σ^2$. Our focus is on the recently introduced SLOPE estimator ((Bogdan et al., 2014)), which regularizes the least-squares estimates with the rank-dependent penalty $\sum_{1 \le i \le p} λ_i |\hat β|_{(i)}$, where $|\hat β|_{(i)}$ is the $i$th largest magnitude of the fitted coefficients. Under Gaussian designs, where the entries of $X$ are i.i.d.~$\mathcal{N}(0, 1/n)$, we show that SLOPE, with weights $λ_i$ just about equal to $σ\cdot Φ^{-1}(1-iq/(2p))$ ($Φ^{-1}(α)$ is the $α$th quantile of a standard normal and $q$ is a fixed number in $(0,1)$) achieves a squared error of estimation obeying \[ \sup_{\| β\|_0 \le k} \,\, \mathbb{P} \left(\| \hatβ_{\text{SLOPE}} - β\|^2 > (1+ε) \, 2σ^2 k \log(p/k) \right) \longrightarrow 0 \] as the dimension $p$ increases to $\infty$, and where $ε> 0$ is an arbitrary small constant. This holds under a weak assumption on the $\ell_0$-sparsity level, namely, $k/p \rightarrow 0$ and $(k\log p)/n \rightarrow 0$, and is sharp in the sense that this is the best possible error any estimator can achieve. A remarkable feature is that SLOPE does not require any knowledge of the degree of sparsity, and yet automatically adapts to yield optimal total squared errors over a wide range of $\ell_0$-sparsity classes. We are not aware of any other estimator with this property.
△ Less
Submitted 23 September, 2015; v1 submitted 29 March, 2015;
originally announced March 2015.
-
Phase Retrieval via Wirtinger Flow: Theory and Algorithms
Authors:
Emmanuel Candes,
Xiaodong Li,
Mahdi Soltanolkotabi
Abstract:
We study the problem of recovering the phase from magnitude measurements; specifically, we wish to reconstruct a complex-valued signal x of C^n about which we have phaseless samples of the form y_r = |< a_r,x >|^2, r = 1,2,...,m (knowledge of the phase of these samples would yield a linear system). This paper develops a non-convex formulation of the phase retrieval problem as well as a concrete so…
▽ More
We study the problem of recovering the phase from magnitude measurements; specifically, we wish to reconstruct a complex-valued signal x of C^n about which we have phaseless samples of the form y_r = |< a_r,x >|^2, r = 1,2,...,m (knowledge of the phase of these samples would yield a linear system). This paper develops a non-convex formulation of the phase retrieval problem as well as a concrete solution algorithm. In a nutshell, this algorithm starts with a careful initialization obtained by means of a spectral method, and then refines this initial estimate by iteratively applying novel update rules, which have low computational complexity, much like in a gradient descent scheme. The main contribution is that this algorithm is shown to rigorously allow the exact retrieval of phase information from a nearly minimal number of random measurements. Indeed, the sequence of successive iterates provably converges to the solution at a geometric rate so that the proposed scheme is efficient both in terms of computational and data resources. In theory, a variation on this scheme leads to a near-linear time algorithm for a physically realizable model based on coded diffraction patterns. We illustrate the effectiveness of our methods with various experiments on image data. Underlying our analysis are insights for the analysis of non-convex optimization schemes that may have implications for computational problems beyond phase retrieval.
△ Less
Submitted 24 November, 2015; v1 submitted 3 July, 2014;
originally announced July 2014.
-
Phase Retrieval from Coded Diffraction Patterns
Authors:
Emmanuel Candes,
Xiaodong Li,
Mahdi Soltanolkotabi
Abstract:
This paper considers the question of recovering the phase of an object from intensity-only measurements, a problem which naturally appears in X-ray crystallography and related disciplines. We study a physically realistic setup where one can modulate the signal of interest and then collect the intensity of its diffraction pattern, each modulation thereby producing a sort of coded diffraction patter…
▽ More
This paper considers the question of recovering the phase of an object from intensity-only measurements, a problem which naturally appears in X-ray crystallography and related disciplines. We study a physically realistic setup where one can modulate the signal of interest and then collect the intensity of its diffraction pattern, each modulation thereby producing a sort of coded diffraction pattern. We show that PhaseLift, a recent convex programming technique, recovers the phase information exactly from a number of random modulations, which is polylogarithmic in the number of unknowns. Numerical experiments with noiseless and noisy data complement our theoretical analysis and illustrate our approach.
△ Less
Submitted 5 November, 2013; v1 submitted 11 October, 2013;
originally announced October 2013.
-
Hyperspectral fluorescence microscopy based on Compressive Sampling
Authors:
Makhlad Chahid,
Jerome Bobin,
Hamed Shams Mousavi,
Emmanuel Candes,
Maxime Dahan,
Vincent Studer
Abstract:
The mathematical theory of compressed sensing (CS) asserts that one can acquire signals from measurements whose rate is much lower than the total bandwidth. Whereas the CS theory is now well developed, challenges concerning hardware implementations of CS-based acquisition devices-especially in optics-have only started being addressed. This paper presents an implementation of compressive sensing in…
▽ More
The mathematical theory of compressed sensing (CS) asserts that one can acquire signals from measurements whose rate is much lower than the total bandwidth. Whereas the CS theory is now well developed, challenges concerning hardware implementations of CS-based acquisition devices-especially in optics-have only started being addressed. This paper presents an implementation of compressive sensing in fluorescence microscopy and its applications to biomedical imaging. Our CS microscope combines a dynamic structured wide-field illumination and a fast and sensitive single-point fluorescence detection to enable reconstructions of images of fluorescent beads, cells, and tissues with undersampling ratios (between the number of pixels and number of measurements) up to 32. We further demonstrate a hyperspectral mode and record images with 128 spectral channels and undersampling ratios up to 64, illustrating the potential benefits of CS acquisition for higher-dimensional signals, which typically exhibits extreme redundancy. Altogether, our results emphasize the interest of CS schemes for acquisition at a significantly reduced rate and point to some remaining challenges for CS fluorescence microscopy.
△ Less
Submitted 17 July, 2013;
originally announced July 2013.
-
Robust subspace clustering
Authors:
Mahdi Soltanolkotabi,
Ehsan Elhamifar,
Emmanuel J. Candès
Abstract:
Subspace clustering refers to the task of finding a multi-subspace representation that best fits a collection of points taken from a high-dimensional space. This paper introduces an algorithm inspired by sparse subspace clustering (SSC) [In IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2009) 2790-2797] to cluster noisy data, and develops some novel theory demonstrating its corr…
▽ More
Subspace clustering refers to the task of finding a multi-subspace representation that best fits a collection of points taken from a high-dimensional space. This paper introduces an algorithm inspired by sparse subspace clustering (SSC) [In IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2009) 2790-2797] to cluster noisy data, and develops some novel theory demonstrating its correctness. In particular, the theory uses ideas from geometric functional analysis to show that the algorithm can accurately recover the underlying subspaces under minimal requirements on their orientation, and on the number of samples per subspace. Synthetic as well as real data experiments complement our theoretical study, illustrating our approach and demonstrating its effectiveness.
△ Less
Submitted 23 May, 2014; v1 submitted 11 January, 2013;
originally announced January 2013.
-
Discussion: Latent variable graphical model selection via convex optimization
Authors:
Emmanuel J. Candés,
Mahdi Soltanolkotabi
Abstract:
Discussion of "Latent variable graphical model selection via convex optimization" by Venkat Chandrasekaran, Pablo A. Parrilo and Alan S. Willsky [arXiv:1008.1290].
Discussion of "Latent variable graphical model selection via convex optimization" by Venkat Chandrasekaran, Pablo A. Parrilo and Alan S. Willsky [arXiv:1008.1290].
△ Less
Submitted 5 November, 2012;
originally announced November 2012.
-
Super-Resolution from Noisy Data
Authors:
Emmanuel Candes,
Carlos Fernandez-Granda
Abstract:
This paper studies the recovery of a superposition of point sources from noisy bandlimited data. In the fewest possible words, we only have information about the spectrum of an object in a low-frequency band bounded by a certain cut-off frequency and seek to obtain a higher resolution estimate by extrapolating the spectrum up to a higher frequency. We show that as long as the sources are separated…
▽ More
This paper studies the recovery of a superposition of point sources from noisy bandlimited data. In the fewest possible words, we only have information about the spectrum of an object in a low-frequency band bounded by a certain cut-off frequency and seek to obtain a higher resolution estimate by extrapolating the spectrum up to a higher frequency. We show that as long as the sources are separated by twice the inverse of the cut-off frequency, solving a simple convex program produces a stable estimate in the sense that the approximation error between the higher-resolution reconstruction and the truth is proportional to the noise level times the square of the super-resolution factor (SRF), which is the ratio between the desired high frequency and the cut-off frequency of the data.
△ Less
Submitted 9 July, 2013; v1 submitted 1 November, 2012;
originally announced November 2012.
-
A single-photon sampling architecture for solid-state imaging
Authors:
Ewout van den Berg,
Emmanuel Candes,
Garry Chinn,
Craig Levin,
Peter Olcott,
Carlos Sing-Long
Abstract:
Advances in solid-state technology have enabled the development of silicon photomultiplier sensor arrays capable of sensing individual photons. Combined with high-frequency time-to-digital converters (TDCs), this technology opens up the prospect of sensors capable of recording with high accuracy both the time and location of each detected photon. Such a capability could lead to significant improve…
▽ More
Advances in solid-state technology have enabled the development of silicon photomultiplier sensor arrays capable of sensing individual photons. Combined with high-frequency time-to-digital converters (TDCs), this technology opens up the prospect of sensors capable of recording with high accuracy both the time and location of each detected photon. Such a capability could lead to significant improvements in imaging accuracy, especially for applications operating with low photon fluxes such as LiDAR and positron emission tomography.
The demands placed on on-chip readout circuitry imposes stringent trade-offs between fill factor and spatio-temporal resolution, causing many contemporary designs to severely underutilize the technology's full potential. Concentrating on the low photon flux setting, this paper leverages results from group testing and proposes an architecture for a highly efficient readout of pixels using only a small number of TDCs, thereby also reducing both cost and power consumption. The design relies on a multiplexing technique based on binary interconnection matrices. We provide optimized instances of these matrices for various sensor parameters and give explicit upper and lower bounds on the number of TDCs required to uniquely decode a given maximum number of simultaneous photon arrivals.
To illustrate the strength of the proposed architecture, we note a typical digitization result of a 120x120 photodiode sensor on a 30um x 30um pitch with a 40ps time resolution and an estimated fill factor of approximately 70%, using only 161 TDCs. The design guarantees registration and unique recovery of up to 4 simultaneous photon arrivals using a fast decoding algorithm. In a series of realistic simulations of scintillation events in clinical positron emission tomography the design was able to recover the spatio-temporal location of 98.6% of all photons that caused pixel firings.
△ Less
Submitted 11 September, 2012;
originally announced September 2012.
-
Solving Quadratic Equations via PhaseLift when There Are About As Many Equations As Unknowns
Authors:
Emmanuel J. Candes,
Xiaodong Li
Abstract:
This note shows that we can recover a complex vector x in C^n exactly from on the order of n quadratic equations of the form |<a_i, x>|^2 = b_i, i = 1, ..., m, by using a semidefinite program known as PhaseLift. This improves upon earlier bounds in [3], which required the number of equations to be at least on the order of n log n. We also demonstrate optimal recovery results from noisy quadratic m…
▽ More
This note shows that we can recover a complex vector x in C^n exactly from on the order of n quadratic equations of the form |<a_i, x>|^2 = b_i, i = 1, ..., m, by using a semidefinite program known as PhaseLift. This improves upon earlier bounds in [3], which required the number of equations to be at least on the order of n log n. We also demonstrate optimal recovery results from noisy quadratic measurements; these results are much sharper than previously known results.
△ Less
Submitted 17 September, 2012; v1 submitted 30 August, 2012;
originally announced August 2012.
-
Towards a Mathematical Theory of Super-Resolution
Authors:
Emmanuel Candes,
Carlos Fernandez-Granda
Abstract:
This paper develops a mathematical theory of super-resolution. Broadly speaking, super-resolution is the problem of recovering the fine details of an object---the high end of its spectrum---from coarse scale information only---from samples at the low end of the spectrum. Suppose we have many point sources at unknown locations in $[0,1]$ and with unknown complex-valued amplitudes. We only observe F…
▽ More
This paper develops a mathematical theory of super-resolution. Broadly speaking, super-resolution is the problem of recovering the fine details of an object---the high end of its spectrum---from coarse scale information only---from samples at the low end of the spectrum. Suppose we have many point sources at unknown locations in $[0,1]$ and with unknown complex-valued amplitudes. We only observe Fourier samples of this object up until a frequency cut-off $f_c$. We show that one can super-resolve these point sources with infinite precision---i.e. recover the exact locations and amplitudes---by solving a simple convex optimization problem, which can essentially be reformulated as a semidefinite program. This holds provided that the distance between sources is at least $2/f_c$. This result extends to higher dimensions and other models. In one dimension for instance, it is possible to recover a piecewise smooth function by resolving the discontinuity points with infinite precision as well. We also show that the theory and methods are robust to noise. In particular, in the discrete setting we develop some theoretical results explaining how the accuracy of the super-resolved signal is expected to degrade when both the noise level and the {\em super-resolution factor} vary.
△ Less
Submitted 13 November, 2012; v1 submitted 27 March, 2012;
originally announced March 2012.
-
A geometric analysis of subspace clustering with outliers
Authors:
Mahdi Soltanolkotabi,
Emmanuel J. Candés
Abstract:
This paper considers the problem of clustering a collection of unlabeled data points assumed to lie near a union of lower-dimensional planes. As is common in computer vision or unsupervised learning applications, we do not know in advance how many subspaces there are nor do we have any information about their dimensions. We develop a novel geometric analysis of an algorithm named sparse subspace c…
▽ More
This paper considers the problem of clustering a collection of unlabeled data points assumed to lie near a union of lower-dimensional planes. As is common in computer vision or unsupervised learning applications, we do not know in advance how many subspaces there are nor do we have any information about their dimensions. We develop a novel geometric analysis of an algorithm named sparse subspace clustering (SSC) [In IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009 (2009) 2790-2797. IEEE], which significantly broadens the range of problems where it is provably effective. For instance, we show that SSC can recover multiple subspaces, each of dimension comparable to the ambient dimension. We also prove that SSC can correctly cluster data points even when the subspaces of interest intersect. Further, we develop an extension of SSC that succeeds when the data set is corrupted with possibly overwhelmingly many outliers. Underlying our analysis are clear geometric insights, which may bear on other sparse recovery problems. A numerical study complements our theoretical analysis and demonstrates the effectiveness of these methods.
△ Less
Submitted 30 January, 2013; v1 submitted 19 December, 2011;
originally announced December 2011.
-
On the Fundamental Limits of Adaptive Sensing
Authors:
Ery Arias-Castro,
Emmanuel J. Candes,
Mark Davenport
Abstract:
Suppose we can sequentially acquire arbitrary linear measurements of an n-dimensional vector x resulting in the linear model y = Ax + z, where z represents measurement noise. If the signal is known to be sparse, one would expect the following folk theorem to be true: choosing an adaptive strategy which cleverly selects the next row of A based on what has been previously observed should do far bett…
▽ More
Suppose we can sequentially acquire arbitrary linear measurements of an n-dimensional vector x resulting in the linear model y = Ax + z, where z represents measurement noise. If the signal is known to be sparse, one would expect the following folk theorem to be true: choosing an adaptive strategy which cleverly selects the next row of A based on what has been previously observed should do far better than a nonadaptive strategy which sets the rows of A ahead of time, thus not trying to learn anything about the signal in between observations. This paper shows that the folk theorem is false. We prove that the advantages offered by clever adaptive strategies and sophisticated estimation procedures---no matter how intractable---over classical compressed acquisition/recovery schemes are, in general, minimal.
△ Less
Submitted 13 August, 2012; v1 submitted 20 November, 2011;
originally announced November 2011.
-
PhaseLift: Exact and Stable Signal Recovery from Magnitude Measurements via Convex Programming
Authors:
Emmanuel J. Candes,
Thomas Strohmer,
Vladislav Voroninski
Abstract:
Suppose we wish to recover a signal x in C^n from m intensity measurements of the form |<x,z_i>|^2, i = 1, 2,..., m; that is, from data in which phase information is missing. We prove that if the vectors z_i are sampled independently and uniformly at random on the unit sphere, then the signal x can be recovered exactly (up to a global phase factor) by solving a convenient semidefinite program---a…
▽ More
Suppose we wish to recover a signal x in C^n from m intensity measurements of the form |<x,z_i>|^2, i = 1, 2,..., m; that is, from data in which phase information is missing. We prove that if the vectors z_i are sampled independently and uniformly at random on the unit sphere, then the signal x can be recovered exactly (up to a global phase factor) by solving a convenient semidefinite program---a trace-norm minimization problem; this holds with large probability provided that m is on the order of n log n, and without any assumption about the signal whatsoever. This novel result demonstrates that in some instances, the combinatorial phase retrieval problem can be solved by convex programming techniques. Finally, we also prove that our methodology is robust vis a vis additive noise.
△ Less
Submitted 21 September, 2011;
originally announced September 2011.
-
Phase Retrieval via Matrix Completion
Authors:
Emmanuel J. Candes,
Yonina Eldar,
Thomas Strohmer,
Vlad Voroninski
Abstract:
This paper develops a novel framework for phase retrieval, a problem which arises in X-ray crystallography, diffraction imaging, astronomical imaging and many other applications. Our approach combines multiple structured illuminations together with ideas from convex programming to recover the phase from intensity measurements, typically from the modulus of the diffracted wave. We demonstrate empir…
▽ More
This paper develops a novel framework for phase retrieval, a problem which arises in X-ray crystallography, diffraction imaging, astronomical imaging and many other applications. Our approach combines multiple structured illuminations together with ideas from convex programming to recover the phase from intensity measurements, typically from the modulus of the diffracted wave. We demonstrate empirically that any complex-valued object can be recovered from the knowledge of the magnitude of just a few diffracted patterns by solving a simple convex optimization problem inspired by the recent literature on matrix completion. More importantly, we also demonstrate that our noise-aware algorithms are stable in the sense that the reconstruction degrades gracefully as the signal-to-noise ratio decreases. Finally, we introduce some theory showing that one can design very simple structured illumination patterns such that three diffracted figures uniquely determine the phase of the object we wish to recover.
△ Less
Submitted 20 September, 2011; v1 submitted 2 September, 2011;
originally announced September 2011.
-
Simple Bounds for Recovering Low-complexity Models
Authors:
Emmanuel Candes,
Benjamin Recht
Abstract:
This note presents a unified analysis of the recovery of simple objects from random linear measurements. When the linear functionals are Gaussian, we show that an s-sparse vector in R^n can be efficiently recovered from 2s log n measurements with high probability and a rank r, n by n matrix can be efficiently recovered from r(6n-5r) with high probability. For sparse vectors, this is within an addi…
▽ More
This note presents a unified analysis of the recovery of simple objects from random linear measurements. When the linear functionals are Gaussian, we show that an s-sparse vector in R^n can be efficiently recovered from 2s log n measurements with high probability and a rank r, n by n matrix can be efficiently recovered from r(6n-5r) with high probability. For sparse vectors, this is within an additive factor of the best known nonasymptotic bounds. For low-rank matrices, this matches the best known bounds. We present a parallel analysis for block sparse vectors obtaining similarly tight bounds. In the case of sparse and block sparse signals, we additionally demonstrate that our bounds are only slightly weakened when the measurement map is a random sign matrix. Our results are based on analyzing a particular dual point which certifies optimality conditions of the respective convex programming problem. Our calculations rely only on standard large deviation inequalities and our analysis is self-contained.
△ Less
Submitted 28 February, 2012; v1 submitted 7 June, 2011;
originally announced June 2011.
-
How well can we estimate a sparse vector?
Authors:
Emmanuel J. Candès,
Mark A. Davenport
Abstract:
The estimation of a sparse vector in the linear model is a fundamental problem in signal processing, statistics, and compressive sensing. This paper establishes a lower bound on the mean-squared error, which holds regardless of the sensing/design matrix being used and regardless of the estimation procedure. This lower bound very nearly matches the known upper bound one gets by taking a random proj…
▽ More
The estimation of a sparse vector in the linear model is a fundamental problem in signal processing, statistics, and compressive sensing. This paper establishes a lower bound on the mean-squared error, which holds regardless of the sensing/design matrix being used and regardless of the estimation procedure. This lower bound very nearly matches the known upper bound one gets by taking a random projection of the sparse vector followed by an $\ell_1$ estimation procedure such as the Dantzig selector. In this sense, compressive sensing techniques cannot essentially be improved.
△ Less
Submitted 1 March, 2013; v1 submitted 27 April, 2011;
originally announced April 2011.
-
A probabilistic and RIPless theory of compressed sensing
Authors:
Emmanuel J. Candes,
Yaniv Plan
Abstract:
This paper introduces a simple and very general theory of compressive sensing. In this theory, the sensing mechanism simply selects sensing vectors independently at random from a probability distribution F; it includes all models - e.g. Gaussian, frequency measurements - discussed in the literature, but also provides a framework for new measurement strategies as well. We prove that if the probabil…
▽ More
This paper introduces a simple and very general theory of compressive sensing. In this theory, the sensing mechanism simply selects sensing vectors independently at random from a probability distribution F; it includes all models - e.g. Gaussian, frequency measurements - discussed in the literature, but also provides a framework for new measurement strategies as well. We prove that if the probability distribution F obeys a simple incoherence property and an isotropy property, one can faithfully recover approximately sparse signals from a minimal number of noisy measurements. The novelty is that our recovery results do not require the restricted isometry property (RIP) - they make use of a much weaker notion - or a random model for the signal. As an example, the paper shows that a signal with s nonzero entries can be faithfully recovered from about s log n Fourier coefficients that are contaminated with noise.
△ Less
Submitted 19 November, 2010; v1 submitted 16 November, 2010;
originally announced November 2010.
-
Compressed Sensing with Coherent and Redundant Dictionaries
Authors:
Emmanuel J. Candes,
Yonina C. Eldar,
Deanna Needell,
Paige Randall
Abstract:
This article presents novel results concerning the recovery of signals from undersampled data in the common situation where such signals are not sparse in an orthonormal basis or incoherent dictionary, but in a truly redundant dictionary. This work thus bridges a gap in the literature and shows not only that compressed sensing is viable in this context, but also that accurate recovery is possible…
▽ More
This article presents novel results concerning the recovery of signals from undersampled data in the common situation where such signals are not sparse in an orthonormal basis or incoherent dictionary, but in a truly redundant dictionary. This work thus bridges a gap in the literature and shows not only that compressed sensing is viable in this context, but also that accurate recovery is possible via an L1-analysis optimization problem. We introduce a condition on the measurement/sensing matrix, which is a natural generalization of the now well-known restricted isometry property, and which guarantees accurate recovery of signals that are nearly sparse in (possibly) highly overcomplete and coherent dictionaries. This condition imposes no incoherence restriction on the dictionary and our results may be the first of this kind. We discuss practical examples and the implications of our results on those applications, and complement our study by demonstrating the potential of L1-analysis for such problems.
△ Less
Submitted 4 December, 2010; v1 submitted 14 May, 2010;
originally announced May 2010.
-
Stable Principal Component Pursuit
Authors:
Zihan Zhou,
Xiaodong Li,
John Wright,
Emmanuel Candes,
Yi Ma
Abstract:
In this paper, we study the problem of recovering a low-rank matrix (the principal components) from a high-dimensional data matrix despite both small entry-wise noise and gross sparse errors. Recently, it has been shown that a convex program, named Principal Component Pursuit (PCP), can recover the low-rank matrix when the data matrix is corrupted by gross sparse errors. We further prove that th…
▽ More
In this paper, we study the problem of recovering a low-rank matrix (the principal components) from a high-dimensional data matrix despite both small entry-wise noise and gross sparse errors. Recently, it has been shown that a convex program, named Principal Component Pursuit (PCP), can recover the low-rank matrix when the data matrix is corrupted by gross sparse errors. We further prove that the solution to a related convex program (a relaxed PCP) gives an estimate of the low-rank matrix that is simultaneously stable to small entrywise noise and robust to gross sparse errors. More precisely, our result shows that the proposed convex program recovers the low-rank matrix even though a positive fraction of its entries are arbitrarily corrupted, with an error bound proportional to the noise level. We present simulation results to support our result and demonstrate that the new convex program accurately recovers the principal components (the low-rank matrix) under quite broad conditions. To our knowledge, this is the first result that shows the classical Principal Component Analysis (PCA), optimal for small i.i.d. noise, can be made robust to gross sparse errors; or the first that shows the newly proposed PCP can be made stable to small entry-wise perturbations.
△ Less
Submitted 13 January, 2010;
originally announced January 2010.
-
Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit
Authors:
Arvind Ganesh,
John Wright,
Xiaodong Li,
Emmanuel J. Candes,
Yi Ma
Abstract:
We consider the problem of recovering a low-rank matrix when some of its entries, whose locations are not known a priori, are corrupted by errors of arbitrarily large magnitude. It has recently been shown that this problem can be solved efficiently and effectively by a convex program named Principal Component Pursuit (PCP), provided that the fraction of corrupted entries and the rank of the matr…
▽ More
We consider the problem of recovering a low-rank matrix when some of its entries, whose locations are not known a priori, are corrupted by errors of arbitrarily large magnitude. It has recently been shown that this problem can be solved efficiently and effectively by a convex program named Principal Component Pursuit (PCP), provided that the fraction of corrupted entries and the rank of the matrix are both sufficiently small. In this paper, we extend that result to show that the same convex program, with a slightly improved weighting parameter, exactly recovers the low-rank matrix even if "almost all" of its entries are arbitrarily corrupted, provided the signs of the errors are random. We corroborate our result with simulations on randomly generated matrices and errors.
△ Less
Submitted 22 January, 2010; v1 submitted 13 January, 2010;
originally announced January 2010.
-
Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements
Authors:
Emmanuel J. Candes,
Yaniv Plan
Abstract:
This paper presents several novel theoretical results regarding the recovery of a low-rank matrix from just a few measurements consisting of linear combinations of the matrix entries. We show that properly constrained nuclear-norm minimization stably recovers a low-rank matrix from a constant number of noisy measurements per degree of freedom; this seems to be the first result of this nature. Fu…
▽ More
This paper presents several novel theoretical results regarding the recovery of a low-rank matrix from just a few measurements consisting of linear combinations of the matrix entries. We show that properly constrained nuclear-norm minimization stably recovers a low-rank matrix from a constant number of noisy measurements per degree of freedom; this seems to be the first result of this nature. Further, the recovery error from noisy data is within a constant of three targets: 1) the minimax risk, 2) an oracle error that would be available if the column space of the matrix were known, and 3) a more adaptive oracle error which would be available with the knowledge of the column space corresponding to the part of the matrix that stands above the noise. Lastly, the error bounds regarding low-rank matrices are extended to provide an error bound when the matrix has full rank with decaying singular values. The analysis in this paper is based on the restricted isometry property (RIP) introduced in [6] for vectors, and in [22] for matrices.
△ Less
Submitted 2 January, 2010;
originally announced January 2010.
-
Robust Principal Component Analysis?
Authors:
Emmanuel J. Candes,
Xiaodong Li,
Yi Ma,
John Wright
Abstract:
This paper is about a curious phenomenon. Suppose we have a data matrix, which is the superposition of a low-rank component and a sparse component. Can we recover each component individually? We prove that under some suitable assumptions, it is possible to recover both the low-rank and the sparse components exactly by solving a very convenient convex program called Principal Component Pursuit; a…
▽ More
This paper is about a curious phenomenon. Suppose we have a data matrix, which is the superposition of a low-rank component and a sparse component. Can we recover each component individually? We prove that under some suitable assumptions, it is possible to recover both the low-rank and the sparse components exactly by solving a very convenient convex program called Principal Component Pursuit; among all feasible decompositions, simply minimize a weighted combination of the nuclear norm and of the L1 norm. This suggests the possibility of a principled approach to robust principal component analysis since our methodology and results assert that one can recover the principal components of a data matrix even though a positive fraction of its entries are arbitrarily corrupted. This extends to the situation where a fraction of the entries are missing as well. We discuss an algorithm for solving this optimization problem, and present applications in the area of video surveillance, where our methodology allows for the detection of objects in a cluttered background, and in the area of face recognition, where it offers a principled way of removing shadows and specularities in images of faces.
△ Less
Submitted 18 December, 2009;
originally announced December 2009.
-
Accurate low-rank matrix recovery from a small number of linear measurements
Authors:
Emmanuel J. Candes,
Yaniv Plan
Abstract:
We consider the problem of recovering a lowrank matrix M from a small number of random linear measurements. A popular and useful example of this problem is matrix completion, in which the measurements reveal the values of a subset of the entries, and we wish to fill in the missing entries (this is the famous Netflix problem). When M is believed to have low rank, one would ideally try to recover…
▽ More
We consider the problem of recovering a lowrank matrix M from a small number of random linear measurements. A popular and useful example of this problem is matrix completion, in which the measurements reveal the values of a subset of the entries, and we wish to fill in the missing entries (this is the famous Netflix problem). When M is believed to have low rank, one would ideally try to recover M by finding the minimum-rank matrix that is consistent with the data; this is, however, problematic since this is a nonconvex problem that is, generally, intractable.
Nuclear-norm minimization has been proposed as a tractable approach, and past papers have delved into the theoretical properties of nuclear-norm minimization algorithms, establishing conditions under which minimizing the nuclear norm yields the minimum rank solution. We review this spring of emerging literature and extend and refine previous theoretical results. Our focus is on providing error bounds when M is well approximated by a low-rank matrix, and when the measurements are corrupted with noise. We show that for a certain class of random linear measurements, nuclear-norm minimization provides stable recovery from a number of samples nearly at the theoretical lower limit, and enjoys order-optimal error bounds (with high probability).
△ Less
Submitted 2 October, 2009;
originally announced October 2009.