Search | arXiv e-print repository

InnerThoughts: Disentangling Representations and Predictions in Large Language Models

Authors: Didier Chételat, Joseph Cotnareanu, Rylee Thompson, Yingxue Zhang, Mark Coates

Abstract: Large language models (LLMs) contain substantial factual knowledge which is commonly elicited by multiple-choice question-answering prompts. Internally, such models process the prompt through multiple transformer layers, building varying representations of the problem within its hidden states. Ultimately, however, only the hidden state corresponding to the final layer and token position are used t… ▽ More Large language models (LLMs) contain substantial factual knowledge which is commonly elicited by multiple-choice question-answering prompts. Internally, such models process the prompt through multiple transformer layers, building varying representations of the problem within its hidden states. Ultimately, however, only the hidden state corresponding to the final layer and token position are used to predict the answer label. In this work, we propose instead to learn a small separate neural network predictor module on a collection of training questions, that take the hidden states from all the layers at the last temporal position as input and outputs predictions. In effect, such a framework disentangles the representational abilities of LLMs from their predictive abilities. On a collection of hard benchmarks, our method achieves considerable improvements in performance, sometimes comparable to supervised fine-tuning procedures, but at a fraction of the computational cost. △ Less

Submitted 29 January, 2025; originally announced January 2025.

Comments: Accepted at AISTATS 2025

arXiv:2412.13292 [pdf, other]

Refining Answer Distributions for Improved Large Language Model Reasoning

Authors: Soumyasundar Pal, Didier Chételat, Yingxue Zhang, Mark Coates

Abstract: Large Language Models (LLMs) have exhibited an impressive capability to perform reasoning tasks, especially if they are encouraged to generate a sequence of intermediate steps. Reasoning performance can be improved by suitably combining multiple LLM responses, generated either in parallel in a single query, or via sequential interactions with LLMs throughout the reasoning process. Existing strateg… ▽ More Large Language Models (LLMs) have exhibited an impressive capability to perform reasoning tasks, especially if they are encouraged to generate a sequence of intermediate steps. Reasoning performance can be improved by suitably combining multiple LLM responses, generated either in parallel in a single query, or via sequential interactions with LLMs throughout the reasoning process. Existing strategies for combination, such as self-consistency and progressive-hint-prompting, make inefficient usage of the LLM responses. We present Refined Answer Distributions, a novel and principled algorithmic framework to enhance the reasoning capabilities of LLMs. Our approach can be viewed as an iterative sampling strategy for forming a Monte Carlo approximation of an underlying distribution of answers, with the goal of identifying the mode -- the most likely answer. Empirical evaluation on several reasoning benchmarks demonstrates the superiority of the proposed approach. △ Less

Submitted 9 April, 2025; v1 submitted 17 December, 2024; originally announced December 2024.

arXiv:2411.00843 [pdf, other]

The Graph's Apprentice: Teaching an LLM Low Level Knowledge for Circuit Quality Estimation

Authors: Reza Moravej, Saurabh Bodhe, Zhanguang Zhang, Didier Chetelat, Dimitrios Tsaras, Yingxue Zhang, Hui-Ling Zhen, Jianye Hao, Mingxuan Yuan

Abstract: Logic synthesis is a crucial phase in the circuit design process, responsible for transforming hardware description language (HDL) designs into optimized netlists. However, traditional logic synthesis methods are computationally intensive, restricting their iterative use in refining chip designs. Recent advancements in large language models (LLMs), particularly those fine-tuned on programming lang… ▽ More Logic synthesis is a crucial phase in the circuit design process, responsible for transforming hardware description language (HDL) designs into optimized netlists. However, traditional logic synthesis methods are computationally intensive, restricting their iterative use in refining chip designs. Recent advancements in large language models (LLMs), particularly those fine-tuned on programming languages, present a promising alternative. This work proposes augmenting LLMs with predictor networks trained to estimate circuit quality directly from HDL code. To enhance performance, the model is regularized using embeddings from graph neural networks (GNNs) trained on Look-Up Table (LUT) graphs, thereby incorporating lower-level circuit insights. The proposed method demonstrates superior performance compared to existing graph-based RTL-level estimation techniques on the established benchmark OpenABCD, while providing instant feedback on HDL code quality. △ Less

Submitted 14 February, 2025; v1 submitted 30 October, 2024; originally announced November 2024.

arXiv:2405.11024 [pdf, other]

GraSS: Combining Graph Neural Networks with Expert Knowledge for SAT Solver Selection

Authors: Zhanguang Zhang, Didier Chetelat, Joseph Cotnareanu, Amur Ghose, Wenyi Xiao, Hui-Ling Zhen, Yingxue Zhang, Jianye Hao, Mark Coates, Mingxuan Yuan

Abstract: Boolean satisfiability (SAT) problems are routinely solved by SAT solvers in real-life applications, yet solving time can vary drastically between solvers for the same instance. This has motivated research into machine learning models that can predict, for a given SAT instance, which solver to select among several options. Existing SAT solver selection methods all rely on some hand-picked instance… ▽ More Boolean satisfiability (SAT) problems are routinely solved by SAT solvers in real-life applications, yet solving time can vary drastically between solvers for the same instance. This has motivated research into machine learning models that can predict, for a given SAT instance, which solver to select among several options. Existing SAT solver selection methods all rely on some hand-picked instance features, which are costly to compute and ignore the structural information in SAT graphs. In this paper we present GraSS, a novel approach for automatic SAT solver selection based on tripartite graph representations of instances and a heterogeneous graph neural network (GNN) model. While GNNs have been previously adopted in other SAT-related tasks, they do not incorporate any domain-specific knowledge and ignore the runtime variation introduced by different clause orders. We enrich the graph representation with domain-specific decisions, such as novel node feature design, positional encodings for clauses in the graph, a GNN architecture tailored to our tripartite graphs and a runtime-sensitive loss function. Through extensive experiments, we demonstrate that this combination of raw representations and domain-specific choices leads to improvements in runtime for a pool of seven state-of-the-art solvers on both an industrial circuit design benchmark, and on instances from the 20-year Anniversary Track of the 2022 SAT Competition. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: Accepted by KDD 2024

arXiv:2310.10603 [pdf, other]

Exploring the Power of Graph Neural Networks in Solving Linear Optimization Problems

Authors: Chendi Qian, Didier Chételat, Christopher Morris

Abstract: Recently, machine learning, particularly message-passing graph neural networks (MPNNs), has gained traction in enhancing exact optimization algorithms. For example, MPNNs speed up solving mixed-integer optimization problems by imitating computational intensive heuristics like strong branching, which entails solving multiple linear optimization problems (LPs). Despite the empirical success, the rea… ▽ More Recently, machine learning, particularly message-passing graph neural networks (MPNNs), has gained traction in enhancing exact optimization algorithms. For example, MPNNs speed up solving mixed-integer optimization problems by imitating computational intensive heuristics like strong branching, which entails solving multiple linear optimization problems (LPs). Despite the empirical success, the reasons behind MPNNs' effectiveness in emulating linear optimization remain largely unclear. Here, we show that MPNNs can simulate standard interior-point methods for LPs, explaining their practical success. Furthermore, we highlight how MPNNs can serve as a lightweight proxy for solving LPs, adapting to a given problem instance distribution. Empirically, we show that MPNNs solve LP relaxations of standard combinatorial optimization problems close to optimality, often surpassing conventional solvers and competing approaches in solving time. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2210.16934 [pdf, other]

Learning to Compare Nodes in Branch and Bound with Graph Neural Networks

Authors: Abdel Ghani Labassi, Didier Chételat, Andrea Lodi

Abstract: Branch-and-bound approaches in integer programming require ordering portions of the space to explore next, a problem known as node comparison. We propose a new siamese graph neural network model to tackle this problem, where the nodes are represented as bipartite graphs with attributes. Similar to prior work, we train our model to imitate a diving oracle that plunges towards the optimal solution.… ▽ More Branch-and-bound approaches in integer programming require ordering portions of the space to explore next, a problem known as node comparison. We propose a new siamese graph neural network model to tackle this problem, where the nodes are represented as bipartite graphs with attributes. Similar to prior work, we train our model to imitate a diving oracle that plunges towards the optimal solution. We evaluate our method by solving the instances in a plain framework where the nodes are explored according to their rank. On three NP-hard benchmarks chosen to be particularly primal-difficult, our approach leads to faster solving and smaller branch- and-bound trees than the default ranking function of the open-source solver SCIP, as well as competing machine learning methods. Moreover, these results generalize to instances larger than used for training. Code for reproducing the experiments can be found at https://github.com/ds4dm/learn2comparenodes. △ Less

Submitted 30 October, 2022; originally announced October 2022.

Comments: 7 pages, 3 figures, 2 tables

arXiv:2206.14987 [pdf, other]

Lookback for Learning to Branch

Authors: Prateek Gupta, Elias B. Khalil, Didier Chetélat, Maxime Gasse, Yoshua Bengio, Andrea Lodi, M. Pawan Kumar

Abstract: The expressive and computationally inexpensive bipartite Graph Neural Networks (GNN) have been shown to be an important component of deep learning based Mixed-Integer Linear Program (MILP) solvers. Recent works have demonstrated the effectiveness of such GNNs in replacing the branching (variable selection) heuristic in branch-and-bound (B&B) solvers. These GNNs are trained, offline and on a collec… ▽ More The expressive and computationally inexpensive bipartite Graph Neural Networks (GNN) have been shown to be an important component of deep learning based Mixed-Integer Linear Program (MILP) solvers. Recent works have demonstrated the effectiveness of such GNNs in replacing the branching (variable selection) heuristic in branch-and-bound (B&B) solvers. These GNNs are trained, offline and on a collection of MILPs, to imitate a very good but computationally expensive branching heuristic, strong branching. Given that B&B results in a tree of sub-MILPs, we ask (a) whether there are strong dependencies exhibited by the target heuristic among the neighboring nodes of the B&B tree, and (b) if so, whether we can incorporate them in our training procedure. Specifically, we find that with the strong branching heuristic, a child node's best choice was often the parent's second-best choice. We call this the "lookback" phenomenon. Surprisingly, the typical branching GNN of Gasse et al. (2019) often misses this simple "answer". To imitate the target behavior more closely by incorporating the lookback phenomenon in GNNs, we propose two methods: (a) target smoothing for the standard cross-entropy loss function, and (b) adding a Parent-as-Target (PAT) Lookback regularizer term. Finally, we propose a model selection framework to incorporate harder-to-formulate objectives such as solving time in the final models. Through extensive experimentation on standard benchmark instances, we show that our proposal results in up to 22% decrease in the size of the B&B tree and up to 15% improvement in the solving times. △ Less

Submitted 29 December, 2022; v1 submitted 29 June, 2022; originally announced June 2022.

Comments: Published in Transactions on Machine Learning Research (TMLR)

arXiv:2205.11107 [pdf, other]

Learning to branch with Tree MDPs

Authors: Lara Scavuzzo, Feng Yang Chen, Didier Chételat, Maxime Gasse, Andrea Lodi, Neil Yorke-Smith, Karen Aardal

Abstract: State-of-the-art Mixed Integer Linear Program (MILP) solvers combine systematic tree search with a plethora of hard-coded heuristics, such as the branching rule. The idea of learning branching rules from data has received increasing attention recently, and promising results have been obtained by learning fast approximations of the strong branching expert. In this work, we instead propose to learn… ▽ More State-of-the-art Mixed Integer Linear Program (MILP) solvers combine systematic tree search with a plethora of hard-coded heuristics, such as the branching rule. The idea of learning branching rules from data has received increasing attention recently, and promising results have been obtained by learning fast approximations of the strong branching expert. In this work, we instead propose to learn branching rules from scratch via Reinforcement Learning (RL). We revisit the work of Etheve et al. (2020) and propose tree Markov Decision Processes, or tree MDPs, a generalization of temporal MDPs that provides a more suitable framework for learning to branch. We derive a tree policy gradient theorem, which exhibits a better credit assignment compared to its temporal counterpart. We demonstrate through computational experiments that tree MDPs improve the learning convergence, and offer a promising framework for tackling the learning-to-branch problem in MILPs. △ Less

Submitted 13 October, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

Comments: 10 pages, 2 figures, plus supplementary material

arXiv:2204.09122 [pdf, other]

Continuous cutting plane algorithms in integer programming

Authors: Didier Chételat, Andrea Lodi

Abstract: Cutting planes for mixed-integer linear programs (MILPs) are typically computed in rounds by iteratively solving optimization problems, the so-called separation. Instead, we reframe the problem of finding good cutting planes as a continuous optimization problem over weights parametrizing families of valid inequalities. This problem can also be interpreted as optimizing a neural network to solve an… ▽ More Cutting planes for mixed-integer linear programs (MILPs) are typically computed in rounds by iteratively solving optimization problems, the so-called separation. Instead, we reframe the problem of finding good cutting planes as a continuous optimization problem over weights parametrizing families of valid inequalities. This problem can also be interpreted as optimizing a neural network to solve an optimization problem over subadditive functions, which we call the subadditive primal problem of the MILP. To do so, we propose a concrete two-step algorithm, and demonstrate empirical gains when optimizing generalized Gomory mixed-integer inequalities over various classes of MILPs. Code for reproducing the experiments can be found at https://github.com/dchetelat/subadditive. △ Less

Submitted 6 July, 2023; v1 submitted 19 April, 2022; originally announced April 2022.

Comments: To be published in Operations Research Letters

MSC Class: 90C11

arXiv:2203.02433 [pdf, ps, other]

The Machine Learning for Combinatorial Optimization Competition (ML4CO): Results and Insights

Authors: Maxime Gasse, Quentin Cappart, Jonas Charfreitag, Laurent Charlin, Didier Chételat, Antonia Chmiela, Justin Dumouchelle, Ambros Gleixner, Aleksandr M. Kazachkov, Elias Khalil, Pawel Lichocki, Andrea Lodi, Miles Lubin, Chris J. Maddison, Christopher Morris, Dimitri J. Papageorgiou, Augustin Parjadis, Sebastian Pokutta, Antoine Prouvost, Lara Scavuzzo, Giulia Zarpellon, Linxin Yang, Sha Lai, Akang Wang, Xiaodong Luo , et al. (16 additional authors not shown)

Abstract: Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused on solving problem instances in isolation, ignoring that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning as a new approach for solving combinatorial problems, either dir… ▽ More Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused on solving problem instances in isolation, ignoring that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning as a new approach for solving combinatorial problems, either directly as solvers or by enhancing exact solvers. Based on this context, the ML4CO aims at improving state-of-the-art combinatorial optimization solvers by replacing key heuristic components. The competition featured three challenging tasks: finding the best feasible solution, producing the tightest optimality certificate, and giving an appropriate solver configuration. Three realistic datasets were considered: balanced item placement, workload apportionment, and maritime inventory routing. This last dataset was kept anonymous for the contestants. △ Less

Submitted 17 March, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

Comments: Neurips 2021 competition. arXiv admin note: text overlap with arXiv:2112.12251 by other authors

arXiv:2104.02828 [pdf, ps, other]

Ecole: A Library for Learning Inside MILP Solvers

Authors: Antoine Prouvost, Justin Dumouchelle, Maxime Gasse, Didier Chételat, Andrea Lodi

Abstract: In this paper we describe Ecole (Extensible Combinatorial Optimization Learning Environments), a library to facilitate integration of machine learning in combinatorial optimization solvers. It exposes sequential decision making that must be performed in the process of solving as Markov decision processes. This means that, rather than trying to predict solutions to combinatorial optimization proble… ▽ More In this paper we describe Ecole (Extensible Combinatorial Optimization Learning Environments), a library to facilitate integration of machine learning in combinatorial optimization solvers. It exposes sequential decision making that must be performed in the process of solving as Markov decision processes. This means that, rather than trying to predict solutions to combinatorial optimization problems directly, Ecole allows machine learning to work in cooperation with a state-of-the-art a mixed-integer linear programming solver that acts as a controllable algorithm. Ecole provides a collection of computationally efficient, ready to use learning environments, which are also easy to extend to define novel training tasks. Documentation and code can be found at https://www.ecole.ai. △ Less

Submitted 6 April, 2021; originally announced April 2021.

arXiv:2102.09544 [pdf, ps, other]

Combinatorial optimization and reasoning with graph neural networks

Authors: Quentin Cappart, Didier Chételat, Elias Khalil, Andrea Lodi, Christopher Morris, Petar Veličković

Abstract: Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused on solving problem instances in isolation, ignoring that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning, especially graph neural networks (GNNs), as a key building bloc… ▽ More Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused on solving problem instances in isolation, ignoring that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning, especially graph neural networks (GNNs), as a key building block for combinatorial tasks, either directly as solvers or by enhancing exact solvers. The inductive bias of GNNs effectively encodes combinatorial and relational input due to their invariance to permutations and awareness of input sparsity. This paper presents a conceptual review of recent key advancements in this emerging field, aiming at optimization and machine learning researchers. △ Less

Submitted 23 September, 2022; v1 submitted 18 February, 2021; originally announced February 2021.

Journal ref: Journal of Machine Learning Research, 24(130):1-61, 2023

arXiv:2011.06069 [pdf, other]

Ecole: A Gym-like Library for Machine Learning in Combinatorial Optimization Solvers

Authors: Antoine Prouvost, Justin Dumouchelle, Lara Scavuzzo, Maxime Gasse, Didier Chételat, Andrea Lodi

Abstract: We present Ecole, a new library to simplify machine learning research for combinatorial optimization. Ecole exposes several key decision tasks arising in general-purpose combinatorial optimization solvers as control problems over Markov decision processes. Its interface mimics the popular OpenAI Gym library and is both extensible and intuitive to use. We aim at making this library a standardized p… ▽ More We present Ecole, a new library to simplify machine learning research for combinatorial optimization. Ecole exposes several key decision tasks arising in general-purpose combinatorial optimization solvers as control problems over Markov decision processes. Its interface mimics the popular OpenAI Gym library and is both extensible and intuitive to use. We aim at making this library a standardized platform that will lower the bar of entry and accelerate innovation in the field. Documentation and code can be found at https://www.ecole.ai. △ Less

Submitted 24 November, 2020; v1 submitted 11 November, 2020; originally announced November 2020.

Comments: Published at the 1st Workshop on Learning Meets Combinatorial Algorithms @ NeurIPS 2020, Vancouver, Canada

arXiv:2009.01358 [pdf, ps, other]

Change Point Detection by Cross-Entropy Maximization

Authors: Aurélien Serre, Didier Chételat, Andrea Lodi

Abstract: Many offline unsupervised change point detection algorithms rely on minimizing a penalized sum of segment-wise costs. We extend this framework by proposing to minimize a sum of discrepancies between segments. In particular, we propose to select the change points so as to maximize the cross-entropy between successive segments, balanced by a penalty for introducing new change points. We propose a dy… ▽ More Many offline unsupervised change point detection algorithms rely on minimizing a penalized sum of segment-wise costs. We extend this framework by proposing to minimize a sum of discrepancies between segments. In particular, we propose to select the change points so as to maximize the cross-entropy between successive segments, balanced by a penalty for introducing new change points. We propose a dynamic programming algorithm to solve this problem and analyze its complexity. Experiments on two challenging datasets demonstrate the advantages of our method compared to three state-of-the-art approaches. △ Less

Submitted 2 September, 2020; originally announced September 2020.

Comments: Preprint

arXiv:1906.01629 [pdf, other]

Exact Combinatorial Optimization with Graph Convolutional Neural Networks

Authors: Maxime Gasse, Didier Chételat, Nicola Ferroni, Laurent Charlin, Andrea Lodi

Abstract: Combinatorial optimization problems are typically tackled by the branch-and-bound paradigm. We propose a new graph convolutional neural network model for learning branch-and-bound variable selection policies, which leverages the natural variable-constraint bipartite graph representation of mixed-integer linear programs. We train our model via imitation learning from the strong branching expert rul… ▽ More Combinatorial optimization problems are typically tackled by the branch-and-bound paradigm. We propose a new graph convolutional neural network model for learning branch-and-bound variable selection policies, which leverages the natural variable-constraint bipartite graph representation of mixed-integer linear programs. We train our model via imitation learning from the strong branching expert rule, and demonstrate on a series of hard problems that our approach produces policies that improve upon state-of-the-art machine-learning methods for branching and generalize to instances significantly larger than seen during training. Moreover, we improve for the first time over expert-designed branching rules implemented in a state-of-the-art solver on large problems. Code for reproducing all the experiments can be found at https://github.com/ds4dm/learn2branch. △ Less

Submitted 30 October, 2019; v1 submitted 4 June, 2019; originally announced June 2019.

Comments: Accepted paper at the NeurIPS 2019 conference

arXiv:1705.03510 [pdf, other]

The middle-scale asymptotics of Wishart matrices

Authors: Didier Chételat, Martin T. Wells

Abstract: We study the behavior of a real $p$-dimensional Wishart random matrix with $n$ degrees of freedom when $n,p\rightarrow\infty$ but $p/n\rightarrow 0$. We establish the existence of phase transitions when $p$ grows at the order $n^{(K+1)/(K+3)}$ for every $k\in\mathbb{N}$, and derive expressions for approximating densities between every two phase transitions. To do this, we make use of a novel tool… ▽ More We study the behavior of a real $p$-dimensional Wishart random matrix with $n$ degrees of freedom when $n,p\rightarrow\infty$ but $p/n\rightarrow 0$. We establish the existence of phase transitions when $p$ grows at the order $n^{(K+1)/(K+3)}$ for every $k\in\mathbb{N}$, and derive expressions for approximating densities between every two phase transitions. To do this, we make use of a novel tool we call the G-transform of a distribution, which is closely related to the characteristic function. We also derive an extension of the $t$-distribution to the real symmetric matrices, which naturally appears as the conjugate distribution to the Wishart under a G-transformation, and show its empirical spectral distribution obeys a semicircle law when $p/n\rightarrow 0$. Finally, we discuss how the phase transitions of the Wishart distribution might originate from changes in rates of convergence of symmetric $t$ statistics. △ Less

Submitted 9 May, 2017; originally announced May 2017.

MSC Class: 60B20; 60B10 (Primary); 60E10 (Secondary)

arXiv:1510.08873 [pdf, other]

On the Domain of Attraction of a Tracy-Widom Law with Applications to Testing Multiple Largest Roots

Authors: Didier Chételat, Rajendran Narayanan, Martin T. Wells

Abstract: The greatest root statistic arises as the test statistic in several multivariate analysis settings. Suppose there is a global null hypothesis that consists of different independent sub-null hypotheses, and suppose the greatest root statistic is used as the test statistic for each sub-null hypothesis. Such problems may arise when conducting a batch MANOVA or several batches of pairwise testing for… ▽ More The greatest root statistic arises as the test statistic in several multivariate analysis settings. Suppose there is a global null hypothesis that consists of different independent sub-null hypotheses, and suppose the greatest root statistic is used as the test statistic for each sub-null hypothesis. Such problems may arise when conducting a batch MANOVA or several batches of pairwise testing for equality of covariance matrices. Using the union-intersection testing approach and by letting the problem dimension tend to infinity faster than the number of batches, we show that the global null can be tested using a Gumbel distribution to approximate the critical values. Although the theoretical results are asymptotic, simulation studies indicate that the approximations are very good even for small to moderate dimensions. The results are general and can be applied in any setting where the greatest root statistic is used, not just for the two methods we use for illustrative purposes. △ Less

Submitted 29 October, 2015; originally announced October 2015.

MSC Class: 60G70; 62E20; 62H15

arXiv:1509.02451 [pdf, other]

Improved Second Order Estimation in the Singular Multivariate Normal Model

Authors: Didier Chételat, Martin T. Wells

Abstract: We consider the problem of estimating covariance and precision matrices, and their associated discriminant coefficients, from normal data when the rank of the covariance matrix is strictly smaller than its dimension and the available sample size. Using unbiased risk estimation, we construct novel estimators by minimizing upper bounds on the difference in risk over several classes. Our proposal est… ▽ More We consider the problem of estimating covariance and precision matrices, and their associated discriminant coefficients, from normal data when the rank of the covariance matrix is strictly smaller than its dimension and the available sample size. Using unbiased risk estimation, we construct novel estimators by minimizing upper bounds on the difference in risk over several classes. Our proposal estimates are empirically demonstrated to offer substantial improvement over classical approaches. △ Less

Submitted 8 September, 2015; originally announced September 2015.

Comments: 34 pages

MSC Class: Primary 62C15; secondary 62F10; 62H12

arXiv:1410.5014 [pdf, other]

Optimal Two-Step Prediction in Regression

Authors: Didier Chételat, Johannes Lederer, Joseph Salmon

Abstract: High-dimensional prediction typically comprises two steps: variable selection and subsequent least-squares refitting on the selected variables. However, the standard variable selection procedures, such as the lasso, hinge on tuning parameters that need to be calibrated. Cross-validation, the most popular calibration scheme, is computationally costly and lacks finite sample guarantees. In this pape… ▽ More High-dimensional prediction typically comprises two steps: variable selection and subsequent least-squares refitting on the selected variables. However, the standard variable selection procedures, such as the lasso, hinge on tuning parameters that need to be calibrated. Cross-validation, the most popular calibration scheme, is computationally costly and lacks finite sample guarantees. In this paper, we introduce an alternative scheme, easy to implement and both computationally and theoretically efficient. △ Less

Submitted 5 June, 2017; v1 submitted 18 October, 2014; originally announced October 2014.

arXiv:1408.6440 [pdf, other]

Noise Estimation in the Spiked Covariance Model

Authors: Didier Chételat, Martin T. Wells

Abstract: The problem of estimating a spiked covariance matrix in high dimensions under Frobenius loss, and the parallel problem of estimating the noise in spiked PCA is investigated. We propose an estimator of the noise parameter by minimizing an unbiased estimator of the invariant Frobenius risk using calculus of variations. The resulting estimator is shown, using random matrix theory, to be strongly cons… ▽ More The problem of estimating a spiked covariance matrix in high dimensions under Frobenius loss, and the parallel problem of estimating the noise in spiked PCA is investigated. We propose an estimator of the noise parameter by minimizing an unbiased estimator of the invariant Frobenius risk using calculus of variations. The resulting estimator is shown, using random matrix theory, to be strongly consistent and essentially asymptotically normal and minimax for the noise estimation problem. We apply the construction to construct a robust spiked covariance matrix estimator with consistent eigenvalues. △ Less

Submitted 27 August, 2014; originally announced August 2014.

arXiv:1302.6746 [pdf, ps, other]

doi 10.1214/12-AOS1067

Improved multivariate normal mean estimation with unknown covariance when p is greater than n

Authors: Didier Chételat, Martin T. Wells

Abstract: We consider the problem of estimating the mean vector of a p-variate normal $(θ,Σ)$ distribution under invariant quadratic loss, $(δ-θ)'Σ^{-1}(δ-θ)$, when the covariance is unknown. We propose a new class of estimators that dominate the usual estimator $δ^0(X)=X$. The proposed estimators of $θ$ depend upon X and an independent Wishart matrix S with n degrees of freedom, however, S is singular almo… ▽ More We consider the problem of estimating the mean vector of a p-variate normal $(θ,Σ)$ distribution under invariant quadratic loss, $(δ-θ)'Σ^{-1}(δ-θ)$, when the covariance is unknown. We propose a new class of estimators that dominate the usual estimator $δ^0(X)=X$. The proposed estimators of $θ$ depend upon X and an independent Wishart matrix S with n degrees of freedom, however, S is singular almost surely when p>n. The proof of domination involves the development of some new unbiased estimators of risk for the p>n setting. We also find some relationships between the amount of domination and the magnitudes of n and p. △ Less

Submitted 27 February, 2013; originally announced February 2013.

Comments: Published in at http://dx.doi.org/10.1214/12-AOS1067 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1067

Journal ref: Annals of Statistics 2012, Vol. 40, No. 6, 3137-3160

Showing 1–21 of 21 results for author: Chételat, D