Search | arXiv e-print repository

Heavy Lasso: sparse penalized regression under heavy-tailed noise via data-augmented soft-thresholding

Abstract: High-dimensional linear regression is a fundamental tool in modern statistics, particularly when the number of predictors exceeds the sample size. The classical Lasso, which relies on the squared loss, performs well under Gaussian noise assumptions but often deteriorates in the presence of heavy-tailed errors or outliers commonly encountered in real data applications such as genomics, finance, and… ▽ More High-dimensional linear regression is a fundamental tool in modern statistics, particularly when the number of predictors exceeds the sample size. The classical Lasso, which relies on the squared loss, performs well under Gaussian noise assumptions but often deteriorates in the presence of heavy-tailed errors or outliers commonly encountered in real data applications such as genomics, finance, and signal processing. To address these challenges, we propose a novel robust regression method, termed Heavy Lasso, which incorporates a loss function inspired by the Student's t-distribution within a Lasso penalization framework. This loss retains the desirable quadratic behavior for small residuals while adaptively downweighting large deviations, thus enhancing robustness to heavy-tailed noise and outliers. Heavy Lasso enjoys computationally efficient by leveraging a data augmentation scheme and a soft-thresholding algorithm, which integrate seamlessly with classical Lasso solvers. Theoretically, we establish non-asymptotic bounds under both $\ell_1$ and $\ell_2 $ norms, by employing the framework of localized convexity, showing that the Heavy Lasso estimator achieves rates comparable to those of the Huber loss. Extensive numerical studies demonstrate Heavy Lasso's superior performance over classical Lasso and other robust variants, highlighting its effectiveness in challenging noisy settings. Our method is implemented in the R package heavylasso available on Github. △ Less

Submitted 9 June, 2025; originally announced June 2025.

arXiv:2505.22211 [pdf, ps, other]

Handling bounded response in high dimensions: a Horseshoe prior Bayesian Beta regression approach

Authors: The Tien Mai

Abstract: Bounded continuous responses -- such as proportions -- arise frequently in diverse scientific fields including climatology, biostatistics, and finance. Beta regression is a widely adopted framework for modeling such data, due to the flexibility of the Beta distribution over the unit interval. While Bayesian extensions of Beta regression have shown promise, existing methods are limited to low-dimen… ▽ More Bounded continuous responses -- such as proportions -- arise frequently in diverse scientific fields including climatology, biostatistics, and finance. Beta regression is a widely adopted framework for modeling such data, due to the flexibility of the Beta distribution over the unit interval. While Bayesian extensions of Beta regression have shown promise, existing methods are limited to low-dimensional settings and lack theoretical guarantees. In this work, we propose a novel Bayesian approach for high-dimensional sparse Beta regression framework that employs a tempered posterior. Our method incorporates the Horseshoe prior for effective shrinkage and variable selection. Most notable, we propose a novel Gibbs sampling algorithm using Pólya-Gamma augmentation for efficient inference in Beta regression model. We also provide the first theoretical results establishing posterior consistency and convergence rates for Bayesian Beta regression. Through extensive simulation studies in both low- and high-dimensional scenarios, we demonstrate that our approach outperforms existing alternatives, offering improved estimation accuracy and model interpretability. Our method is implemented in the R package ``betaregbayes" available on Github. △ Less

Submitted 28 May, 2025; originally announced May 2025.

arXiv:2505.08288 [pdf, ps, other]

High-dimensional Bayesian Tobit regression for censored response with Horseshoe prior

Authors: The Tien Mai

Abstract: Censored response variables--where outcomes are only partially observed due to known bounds--arise in numerous scientific domains and present serious challenges for regression analysis. The Tobit model, a classical solution for handling left-censoring, has been widely used in economics and beyond. However, with the increasing prevalence of high-dimensional data, where the number of covariates exce… ▽ More Censored response variables--where outcomes are only partially observed due to known bounds--arise in numerous scientific domains and present serious challenges for regression analysis. The Tobit model, a classical solution for handling left-censoring, has been widely used in economics and beyond. However, with the increasing prevalence of high-dimensional data, where the number of covariates exceeds the sample size, traditional Tobit methods become inadequate. While frequentist approaches for high-dimensional Tobit regression have recently been developed, notably through Lasso-based estimators, the Bayesian literature remains sparse and lacks theoretical guarantees. In this work, we propose a novel Bayesian framework for high-dimensional Tobit regression that addresses both censoring and sparsity. Our method leverages the Horseshoe prior to induce shrinkage and employs a data augmentation strategy to facilitate efficient posterior computation via Gibbs sampling. We establish posterior consistency and derive concentration rates under sparsity, providing the first theoretical results for Bayesian Tobit models in high dimensions. Numerical experiments show that our approach outperforms favorably with the recent Lasso-Tobit method. Our method is implemented in the R package tobitbayes, which can be found on Github. △ Less

Submitted 13 May, 2025; originally announced May 2025.

arXiv:2505.04341 [pdf, ps, other]

PAC-Bayesian risk bounds for fully connected deep neural network with Gaussian priors

Authors: The Tien Mai

Abstract: Deep neural networks (DNNs) have emerged as a powerful methodology with significant practical successes in fields such as computer vision and natural language processing. Recent works have demonstrated that sparsely connected DNNs with carefully designed architectures can achieve minimax estimation rates under classical smoothness assumptions. However, subsequent studies revealed that simple fully… ▽ More Deep neural networks (DNNs) have emerged as a powerful methodology with significant practical successes in fields such as computer vision and natural language processing. Recent works have demonstrated that sparsely connected DNNs with carefully designed architectures can achieve minimax estimation rates under classical smoothness assumptions. However, subsequent studies revealed that simple fully connected DNNs can achieve comparable convergence rates, challenging the necessity of sparsity. Theoretical advances in Bayesian neural networks (BNNs) have been more fragmented. Much of those work has concentrated on sparse networks, leaving the theoretical properties of fully connected BNNs underexplored. In this paper, we address this gap by investigating fully connected Bayesian DNNs with Gaussian prior using PAC-Bayes bounds. We establish upper bounds on the prediction risk for a probabilistic deep neural network method, showing that these bounds match (up to logarithmic factors) the minimax-optimal rates in Besov space, for both nonparametric regression and binary classification with logistic loss. Importantly, our results hold for a broad class of practical activation functions that are Lipschitz continuous. △ Less

Submitted 7 May, 2025; originally announced May 2025.

arXiv:2504.10171 [pdf, ps, other]

Kullback-Leibler excess risk bounds for exponential weighted aggregation in Generalized linear models

Authors: The Tien Mai

Abstract: Aggregation methods have emerged as a powerful and flexible framework in statistical learning, providing unified solutions across diverse problems such as regression, classification, and density estimation. In the context of generalized linear models (GLMs), where responses follow exponential family distributions, aggregation offers an attractive alternative to classical parametric modeling. This… ▽ More Aggregation methods have emerged as a powerful and flexible framework in statistical learning, providing unified solutions across diverse problems such as regression, classification, and density estimation. In the context of generalized linear models (GLMs), where responses follow exponential family distributions, aggregation offers an attractive alternative to classical parametric modeling. This paper investigates the problem of sparse aggregation in GLMs, aiming to approximate the true parameter vector by a sparse linear combination of predictors. We prove that an exponential weighted aggregation scheme yields a sharp oracle inequality for the Kullback-Leibler risk with leading constant equal to one, while also attaining the minimax-optimal rate of aggregation. These results are further enhanced by establishing high-probability bounds on the excess risk. △ Less

Submitted 14 April, 2025; originally announced April 2025.

arXiv:2504.09509 [pdf, ps, other]

Optimal sparse phase retrieval via a quasi-Bayesian approach

Authors: The Tien Mai

Abstract: This paper addresses the problem of sparse phase retrieval, a fundamental inverse problem in applied mathematics, physics, and engineering, where a signal need to be reconstructed using only the magnitude of its transformation while phase information remains inaccessible. Leveraging the inherent sparsity of many real-world signals, we introduce a novel sparse quasi-Bayesian approach and provide th… ▽ More This paper addresses the problem of sparse phase retrieval, a fundamental inverse problem in applied mathematics, physics, and engineering, where a signal need to be reconstructed using only the magnitude of its transformation while phase information remains inaccessible. Leveraging the inherent sparsity of many real-world signals, we introduce a novel sparse quasi-Bayesian approach and provide the first theoretical guarantees for such an approach. Specifically, we employ a scaled Student distribution as a continuous shrinkage prior to enforce sparsity and analyze the method using the PAC-Bayesian inequality framework. Our results establish that the proposed Bayesian estimator achieves minimax-optimal convergence rates under sub-exponential noise, matching those of state-of-the-art frequentist methods. To ensure computational feasibility, we develop an efficient Langevin Monte Carlo sampling algorithm. Through numerical experiments, we demonstrate that our method performs comparably to existing frequentist techniques, highlighting its potential as a principled alternative for sparse phase retrieval in noisy settings. △ Less

Submitted 13 April, 2025; originally announced April 2025.

arXiv:2410.15381 [pdf, ps, other]

High-dimensional prediction for count response via sparse exponential weights

Authors: The Tien Mai

Abstract: Count data is prevalent in various fields like ecology, medical research, and genomics. In high-dimensional settings, where the number of features exceeds the sample size, feature selection becomes essential. While frequentist methods like Lasso have advanced in handling high-dimensional count data, Bayesian approaches remain under-explored with no theoretical results on prediction performance. Th… ▽ More Count data is prevalent in various fields like ecology, medical research, and genomics. In high-dimensional settings, where the number of features exceeds the sample size, feature selection becomes essential. While frequentist methods like Lasso have advanced in handling high-dimensional count data, Bayesian approaches remain under-explored with no theoretical results on prediction performance. This paper introduces a novel probabilistic machine learning framework for high-dimensional count data prediction. We propose a pseudo-Bayesian method that integrates a scaled Student prior to promote sparsity and uses an exponential weight aggregation procedure. A key contribution is a novel risk measure tailored to count data prediction, with theoretical guarantees for prediction risk using PAC-Bayesian bounds. Our results include non-asymptotic oracle inequalities, demonstrating rate-optimal prediction error without prior knowledge of sparsity. We implement this approach efficiently using Langevin Monte Carlo method. Simulations and a real data application highlight the strong performance of our method compared to the Lasso in various settings. △ Less

Submitted 20 October, 2024; originally announced October 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2409.01687

arXiv:2409.01687 [pdf, ps, other]

A sparse PAC-Bayesian approach for high-dimensional quantile prediction

Authors: The Tien Mai

Abstract: Quantile regression, a robust method for estimating conditional quantiles, has advanced significantly in fields such as econometrics, statistics, and machine learning. In high-dimensional settings, where the number of covariates exceeds sample size, penalized methods like lasso have been developed to address sparsity challenges. Bayesian methods, initially connected to quantile regression via the… ▽ More Quantile regression, a robust method for estimating conditional quantiles, has advanced significantly in fields such as econometrics, statistics, and machine learning. In high-dimensional settings, where the number of covariates exceeds sample size, penalized methods like lasso have been developed to address sparsity challenges. Bayesian methods, initially connected to quantile regression via the asymmetric Laplace likelihood, have also evolved, though issues with posterior variance have led to new approaches, including pseudo/score likelihoods. This paper presents a novel probabilistic machine learning approach for high-dimensional quantile prediction. It uses a pseudo-Bayesian framework with a scaled Student-t prior and Langevin Monte Carlo for efficient computation. The method demonstrates strong theoretical guarantees, through PAC-Bayes bounds, that establish non-asymptotic oracle inequalities, showing minimax-optimal prediction error and adaptability to unknown sparsity. Its effectiveness is validated through simulations and real-world data, where it performs competitively against established frequentist and Bayesian techniques. △ Less

Submitted 3 September, 2024; originally announced September 2024.

arXiv:2408.08675 [pdf, ps, other]

Misclassification excess risk bounds for PAC-Bayesian classification via convexified loss

Authors: The Tien Mai

Abstract: PAC-Bayesian bounds have proven to be a valuable tool for deriving generalization bounds and for designing new learning algorithms in machine learning. However, it typically focus on providing generalization bounds with respect to a chosen loss function. In classification tasks, due to the non-convex nature of the 0-1 loss, a convex surrogate loss is often used, and thus current PAC-Bayesian bound… ▽ More PAC-Bayesian bounds have proven to be a valuable tool for deriving generalization bounds and for designing new learning algorithms in machine learning. However, it typically focus on providing generalization bounds with respect to a chosen loss function. In classification tasks, due to the non-convex nature of the 0-1 loss, a convex surrogate loss is often used, and thus current PAC-Bayesian bounds are primarily specified for this convex surrogate. This work shifts its focus to providing misclassification excess risk bounds for PAC-Bayesian classification when using a convex surrogate loss. Our key ingredient here is to leverage PAC-Bayesian relative bounds in expectation rather than relying on PAC-Bayesian bounds in probability. We demonstrate our approach in several important applications. △ Less

Submitted 16 August, 2024; originally announced August 2024.

arXiv:2406.14269 [pdf, ps, other]

doi 10.1002/sta4.70008

Concentration of a sparse Bayesian model with Horseshoe prior in estimating high-dimensional precision matrix

Authors: The Tien Mai

Abstract: Precision matrices are crucial in many fields such as social networks, neuroscience, and economics, representing the edge structure of Gaussian graphical models (GGMs), where a zero in an off-diagonal position of the precision matrix indicates conditional independence between nodes. In high-dimensional settings where the dimension of the precision matrix $ p $ exceeds the sample size $ n $ and… ▽ More Precision matrices are crucial in many fields such as social networks, neuroscience, and economics, representing the edge structure of Gaussian graphical models (GGMs), where a zero in an off-diagonal position of the precision matrix indicates conditional independence between nodes. In high-dimensional settings where the dimension of the precision matrix $ p $ exceeds the sample size $ n $ and the matrix is sparse, methods like graphical Lasso, graphical SCAD, and CLIME are popular for estimating GGMs. While frequentist methods are well-studied, Bayesian approaches for (unstructured) sparse precision matrices are less explored. The graphical horseshoe estimate by \cite{li2019graphical}, applying the global-local horseshoe prior, shows superior empirical performance, but theoretical work for sparse precision matrix estimations using shrinkage priors is limited. This paper addresses these gaps by providing concentration results for the tempered posterior with the fully specified horseshoe prior in high-dimensional settings. Moreover, we also provide novel theoretical results for model misspecification, offering a general oracle inequality for the posterior. A concise set of simulations is performed to validate our theoretical findings. △ Less

Submitted 13 September, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

Journal ref: Stat 2024

arXiv:2405.19016 [pdf, ps, other]

Adaptive posterior concentration rates for sparse high-dimensional linear regression with random design and unknown error variance

Authors: The Tien Mai

Abstract: This paper investigates sparse high-dimensional linear regression, particularly examining the properties of the posterior under conditions of random design and unknown error variance. We provide consistency results for the posterior and analyze its concentration rates, demonstrating adaptiveness to the unknown sparsity level of the regression coefficient vector. Furthermore, we extend our investig… ▽ More This paper investigates sparse high-dimensional linear regression, particularly examining the properties of the posterior under conditions of random design and unknown error variance. We provide consistency results for the posterior and analyze its concentration rates, demonstrating adaptiveness to the unknown sparsity level of the regression coefficient vector. Furthermore, we extend our investigation to establish concentration outcomes for parameter estimation using specific distance measures. These findings are in line with recent discoveries in frequentist studies. Additionally, by employing techniques to address model misspecification through a fractional posterior, we broaden our analysis through oracle inequalities to encompass the critical aspect of model misspecification for the regular posterior. Our novel findings are demonstrated using two different types of sparsity priors: a shrinkage prior and a spike-and-slab prior. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.01304 [pdf, ps, other]

doi 10.1007/s10994-024-06690-0

Misclassification bounds for PAC-Bayesian sparse deep learning

Authors: The Tien Mai

Abstract: Recently, there has been a significant focus on exploring the theoretical aspects of deep learning, especially regarding its performance in classification tasks. Bayesian deep learning has emerged as a unified probabilistic framework, seeking to integrate deep learning with Bayesian methodologies seamlessly. However, there exists a gap in the theoretical understanding of Bayesian approaches in dee… ▽ More Recently, there has been a significant focus on exploring the theoretical aspects of deep learning, especially regarding its performance in classification tasks. Bayesian deep learning has emerged as a unified probabilistic framework, seeking to integrate deep learning with Bayesian methodologies seamlessly. However, there exists a gap in the theoretical understanding of Bayesian approaches in deep learning for classification. This study presents an attempt to bridge that gap. By leveraging PAC-Bayes bounds techniques, we present theoretical results on the prediction or misclassification error of a probabilistic approach utilizing Spike-and-Slab priors for sparse deep learning in classification. We establish non-asymptotic results for the prediction error. Additionally, we demonstrate that, by considering different architectures, our results can achieve minimax optimal rates in both low and high-dimensional settings, up to a logarithmic factor. Moreover, our additional logarithmic term yields slight improvements over previous works. Additionally, we propose and analyze an automated model selection approach aimed at optimally choosing a network architecture with guaranteed optimality. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:1908.04847 by other authors

Journal ref: Machine Learning 2025

arXiv:2405.01140 [pdf, other]

Tracking and classifying objects with DAS data along railway

Authors: Simon L. B. Fredriksen, The Tien Mai, Kevin Growe, Jo Eidsvik

Abstract: Distributed acoustic sensing through fiber-optical cables can contribute to traffic monitoring systems. Using data from a day of field testing on a 50 km long fiber-optic cable along a railroad track in Norway, we detect and track cars and trains along a segment of the fiber-optic cable where the road runs parallel to the railroad tracks. We develop a method for automatic detection of events and t… ▽ More Distributed acoustic sensing through fiber-optical cables can contribute to traffic monitoring systems. Using data from a day of field testing on a 50 km long fiber-optic cable along a railroad track in Norway, we detect and track cars and trains along a segment of the fiber-optic cable where the road runs parallel to the railroad tracks. We develop a method for automatic detection of events and then use these in a Kalman filter variant known as joint probabilistic data association for object tracking and classification. Model parameters are specified using in-situ log data along with the fiber-optic signals. Running the algorithm over an entire day, we highlight results of counting cars and trains over time and their estimated velocities. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.08969 [pdf, ps, other]

doi 10.1007/s10994-024-06691-z

Concentration properties of fractional posterior in 1-bit matrix completion

Authors: The Tien Mai

Abstract: The problem of estimating a matrix based on a set of its observed entries is commonly referred to as the matrix completion problem. In this work, we specifically address the scenario of binary observations, often termed as 1-bit matrix completion. While numerous studies have explored Bayesian and frequentist methods for real-value matrix completion, there has been a lack of theoretical exploration… ▽ More The problem of estimating a matrix based on a set of its observed entries is commonly referred to as the matrix completion problem. In this work, we specifically address the scenario of binary observations, often termed as 1-bit matrix completion. While numerous studies have explored Bayesian and frequentist methods for real-value matrix completion, there has been a lack of theoretical exploration regarding Bayesian approaches in 1-bit matrix completion. We tackle this gap by considering a general, non-uniform sampling scheme and providing theoretical assurances on the efficacy of the fractional posterior. Our contributions include obtaining concentration results for the fractional posterior and demonstrating its effectiveness in recovering the underlying parameter matrix. We accomplish this using two distinct types of prior distributions: low-rank factorization priors and a spectral scaled Student prior, with the latter requiring fewer assumptions. Importantly, our results exhibit an adaptive nature by not mandating prior knowledge of the rank of the parameter matrix. Our findings are comparable to those found in the frequentist literature, yet demand fewer restrictive assumptions. △ Less

Submitted 13 April, 2024; originally announced April 2024.

Journal ref: Machine Learning 2025

arXiv:2403.03656 [pdf, other]

A practical and efficient approach for Bayesian reservoir inversion: Insights from the Alvheim field data

Authors: Karen S Auestad, The Tien Mai, Mina Spremic, Jo Eidsvik

Abstract: Stochastic reservoir characterization, a critical aspect of subsurface exploration for oil and gas reservoirs, relies on stochastic methods to model and understand subsurface properties using seismic data. This paper addresses the computational challenges associated with Bayesian reservoir inversion methods, focusing on two key obstacles: the demanding forward model and the high dimensionality of… ▽ More Stochastic reservoir characterization, a critical aspect of subsurface exploration for oil and gas reservoirs, relies on stochastic methods to model and understand subsurface properties using seismic data. This paper addresses the computational challenges associated with Bayesian reservoir inversion methods, focusing on two key obstacles: the demanding forward model and the high dimensionality of Gaussian random fields. Leveraging the generalized Bayesian approach, we replace the intricate forward function with a computationally efficient multivariate adaptive regression splines method, resulting in a 34 acceleration in computational efficiency. For handling high-dimensional Gaussian random fields, we employ a fast Fourier transform (FFT) technique. Additionally, we explore the preconditioned Crank-Nicolson method for sampling, providing a more efficient exploration of high-dimensional parameter spaces. The practicality and efficacy of our approach are tested extensively in simulations and its validity is demonstrated in application to the Alvheim field data. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2312.12952 [pdf, ps, other]

doi 10.1111/stan.12342

High-dimensional sparse classification using exponential weighting with empirical hinge loss

Authors: The Tien Mai

Abstract: In this study, we address the problem of high-dimensional binary classification. Our proposed solution involves employing an aggregation technique founded on exponential weights and empirical hinge loss. Through the employment of a suitable sparsity-inducing prior distribution, we demonstrate that our method yields favorable theoretical results on prediction error. The efficiency of our procedure… ▽ More In this study, we address the problem of high-dimensional binary classification. Our proposed solution involves employing an aggregation technique founded on exponential weights and empirical hinge loss. Through the employment of a suitable sparsity-inducing prior distribution, we demonstrate that our method yields favorable theoretical results on prediction error. The efficiency of our procedure is achieved through the utilization of Langevin Monte Carlo, a gradient-based sampling approach. To illustrate the effectiveness of our approach, we conduct comparisons with the logistic Lasso on simulated data and a real dataset. Our method frequently demonstrates superior performance compared to the logistic Lasso. △ Less

Submitted 6 March, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

Journal ref: Statistica Neerlandica 2024

arXiv:2306.05829 [pdf, ps, other]

doi 10.1007/s11222-023-10314-3

A reduced-rank approach to predicting multiple binary responses through machine learning

Authors: The Tien Mai

Abstract: This paper investigates the problem of simultaneously predicting multiple binary responses by utilizing a shared set of covariates. Our approach incorporates machine learning techniques for binary classification, without making assumptions about the underlying observations. Instead, our focus lies on a group of predictors, aiming to identify the one that minimizes prediction error. Unlike previous… ▽ More This paper investigates the problem of simultaneously predicting multiple binary responses by utilizing a shared set of covariates. Our approach incorporates machine learning techniques for binary classification, without making assumptions about the underlying observations. Instead, our focus lies on a group of predictors, aiming to identify the one that minimizes prediction error. Unlike previous studies that primarily address estimation error, we directly analyze the prediction error of our method using PAC-Bayesian bounds techniques. In this paper, we introduce a pseudo-Bayesian approach capable of handling incomplete response data. Our strategy is efficiently implemented using the Langevin Monte Carlo method. Through simulation studies and a practical application using real data, we demonstrate the effectiveness of our proposed method, producing comparable or sometimes superior results compared to the current state-of-the-art method. △ Less

Submitted 6 March, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

Journal ref: statistics and computing, 2023, 33

arXiv:2306.04606 [pdf, other]

Network-based Representations and Dynamic Discrete Choice Models for Multiple Discrete Choice Analysis

Authors: Hung Tran, Tien Mai

Abstract: In many choice modeling applications, people demand is frequently characterized as multiple discrete, which means that people choose multiple items simultaneously. The analysis and prediction of people behavior in multiple discrete choice situations pose several challenges. In this paper, to address this, we propose a random utility maximization (RUM) based model that considers each subset of choi… ▽ More In many choice modeling applications, people demand is frequently characterized as multiple discrete, which means that people choose multiple items simultaneously. The analysis and prediction of people behavior in multiple discrete choice situations pose several challenges. In this paper, to address this, we propose a random utility maximization (RUM) based model that considers each subset of choice alternatives as a composite alternative, where individuals choose a subset according to the RUM framework. While this approach offers a natural and intuitive modeling approach for multiple-choice analysis, the large number of subsets of choices in the formulation makes its estimation and application intractable. To overcome this challenge, we introduce directed acyclic graph (DAG) based representations of choices where each node of the DAG is associated with an elemental alternative and additional information such that the number of selected elemental alternatives. Our innovation is to show that the multi-choice model is equivalent to a recursive route choice model on the DAG, leading to the development of new efficient estimation algorithms based on dynamic programming. In addition, the DAG representations enable us to bring some advanced route choice models to capture the correlation between subset choice alternatives. Numerical experiments based on synthetic and real datasets show many advantages of our modeling approach and the proposed estimation algorithms. △ Less

Submitted 7 June, 2023; originally announced June 2023.

arXiv:2304.02261 [pdf, other]

Optimal Sketching Bounds for Sparse Linear Regression

Authors: Tung Mai, Alexander Munteanu, Cameron Musco, Anup B. Rao, Chris Schwiegelshohn, David P. Woodruff

Abstract: We study oblivious sketching for $k$-sparse linear regression under various loss functions such as an $\ell_p$ norm, or from a broad class of hinge-like loss functions, which includes the logistic and ReLU losses. We show that for sparse $\ell_2$ norm regression, there is a distribution over oblivious sketches with $Θ(k\log(d/k)/\varepsilon^2)$ rows, which is tight up to a constant factor. This ex… ▽ More We study oblivious sketching for $k$-sparse linear regression under various loss functions such as an $\ell_p$ norm, or from a broad class of hinge-like loss functions, which includes the logistic and ReLU losses. We show that for sparse $\ell_2$ norm regression, there is a distribution over oblivious sketches with $Θ(k\log(d/k)/\varepsilon^2)$ rows, which is tight up to a constant factor. This extends to $\ell_p$ loss with an additional additive $O(k\log(k/\varepsilon)/\varepsilon^2)$ term in the upper bound. This establishes a surprising separation from the related sparse recovery problem, which is an important special case of sparse regression. For this problem, under the $\ell_2$ norm, we observe an upper bound of $O(k \log (d)/\varepsilon + k\log(k/\varepsilon)/\varepsilon^2)$ rows, showing that sparse recovery is strictly easier to sketch than sparse regression. For sparse regression under hinge-like loss functions including sparse logistic and sparse ReLU regression, we give the first known sketching bounds that achieve $o(d)$ rows showing that $O(μ^2 k\log(μn d/\varepsilon)/\varepsilon^2)$ rows suffice, where $μ$ is a natural complexity parameter needed to obtain relative error bounds for these loss functions. We again show that this dimension is tight, up to lower order terms and the dependence on $μ$. Finally, we show that similar sketching bounds can be achieved for LASSO regression, a popular convex relaxation of sparse regression, where one aims to minimize $\|Ax-b\|_2^2+λ\|x\|_1$ over $x\in\mathbb{R}^d$. We show that sketching dimension $O(\log(d)/(λ\varepsilon)^2)$ suffices and that the dependence on $d$ and $λ$ is tight. △ Less

Submitted 5 April, 2023; originally announced April 2023.

Comments: AISTATS 2023

arXiv:2210.15290 [pdf, ps, other]

doi 10.3390/e25020333

From bilinear regression to inductive matrix completion: a quasi-Bayesian analysis

Authors: The Tien Mai

Abstract: In this paper we study the problem of bilinear regression and we further address the case when the response matrix contains missing data that referred as the problem of inductive matrix completion. We propose a quasi-Bayesian approach first to the problem of bilinear regression where a quasi-likelihood is employed. Then, we adapt this approach to the context of inductive matrix completion. Under a… ▽ More In this paper we study the problem of bilinear regression and we further address the case when the response matrix contains missing data that referred as the problem of inductive matrix completion. We propose a quasi-Bayesian approach first to the problem of bilinear regression where a quasi-likelihood is employed. Then, we adapt this approach to the context of inductive matrix completion. Under a low-rankness assumption and leveraging PAC-Bayes bound technique, we provide statistical properties for our proposed estimators and for the quasi-posteriors. We propose a Langevin Monte Carlo method to approximately compute the proposed estimators. Some numerical studies are conducted to demonstrated our methods. △ Less

Submitted 27 October, 2022; originally announced October 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2206.08619

Journal ref: entropy 2023

arXiv:2210.06594 [pdf, other]

Sample Constrained Treatment Effect Estimation

Authors: Raghavendra Addanki, David Arbour, Tung Mai, Cameron Musco, Anup Rao

Abstract: Treatment effect estimation is a fundamental problem in causal inference. We focus on designing efficient randomized controlled trials, to accurately estimate the effect of some treatment on a population of $n$ individuals. In particular, we study sample-constrained treatment effect estimation, where we must select a subset of $s \ll n$ individuals from the population to experiment on. This subset… ▽ More Treatment effect estimation is a fundamental problem in causal inference. We focus on designing efficient randomized controlled trials, to accurately estimate the effect of some treatment on a population of $n$ individuals. In particular, we study sample-constrained treatment effect estimation, where we must select a subset of $s \ll n$ individuals from the population to experiment on. This subset must be further partitioned into treatment and control groups. Algorithms for partitioning the entire population into treatment and control groups, or for choosing a single representative subset, have been well-studied. The key challenge in our setting is jointly choosing a representative subset and a partition for that set. We focus on both individual and average treatment effect estimation, under a linear effects model. We give provably efficient experimental designs and corresponding estimators, by identifying connections to discrepancy minimization and leverage-score-based sampling used in randomized numerical linear algebra. Our theoretical results obtain a smooth transition to known guarantees when $s$ equals the population size. We also empirically demonstrate the performance of our algorithms. △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: Conference on Neural Information Processing Systems (NeurIPS) 2022

arXiv:2208.04797 [pdf, ps, other]

doi 10.1093/bioadv/vbad027

Inferring the heritability of bacterial traits in the era of machine learning

Authors: The Tien Mai, John A Lees, Rebecca A Gladstone, Jukka Corander

Abstract: Quantification of heritability is a fundamental desideratum in genetics, which allows an assessment of the contribution of additive genetic variation to the variability of a trait of interest. The traditional computational approaches for assessing the heritability of a trait have been developed in the field of quantitative genetics. However, the rise of modern population genomics with large sample… ▽ More Quantification of heritability is a fundamental desideratum in genetics, which allows an assessment of the contribution of additive genetic variation to the variability of a trait of interest. The traditional computational approaches for assessing the heritability of a trait have been developed in the field of quantitative genetics. However, the rise of modern population genomics with large sample sizes has led to the development of several new machine learning based approaches to inferring heritability. In this paper, we systematically summarize recent advances in machine learning which can be used to infer heritability. We focus on an application of these methods to bacterial genomes, where heritability plays a key role in understanding phenotypes such as antibiotic resistance and virulence, which are particularly important due to the rising frequency of antimicrobial resistance. By designing a heritability model incorporating realistic patterns of genome-wide linkage disequilibrium for a frequently recombining bacterial pathogen, we test the performance of a wide spectrum of different inference methods, including also GCTA. In addition to the synthetic data benchmark, we present a comparison of the methods for antibiotic resistance traits for multiple bacterial pathogens. Insights from the benchmarking and real data analyses indicate a highly variable performance of the different methods and suggest that heritability inference would likely benefit from tailoring of the methods to the specific genetic architecture of the target organism. △ Less

Submitted 16 January, 2023; v1 submitted 9 August, 2022; originally announced August 2022.

Journal ref: Bioinformatics Advances 2023

arXiv:2206.08619 [pdf, ps, other]

Optimal quasi-Bayesian reduced rank regression with incomplete response

Authors: The Tien Mai, Pierre Alquier

Abstract: The aim of reduced rank regression is to connect multiple response variables to multiple predictors. This model is very popular, especially in biostatistics where multiple measurements on individuals can be re-used to predict multiple outputs. Unfortunately, there are often missing data in such datasets, making it difficult to use standard estimation tools. In this paper, we study the problem of r… ▽ More The aim of reduced rank regression is to connect multiple response variables to multiple predictors. This model is very popular, especially in biostatistics where multiple measurements on individuals can be re-used to predict multiple outputs. Unfortunately, there are often missing data in such datasets, making it difficult to use standard estimation tools. In this paper, we study the problem of reduced rank regression where the response matrix is incomplete. We propose a quasi-Bayesian approach to this problem, in the sense that the likelihood is replaced by a quasi-likelihood. We provide a tight oracle inequality, proving that our method is adaptive to the rank of the coefficient matrix. We describe a Langevin Monte Carlo algorithm for the computation of the posterior mean. Numerical comparison on synthetic and real data show that our method are competitive to the state-of-the-art where the rank is chosen by cross validation, and sometimes lead to an improvement. △ Less

Submitted 17 June, 2022; originally announced June 2022.

arXiv:2203.02025 [pdf, other]

Online Balanced Experimental Design

Authors: David Arbour, Drew Dimmery, Tung Mai, Anup Rao

Abstract: e consider the experimental design problem in an online environment, an important practical task for reducing the variance of estimates in randomized experiments which allows for greater precision, and in turn, improved decision making. In this work, we present algorithms that build on recent advances in online discrepancy minimization which accommodate both arbitrary treatment probabilities and m… ▽ More e consider the experimental design problem in an online environment, an important practical task for reducing the variance of estimates in randomized experiments which allows for greater precision, and in turn, improved decision making. In this work, we present algorithms that build on recent advances in online discrepancy minimization which accommodate both arbitrary treatment probabilities and multiple treatments. The proposed algorithms are computational efficient, minimize covariate imbalance, and include randomization which enables robustness to misspecification. We provide worst case bounds on the expected mean squared error of the causal estimate and show that the proposed estimator is no worse than an implicit ridge regression, which are within a logarithmic factor of the best known results for offline experimental design. We conclude with a detailed simulation study showing favorable results relative to complete randomization as well as to offline methods for experimental design with time complexities exceeding our algorithm. △ Less

Submitted 3 March, 2022; originally announced March 2022.

arXiv:2111.14674 [pdf, ps, other]

Online MAP Inference and Learning for Nonsymmetric Determinantal Point Processes

Authors: Aravind Reddy, Ryan A. Rossi, Zhao Song, Anup Rao, Tung Mai, Nedim Lipka, Gang Wu, Eunyee Koh, Nesreen Ahmed

Abstract: In this paper, we introduce the online and streaming MAP inference and learning problems for Non-symmetric Determinantal Point Processes (NDPPs) where data points arrive in an arbitrary order and the algorithms are constrained to use a single-pass over the data as well as sub-linear memory. The online setting has an additional requirement of maintaining a valid solution at any point in time. For s… ▽ More In this paper, we introduce the online and streaming MAP inference and learning problems for Non-symmetric Determinantal Point Processes (NDPPs) where data points arrive in an arbitrary order and the algorithms are constrained to use a single-pass over the data as well as sub-linear memory. The online setting has an additional requirement of maintaining a valid solution at any point in time. For solving these new problems, we propose algorithms with theoretical guarantees, evaluate them on several real-world datasets, and show that they give comparable performance to state-of-the-art offline algorithms that store the entire data in memory and take multiple passes over it. △ Less

Submitted 29 November, 2021; originally announced November 2021.

arXiv:2108.05655 [pdf, other]

doi 10.11159/icsta22.114

Understanding the population structure correction regression

Authors: The Tien Mai, Pierre Alquier

Abstract: Although genome-wide association studies (GWAS) on complex traits have achieved great successes, the current leading GWAS approaches simply perform to test each genotype-phenotype association separately for each genetic variant. Curiously, the statistical properties for using these approaches is not known when a joint model for the whole genetic variants is considered. Here we advance in GWAS in u… ▽ More Although genome-wide association studies (GWAS) on complex traits have achieved great successes, the current leading GWAS approaches simply perform to test each genotype-phenotype association separately for each genetic variant. Curiously, the statistical properties for using these approaches is not known when a joint model for the whole genetic variants is considered. Here we advance in GWAS in understanding the statistical properties of the "population structure correction" (PSC) approach, a standard univariate approach in GWAS. We further propose and analyse a correction to the PSC approach, termed as "corrected population correction" (CPC). Together with the theoretical results, numerical simulations show that CPC is always comparable or better than PSC, with a dramatic improvement in some special cases. △ Less

Submitted 12 August, 2021; originally announced August 2021.

arXiv:2106.00577 [pdf, ps, other]

doi 10.1007/s00180-022-01264-x

An efficient adaptive MCMC algorithm for Pseudo-Bayesian quantum tomography

Authors: The Tien Mai

Abstract: We revisit the Pseudo-Bayesian approach to the problem of estimating density matrix in quantum state tomography in this paper. Pseudo-Bayesian inference has been shown to offer a powerful paradign for quantum tomography with attractive theoretical and empirical results. However, the computation of (Pseudo-)Bayesian estimators, due to sampling from complex and high-dimensional distribution, pose si… ▽ More We revisit the Pseudo-Bayesian approach to the problem of estimating density matrix in quantum state tomography in this paper. Pseudo-Bayesian inference has been shown to offer a powerful paradign for quantum tomography with attractive theoretical and empirical results. However, the computation of (Pseudo-)Bayesian estimators, due to sampling from complex and high-dimensional distribution, pose significant challenges that hampers their usages in practical settings. To overcome this problem, we present an efficient adaptive MCMC sampling method for the Pseudo-Bayesian estimator. We show in simulations that our approach is substantially faster than the previous implementation by at least two orders of magnitude which is significant for practical quantum tomography. △ Less

Submitted 14 September, 2023; v1 submitted 1 June, 2021; originally announced June 2021.

arXiv:2104.08191 [pdf, ps, other]

PAC-Bayesian Matrix Completion with a Spectral Scaled Student Prior

Authors: The Tien Mai

Abstract: We study the problem of matrix completion in this paper. A spectral scaled Student prior is exploited to favour the underlying low-rank structure of the data matrix. We provide a thorough theoretical investigation for our approach through PAC-Bayesian bounds. More precisely, our PAC-Bayesian approach enjoys a minimax-optimal oracle inequality which guarantees that our method works well under model… ▽ More We study the problem of matrix completion in this paper. A spectral scaled Student prior is exploited to favour the underlying low-rank structure of the data matrix. We provide a thorough theoretical investigation for our approach through PAC-Bayesian bounds. More precisely, our PAC-Bayesian approach enjoys a minimax-optimal oracle inequality which guarantees that our method works well under model misspecification and under general sampling distribution. Interestingly, we also provide efficient gradient-based sampling implementations for our approach by using Langevin Monte Carlo. More specifically, we show that our algorithms are significantly faster than Gibbs sampler in this problem. To illustrate the attractive features of our inference strategy, some numerical simulations are conducted and an application to image inpainting is demonstrated. △ Less

Submitted 7 January, 2022; v1 submitted 16 April, 2021; originally announced April 2021.

arXiv:2103.11749 [pdf, ps, other]

doi 10.1007/s40300-023-00239-2

Simulation comparisons between Bayesian and de-biased estimators in low-rank matrix completion

Authors: The Tien Mai

Abstract: In this paper, we study the low-rank matrix completion problem, a class of machine learning problems, that aims at the prediction of missing entries in a partially observed matrix. Such problems appear in several challenging applications such as collaborative filtering, image processing, and genotype imputation. We compare the Bayesian approaches and a recently introduced de-biased estimator which… ▽ More In this paper, we study the low-rank matrix completion problem, a class of machine learning problems, that aims at the prediction of missing entries in a partially observed matrix. Such problems appear in several challenging applications such as collaborative filtering, image processing, and genotype imputation. We compare the Bayesian approaches and a recently introduced de-biased estimator which provides a useful way to build confidence intervals of interest. From a theoretical viewpoint, the de-biased estimator comes with a sharp minimax-optimal rate of estimation error whereas the Bayesian approach reaches this rate with an additional logarithmic factor. Our simulation studies show originally interesting results that the de-biased estimator is just as good as the Bayesian estimators. Moreover, Bayesian approaches are much more stable and can outperform the de-biased estimator in the case of small samples. In addition, we also find that the empirical coverage rate of the confidence intervals obtained by the de-biased estimator for an entry is absolutely lower than of the considered credible interval. These results suggest further theoretical studies on the estimation error and the concentration of Bayesian methods as they are quite limited up to present. △ Less

Submitted 1 September, 2023; v1 submitted 22 March, 2021; originally announced March 2021.

Journal ref: metron 2023

arXiv:2103.04557 [pdf, other]

Asymptotics of Ridge Regression in Convolutional Models

Authors: Mojtaba Sahraee-Ardakan, Tung Mai, Anup Rao, Ryan Rossi, Sundeep Rangan, Alyson K. Fletcher

Abstract: Understanding generalization and estimation error of estimators for simple models such as linear and generalized linear models has attracted a lot of attention recently. This is in part due to an interesting observation made in machine learning community that highly over-parameterized neural networks achieve zero training error, and yet they are able to generalize well over the test samples. This… ▽ More Understanding generalization and estimation error of estimators for simple models such as linear and generalized linear models has attracted a lot of attention recently. This is in part due to an interesting observation made in machine learning community that highly over-parameterized neural networks achieve zero training error, and yet they are able to generalize well over the test samples. This phenomenon is captured by the so called double descent curve, where the generalization error starts decreasing again after the interpolation threshold. A series of recent works tried to explain such phenomenon for simple models. In this work, we analyze the asymptotics of estimation error in ridge estimators for convolutional linear models. These convolutional inverse problems, also known as deconvolution, naturally arise in different fields such as seismology, imaging, and acoustics among others. Our results hold for a large class of input distributions that include i.i.d. features as a special case. We derive exact formulae for estimation error of ridge estimators that hold in a certain high-dimensional regime. We show the double descent phenomenon in our experiments for convolutional models and show that our theoretical results match the experiments. △ Less

Submitted 8 March, 2021; originally announced March 2021.

arXiv:2102.13179 [pdf, other]

Machine Unlearning via Algorithmic Stability

Authors: Enayat Ullah, Tung Mai, Anup Rao, Ryan Rossi, Raman Arora

Abstract: We study the problem of machine unlearning and identify a notion of algorithmic stability, Total Variation (TV) stability, which we argue, is suitable for the goal of exact unlearning. For convex risk minimization problems, we design TV-stable algorithms based on noisy Stochastic Gradient Descent (SGD). Our key contribution is the design of corresponding efficient unlearning algorithms, which are… ▽ More We study the problem of machine unlearning and identify a notion of algorithmic stability, Total Variation (TV) stability, which we argue, is suitable for the goal of exact unlearning. For convex risk minimization problems, we design TV-stable algorithms based on noisy Stochastic Gradient Descent (SGD). Our key contribution is the design of corresponding efficient unlearning algorithms, which are based on constructing a (maximal) coupling of Markov chains for the noisy SGD procedure. To understand the trade-offs between accuracy and unlearning efficiency, we give upper and lower bounds on excess empirical and populations risk of TV stable algorithms for convex risk minimization. Our techniques generalize to arbitrary non-convex functions, and our algorithms are differentially private as well. △ Less

Submitted 25 February, 2021; originally announced February 2021.

arXiv:2102.12961 [pdf, other]

doi 10.1007/978-3-031-10461-9_37

On regret bounds for continual single-index learning

Authors: The Tien Mai

Abstract: In this paper, we generalize the problem of single-index model to the context of continual learning in which a learner is challenged with a sequence of tasks one by one and the dataset of each task is revealed in an online fashion. We propose a randomized strategy that is able to learn a common single-index (meta-parameter) for all tasks and a specific link function for each task. The common singl… ▽ More In this paper, we generalize the problem of single-index model to the context of continual learning in which a learner is challenged with a sequence of tasks one by one and the dataset of each task is revealed in an online fashion. We propose a randomized strategy that is able to learn a common single-index (meta-parameter) for all tasks and a specific link function for each task. The common single-index allows to transfer the information gained from the previous tasks to a new one. We provide a rigorous theoretical analysis of our proposed strategy by proving some regret bounds under different assumption on the loss function. △ Less

Submitted 25 November, 2021; v1 submitted 25 February, 2021; originally announced February 2021.

arXiv:2102.07579 [pdf, ps, other]

Efficient Bayesian reduced rank regression using Langevin Monte Carlo approach

Authors: The Tien Mai

Abstract: The problem of Bayesian reduced rank regression is considered in this paper. We propose, for the first time, to use Langevin Monte Carlo method in this problem. A spectral scaled Student prior distrbution is used to exploit the underlying low-rank structure of the coefficient matrix. We show that our algorithms are significantly faster than the Gibbs sampler in high-dimensional setting. Simulation… ▽ More The problem of Bayesian reduced rank regression is considered in this paper. We propose, for the first time, to use Langevin Monte Carlo method in this problem. A spectral scaled Student prior distrbution is used to exploit the underlying low-rank structure of the coefficient matrix. We show that our algorithms are significantly faster than the Gibbs sampler in high-dimensional setting. Simulation results show that our proposed algorithms for Bayesian reduced rank regression are comparable to the state-of-the-art method where the rank is chosen by cross validation. △ Less

Submitted 15 February, 2021; originally announced February 2021.

arXiv:2101.06309 [pdf, other]

Fundamental Tradeoffs in Distributionally Adversarial Training

Authors: Mohammad Mehrabi, Adel Javanmard, Ryan A. Rossi, Anup Rao, Tung Mai

Abstract: Adversarial training is among the most effective techniques to improve the robustness of models against adversarial perturbations. However, the full effect of this approach on models is not well understood. For example, while adversarial training can reduce the adversarial risk (prediction error against an adversary), it sometimes increase standard risk (generalization error when there is no adver… ▽ More Adversarial training is among the most effective techniques to improve the robustness of models against adversarial perturbations. However, the full effect of this approach on models is not well understood. For example, while adversarial training can reduce the adversarial risk (prediction error against an adversary), it sometimes increase standard risk (generalization error when there is no adversary). Even more, such behavior is impacted by various elements of the learning problem, including the size and quality of training data, specific forms of adversarial perturbations in the input, model overparameterization, and adversary's power, among others. In this paper, we focus on \emph{distribution perturbing} adversary framework wherein the adversary can change the test distribution within a neighborhood of the training data distribution. The neighborhood is defined via Wasserstein distance between distributions and the radius of the neighborhood is a measure of adversary's manipulative power. We study the tradeoff between standard risk and adversarial risk and derive the Pareto-optimal tradeoff, achievable over specific classes of models, in the infinite data limit with features dimension kept fixed. We consider three learning settings: 1) Regression with the class of linear models; 2) Binary classification under the Gaussian mixtures data model, with the class of linear classifiers; 3) Regression with the class of random features model (which can be equivalently represented as two-layer neural network with random first-layer weights). We show that a tradeoff between standard and adversarial risk is manifested in all three settings. We further characterize the Pareto-optimal tradeoff curves and discuss how a variety of factors, such as features correlation, adversary's power or the width of two-layer neural network would affect this tradeoff. △ Less

Submitted 15 January, 2021; originally announced January 2021.

Comments: 23 pages, 3 figures

arXiv:2009.13566 [pdf, other]

Graph Neural Networks with Heterophily

Authors: Jiong Zhu, Ryan A. Rossi, Anup Rao, Tung Mai, Nedim Lipka, Nesreen K. Ahmed, Danai Koutra

Abstract: Graph Neural Networks (GNNs) have proven to be useful for many different practical applications. However, many existing GNN models have implicitly assumed homophily among the nodes connected in the graph, and therefore have largely overlooked the important setting of heterophily, where most connected nodes are from different classes. In this work, we propose a novel framework called CPGNN that gen… ▽ More Graph Neural Networks (GNNs) have proven to be useful for many different practical applications. However, many existing GNN models have implicitly assumed homophily among the nodes connected in the graph, and therefore have largely overlooked the important setting of heterophily, where most connected nodes are from different classes. In this work, we propose a novel framework called CPGNN that generalizes GNNs for graphs with either homophily or heterophily. The proposed framework incorporates an interpretable compatibility matrix for modeling the heterophily or homophily level in the graph, which can be learned in an end-to-end fashion, enabling it to go beyond the assumption of strong homophily. Theoretically, we show that replacing the compatibility matrix in our framework with the identity (which represents pure homophily) reduces to GCN. Our extensive experiments demonstrate the effectiveness of our approach in more realistic and challenging experimental settings with significantly less training data compared to previous works: CPGNN variants achieve state-of-the-art results in heterophily settings with or without contextual node features, while maintaining comparable performance in homophily settings. △ Less

Submitted 14 June, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

Comments: Proceedings version of AAAI 2021 with appendix and additional typo fixes; 12 pages, 4 figures

Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence. 35, 12 (May 2021), 11168-11176

arXiv:2005.05783 [pdf]

Modeling Route Choice with Real-Time Information: Comparing the Recursive and Non-Recursive Models

Authors: Xinlian Yu, Tien Mai, Jing Ding-Mastera, Song Gao, Emma Frejinger

Abstract: We study the routing policy choice problems in a stochastic time-dependent (STD) network. A routing policy is defined as a decision rule applied at the end of each link that maps the realized traffic condition to the decision on the link to take next. Two types of routing policy choice models are formulated with perfect online information (POI): recursive logit model and non-recursive logit model.… ▽ More We study the routing policy choice problems in a stochastic time-dependent (STD) network. A routing policy is defined as a decision rule applied at the end of each link that maps the realized traffic condition to the decision on the link to take next. Two types of routing policy choice models are formulated with perfect online information (POI): recursive logit model and non-recursive logit model. In the non-recursive model, a choice set of routing policies between an origin-destination (OD) pair is generated, and a probabilistic choice is modeled at the origin, while the choice of the next link at each link is a deterministic execution of the chosen routing policy. In the recursive model, the probabilistic choice of the next link is modeled at each link, following the framework of dynamic discrete choice models. The two models are further compared in terms of computational efficiency in estimation and prediction, and flexibility in systematic utility specification and modeling correlation. △ Less

Submitted 4 June, 2020; v1 submitted 8 May, 2020; originally announced May 2020.

arXiv:1911.06930 [pdf, other]

Inverse Reinforcement Learning with Missing Data

Authors: Tien Mai, Quoc Phong Nguyen, Kian Hsiang Low, Patrick Jaillet

Abstract: We consider the problem of recovering an expert's reward function with inverse reinforcement learning (IRL) when there are missing/incomplete state-action pairs or observations in the demonstrated trajectories. This issue of missing trajectory data or information occurs in many situations, e.g., GPS signals from vehicles moving on a road network are intermittent. In this paper, we propose a tracta… ▽ More We consider the problem of recovering an expert's reward function with inverse reinforcement learning (IRL) when there are missing/incomplete state-action pairs or observations in the demonstrated trajectories. This issue of missing trajectory data or information occurs in many situations, e.g., GPS signals from vehicles moving on a road network are intermittent. In this paper, we propose a tractable approach to directly compute the log-likelihood of demonstrated trajectories with incomplete/missing data. Our algorithm is efficient in handling a large number of missing segments in the demonstrated trajectories, as it performs the training with incomplete data by solving a sequence of systems of linear equations, and the number of such systems to be solved does not depend on the number of missing segments. Empirical evaluation on a real-world dataset shows that our training algorithm outperforms other conventional techniques. △ Less

Submitted 15 November, 2019; originally announced November 2019.

arXiv:1911.06928 [pdf, other]

Generalized Maximum Causal Entropy for Inverse Reinforcement Learning

Authors: Tien Mai, Kennard Chan, Patrick Jaillet

Abstract: We consider the problem of learning from demonstrated trajectories with inverse reinforcement learning (IRL). Motivated by a limitation of the classical maximum entropy model in capturing the structure of the network of states, we propose an IRL model based on a generalized version of the causal entropy maximization problem, which allows us to generate a class of maximum entropy IRL models. Our ge… ▽ More We consider the problem of learning from demonstrated trajectories with inverse reinforcement learning (IRL). Motivated by a limitation of the classical maximum entropy model in capturing the structure of the network of states, we propose an IRL model based on a generalized version of the causal entropy maximization problem, which allows us to generate a class of maximum entropy IRL models. Our generalized model has an advantage of being able to recover, in addition to a reward function, another expert's function that would (partially) capture the impact of the connecting structure of the states on experts' decisions. Empirical evaluation on a real-world dataset and a grid-world dataset shows that our generalized model outperforms the classical ones, in terms of recovering reward functions and demonstrated trajectories. △ Less

Submitted 18 August, 2020; v1 submitted 15 November, 2019; originally announced November 2019.

arXiv:1910.11743 [pdf, other]

doi 10.1186/s12859-021-04079-7

Boosting heritability: estimating the genetic component of phenotypic variation with multiple sample splitting

Authors: The Tien Mai, Paul Turner, Jukka Corander

Abstract: Background: Heritability is a central measure in genetics quantifying how much of the variability observed in a trait is attributable to genetic differences. Existing methods for estimating heritability are most often based on random-effect models, typically for computational reasons. The alternative of using a fixed-effect model has received much more limited attention in the literature. Results:… ▽ More Background: Heritability is a central measure in genetics quantifying how much of the variability observed in a trait is attributable to genetic differences. Existing methods for estimating heritability are most often based on random-effect models, typically for computational reasons. The alternative of using a fixed-effect model has received much more limited attention in the literature. Results: In this paper, we propose a generic strategy for heritability inference, termed as ``boosting heritability", by combining the advantageous features of different recent methods to produce an estimate of the heritability with a high-dimensional linear model. Boosting heritability uses in particular a multiple sample splitting strategy which leads in general to a stable and and accurate estimate. We use both simulated data and real antibiotic resistance data from a major human pathogen, Sptreptococcus pneumoniae, to demonstrate the attractive features of our inference strategy. Conclusions: Boosting is shown to offer a reliable and practically useful tool for inference about heritability. △ Less

Submitted 15 March, 2021; v1 submitted 25 October, 2019; originally announced October 2019.

arXiv:1905.00095 [pdf, ps, other]

doi 10.1007/978-3-030-63061-4_7

Composite local low-rank structure in learning drug sensitivity

Authors: The Tien Mai, Leiv Rønneberg, Zhi Zhao, Manuela Zucknick, Jukka Corander

Abstract: The molecular characterization of tumor samples by multiple omics data sets of different types or modalities (e.g. gene expression, mutation, CpG methylation) has become an invaluable source of information for assessing the expected performance of individual drugs and their combinations. Merging the relevant information from the omics data modalities provides the statistical basis for determining… ▽ More The molecular characterization of tumor samples by multiple omics data sets of different types or modalities (e.g. gene expression, mutation, CpG methylation) has become an invaluable source of information for assessing the expected performance of individual drugs and their combinations. Merging the relevant information from the omics data modalities provides the statistical basis for determining suitable therapies for specific cancer patients. Different data modalities may each have their specific structures that need to be taken into account during inference. In this paper, we assume that each omics data modality has a low-rank structure with only few relevant features that affect the prediction and we propose to use a composite local nuclear norm penalization for learning drug sensitivity. Numerical results show that the composite low-rank structure can improve the prediction performance compared to using a global low-rank approach or elastic net regression. △ Less

Submitted 5 September, 2019; v1 submitted 30 April, 2019; originally announced May 2019.

Journal ref: CIBB 2019,http://www.cibb2019.it/

arXiv:1610.08628 [pdf, other]

Regret Bounds for Lifelong Learning

Authors: Pierre Alquier, The Tien Mai, Massimiliano Pontil

Abstract: We consider the problem of transfer learning in an online setting. Different tasks are presented sequentially and processed by a within-task algorithm. We propose a lifelong learning strategy which refines the underlying data representation used by the within-task algorithm, thereby transferring information from one task to the next. We show that when the within-task algorithm comes with some regr… ▽ More We consider the problem of transfer learning in an online setting. Different tasks are presented sequentially and processed by a within-task algorithm. We propose a lifelong learning strategy which refines the underlying data representation used by the within-task algorithm, thereby transferring information from one task to the next. We show that when the within-task algorithm comes with some regret bound, our strategy inherits this good property. Our bounds are in expectation for a general loss function, and uniform for a convex loss. We discuss applications to dictionary learning and finite set of predictors. In the latter case, we improve previous $O(1/\sqrt{m})$ bounds to $O(1/m)$ where $m$ is the per task sample size. △ Less

Submitted 27 October, 2016; originally announced October 2016.

Journal ref: Proceedings of Machine Learning Research, 2017, vol. 54 (AISTAT 2017), pp. 261-269

arXiv:1509.07900 [pdf, other]

Bayesian sequential parameter estimation with a Laplace type approximation

Authors: Tiep Mai, Simon Wilson

Abstract: A method for sequential inference of the fixed parameters of a dynamic latent Gaussian models is proposed and evaluated that is based on the iterated Laplace approximation. The method provides a useful trade-off between computational performance and the accuracy of the approximation to the true posterior distribution. Approximation corrections are shown to improve the accuracy of the approximation… ▽ More A method for sequential inference of the fixed parameters of a dynamic latent Gaussian models is proposed and evaluated that is based on the iterated Laplace approximation. The method provides a useful trade-off between computational performance and the accuracy of the approximation to the true posterior distribution. Approximation corrections are shown to improve the accuracy of the approximation in simulation studies. A population-based approach is also shown to provide a more robust inference method. △ Less

Submitted 25 September, 2015; originally announced September 2015.

arXiv:1509.06492 [pdf, other]

Modifying iterated Laplace approximations

Authors: Tiep Mai, Simon Wilson

Abstract: In this paper, several modifications are introduced to the functional approximation method iterLap to reduce the approximation error, including stopping rule adjustment, proposal of new residual function, starting point selection for numerical optimisation, scaling of Hessian matrix. Illustrative examples are also provided to show the trade-off between running time and accuracy of the original and… ▽ More In this paper, several modifications are introduced to the functional approximation method iterLap to reduce the approximation error, including stopping rule adjustment, proposal of new residual function, starting point selection for numerical optimisation, scaling of Hessian matrix. Illustrative examples are also provided to show the trade-off between running time and accuracy of the original and modified methods. △ Less

Submitted 22 September, 2015; originally announced September 2015.

Showing 1–43 of 43 results for author: Mai, T