Search | arXiv e-print repository

An Easily Tunable Approach to Robust and Sparse High-Dimensional Linear Regression

Authors: Takeyuki Sasai, Hironori Fujisawa

Abstract: Sparse linear regression methods such as Lasso require a tuning parameter that depends on the noise variance, which is typically unknown and difficult to estimate in practice. In the presence of heavy-tailed noise or adversarial outliers, this problem becomes more challenging. In this paper, we propose an estimator for robust and sparse linear regression that eliminates the need for explicit prior… ▽ More Sparse linear regression methods such as Lasso require a tuning parameter that depends on the noise variance, which is typically unknown and difficult to estimate in practice. In the presence of heavy-tailed noise or adversarial outliers, this problem becomes more challenging. In this paper, we propose an estimator for robust and sparse linear regression that eliminates the need for explicit prior knowledge of the noise scale. Our method builds on the Huber loss and incorporates an iterative scheme that alternates between coefficient estimation and adaptive noise calibration via median-of-means. The approach is theoretically grounded and achieves sharp non-asymptotic error bounds under both sub-Gaussian and heavy-tailed noise assumptions. Moreover, the proposed method accommodates arbitrary outlier contamination in the response without requiring prior knowledge of the number of outliers or the sparsity level. While previous robust estimators avoid tuning parameters related to the noise scale or sparsity, our procedure achieves comparable error bounds when the number of outliers is unknown, and improved bounds when it is known. In particular, the improved bounds match the known minimax lower bounds up to constant factors. △ Less

Submitted 14 June, 2025; originally announced June 2025.

MSC Class: 62J07

arXiv:2308.15838 [pdf, other]

Adaptive Lasso, Transfer Lasso, and Beyond: An Asymptotic Perspective

Authors: Masaaki Takada, Hironori Fujisawa

Abstract: This paper presents a comprehensive exploration of the theoretical properties inherent in the Adaptive Lasso and the Transfer Lasso. The Adaptive Lasso, a well-established method, employs regularization divided by initial estimators and is characterized by asymptotic normality and variable selection consistency. In contrast, the recently proposed Transfer Lasso employs regularization subtracted by… ▽ More This paper presents a comprehensive exploration of the theoretical properties inherent in the Adaptive Lasso and the Transfer Lasso. The Adaptive Lasso, a well-established method, employs regularization divided by initial estimators and is characterized by asymptotic normality and variable selection consistency. In contrast, the recently proposed Transfer Lasso employs regularization subtracted by initial estimators with the demonstrated capacity to curtail non-asymptotic estimation errors. A pivotal question thus emerges: Given the distinct ways the Adaptive Lasso and the Transfer Lasso employ initial estimators, what benefits or drawbacks does this disparity confer upon each method? This paper conducts a theoretical examination of the asymptotic properties of the Transfer Lasso, thereby elucidating its differentiation from the Adaptive Lasso. Informed by the findings of this analysis, we introduce a novel method, one that amalgamates the strengths and compensates for the weaknesses of both methods. The paper concludes with validations of our theory and comparisons of the methods via simulation experiments. △ Less

Submitted 17 April, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

arXiv:2208.11592 [pdf, ps, other]

Outlier Robust and Sparse Estimation of Linear Regression Coefficients

Authors: Takeyuki Sasai, Hironori Fujisawa

Abstract: We consider outlier-robust and sparse estimation of linear regression coefficients, when the covariates and the noises are contaminated by adversarial outliers and noises are sampled from a heavy-tailed distribution. Our results present sharper error bounds under weaker assumptions than prior studies that share similar interests with this study. Our analysis relies on some sharp concentration ineq… ▽ More We consider outlier-robust and sparse estimation of linear regression coefficients, when the covariates and the noises are contaminated by adversarial outliers and noises are sampled from a heavy-tailed distribution. Our results present sharper error bounds under weaker assumptions than prior studies that share similar interests with this study. Our analysis relies on some sharp concentration inequalities resulting from generic chaining. △ Less

Submitted 24 May, 2024; v1 submitted 24 August, 2022; originally announced August 2022.

MSC Class: 62J07; 62F35

arXiv:2102.11120 [pdf, ps, other]

Adversarial robust weighted Huber regression

Authors: Takeyuki Sasai, Hironori Fujisawa

Abstract: We consider a robust estimation of linear regression coefficients. In this note, we focus on the case where the covariates are sampled from an $L$-subGaussian distribution with unknown covariance, the noises are sampled from a distribution with a bounded absolute moment and both covariates and noises may be contaminated by an adversary. We derive an estimation error bound, which depends on the sta… ▽ More We consider a robust estimation of linear regression coefficients. In this note, we focus on the case where the covariates are sampled from an $L$-subGaussian distribution with unknown covariance, the noises are sampled from a distribution with a bounded absolute moment and both covariates and noises may be contaminated by an adversary. We derive an estimation error bound, which depends on the stable rank and the condition number of the covariance matrix of covariates with a polynomial computational complexity of estimation. △ Less

Submitted 24 May, 2024; v1 submitted 22 February, 2021; originally announced February 2021.

Comments: The case of sparse coefficients is investigated in arXiv:2208.11592. This manuscript will not be submitted for publications

MSC Class: 62G35; 62G05

arXiv:2010.13018 [pdf, ps, other]

Adversarial Robust Low Rank Matrix Estimation: Compressed Sensing and Matrix Completion

Authors: Takeyuki Sasai, Hironori Fujisawa

Abstract: We consider robust low rank matrix estimation as a trace regression when outputs are contaminated by adversaries. The adversaries are allowed to add arbitrary values to arbitrary outputs. Such values can depend on any samples. We deal with matrix compressed sensing, including lasso as a partial problem, and matrix completion, and then we obtain sharp estimation error bounds. To obtain the error bo… ▽ More We consider robust low rank matrix estimation as a trace regression when outputs are contaminated by adversaries. The adversaries are allowed to add arbitrary values to arbitrary outputs. Such values can depend on any samples. We deal with matrix compressed sensing, including lasso as a partial problem, and matrix completion, and then we obtain sharp estimation error bounds. To obtain the error bounds for different models such as matrix compressed sensing and matrix completion, we propose a simple unified approach based on a combination of the Huber loss function and the nuclear norm penalization, which is a different approach from the conventional ones. Some error bounds obtained in the present paper are sharper than the past ones. △ Less

Submitted 24 May, 2024; v1 submitted 24 October, 2020; originally announced October 2020.

Comments: The lasso part of this manuscript with contaminated input as well as output is investigated in arXiv:2208.11592. This manuscript will not be submitted for publications

MSC Class: 62G35; 62G05

arXiv:2004.05990 [pdf, ps, other]

Robust estimation with Lasso when outputs are adversarially contaminated

Authors: Takeyuki Sasai, Hironori Fujisawa

Abstract: We consider robust estimation when outputs are adversarially contaminated. Nguyen and Tran (2012) proposed an extended Lasso for robust parameter estimation and then they showed the convergence rate of the estimation error. Recently, Dalalyan and Thompson (2019) gave some useful inequalities and then they showed a faster convergence rate than Nguyen and Tran (2012). They focused on the fact that t… ▽ More We consider robust estimation when outputs are adversarially contaminated. Nguyen and Tran (2012) proposed an extended Lasso for robust parameter estimation and then they showed the convergence rate of the estimation error. Recently, Dalalyan and Thompson (2019) gave some useful inequalities and then they showed a faster convergence rate than Nguyen and Tran (2012). They focused on the fact that the minimization problem of the extended Lasso can become that of the penalized Huber loss function with $L_1$ penalty. The distinguishing point is that the Huber loss function includes an extra tuning parameter, which is different from the conventional method. We give the proof, which is different from Dalalyan and Thompson (2019) and then we give the same convergence rate as Dalalyan and Thompson (2019). The significance of our proof is to use some specific properties of the Huber function. Such techniques have not been used in the past proofs. △ Less

Submitted 24 May, 2024; v1 submitted 13 April, 2020; originally announced April 2020.

Comments: The case of contaminated inputs as well as outputs is investigated in arXiv:2208.11592. This manuscript will not be submitted for publications

arXiv:1805.07960

Stochastic Gradient Descent for Stochastic Doubly-Nonconvex Composite Optimization

Authors: Takayuki Kawashima, Hironori Fujisawa

Abstract: The stochastic gradient descent has been widely used for solving composite optimization problems in big data analyses. Many algorithms and convergence properties have been developed. The composite functions were convex primarily and gradually nonconvex composite functions have been adopted to obtain more desirable properties. The convergence properties have been investigated, but only when either… ▽ More The stochastic gradient descent has been widely used for solving composite optimization problems in big data analyses. Many algorithms and convergence properties have been developed. The composite functions were convex primarily and gradually nonconvex composite functions have been adopted to obtain more desirable properties. The convergence properties have been investigated, but only when either of composite functions is nonconvex. There is no convergence property when both composite functions are nonconvex, which is named the \textit{doubly-nonconvex} case.To overcome this difficulty, we assume a simple and weak condition that the penalty function is \textit{quasiconvex} and then we obtain convergence properties for the stochastic doubly-nonconvex composite optimization problem.The convergence rate obtained here is of the same order as the existing work.We deeply analyze the convergence rate with the constant step size and mini-batch size and give the optimal convergence rate with appropriate sizes, which is superior to the existing work. Experimental results illustrate that our method is superior to existing methods. △ Less

Submitted 1 March, 2020; v1 submitted 21 May, 2018; originally announced May 2018.

Comments: There is a mistake in the proof of Proposition 3.2. related to the Euclidean projection with stochastic gradients

arXiv:1805.06144 [pdf, ps, other]

On Difference Between Two Types of $γ$-divergence for Regression

Authors: Takayuki Kawashima, Hironori Fujisawa

Abstract: The $γ$-divergence is well-known for having strong robustness against heavy contamination. By virtue of this property, many applications via the $γ$-divergence have been proposed. There are two types of \gd\ for regression problem, in which the treatments of base measure are different. In this paper, we compare them and pointed out a distinct difference between these two divergences under heteroge… ▽ More The $γ$-divergence is well-known for having strong robustness against heavy contamination. By virtue of this property, many applications via the $γ$-divergence have been proposed. There are two types of \gd\ for regression problem, in which the treatments of base measure are different. In this paper, we compare them and pointed out a distinct difference between these two divergences under heterogeneous contamination where the outlier ratio depends on the explanatory variable. One divergence has the strong robustness under heterogeneous contamination. The other does not have in general, but has when the parametric model of the response variable belongs to a location-scale family in which the scale does not depend on the explanatory variables or under homogeneous contamination where the outlier ratio does not depend on the explanatory variable. \citet{hung.etal.2017} discussed the strong robustness in a logistic regression model with an additional assumption that the tuning parameter $γ$ is sufficiently large. The results obtained in this paper hold for any parametric model without such an additional assumption. △ Less

Submitted 16 May, 2018; originally announced May 2018.

arXiv:1505.05257 [pdf, other]

Sparse and Robust Linear Regression: An Optimization Algorithm and Its Statistical Properties

Authors: Shota Katayama, Hironori Fujisawa

Abstract: This paper studies sparse linear regression analysis with outliers in the responses. A parameter vector for modeling outliers is added to the standard linear regression model and then the sparse estimation problem for both coefficients and outliers is considered. The $\ell_{1}$ penalty is imposed for the coefficients, while various penalties including redescending type penalties are for the outlie… ▽ More This paper studies sparse linear regression analysis with outliers in the responses. A parameter vector for modeling outliers is added to the standard linear regression model and then the sparse estimation problem for both coefficients and outliers is considered. The $\ell_{1}$ penalty is imposed for the coefficients, while various penalties including redescending type penalties are for the outliers. To solve the sparse estimation problem, we introduce an optimization algorithm. Under some conditions, we show the algorithmic and statistical convergence property for the coefficients obtained by the algorithm. Moreover, it is shown that the algorithm can recover the true support of the coefficients with probability going to one. △ Less

Submitted 20 May, 2015; originally announced May 2015.

Comments: 23 pages, 2 figures

MSC Class: 62J05 (Primary); 62F35 (Secondary)

arXiv:1412.1411 [pdf, other]

On the Weak Convergence and Central Limit Theorem of Blurring and Nonblurring Processes with Application to Robust Location Estimation

Authors: Ting-Li Chen, Hironori Fujisawa, Su-Yun Huang, Chii-Ruey Hwang

Abstract: This article studies the weak convergence and associated Central Limit Theorem for blurring and nonblurring processes. Then, they are applied to the estimation of location parameter. Simulation studies show that the location estimation based on the convergence point of blurring process is more robust and often more efficient than that of nonblurring process. This article studies the weak convergence and associated Central Limit Theorem for blurring and nonblurring processes. Then, they are applied to the estimation of location parameter. Simulation studies show that the location estimation based on the convergence point of blurring process is more robust and often more efficient than that of nonblurring process. △ Less

Submitted 27 January, 2015; v1 submitted 3 December, 2014; originally announced December 2014.

arXiv:1311.5301 [pdf, other]

Robust Estimation under Heavy Contamination using Enlarged Models

Authors: Takafumi Kanamori, Hironori Fujisawa

Abstract: In data analysis, contamination caused by outliers is inevitable, and robust statistical methods are strongly demanded. In this paper, our concern is to develop a new approach for robust data analysis based on scoring rules. The scoring rule is a discrepancy measure to assess the quality of probabilistic forecasts. We propose a simple way of estimating not only the parameter in the statistical mod… ▽ More In data analysis, contamination caused by outliers is inevitable, and robust statistical methods are strongly demanded. In this paper, our concern is to develop a new approach for robust data analysis based on scoring rules. The scoring rule is a discrepancy measure to assess the quality of probabilistic forecasts. We propose a simple way of estimating not only the parameter in the statistical model but also the contamination ratio of outliers. Estimating the contamination ratio is important, since one can detect outliers out of the training samples based on the estimated contamination ratio. For this purpose, we use scoring rules with an extended statistical models, that is called the enlarged models. Also, the regression problems are considered. We study a complex heterogeneous contamination, in which the contamination ratio of outliers in the dependent variable may depend on the independent variable. We propose a simple method to obtain a robust regression estimator under heterogeneous contamination. In addition, we show that our method provides also an estimator of the expected contamination ratio that is available to detect the outliers out of training samples. Numerical experiments demonstrate the effectiveness of our methods compared to the conventional estimators. △ Less

Submitted 20 November, 2013; originally announced November 2013.

Comments: 32 pages, 3 figures, 3 tables

arXiv:1305.2473 [pdf, ps, other]

Affine Invariant Divergences associated with Composite Scores and its Applications

Authors: Takafumi Kanamori, Hironori Fujisawa

Abstract: In statistical analysis, measuring a score of predictive performance is an important task. In many scientific fields, appropriate scores were tailored to tackle the problems at hand. A proper score is a popular tool to obtain statistically consistent forecasts. Furthermore, a mathematical characterization of the proper score was studied. As a result, it was revealed that the proper score correspon… ▽ More In statistical analysis, measuring a score of predictive performance is an important task. In many scientific fields, appropriate scores were tailored to tackle the problems at hand. A proper score is a popular tool to obtain statistically consistent forecasts. Furthermore, a mathematical characterization of the proper score was studied. As a result, it was revealed that the proper score corresponds to a Bregman divergence, which is an extension of the squared distance over the set of probability distributions. In the present paper, we introduce composite scores as an extension of the typical scores in order to obtain a wider class of probabilistic forecasting. Then, we propose a class of composite scores, named Holder scores, that induce equivariant estimators. The equivariant estimators have a favorable property, implying that the estimator is transformed in a consistent way, when the data is transformed. In particular, we deal with the affine transformation of the data. By using the equivariant estimators under the affine transformation, one can obtain estimators that do no essentially depend on the choice of the system of units in the measurement. Conversely, we prove that the Holder score is characterized by the invariance property under the affine transformations. Furthermore, we investigate statistical properties of the estimators using Holder scores for the statistical problems including estimation of regression functions and robust parameter estimation, and illustrate the usefulness of the newly introduced scores for statistical forecasting. △ Less

Submitted 11 May, 2013; originally announced May 2013.

Comments: 24 pages

Showing 1–12 of 12 results for author: Fujisawa, H