Semi-supervised linear regression: enhancing efficiency and robustness in high dimensions

Chen, Kai; Zhang, Yuqian

doi:10.1093/biomtc/ujaf113

Statistics > Methodology

arXiv:2311.17685 (stat)

[Submitted on 29 Nov 2023 (v1), last revised 2 Sep 2025 (this version, v2)]

Title:Semi-supervised linear regression: enhancing efficiency and robustness in high dimensions

Authors:Kai Chen, Yuqian Zhang

View PDF HTML (experimental)

Abstract:In semi-supervised learning, the prevailing understanding suggests that observing additional unlabeled samples improves estimation accuracy for linear parameters only in the case of model misspecification. In this work, we challenge such a claim and show that additional unlabeled samples are beneficial in high-dimensional settings. Initially focusing on a dense scenario, we introduce robust semi-supervised estimators for the regression coefficient without relying on sparse structures in the population slope. Even when the true underlying model is linear, we show that leveraging information from large-scale unlabeled data helps reduce estimation bias, thereby improving both estimation accuracy and inference robustness. Moreover, we propose semi-supervised methods with further enhanced efficiency in scenarios with a sparse linear slope. The performance of the proposed methods is demonstrated through extensive numerical studies.

Subjects:	Methodology (stat.ME)
Cite as:	arXiv:2311.17685 [stat.ME]
	(or arXiv:2311.17685v2 [stat.ME] for this version)
	https://doi.org/10.48550/arXiv.2311.17685
Journal reference:	Biometrics, Volume 81, Issue 3, September 2025, ujaf113
Related DOI:	https://doi.org/10.1093/biomtc/ujaf113

Submission history

From: Kai Chen [view email]
[v1] Wed, 29 Nov 2023 14:47:16 UTC (84 KB)
[v2] Tue, 2 Sep 2025 04:17:03 UTC (701 KB)

Statistics > Methodology

Title:Semi-supervised linear regression: enhancing efficiency and robustness in high dimensions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Methodology

Title:Semi-supervised linear regression: enhancing efficiency and robustness in high dimensions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators