Compression, Generalization and Learning

Campi, Marco C.; Garatti, Simone

Computer Science > Machine Learning

arXiv:2301.12767 (cs)

[Submitted on 30 Jan 2023 (v1), last revised 8 Jan 2024 (this version, v2)]

Title:Compression, Generalization and Learning

Authors:Marco C. Campi, Simone Garatti

View PDF

Abstract:A compression function is a map that slims down an observational set into a subset of reduced size, while preserving its informational content. In multiple applications, the condition that one new observation makes the compressed set change is interpreted that this observation brings in extra information and, in learning theory, this corresponds to misclassification, or misprediction. In this paper, we lay the foundations of a new theory that allows one to keep control on the probability of change of compression (which maps into the statistical "risk" in learning applications). Under suitable conditions, the cardinality of the compressed set is shown to be a consistent estimator of the probability of change of compression (without any upper limit on the size of the compressed set); moreover, unprecedentedly tight finite-sample bounds to evaluate the probability of change of compression are obtained under a generally applicable condition of preference. All results are usable in a fully agnostic setup, i.e., without requiring any a priori knowledge on the probability distribution of the observations. Not only these results offer a valid support to develop trust in observation-driven methodologies, they also play a fundamental role in learning techniques as a tool for hyper-parameter tuning.

Comments:	this https URL
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Statistics Theory (math.ST); Machine Learning (stat.ML)
Cite as:	arXiv:2301.12767 [cs.LG]
	(or arXiv:2301.12767v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2301.12767
Journal reference:	Journal of Machine Learning Research, 24(339):1-74, 2023

Submission history

From: Simone Garatti [view email]
[v1] Mon, 30 Jan 2023 10:27:45 UTC (500 KB)
[v2] Mon, 8 Jan 2024 11:20:43 UTC (1,129 KB)

Computer Science > Machine Learning

Title:Compression, Generalization and Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Compression, Generalization and Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators