dreaMLearning: Data Compression Assisted Machine Learning

Zhao, Xiaobo; Hurst, Aaron; Karras, Panagiotis; Lucani, Daniel E.

Computer Science > Machine Learning

arXiv:2506.22190 (cs)

[Submitted on 27 Jun 2025]

Title:dreaMLearning: Data Compression Assisted Machine Learning

Authors:Xiaobo Zhao, Aaron Hurst, Panagiotis Karras, Daniel E. Lucani

View PDF HTML (experimental)

Abstract:Despite rapid advancements, machine learning, particularly deep learning, is hindered by the need for large amounts of labeled data to learn meaningful patterns without overfitting and immense demands for computation and storage, which motivate research into architectures that can achieve good performance with fewer resources. This paper introduces dreaMLearning, a novel framework that enables learning from compressed data without decompression, built upon Entropy-based Generalized Deduplication (EntroGeDe), an entropy-driven lossless compression method that consolidates information into a compact set of representative samples. DreaMLearning accommodates a wide range of data types, tasks, and model architectures. Extensive experiments on regression and classification tasks with tabular and image data demonstrate that dreaMLearning accelerates training by up to 8.8x, reduces memory usage by 10x, and cuts storage by 42%, with a minimal impact on model performance. These advancements enhance diverse ML applications, including distributed and federated learning, and tinyML on resource-constrained edge devices, unlocking new possibilities for efficient and scalable learning.

Comments:	18 pages, 11 figures
Subjects:	Machine Learning (cs.LG); Information Theory (cs.IT); Signal Processing (eess.SP)
Cite as:	arXiv:2506.22190 [cs.LG]
	(or arXiv:2506.22190v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2506.22190

Submission history

From: Xiaobo Zhao [view email]
[v1] Fri, 27 Jun 2025 12:57:22 UTC (284 KB)

Computer Science > Machine Learning

Title:dreaMLearning: Data Compression Assisted Machine Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:dreaMLearning: Data Compression Assisted Machine Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators