Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance

Kaushik, Chiraag; Liu, Ran; Lin, Chi-Heng; Khera, Amrit; Jin, Matthew Y; Ma, Wenrui; Muthukumar, Vidya; Dyer, Eva L

Computer Science > Machine Learning

arXiv:2402.11742 (cs)

[Submitted on 18 Feb 2024 (v1), last revised 3 Jun 2024 (this version, v2)]

Title:Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance

Authors:Chiraag Kaushik, Ran Liu, Chi-Heng Lin, Amrit Khera, Matthew Y Jin, Wenrui Ma, Vidya Muthukumar, Eva L Dyer

View PDF HTML (experimental)

Abstract:Classification models are expected to perform equally well for different classes, yet in practice, there are often large gaps in their performance. This issue of class bias is widely studied in cases of datasets with sample imbalance, but is relatively overlooked in balanced datasets. In this work, we introduce the concept of spectral imbalance in features as a potential source for class disparities and study the connections between spectral imbalance and class bias in both theory and practice. To build the connection between spectral imbalance and class gap, we develop a theoretical framework for studying class disparities and derive exact expressions for the per-class error in a high-dimensional mixture model setting. We then study this phenomenon in 11 different state-of-the-art pretrained encoders and show how our proposed framework can be used to compare the quality of encoders, as well as evaluate and combine data augmentation strategies to mitigate the issue. Our work sheds light on the class-dependent effects of learning, and provides new insights into how state-of-the-art pretrained features may have unknown biases that can be diagnosed through their spectra.

Comments:	25 pages, 9 figures
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2402.11742 [cs.LG]
	(or arXiv:2402.11742v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2402.11742

Submission history

From: Ran Liu [view email]
[v1] Sun, 18 Feb 2024 23:59:54 UTC (23,246 KB)
[v2] Mon, 3 Jun 2024 14:09:10 UTC (23,440 KB)

Computer Science > Machine Learning

Title:Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators