Deep Discrete Encoders: Identifiable Deep Generative Models for Rich Data with Discrete Latent Layers

Lee, Seunghyun; Gu, Yuqi

Abstract:In the era of generative AI, deep generative models (DGMs) with latent representations have gained tremendous popularity. Despite their impressive empirical performance, the statistical properties of these models remain underexplored. DGMs are often overparametrized, non-identifiable, and uninterpretable black boxes, raising serious concerns when deploying them in high-stakes applications. Motivated by this, we propose interpretable deep generative models for rich data types with discrete latent layers, called Deep Discrete Encoders (DDEs). A DDE is a directed graphical model with multiple binary latent layers. Theoretically, we propose transparent identifiability conditions for DDEs, which imply progressively smaller sizes of the latent layers as they go deeper. Identifiability ensures consistent parameter estimation and inspires an interpretable design of the deep architecture. Computationally, we propose a scalable estimation pipeline of a layerwise nonlinear spectral initialization followed by a penalized stochastic approximation EM algorithm. This procedure can efficiently estimate models with exponentially many latent components. Extensive simulation studies for high-dimensional data and deep architectures validate our theoretical results and demonstrate the excellent performance of our algorithms. We apply DDEs to three diverse real datasets with different data types to perform hierarchical topic modeling, image representation learning, and response time modeling in educational testing.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
Cite as:	arXiv:2501.01414 [stat.ML]
	(or arXiv:2501.01414v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2501.01414

Statistics > Machine Learning

Title:Deep Discrete Encoders: Identifiable Deep Generative Models for Rich Data with Discrete Latent Layers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators