Learned Disentangled Latent Representations for Scalable Image Coding for Humans and Machines

Ozyilkan, Ezgi; Ulhaq, Mateen; Choi, Hyomin; Racape, Fabien

Electrical Engineering and Systems Science > Image and Video Processing

arXiv:2301.04183v1 (eess)

[Submitted on 10 Jan 2023]

Title:Learned Disentangled Latent Representations for Scalable Image Coding for Humans and Machines

Authors:Ezgi Ozyilkan, Mateen Ulhaq, Hyomin Choi, Fabien Racape

View PDF

Abstract:As an increasing amount of image and video content will be analyzed by machines, there is demand for a new codec paradigm that is capable of compressing visual input primarily for the purpose of computer vision inference, while secondarily supporting input reconstruction. In this work, we propose a learned compression architecture that can be used to build such a codec. We introduce a novel variational formulation that explicitly takes feature data relevant to the desired inference task as input at the encoder side. As such, our learned scalable image codec encodes and transmits two disentangled latent representations for object detection and input reconstruction. We note that compared to relevant benchmarks, our proposed scheme yields a more compact latent representation that is specialized for the inference task. Our experiments show that our proposed system achieves a bit rate savings of 40.6% on the primary object detection task compared to the current state-of-the-art, albeit with some degradation in performance for the secondary input reconstruction task.

Comments:	accepted as a paper for DCC 2023
Subjects:	Image and Video Processing (eess.IV)
Cite as:	arXiv:2301.04183 [eess.IV]
	(or arXiv:2301.04183v1 [eess.IV] for this version)
	https://doi.org/10.48550/arXiv.2301.04183

Submission history

From: Ezgi Ozyilkan [view email]
[v1] Tue, 10 Jan 2023 19:29:41 UTC (3,531 KB)

Electrical Engineering and Systems Science > Image and Video Processing

Title:Learned Disentangled Latent Representations for Scalable Image Coding for Humans and Machines

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Image and Video Processing

Title:Learned Disentangled Latent Representations for Scalable Image Coding for Humans and Machines

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators