Search | arXiv e-print repository

The First Principles of Deep Learning and Compression

Abstract: The deep learning revolution incited by the 2012 Alexnet paper has been transformative for the field of computer vision. Many problems which were severely limited using classical solutions are now seeing unprecedented success. The rapid proliferation of deep learning methods has led to a sharp increase in their use in consumer and embedded applications. One consequence of consumer and embedded app… ▽ More The deep learning revolution incited by the 2012 Alexnet paper has been transformative for the field of computer vision. Many problems which were severely limited using classical solutions are now seeing unprecedented success. The rapid proliferation of deep learning methods has led to a sharp increase in their use in consumer and embedded applications. One consequence of consumer and embedded applications is lossy multimedia compression which is required to engineer the efficient storage and transmission of data in these real-world scenarios. As such, there has been increased interest in a deep learning solution for multimedia compression which would allow for higher compression ratios and increased visual quality. The deep learning approach to multimedia compression, so called Learned Multimedia Compression, involves computing a compressed representation of an image or video using a deep network for the encoder and the decoder. While these techniques have enjoyed impressive academic success, their industry adoption has been essentially non-existent. Classical compression techniques like JPEG and MPEG are too entrenched in modern computing to be easily replaced. This dissertation takes an orthogonal approach and leverages deep learning to improve the compression fidelity of these classical algorithms. This allows the incredible advances in deep learning to be used for multimedia compression without threatening the ubiquity of the classical methods. The key insight of this work is that methods which are motivated by first principles, i.e., the underlying engineering decisions that were made when the compression algorithms were developed, are more effective than general methods. By encoding prior knowledge into the design of the algorithm, the flexibility, performance, and/or accuracy are improved at the cost of generality... △ Less

Submitted 4 April, 2022; originally announced April 2022.

Comments: Doctoral Dissertation, more information at https://maxehr.umiacs.io/dissertation

arXiv:2202.00011 [pdf, other]

Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed Video Quality Enhancement

Authors: Max Ehrlich, Jon Barker, Namitha Padmanabhan, Larry Davis, Andrew Tao, Bryan Catanzaro, Abhinav Shrivastava

Abstract: Video compression is a central feature of the modern internet powering technologies from social media to video conferencing. While video compression continues to mature, for many compression settings, quality loss is still noticeable. These settings nevertheless have important applications to the efficient transmission of videos over bandwidth constrained or otherwise unstable connections. In this… ▽ More Video compression is a central feature of the modern internet powering technologies from social media to video conferencing. While video compression continues to mature, for many compression settings, quality loss is still noticeable. These settings nevertheless have important applications to the efficient transmission of videos over bandwidth constrained or otherwise unstable connections. In this work, we develop a deep learning architecture capable of restoring detail to compressed videos which leverages the underlying structure and motion information embedded in the video bitstream. We show that this improves restoration accuracy compared to prior compression correction methods and is competitive when compared with recent deep-learning-based video compression methods on rate-distortion while achieving higher throughput. Furthermore, we condition our model on quantization data which is readily available in the bitstream. This allows our single model to handle a variety of different compression quality settings which required an ensemble of models in prior work. △ Less

Submitted 30 October, 2023; v1 submitted 31 January, 2022; originally announced February 2022.

Comments: WACV 2024

arXiv:2109.02436 [pdf, other]

ReLaX: Retinal Layer Attribution for Guided Explanations of Automated Optical Coherence Tomography Classification

Authors: Evan Wen, Rebecca Sorenson, Max Ehrlich

Abstract: 30 million Optical Coherence Tomography (OCT) imaging tests are issued annually to diagnose various retinal diseases, but accurate diagnosis of OCT scans requires trained eye care professionals who are still prone to making errors. With better systems for diagnosis, many cases of vision loss caused by retinal disease could be entirely avoided. In this work, we present ReLaX, a novel deep learning… ▽ More 30 million Optical Coherence Tomography (OCT) imaging tests are issued annually to diagnose various retinal diseases, but accurate diagnosis of OCT scans requires trained eye care professionals who are still prone to making errors. With better systems for diagnosis, many cases of vision loss caused by retinal disease could be entirely avoided. In this work, we present ReLaX, a novel deep learning framework for explainable, accurate classification of retinal pathologies which achieves state-of-the-art accuracy. Furthermore, we emphasize producing both qualitative and quantitative explanations of the model's decisions. While previous works use pixel-level attribution methods for generating model explanations, our work uses a novel retinal layer attribution method for producing rich qualitative and quantitative model explanations. ReLaX determines the importance of each retinal layer by combining heatmaps with an OCT segmentation model. Our work is the first to produce detailed quantitative explanations of a model's predictions in this way. The combination of accuracy and interpretability can be clinically applied for accessible, high-quality patient care. △ Less

Submitted 1 October, 2022; v1 submitted 3 September, 2021; originally announced September 2021.

Comments: ECCV 2022 Medical Computer Vision Workshop

arXiv:2105.07322 [pdf, other]

doi 10.1109/IGARSS.2019.8900639

Unsupervised Super-Resolution of Satellite Imagery for High Fidelity Material Label Transfer

Authors: Arthita Ghosh, Max Ehrlich, Larry Davis, Rama Chellappa

Abstract: Urban material recognition in remote sensing imagery is a highly relevant, yet extremely challenging problem due to the difficulty of obtaining human annotations, especially on low resolution satellite images. To this end, we propose an unsupervised domain adaptation based approach using adversarial learning. We aim to harvest information from smaller quantities of high resolution data (source dom… ▽ More Urban material recognition in remote sensing imagery is a highly relevant, yet extremely challenging problem due to the difficulty of obtaining human annotations, especially on low resolution satellite images. To this end, we propose an unsupervised domain adaptation based approach using adversarial learning. We aim to harvest information from smaller quantities of high resolution data (source domain) and utilize the same to super-resolve low resolution imagery (target domain). This can potentially aid in semantic as well as material label transfer from a richly annotated source to a target domain. △ Less

Submitted 15 May, 2021; originally announced May 2021.

Comments: Published in the proceedings of the 2019 IEEE International Geoscience and Remote Sensing Symposium

Journal ref: IGARSS (2019), 5144-5147

arXiv:2004.09320 [pdf, other]

Quantization Guided JPEG Artifact Correction

Authors: Max Ehrlich, Larry Davis, Ser-Nam Lim, Abhinav Shrivastava

Abstract: The JPEG image compression algorithm is the most popular method of image compression because of its ability for large compression ratios. However, to achieve such high compression, information is lost. For aggressive quantization settings, this leads to a noticeable reduction in image quality. Artifact correction has been studied in the context of deep neural networks for some time, but the curren… ▽ More The JPEG image compression algorithm is the most popular method of image compression because of its ability for large compression ratios. However, to achieve such high compression, information is lost. For aggressive quantization settings, this leads to a noticeable reduction in image quality. Artifact correction has been studied in the context of deep neural networks for some time, but the current state-of-the-art methods require a different model to be trained for each quality setting, greatly limiting their practical application. We solve this problem by creating a novel architecture which is parameterized by the JPEG files quantization matrix. This allows our single model to achieve state-of-the-art performance over models trained for specific quality settings. △ Less

Submitted 16 July, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

Comments: Published in the proceedings of ECCV 2020, please see our released code and models at https://gitlab.com/Queuecumber/quantization-guided-ac

Showing 1–5 of 5 results for author: Ehrlich, M