-
The First Principles of Deep Learning and Compression
Authors:
Max Ehrlich
Abstract:
The deep learning revolution incited by the 2012 Alexnet paper has been transformative for the field of computer vision. Many problems which were severely limited using classical solutions are now seeing unprecedented success. The rapid proliferation of deep learning methods has led to a sharp increase in their use in consumer and embedded applications. One consequence of consumer and embedded app…
▽ More
The deep learning revolution incited by the 2012 Alexnet paper has been transformative for the field of computer vision. Many problems which were severely limited using classical solutions are now seeing unprecedented success. The rapid proliferation of deep learning methods has led to a sharp increase in their use in consumer and embedded applications. One consequence of consumer and embedded applications is lossy multimedia compression which is required to engineer the efficient storage and transmission of data in these real-world scenarios. As such, there has been increased interest in a deep learning solution for multimedia compression which would allow for higher compression ratios and increased visual quality.
The deep learning approach to multimedia compression, so called Learned Multimedia Compression, involves computing a compressed representation of an image or video using a deep network for the encoder and the decoder. While these techniques have enjoyed impressive academic success, their industry adoption has been essentially non-existent. Classical compression techniques like JPEG and MPEG are too entrenched in modern computing to be easily replaced. This dissertation takes an orthogonal approach and leverages deep learning to improve the compression fidelity of these classical algorithms. This allows the incredible advances in deep learning to be used for multimedia compression without threatening the ubiquity of the classical methods.
The key insight of this work is that methods which are motivated by first principles, i.e., the underlying engineering decisions that were made when the compression algorithms were developed, are more effective than general methods. By encoding prior knowledge into the design of the algorithm, the flexibility, performance, and/or accuracy are improved at the cost of generality...
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed Video Quality Enhancement
Authors:
Max Ehrlich,
Jon Barker,
Namitha Padmanabhan,
Larry Davis,
Andrew Tao,
Bryan Catanzaro,
Abhinav Shrivastava
Abstract:
Video compression is a central feature of the modern internet powering technologies from social media to video conferencing. While video compression continues to mature, for many compression settings, quality loss is still noticeable. These settings nevertheless have important applications to the efficient transmission of videos over bandwidth constrained or otherwise unstable connections. In this…
▽ More
Video compression is a central feature of the modern internet powering technologies from social media to video conferencing. While video compression continues to mature, for many compression settings, quality loss is still noticeable. These settings nevertheless have important applications to the efficient transmission of videos over bandwidth constrained or otherwise unstable connections. In this work, we develop a deep learning architecture capable of restoring detail to compressed videos which leverages the underlying structure and motion information embedded in the video bitstream. We show that this improves restoration accuracy compared to prior compression correction methods and is competitive when compared with recent deep-learning-based video compression methods on rate-distortion while achieving higher throughput. Furthermore, we condition our model on quantization data which is readily available in the bitstream. This allows our single model to handle a variety of different compression quality settings which required an ensemble of models in prior work.
△ Less
Submitted 30 October, 2023; v1 submitted 31 January, 2022;
originally announced February 2022.
-
ReLaX: Retinal Layer Attribution for Guided Explanations of Automated Optical Coherence Tomography Classification
Authors:
Evan Wen,
Rebecca Sorenson,
Max Ehrlich
Abstract:
30 million Optical Coherence Tomography (OCT) imaging tests are issued annually to diagnose various retinal diseases, but accurate diagnosis of OCT scans requires trained eye care professionals who are still prone to making errors. With better systems for diagnosis, many cases of vision loss caused by retinal disease could be entirely avoided. In this work, we present ReLaX, a novel deep learning…
▽ More
30 million Optical Coherence Tomography (OCT) imaging tests are issued annually to diagnose various retinal diseases, but accurate diagnosis of OCT scans requires trained eye care professionals who are still prone to making errors. With better systems for diagnosis, many cases of vision loss caused by retinal disease could be entirely avoided. In this work, we present ReLaX, a novel deep learning framework for explainable, accurate classification of retinal pathologies which achieves state-of-the-art accuracy. Furthermore, we emphasize producing both qualitative and quantitative explanations of the model's decisions. While previous works use pixel-level attribution methods for generating model explanations, our work uses a novel retinal layer attribution method for producing rich qualitative and quantitative model explanations. ReLaX determines the importance of each retinal layer by combining heatmaps with an OCT segmentation model. Our work is the first to produce detailed quantitative explanations of a model's predictions in this way. The combination of accuracy and interpretability can be clinically applied for accessible, high-quality patient care.
△ Less
Submitted 1 October, 2022; v1 submitted 3 September, 2021;
originally announced September 2021.
-
Unsupervised Super-Resolution of Satellite Imagery for High Fidelity Material Label Transfer
Authors:
Arthita Ghosh,
Max Ehrlich,
Larry Davis,
Rama Chellappa
Abstract:
Urban material recognition in remote sensing imagery is a highly relevant, yet extremely challenging problem due to the difficulty of obtaining human annotations, especially on low resolution satellite images. To this end, we propose an unsupervised domain adaptation based approach using adversarial learning. We aim to harvest information from smaller quantities of high resolution data (source dom…
▽ More
Urban material recognition in remote sensing imagery is a highly relevant, yet extremely challenging problem due to the difficulty of obtaining human annotations, especially on low resolution satellite images. To this end, we propose an unsupervised domain adaptation based approach using adversarial learning. We aim to harvest information from smaller quantities of high resolution data (source domain) and utilize the same to super-resolve low resolution imagery (target domain). This can potentially aid in semantic as well as material label transfer from a richly annotated source to a target domain.
△ Less
Submitted 15 May, 2021;
originally announced May 2021.
-
Quantization Guided JPEG Artifact Correction
Authors:
Max Ehrlich,
Larry Davis,
Ser-Nam Lim,
Abhinav Shrivastava
Abstract:
The JPEG image compression algorithm is the most popular method of image compression because of its ability for large compression ratios. However, to achieve such high compression, information is lost. For aggressive quantization settings, this leads to a noticeable reduction in image quality. Artifact correction has been studied in the context of deep neural networks for some time, but the curren…
▽ More
The JPEG image compression algorithm is the most popular method of image compression because of its ability for large compression ratios. However, to achieve such high compression, information is lost. For aggressive quantization settings, this leads to a noticeable reduction in image quality. Artifact correction has been studied in the context of deep neural networks for some time, but the current state-of-the-art methods require a different model to be trained for each quality setting, greatly limiting their practical application. We solve this problem by creating a novel architecture which is parameterized by the JPEG files quantization matrix. This allows our single model to achieve state-of-the-art performance over models trained for specific quality settings.
△ Less
Submitted 16 July, 2020; v1 submitted 16 April, 2020;
originally announced April 2020.