-
Saliency Driven Perceptual Image Compression
Authors:
Yash Patel,
Srikar Appalaraju,
R. Manmatha
Abstract:
This paper proposes a new end-to-end trainable model for lossy image compression, which includes several novel components. The method incorporates 1) an adequate perceptual similarity metric; 2) saliency in the images; 3) a hierarchical auto-regressive model. This paper demonstrates that the popularly used evaluations metrics such as MS-SSIM and PSNR are inadequate for judging the performance of i…
▽ More
This paper proposes a new end-to-end trainable model for lossy image compression, which includes several novel components. The method incorporates 1) an adequate perceptual similarity metric; 2) saliency in the images; 3) a hierarchical auto-regressive model. This paper demonstrates that the popularly used evaluations metrics such as MS-SSIM and PSNR are inadequate for judging the performance of image compression techniques as they do not align with the human perception of similarity. Alternatively, a new metric is proposed, which is learned on perceptual similarity data specific to image compression. The proposed compression model incorporates the salient regions and optimizes on the proposed perceptual similarity metric. The model not only generates images which are visually better but also gives superior performance for subsequent computer vision tasks such as object detection and segmentation when compared to existing engineered or learned compression techniques.
△ Less
Submitted 8 November, 2020; v1 submitted 12 February, 2020;
originally announced February 2020.
-
Human Perceptual Evaluations for Image Compression
Authors:
Yash Patel,
Srikar Appalaraju,
R. Manmatha
Abstract:
Recently, there has been much interest in deep learning techniques to do image compression and there have been claims that several of these produce better results than engineered compression schemes (such as JPEG, JPEG2000 or BPG). A standard way of comparing image compression schemes today is to use perceptual similarity metrics such as PSNR or MS-SSIM (multi-scale structural similarity). This ha…
▽ More
Recently, there has been much interest in deep learning techniques to do image compression and there have been claims that several of these produce better results than engineered compression schemes (such as JPEG, JPEG2000 or BPG). A standard way of comparing image compression schemes today is to use perceptual similarity metrics such as PSNR or MS-SSIM (multi-scale structural similarity). This has led to some deep learning techniques which directly optimize for MS-SSIM by choosing it as a loss function. While this leads to a higher MS-SSIM for such techniques, we demonstrate using user studies that the resulting improvement may be misleading. Deep learning techniques for image compression with a higher MS-SSIM may actually be perceptually worse than engineered compression schemes with a lower MS-SSIM.
△ Less
Submitted 9 August, 2019;
originally announced August 2019.
-
Deep Perceptual Compression
Authors:
Yash Patel,
Srikar Appalaraju,
R. Manmatha
Abstract:
Several deep learned lossy compression techniques have been proposed in the recent literature. Most of these are optimized by using either MS-SSIM (multi-scale structural similarity) or MSE (mean squared error) as a loss function. Unfortunately, neither of these correlate well with human perception and this is clearly visible from the resulting compressed images. In several cases, the MS-SSIM for…
▽ More
Several deep learned lossy compression techniques have been proposed in the recent literature. Most of these are optimized by using either MS-SSIM (multi-scale structural similarity) or MSE (mean squared error) as a loss function. Unfortunately, neither of these correlate well with human perception and this is clearly visible from the resulting compressed images. In several cases, the MS-SSIM for deep learned techniques is higher than say a conventional, non-deep learned codec such as JPEG-2000 or BPG. However, the images produced by these deep learned techniques are in many cases clearly worse to human eyes than those produced by JPEG-2000 or BPG.
We propose the use of an alternative, deep perceptual metric, which has been shown to align better with human perceptual similarity. We then propose Deep Perceptual Compression (DPC) which makes use of an encoder-decoder based image compression model to jointly optimize on the deep perceptual metric and MS-SSIM. Via extensive human evaluations, we show that the proposed method generates visually better results than previous learning based compression methods and JPEG-2000, and is comparable to BPG. Furthermore, we demonstrate that for tasks like object-detection, images compressed with DPC give better accuracy.
△ Less
Submitted 31 July, 2019; v1 submitted 18 July, 2019;
originally announced July 2019.
-
Searching for Apparel Products from Images in the Wild
Authors:
Son Tran,
Ming Du,
Sampath Chanda,
R. Manmatha,
Cj Taylor
Abstract:
In this age of social media, people often look at what others are wearing. In particular, Instagram and Twitter influencers often provide images of themselves wearing different outfits and their followers are often inspired to buy similar clothes.We propose a system to automatically find the closest visually similar clothes in the online Catalog (street-to-shop searching). The problem is challengi…
▽ More
In this age of social media, people often look at what others are wearing. In particular, Instagram and Twitter influencers often provide images of themselves wearing different outfits and their followers are often inspired to buy similar clothes.We propose a system to automatically find the closest visually similar clothes in the online Catalog (street-to-shop searching). The problem is challenging since the original images are taken under different pose and lighting conditions. The system initially localizes high-level descriptive regions (top, bottom, wristwear. . . ) using multiple CNN detectors such as YOLO and SSD that are trained specifically for apparel domain. It then classifies these regions into more specific regions such as t-shirts, tunic or dresses. Finally, a feature embedding learned using a multi-task function is recovered for every item and then compared with corresponding items in the online Catalog database and ranked according to distance. We validate our approach component-wise using benchmark datasets and end-to-end using human evaluation.
△ Less
Submitted 7 April, 2022; v1 submitted 4 July, 2019;
originally announced July 2019.