Search | arXiv e-print repository

ELF-VC: Efficient Learned Flexible-Rate Video Coding

Authors: Oren Rippel, Alexander G. Anderson, Kedar Tatwawadi, Sanjay Nair, Craig Lytle, Lubomir Bourdev

Abstract: While learned video codecs have demonstrated great promise, they have yet to achieve sufficient efficiency for practical deployment. In this work, we propose several novel ideas for learned video compression which allow for improved performance for the low-latency mode (I- and P-frames only) along with a considerable increase in computational efficiency. In this setting, for natural videos our app… ▽ More While learned video codecs have demonstrated great promise, they have yet to achieve sufficient efficiency for practical deployment. In this work, we propose several novel ideas for learned video compression which allow for improved performance for the low-latency mode (I- and P-frames only) along with a considerable increase in computational efficiency. In this setting, for natural videos our approach compares favorably across the entire R-D curve under metrics PSNR, MS-SSIM and VMAF against all mainstream video standards (H.264, H.265, AV1) and all ML codecs. At the same time, our approach runs at least 5x faster and has fewer parameters than all ML codecs which report these figures. Our contributions include a flexible-rate framework allowing a single model to cover a large and dense range of bitrates, at a negligible increase in computation and parameter count; an efficient backbone optimized for ML-based codecs; and a novel in-loop flow prediction scheme which leverages prior information towards more efficient compression. We benchmark our method, which we call ELF-VC (Efficient, Learned and Flexible Video Coding) on popular video test sets UVG and MCL-JCV under metrics PSNR, MS-SSIM and VMAF. For example, on UVG under PSNR, it reduces the BD-rate by 44% against H.264, 26% against H.265, 15% against AV1, and 35% against the current best ML codec. At the same time, on an NVIDIA Titan V GPU our approach encodes/decodes VGA at 49/91 FPS, HD 720 at 19/35 FPS, and HD 1080 at 10/18 FPS. △ Less

Submitted 29 April, 2021; originally announced April 2021.

Journal ref: International Conference on Computer Vision, 2021

arXiv:1911.03572 [pdf, other]

DZip: improved general-purpose lossless compression based on novel neural network modeling

Authors: Mohit Goyal, Kedar Tatwawadi, Shubham Chandak, Idoia Ochoa

Abstract: We consider lossless compression based on statistical data modeling followed by prediction-based encoding, where an accurate statistical model for the input data leads to substantial improvements in compression. We propose DZip, a general-purpose compressor for sequential data that exploits the well-known modeling capabilities of neural networks (NNs) for prediction, followed by arithmetic coding.… ▽ More We consider lossless compression based on statistical data modeling followed by prediction-based encoding, where an accurate statistical model for the input data leads to substantial improvements in compression. We propose DZip, a general-purpose compressor for sequential data that exploits the well-known modeling capabilities of neural networks (NNs) for prediction, followed by arithmetic coding. Dzip uses a novel hybrid architecture based on adaptive and semi-adaptive training. Unlike most NN based compressors, DZip does not require additional training data and is not restricted to specific data types, only needing the alphabet size of the input data. The proposed compressor outperforms general-purpose compressors such as Gzip (on average 26% reduction) on a variety of real datasets, achieves near-optimal compression on synthetic datasets, and performs close to specialized compressors for large sequence lengths, without any human input. The main limitation of DZip in its current implementation is the encoding/decoding time, which limits its practicality. Nevertheless, the results showcase the potential of developing improved general-purpose compressors based on neural networks and hybrid modeling. △ Less

Submitted 18 September, 2020; v1 submitted 8 November, 2019; originally announced November 2019.

Comments: Updated manuscript and an efficient implementation added

arXiv:1911.00208 [pdf, other]

LFZip: Lossy compression of multivariate floating-point time series data via improved prediction

Authors: Shubham Chandak, Kedar Tatwawadi, Chengtao Wen, Lingyun Wang, Juan Aparicio, Tsachy Weissman

Abstract: Time series data compression is emerging as an important problem with the growth in IoT devices and sensors. Due to the presence of noise in these datasets, lossy compression can often provide significant compression gains without impacting the performance of downstream applications. In this work, we propose an error-bounded lossy compressor, LFZip, for multivariate floating-point time series data… ▽ More Time series data compression is emerging as an important problem with the growth in IoT devices and sensors. Due to the presence of noise in these datasets, lossy compression can often provide significant compression gains without impacting the performance of downstream applications. In this work, we propose an error-bounded lossy compressor, LFZip, for multivariate floating-point time series data that provides guaranteed reconstruction up to user-specified maximum absolute error. The compressor is based on the prediction-quantization-entropy coder framework and benefits from improved prediction using linear models and neural networks. We evaluate the compressor on several time series datasets where it outperforms the existing state-of-the-art error-bounded lossy compressors. The code and data are available at https://github.com/shubhamchandak94/LFZip △ Less

Submitted 13 January, 2020; v1 submitted 1 November, 2019; originally announced November 2019.

arXiv:1906.07887 [pdf, ps, other]

Tutorial on algebraic deletion correction codes

Authors: Kedar Tatwawadi, Shubham Chandak

Abstract: The deletion channel is known to be a notoriously diffcult channel to design error-correction codes for. In spite of this difficulty, there are some beautiful code constructions which give some intuition about the channel and about what good deletion codes look like. In this tutorial we will take a look at some of them. This document is a transcript of my talk at the coding theory reading group on… ▽ More The deletion channel is known to be a notoriously diffcult channel to design error-correction codes for. In spite of this difficulty, there are some beautiful code constructions which give some intuition about the channel and about what good deletion codes look like. In this tutorial we will take a look at some of them. This document is a transcript of my talk at the coding theory reading group on some interesting works on deletion channel. It is not intended to be an exhaustive survey of works on deletion channel, but more as a tutorial to some of the important and cute ideas in this area. For a comprehensive survey, we refer the reader to the cited sources and surveys. We also provide an implementation of VT codes that correct single insertion/deletion errors for general alphabets at https://github.com/shubhamchandak94/VT_codes/. △ Less

Submitted 18 June, 2019; originally announced June 2019.

arXiv:1904.03271 [pdf, ps, other]

doi 10.1109/TIT.2021.3119976

Optimal Communication Rates and Combinatorial Properties for Common Randomness Generation

Authors: Yanjun Han, Kedar Tatwawadi, Gowtham R. Kurri, Zhengqing Zhou, Vinod M. Prabhakaran, Tsachy Weissman

Abstract: We study common randomness generation problems where $n$ players aim to generate same sequences of random coin flips where some subsets of the players share an independent common coin which can be tossed multiple times, and there is a publicly seen blackboard through which the players communicate with each other. We provide a tight representation of the optimal communication rates via linear progr… ▽ More We study common randomness generation problems where $n$ players aim to generate same sequences of random coin flips where some subsets of the players share an independent common coin which can be tossed multiple times, and there is a publicly seen blackboard through which the players communicate with each other. We provide a tight representation of the optimal communication rates via linear programming, and more importantly, propose explicit algorithms for the optimal distributed simulation for a wide class of hypergraphs. In particular, the optimal communication rate in complete hypergraphs is still achievable in sparser hypergraphs containing a path-connected cycle-free cluster of topologically connected components. Some key steps in analyzing the upper bounds rely on two different definitions of connectivity in hypergraphs, which may be of independent interest. △ Less

Submitted 6 October, 2021; v1 submitted 5 April, 2019; originally announced April 2019.

Comments: 17 pages, 10 figures

arXiv:1811.08162 [pdf, other]

DeepZip: Lossless Data Compression using Recurrent Neural Networks

Authors: Mohit Goyal, Kedar Tatwawadi, Shubham Chandak, Idoia Ochoa

Abstract: Sequential data is being generated at an unprecedented pace in various forms, including text and genomic data. This creates the need for efficient compression mechanisms to enable better storage, transmission and processing of such data. To solve this problem, many of the existing compressors attempt to learn models for the data and perform prediction-based compression. Since neural networks are k… ▽ More Sequential data is being generated at an unprecedented pace in various forms, including text and genomic data. This creates the need for efficient compression mechanisms to enable better storage, transmission and processing of such data. To solve this problem, many of the existing compressors attempt to learn models for the data and perform prediction-based compression. Since neural networks are known as universal function approximators with the capability to learn arbitrarily complex mappings, and in practice show excellent performance in prediction tasks, we explore and devise methods to compress sequential data using neural network predictors. We combine recurrent neural network predictors with an arithmetic coder and losslessly compress a variety of synthetic, text and genomic datasets. The proposed compressor outperforms Gzip on the real datasets and achieves near-optimal compression for the synthetic datasets. The results also help understand why and where neural networks are good alternatives for traditional finite context models △ Less

Submitted 20 November, 2018; originally announced November 2018.

arXiv:1811.07557 [pdf, other]

Neural Joint Source-Channel Coding

Authors: Kristy Choi, Kedar Tatwawadi, Aditya Grover, Tsachy Weissman, Stefano Ermon

Abstract: For reliable transmission across a noisy communication channel, classical results from information theory show that it is asymptotically optimal to separate out the source and channel coding processes. However, this decomposition can fall short in the finite bit-length regime, as it requires non-trivial tuning of hand-crafted codes and assumes infinite computational power for decoding. In this wor… ▽ More For reliable transmission across a noisy communication channel, classical results from information theory show that it is asymptotically optimal to separate out the source and channel coding processes. However, this decomposition can fall short in the finite bit-length regime, as it requires non-trivial tuning of hand-crafted codes and assumes infinite computational power for decoding. In this work, we propose to jointly learn the encoding and decoding processes using a new discrete variational autoencoder model. By adding noise into the latent codes to simulate the channel during training, we learn to both compress and error-correct given a fixed bit-length and computational budget. We obtain codes that are not only competitive against several separation schemes, but also learn useful robust representations of the data for downstream tasks such as classification. Finally, inference amortization yields an extremely fast neural decoder, almost an order of magnitude faster compared to standard decoding methods based on iterative belief propagation. △ Less

Submitted 14 May, 2019; v1 submitted 19 November, 2018; originally announced November 2018.

arXiv:1810.11137 [pdf, other]

Towards improved lossy image compression: Human image reconstruction with public-domain images

Authors: Ashutosh Bhown, Soham Mukherjee, Sean Yang, Shubham Chandak, Irena Fischer-Hwang, Kedar Tatwawadi, Judith Fan, Tsachy Weissman

Abstract: Lossy image compression has been studied extensively in the context of typical loss functions such as RMSE, MS-SSIM, etc. However, compression at low bitrates generally produces unsatisfying results. Furthermore, the availability of massive public image datasets appears to have hardly been exploited in image compression. Here, we present a paradigm for eliciting human image reconstruction in order… ▽ More Lossy image compression has been studied extensively in the context of typical loss functions such as RMSE, MS-SSIM, etc. However, compression at low bitrates generally produces unsatisfying results. Furthermore, the availability of massive public image datasets appears to have hardly been exploited in image compression. Here, we present a paradigm for eliciting human image reconstruction in order to perform lossy image compression. In this paradigm, one human describes images to a second human, whose task is to reconstruct the target image using publicly available images and text instructions. The resulting reconstructions are then evaluated by human raters on the Amazon Mechanical Turk platform and compared to reconstructions obtained using state-of-the-art compressor WebP. Our results suggest that prioritizing semantic visual elements may be key to achieving significant improvements in image compression, and that our paradigm can be used to develop a more human-centric loss function. The images, results and additional data are available at https://compression.stanford.edu/human-compression △ Less

Submitted 24 June, 2019; v1 submitted 25 October, 2018; originally announced October 2018.

arXiv:1809.05054 [pdf, other]

IncSQL: Training Incremental Text-to-SQL Parsers with Non-Deterministic Oracles

Authors: Tianze Shi, Kedar Tatwawadi, Kaushik Chakrabarti, Yi Mao, Oleksandr Polozov, Weizhu Chen

Abstract: We present a sequence-to-action parsing approach for the natural language to SQL task that incrementally fills the slots of a SQL query with feasible actions from a pre-defined inventory. To account for the fact that typically there are multiple correct SQL queries with the same or very similar semantics, we draw inspiration from syntactic parsing techniques and propose to train our sequence-to-ac… ▽ More We present a sequence-to-action parsing approach for the natural language to SQL task that incrementally fills the slots of a SQL query with feasible actions from a pre-defined inventory. To account for the fact that typically there are multiple correct SQL queries with the same or very similar semantics, we draw inspiration from syntactic parsing techniques and propose to train our sequence-to-action models with non-deterministic oracles. We evaluate our models on the WikiSQL dataset and achieve an execution accuracy of 83.7% on the test set, a 2.1% absolute improvement over the models trained with traditional static oracles assuming a single correct target SQL query. When further combined with the execution-guided decoding strategy, our model sets a new state-of-the-art performance at an execution accuracy of 87.1%. △ Less

Submitted 1 October, 2018; v1 submitted 13 September, 2018; originally announced September 2018.

arXiv:1807.03100 [pdf, other]

Robust Text-to-SQL Generation with Execution-Guided Decoding

Authors: Chenglong Wang, Kedar Tatwawadi, Marc Brockschmidt, Po-Sen Huang, Yi Mao, Oleksandr Polozov, Rishabh Singh

Abstract: We consider the problem of neural semantic parsing, which translates natural language questions into executable SQL queries. We introduce a new mechanism, execution guidance, to leverage the semantics of SQL. It detects and excludes faulty programs during the decoding procedure by conditioning on the execution of partially generated program. The mechanism can be used with any autoregressive genera… ▽ More We consider the problem of neural semantic parsing, which translates natural language questions into executable SQL queries. We introduce a new mechanism, execution guidance, to leverage the semantics of SQL. It detects and excludes faulty programs during the decoding procedure by conditioning on the execution of partially generated program. The mechanism can be used with any autoregressive generative model, which we demonstrate on four state-of-the-art recurrent or template-based semantic parsing models. We demonstrate that execution guidance universally improves model performance on various text-to-SQL datasets with different scales and query complexity: WikiSQL, ATIS, and GeoQuery. As a result, we achieve new state-of-the-art execution accuracy of 83.8% on WikiSQL. △ Less

Submitted 12 September, 2018; v1 submitted 9 July, 2018; originally announced July 2018.

arXiv:1805.01355 [pdf, ps, other]

Minimax redundancy for Markov chains with large state space

Authors: Kedar Shriram Tatwawadi, Jiantao Jiao, Tsachy Weissman

Abstract: For any Markov source, there exist universal codes whose normalized codelength approaches the Shannon limit asymptotically as the number of samples goes to infinity. This paper investigates how fast the gap between the normalized codelength of the "best" universal compressor and the Shannon limit (i.e. the compression redundancy) vanishes non-asymptotically in terms of the alphabet size and mixing… ▽ More For any Markov source, there exist universal codes whose normalized codelength approaches the Shannon limit asymptotically as the number of samples goes to infinity. This paper investigates how fast the gap between the normalized codelength of the "best" universal compressor and the Shannon limit (i.e. the compression redundancy) vanishes non-asymptotically in terms of the alphabet size and mixing time of the Markov source. We show that, for Markov sources whose relaxation time is at least $1 + \frac{(2+c)}{\sqrt{k}}$, where $k$ is the state space size (and $c>0$ is a constant), the phase transition for the number of samples required to achieve vanishing compression redundancy is precisely $Θ(k^2)$. △ Less

Submitted 5 May, 2018; v1 submitted 1 May, 2018; originally announced May 2018.

Comments: 22 pages, 1 figure

Showing 1–11 of 11 results for author: Tatwawadi, K