-
From Variational to Deterministic Autoencoders
Authors:
Partha Ghosh,
Mehdi S. M. Sajjadi,
Antonio Vergari,
Michael Black,
Bernhard Schölkopf
Abstract:
Variational Autoencoders (VAEs) provide a theoretically-backed and popular framework for deep generative models. However, learning a VAE from data poses still unanswered theoretical questions and considerable practical challenges. In this work, we propose an alternative framework for generative modeling that is simpler, easier to train, and deterministic, yet has many of the advantages of VAEs. We…
▽ More
Variational Autoencoders (VAEs) provide a theoretically-backed and popular framework for deep generative models. However, learning a VAE from data poses still unanswered theoretical questions and considerable practical challenges. In this work, we propose an alternative framework for generative modeling that is simpler, easier to train, and deterministic, yet has many of the advantages of VAEs. We observe that sampling a stochastic encoder in a Gaussian VAE can be interpreted as simply injecting noise into the input of a deterministic decoder. We investigate how substituting this kind of stochasticity, with other explicit and implicit regularization schemes, can lead to an equally smooth and meaningful latent space without forcing it to conform to an arbitrarily chosen prior. To retrieve a generative mechanism to sample new data, we introduce an ex-post density estimation step that can be readily applied also to existing VAEs, improving their sample quality. We show, in a rigorous empirical study, that the proposed regularized deterministic autoencoders are able to generate samples that are comparable to, or better than, those of VAEs and more powerful alternatives when applied to images as well as to structured data such as molecules. \footnote{An implementation is available at: \url{https://github.com/ParthaEth/Regularized_autoencoders-RAE-}}
△ Less
Submitted 29 May, 2020; v1 submitted 29 March, 2019;
originally announced March 2019.
-
Assessing Generative Models via Precision and Recall
Authors:
Mehdi S. M. Sajjadi,
Olivier Bachem,
Mario Lucic,
Olivier Bousquet,
Sylvain Gelly
Abstract:
Recent advances in generative modeling have led to an increased interest in the study of statistical divergences as means of model comparison. Commonly used evaluation methods, such as the Frechet Inception Distance (FID), correlate well with the perceived quality of samples and are sensitive to mode dropping. However, these metrics are unable to distinguish between different failure cases since t…
▽ More
Recent advances in generative modeling have led to an increased interest in the study of statistical divergences as means of model comparison. Commonly used evaluation methods, such as the Frechet Inception Distance (FID), correlate well with the perceived quality of samples and are sensitive to mode dropping. However, these metrics are unable to distinguish between different failure cases since they only yield one-dimensional scores. We propose a novel definition of precision and recall for distributions which disentangles the divergence into two separate dimensions. The proposed notion is intuitive, retains desirable properties, and naturally leads to an efficient algorithm that can be used to evaluate generative models. We relate this notion to total variation as well as to recent evaluation metrics such as Inception Score and FID. To demonstrate the practical utility of the proposed approach we perform an empirical study on several variants of Generative Adversarial Networks and Variational Autoencoders. In an extensive set of experiments we show that the proposed metric is able to disentangle the quality of generated samples from the coverage of the target distribution.
△ Less
Submitted 28 October, 2018; v1 submitted 31 May, 2018;
originally announced June 2018.
-
Tempered Adversarial Networks
Authors:
Mehdi S. M. Sajjadi,
Giambattista Parascandolo,
Arash Mehrjou,
Bernhard Schölkopf
Abstract:
Generative adversarial networks (GANs) have been shown to produce realistic samples from high-dimensional distributions, but training them is considered hard. A possible explanation for training instabilities is the inherent imbalance between the networks: While the discriminator is trained directly on both real and fake samples, the generator only has control over the fake samples it produces sin…
▽ More
Generative adversarial networks (GANs) have been shown to produce realistic samples from high-dimensional distributions, but training them is considered hard. A possible explanation for training instabilities is the inherent imbalance between the networks: While the discriminator is trained directly on both real and fake samples, the generator only has control over the fake samples it produces since the real data distribution is fixed by the choice of a given dataset. We propose a simple modification that gives the generator control over the real samples which leads to a tempered learning process for both generator and discriminator. The real data distribution passes through a lens before being revealed to the discriminator, balancing the generator and discriminator by gradually revealing more detailed features necessary to produce high-quality results. The proposed module automatically adjusts the learning process to the current strength of the networks, yet is generic and easy to add to any GAN variant. In a number of experiments, we show that this can improve quality, stability and/or convergence speed across a range of different GAN architectures (DCGAN, LSGAN, WGAN-GP).
△ Less
Submitted 11 July, 2018; v1 submitted 12 February, 2018;
originally announced February 2018.
-
Frame-Recurrent Video Super-Resolution
Authors:
Mehdi S. M. Sajjadi,
Raviteja Vemulapalli,
Matthew Brown
Abstract:
Recent advances in video super-resolution have shown that convolutional neural networks combined with motion compensation are able to merge information from multiple low-resolution (LR) frames to generate high-quality images. Current state-of-the-art methods process a batch of LR frames to generate a single high-resolution (HR) frame and run this scheme in a sliding window fashion over the entire…
▽ More
Recent advances in video super-resolution have shown that convolutional neural networks combined with motion compensation are able to merge information from multiple low-resolution (LR) frames to generate high-quality images. Current state-of-the-art methods process a batch of LR frames to generate a single high-resolution (HR) frame and run this scheme in a sliding window fashion over the entire video, effectively treating the problem as a large number of separate multi-frame super-resolution tasks. This approach has two main weaknesses: 1) Each input frame is processed and warped multiple times, increasing the computational cost, and 2) each output frame is estimated independently conditioned on the input frames, limiting the system's ability to produce temporally consistent results.
In this work, we propose an end-to-end trainable frame-recurrent video super-resolution framework that uses the previously inferred HR estimate to super-resolve the subsequent frame. This naturally encourages temporally consistent results and reduces the computational cost by warping only one image in each step. Furthermore, due to its recurrent nature, the proposed method has the ability to assimilate a large number of previous frames without increased computational demands. Extensive evaluations and comparisons with previous methods validate the strengths of our approach and demonstrate that the proposed framework is able to significantly outperform the current state of the art.
△ Less
Submitted 25 March, 2018; v1 submitted 14 January, 2018;
originally announced January 2018.
-
AirDraw: Leveraging Smart Watch Motion Sensors for Mobile Human Computer Interactions
Authors:
Seyed A Sajjadi,
Danial Moazen,
Ani Nahapetian
Abstract:
Wearable computing is one of the fastest growing technologies today. Smart watches are poised to take over at least of half the wearable devices market in the near future. Smart watch screen size, however, is a limiting factor for growth, as it restricts practical text input. On the other hand, wearable devices have some features, such as consistent user interaction and hands-free, heads-up operat…
▽ More
Wearable computing is one of the fastest growing technologies today. Smart watches are poised to take over at least of half the wearable devices market in the near future. Smart watch screen size, however, is a limiting factor for growth, as it restricts practical text input. On the other hand, wearable devices have some features, such as consistent user interaction and hands-free, heads-up operations, which pave the way for gesture recognition methods of text entry. This paper proposes a new text input method for smart watches, which utilizes motion sensor data and machine learning approaches to detect letters written in the air by a user. This method is less computationally intensive and less expensive when compared to computer vision approaches. It is also not affected by lighting factors, which limit computer vision solutions. The AirDraw system prototype developed to test this approach is presented. Additionally, experimental results close to 71% accuracy are presented.
△ Less
Submitted 7 May, 2017;
originally announced May 2017.
-
Finding Bottlenecks: Predicting Student Attrition with Unsupervised Classifier
Authors:
Seyed Sajjadi,
Bruce Shapiro,
Christopher McKinlay,
Allen Sarkisyan,
Carol Shubin,
Efunwande Osoba
Abstract:
With pressure to increase graduation rates and reduce time to degree in higher education, it is important to identify at-risk students early. Automated early warning systems are therefore highly desirable. In this paper, we use unsupervised clustering techniques to predict the graduation status of declared majors in five departments at California State University Northridge (CSUN), based on a mini…
▽ More
With pressure to increase graduation rates and reduce time to degree in higher education, it is important to identify at-risk students early. Automated early warning systems are therefore highly desirable. In this paper, we use unsupervised clustering techniques to predict the graduation status of declared majors in five departments at California State University Northridge (CSUN), based on a minimal number of lower division courses in each major. In addition, we use the detected clusters to identify hidden bottleneck courses.
△ Less
Submitted 7 May, 2017;
originally announced May 2017.
-
EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis
Authors:
Mehdi S. M. Sajjadi,
Bernhard Schölkopf,
Michael Hirsch
Abstract:
Single image super-resolution is the task of inferring a high-resolution image from a single low-resolution input. Traditionally, the performance of algorithms for this task is measured using pixel-wise reconstruction measures such as peak signal-to-noise ratio (PSNR) which have been shown to correlate poorly with the human perception of image quality. As a result, algorithms minimizing these metr…
▽ More
Single image super-resolution is the task of inferring a high-resolution image from a single low-resolution input. Traditionally, the performance of algorithms for this task is measured using pixel-wise reconstruction measures such as peak signal-to-noise ratio (PSNR) which have been shown to correlate poorly with the human perception of image quality. As a result, algorithms minimizing these metrics tend to produce over-smoothed images that lack high-frequency textures and do not look natural despite yielding high PSNR values.
We propose a novel application of automated texture synthesis in combination with a perceptual loss focusing on creating realistic textures rather than optimizing for a pixel-accurate reproduction of ground truth images during training. By using feed-forward fully convolutional neural networks in an adversarial training setting, we achieve a significant boost in image quality at high magnification ratios. Extensive experiments on a number of datasets show the effectiveness of our approach, yielding state-of-the-art results in both quantitative and qualitative benchmarks.
△ Less
Submitted 30 July, 2017; v1 submitted 23 December, 2016;
originally announced December 2016.
-
Peer Grading in a Course on Algorithms and Data Structures: Machine Learning Algorithms do not Improve over Simple Baselines
Authors:
Mehdi S. M. Sajjadi,
Morteza Alamgir,
Ulrike von Luxburg
Abstract:
Peer grading is the process of students reviewing each others' work, such as homework submissions, and has lately become a popular mechanism used in massive open online courses (MOOCs). Intrigued by this idea, we used it in a course on algorithms and data structures at the University of Hamburg. Throughout the whole semester, students repeatedly handed in submissions to exercises, which were then…
▽ More
Peer grading is the process of students reviewing each others' work, such as homework submissions, and has lately become a popular mechanism used in massive open online courses (MOOCs). Intrigued by this idea, we used it in a course on algorithms and data structures at the University of Hamburg. Throughout the whole semester, students repeatedly handed in submissions to exercises, which were then evaluated both by teaching assistants and by a peer grading mechanism, yielding a large dataset of teacher and peer grades. We applied different statistical and machine learning methods to aggregate the peer grades in order to come up with accurate final grades for the submissions (supervised and unsupervised, methods based on numeric scores and ordinal rankings). Surprisingly, none of them improves over the baseline of using the mean peer grade as the final grade. We discuss a number of possible explanations for these results and present a thorough analysis of the generated dataset.
△ Less
Submitted 10 February, 2016; v1 submitted 2 June, 2015;
originally announced June 2015.