-
Selective Mixup Fine-Tuning for Optimizing Non-Decomposable Objectives
Authors:
Shrinivas Ramasubramanian,
Harsh Rangwani,
Sho Takemori,
Kunal Samanta,
Yuhei Umeda,
Venkatesh Babu Radhakrishnan
Abstract:
The rise in internet usage has led to the generation of massive amounts of data, resulting in the adoption of various supervised and semi-supervised machine learning algorithms, which can effectively utilize the colossal amount of data to train models. However, before deploying these models in the real world, these must be strictly evaluated on performance measures like worst-case recall and satis…
▽ More
The rise in internet usage has led to the generation of massive amounts of data, resulting in the adoption of various supervised and semi-supervised machine learning algorithms, which can effectively utilize the colossal amount of data to train models. However, before deploying these models in the real world, these must be strictly evaluated on performance measures like worst-case recall and satisfy constraints such as fairness. We find that current state-of-the-art empirical techniques offer sub-optimal performance on these practical, non-decomposable performance objectives. On the other hand, the theoretical techniques necessitate training a new model from scratch for each performance objective. To bridge the gap, we propose SelMix, a selective mixup-based inexpensive fine-tuning technique for pre-trained models, to optimize for the desired objective. The core idea of our framework is to determine a sampling distribution to perform a mixup of features between samples from particular classes such that it optimizes the given objective. We comprehensively evaluate our technique against the existing empirical and theoretically principled methods on standard benchmark datasets for imbalanced classification. We find that proposed SelMix fine-tuning significantly improves the performance for various practical non-decomposable objectives across benchmarks.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
GrainGNN: A dynamic graph neural network for predicting 3D grain microstructure
Authors:
Yigong Qin,
Stephen DeWitt,
Balasubramaniam Radhakrishnan,
George Biros
Abstract:
We propose GrainGNN, a surrogate model for the evolution of polycrystalline grain structure under rapid solidification conditions in metal additive manufacturing. High fidelity simulations of solidification microstructures are typically performed using multicomponent partial differential equations (PDEs) with moving interfaces. The inherent randomness of the PDE initial conditions (grain seeds) ne…
▽ More
We propose GrainGNN, a surrogate model for the evolution of polycrystalline grain structure under rapid solidification conditions in metal additive manufacturing. High fidelity simulations of solidification microstructures are typically performed using multicomponent partial differential equations (PDEs) with moving interfaces. The inherent randomness of the PDE initial conditions (grain seeds) necessitates ensemble simulations to predict microstructure statistics, e.g., grain size, aspect ratio, and crystallographic orientation. Currently such ensemble simulations are prohibitively expensive and surrogates are necessary.
In GrainGNN, we use a dynamic graph to represent interface motion and topological changes due to grain coarsening. We use a reduced representation of the microstructure using hand-crafted features; we combine pattern finding and altering graph algorithms with two neural networks, a classifier (for topological changes) and a regressor (for interface motion). Both networks have an encoder-decoder architecture; the encoder has a multi-layer transformer long-short-term-memory architecture; the decoder is a single layer perceptron.
We evaluate GrainGNN by comparing it to high-fidelity phase field simulations for in-distribution and out-of-distribution grain configurations for solidification under laser power bed fusion conditions. GrainGNN results in 80\%--90\% pointwise accuracy; and nearly identical distributions of scalar quantities of interest (QoI) between phase field and GrainGNN simulations compared using Kolmogorov-Smirnov test. GrainGNN's inference speedup (PyTorch on single x86 CPU) over a high-fidelity phase field simulation (CUDA on a single NVIDIA A100 GPU) is 150$\times$--2000$\times$ for 100-initial grain problem. Further, using GrainGNN, we model the formation of 11,600 grains in 220 seconds on a single CPU core.
△ Less
Submitted 31 January, 2024; v1 submitted 7 January, 2024;
originally announced January 2024.
-
Cost-Sensitive Self-Training for Optimizing Non-Decomposable Metrics
Authors:
Harsh Rangwani,
Shrinivas Ramasubramanian,
Sho Takemori,
Kato Takashi,
Yuhei Umeda,
Venkatesh Babu Radhakrishnan
Abstract:
Self-training based semi-supervised learning algorithms have enabled the learning of highly accurate deep neural networks, using only a fraction of labeled data. However, the majority of work on self-training has focused on the objective of improving accuracy, whereas practical machine learning systems can have complex goals (e.g. maximizing the minimum of recall across classes, etc.) that are non…
▽ More
Self-training based semi-supervised learning algorithms have enabled the learning of highly accurate deep neural networks, using only a fraction of labeled data. However, the majority of work on self-training has focused on the objective of improving accuracy, whereas practical machine learning systems can have complex goals (e.g. maximizing the minimum of recall across classes, etc.) that are non-decomposable in nature. In this work, we introduce the Cost-Sensitive Self-Training (CSST) framework which generalizes the self-training-based methods for optimizing non-decomposable metrics. We prove that our framework can better optimize the desired non-decomposable metric utilizing unlabeled data, under similar data distribution assumptions made for the analysis of self-training. Using the proposed CSST framework, we obtain practical self-training methods (for both vision and NLP tasks) for optimizing different non-decomposable metrics using deep neural networks. Our results demonstrate that CSST achieves an improvement over the state-of-the-art in majority of the cases across datasets and objectives.
△ Less
Submitted 28 April, 2023;
originally announced April 2023.
-
Detection and Classification of mental illnesses on social media using RoBERTa
Authors:
Ankit Murarka,
Balaji Radhakrishnan,
Sushma Ravichandran
Abstract:
Given the current social distancing regulations across the world, social media has become the primary mode of communication for most people. This has resulted in the isolation of many people suffering from mental illnesses who are unable to receive assistance in person. They have increasingly turned to social media to express themselves and to look for guidance in dealing with their illnesses. Kee…
▽ More
Given the current social distancing regulations across the world, social media has become the primary mode of communication for most people. This has resulted in the isolation of many people suffering from mental illnesses who are unable to receive assistance in person. They have increasingly turned to social media to express themselves and to look for guidance in dealing with their illnesses. Keeping this in mind, we propose a solution to detect and classify mental illness posts on social media thereby enabling users to seek appropriate help. In this work, we detect and classify five prominent kinds of mental illnesses: depression, anxiety, bipolar disorder, ADHD and PTSD by analyzing unstructured user data on social media platforms. In addition, we are sharing a new high-quality dataset to drive research on this topic. We believe that our work is the first multi-class model that uses a Transformer-based architecture such as RoBERTa to analyze people's emotions and psychology. We also demonstrate how we stress-test our model using behavioral testing. With this research, we hope to be able to contribute to the public health system by automating some of the detection and classification process.
△ Less
Submitted 23 November, 2020;
originally announced November 2020.
-
From Image Collections to Point Clouds with Self-supervised Shape and Pose Networks
Authors:
K L Navaneet,
Ansu Mathew,
Shashank Kashyap,
Wei-Chih Hung,
Varun Jampani,
R. Venkatesh Babu
Abstract:
Reconstructing 3D models from 2D images is one of the fundamental problems in computer vision. In this work, we propose a deep learning technique for 3D object reconstruction from a single image. Contrary to recent works that either use 3D supervision or multi-view supervision, we use only single view images with no pose information during training as well. This makes our approach more practical r…
▽ More
Reconstructing 3D models from 2D images is one of the fundamental problems in computer vision. In this work, we propose a deep learning technique for 3D object reconstruction from a single image. Contrary to recent works that either use 3D supervision or multi-view supervision, we use only single view images with no pose information during training as well. This makes our approach more practical requiring only an image collection of an object category and the corresponding silhouettes. We learn both 3D point cloud reconstruction and pose estimation networks in a self-supervised manner, making use of differentiable point cloud renderer to train with 2D supervision. A key novelty of the proposed technique is to impose 3D geometric reasoning into predicted 3D point clouds by rotating them with randomly sampled poses and then enforcing cycle consistency on both 3D reconstructions and poses. In addition, using single-view supervision allows us to do test-time optimization on a given test image. Experiments on the synthetic ShapeNet and real-world Pix3D datasets demonstrate that our approach, despite using less supervision, can achieve competitive performance compared to pose-supervised and multi-view supervised approaches.
△ Less
Submitted 5 May, 2020;
originally announced May 2020.
-
Adversarial Fooling Beyond "Flipping the Label"
Authors:
Konda Reddy Mopuri,
Vaisakh Shaj,
R. Venkatesh Babu
Abstract:
Recent advancements in CNNs have shown remarkable achievements in various CV/AI applications. Though CNNs show near human or better than human performance in many critical tasks, they are quite vulnerable to adversarial attacks. These attacks are potentially dangerous in real-life deployments. Though there have been many adversarial attacks proposed in recent years, there is no proper way of quant…
▽ More
Recent advancements in CNNs have shown remarkable achievements in various CV/AI applications. Though CNNs show near human or better than human performance in many critical tasks, they are quite vulnerable to adversarial attacks. These attacks are potentially dangerous in real-life deployments. Though there have been many adversarial attacks proposed in recent years, there is no proper way of quantifying the effectiveness of these attacks. As of today, mere fooling rate is used for measuring the susceptibility of the models, or the effectiveness of adversarial attacks. Fooling rate just considers label flipping and does not consider the cost of such flipping, for instance, in some deployments, flipping between two species of dogs may not be as severe as confusing a dog category with that of a vehicle. Therefore, the metric to quantify the vulnerability of the models should capture the severity of the flipping as well. In this work we first bring out the drawbacks of the existing evaluation and propose novel metrics to capture various aspects of the fooling. Further, for the first time, we present a comprehensive analysis of several important adversarial attacks over a set of distinct CNN architectures. We believe that the presented analysis brings valuable insights about the current adversarial attacks and the CNN models.
△ Less
Submitted 27 April, 2020;
originally announced April 2020.
-
Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis
Authors:
Jogendra Nath Kundu,
Siddharth Seth,
Varun Jampani,
Mugalodi Rakesh,
R. Venkatesh Babu,
Anirban Chakraborty
Abstract:
Camera captured human pose is an outcome of several sources of variation. Performance of supervised 3D pose estimation approaches comes at the cost of dispensing with variations, such as shape and appearance, that may be useful for solving other related tasks. As a result, the learned model not only inculcates task-bias but also dataset-bias because of its strong reliance on the annotated samples,…
▽ More
Camera captured human pose is an outcome of several sources of variation. Performance of supervised 3D pose estimation approaches comes at the cost of dispensing with variations, such as shape and appearance, that may be useful for solving other related tasks. As a result, the learned model not only inculcates task-bias but also dataset-bias because of its strong reliance on the annotated samples, which also holds true for weakly-supervised models. Acknowledging this, we propose a self-supervised learning framework to disentangle such variations from unlabeled video frames. We leverage the prior knowledge on human skeleton and poses in the form of a single part-based 2D puppet model, human pose articulation constraints, and a set of unpaired 3D poses. Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, not only facilitates discovery of interpretable pose disentanglement but also allows us to operate on videos with diverse camera movements. Qualitative results on unseen in-the-wild datasets establish our superior generalization across multiple tasks beyond the primary tasks of 3D pose estimation and part segmentation. Furthermore, we demonstrate state-of-the-art weakly-supervised 3D pose estimation performance on both Human3.6M and MPI-INF-3DHP datasets.
△ Less
Submitted 9 April, 2020;
originally announced April 2020.
-
Universal Source-Free Domain Adaptation
Authors:
Jogendra Nath Kundu,
Naveen Venkat,
Rahul M V,
R. Venkatesh Babu
Abstract:
There is a strong incentive to develop versatile learning techniques that can transfer the knowledge of class-separability from a labeled source domain to an unlabeled target domain in the presence of a domain-shift. Existing domain adaptation (DA) approaches are not equipped for practical DA scenarios as a result of their reliance on the knowledge of source-target label-set relationship (e.g. Clo…
▽ More
There is a strong incentive to develop versatile learning techniques that can transfer the knowledge of class-separability from a labeled source domain to an unlabeled target domain in the presence of a domain-shift. Existing domain adaptation (DA) approaches are not equipped for practical DA scenarios as a result of their reliance on the knowledge of source-target label-set relationship (e.g. Closed-set, Open-set or Partial DA). Furthermore, almost all prior unsupervised DA works require coexistence of source and target samples even during deployment, making them unsuitable for real-time adaptation. Devoid of such impractical assumptions, we propose a novel two-stage learning process. 1) In the Procurement stage, we aim to equip the model for future source-free deployment, assuming no prior knowledge of the upcoming category-gap and domain-shift. To achieve this, we enhance the model's ability to reject out-of-source distribution samples by leveraging the available source data, in a novel generative classifier framework. 2) In the Deployment stage, the goal is to design a unified adaptation algorithm capable of operating across a wide range of category-gaps, with no access to the previously seen source samples. To this end, in contrast to the usage of complex adversarial training regimes, we define a simple yet effective source-free adaptation objective by utilizing a novel instance-level weighting mechanism, named as Source Similarity Metric (SSM). A thorough evaluation shows the practical usability of the proposed learning framework with superior DA performance even over state-of-the-art source-dependent approaches.
△ Less
Submitted 9 April, 2020;
originally announced April 2020.
-
Towards Inheritable Models for Open-Set Domain Adaptation
Authors:
Jogendra Nath Kundu,
Naveen Venkat,
Ambareesh Revanur,
Rahul M V,
R. Venkatesh Babu
Abstract:
There has been a tremendous progress in Domain Adaptation (DA) for visual recognition tasks. Particularly, open-set DA has gained considerable attention wherein the target domain contains additional unseen categories. Existing open-set DA approaches demand access to a labeled source dataset along with unlabeled target instances. However, this reliance on co-existing source and target data is highl…
▽ More
There has been a tremendous progress in Domain Adaptation (DA) for visual recognition tasks. Particularly, open-set DA has gained considerable attention wherein the target domain contains additional unseen categories. Existing open-set DA approaches demand access to a labeled source dataset along with unlabeled target instances. However, this reliance on co-existing source and target data is highly impractical in scenarios where data-sharing is restricted due to its proprietary nature or privacy concerns. Addressing this, we introduce a practical DA paradigm where a source-trained model is used to facilitate adaptation in the absence of the source dataset in future. To this end, we formalize knowledge inheritability as a novel concept and propose a simple yet effective solution to realize inheritable models suitable for the above practical paradigm. Further, we present an objective way to quantify inheritability to enable the selection of the most suitable source model for a given target domain, even in the absence of the source data. We provide theoretical insights followed by a thorough empirical evaluation demonstrating state-of-the-art open-set domain adaptation performance.
△ Less
Submitted 9 April, 2020;
originally announced April 2020.
-
Towards Achieving Adversarial Robustness by Enforcing Feature Consistency Across Bit Planes
Authors:
Sravanti Addepalli,
Vivek B. S.,
Arya Baburaj,
Gaurang Sriramanan,
R. Venkatesh Babu
Abstract:
As humans, we inherently perceive images based on their predominant features, and ignore noise embedded within lower bit planes. On the contrary, Deep Neural Networks are known to confidently misclassify images corrupted with meticulously crafted perturbations that are nearly imperceptible to the human eye. In this work, we attempt to address this problem by training networks to form coarse impres…
▽ More
As humans, we inherently perceive images based on their predominant features, and ignore noise embedded within lower bit planes. On the contrary, Deep Neural Networks are known to confidently misclassify images corrupted with meticulously crafted perturbations that are nearly imperceptible to the human eye. In this work, we attempt to address this problem by training networks to form coarse impressions based on the information in higher bit planes, and use the lower bit planes only to refine their prediction. We demonstrate that, by imposing consistency on the representations learned across differently quantized images, the adversarial robustness of networks improves significantly when compared to a normally trained model. Present state-of-the-art defenses against adversarial attacks require the networks to be explicitly trained using adversarial samples that are computationally expensive to generate. While such methods that use adversarial training continue to achieve the best results, this work paves the way towards achieving robustness without having to explicitly train on adversarial samples. The proposed approach is therefore faster, and also closer to the natural learning process in humans.
△ Less
Submitted 1 April, 2020;
originally announced April 2020.
-
GAN-Tree: An Incrementally Learned Hierarchical Generative Framework for Multi-Modal Data Distributions
Authors:
Jogendra Nath Kundu,
Maharshi Gor,
Dakshit Agrawal,
R. Venkatesh Babu
Abstract:
Despite the remarkable success of generative adversarial networks, their performance seems less impressive for diverse training sets, requiring learning of discontinuous mapping functions. Though multi-mode prior or multi-generator models have been proposed to alleviate this problem, such approaches may fail depending on the empirically chosen initial mode components. In contrast to such bottom-up…
▽ More
Despite the remarkable success of generative adversarial networks, their performance seems less impressive for diverse training sets, requiring learning of discontinuous mapping functions. Though multi-mode prior or multi-generator models have been proposed to alleviate this problem, such approaches may fail depending on the empirically chosen initial mode components. In contrast to such bottom-up approaches, we present GAN-Tree, which follows a hierarchical divisive strategy to address such discontinuous multi-modal data. Devoid of any assumption on the number of modes, GAN-Tree utilizes a novel mode-splitting algorithm to effectively split the parent mode to semantically cohesive children modes, facilitating unsupervised clustering. Further, it also enables incremental addition of new data modes to an already trained GAN-Tree, by updating only a single branch of the tree structure. As compared to prior approaches, the proposed framework offers a higher degree of flexibility in choosing a large variety of mutually exclusive and exhaustive tree nodes called GAN-Set. Extensive experiments on synthetic and natural image datasets including ImageNet demonstrate the superiority of GAN-Tree against the prior state-of-the-arts.
△ Less
Submitted 16 September, 2019; v1 submitted 11 August, 2019;
originally announced August 2019.
-
UM-Adapt: Unsupervised Multi-Task Adaptation Using Adversarial Cross-Task Distillation
Authors:
Jogendra Nath Kundu,
Nishank Lakkakula,
R. Venkatesh Babu
Abstract:
Aiming towards human-level generalization, there is a need to explore adaptable representation learning methods with greater transferability. Most existing approaches independently address task-transferability and cross-domain adaptation, resulting in limited generalization. In this paper, we propose UM-Adapt - a unified framework to effectively perform unsupervised domain adaptation for spatially…
▽ More
Aiming towards human-level generalization, there is a need to explore adaptable representation learning methods with greater transferability. Most existing approaches independently address task-transferability and cross-domain adaptation, resulting in limited generalization. In this paper, we propose UM-Adapt - a unified framework to effectively perform unsupervised domain adaptation for spatially-structured prediction tasks, simultaneously maintaining a balanced performance across individual tasks in a multi-task setting. To realize this, we propose two novel regularization strategies; a) Contour-based content regularization (CCR) and b) exploitation of inter-task coherency using a cross-task distillation module. Furthermore, avoiding a conventional ad-hoc domain discriminator, we re-utilize the cross-task distillation loss as output of an energy function to adversarially minimize the input domain discrepancy. Through extensive experiments, we demonstrate superior generalizability of the learned representations simultaneously for multiple tasks under domain-shifts from synthetic to natural environments. UM-Adapt yields state-of-the-art transfer learning results on ImageNet classification and comparable performance on PASCAL VOC 2007 detection task, even with a smaller backbone-net. Moreover, the resulting semi-supervised framework outperforms the current fully-supervised multi-task learning state-of-the-art on both NYUD and Cityscapes dataset.
△ Less
Submitted 16 September, 2019; v1 submitted 11 August, 2019;
originally announced August 2019.
-
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detection
Authors:
Deepak Babu Sam,
Skand Vishwanath Peri,
Mukuntha Narayanan Sundararaman,
Amogh Kamath,
R. Venkatesh Babu
Abstract:
We introduce a detection framework for dense crowd counting and eliminate the need for the prevalent density regression paradigm. Typical counting models predict crowd density for an image as opposed to detecting every person. These regression methods, in general, fail to localize persons accurate enough for most applications other than counting. Hence, we adopt an architecture that locates every…
▽ More
We introduce a detection framework for dense crowd counting and eliminate the need for the prevalent density regression paradigm. Typical counting models predict crowd density for an image as opposed to detecting every person. These regression methods, in general, fail to localize persons accurate enough for most applications other than counting. Hence, we adopt an architecture that locates every person in the crowd, sizes the spotted heads with bounding box and then counts them. Compared to normal object or face detectors, there exist certain unique challenges in designing such a detection system. Some of them are direct consequences of the huge diversity in dense crowds along with the need to predict boxes contiguously. We solve these issues and develop our LSC-CNN model, which can reliably detect heads of people across sparse to dense crowds. LSC-CNN employs a multi-column architecture with top-down feedback processing to better resolve persons and produce refined predictions at multiple resolutions. Interestingly, the proposed training regime requires only point head annotation, but can estimate approximate size information of heads. We show that LSC-CNN not only has superior localization than existing density regressors, but outperforms in counting as well. The code for our approach is available at https://github.com/val-iisc/lsc-cnn.
△ Less
Submitted 15 February, 2020; v1 submitted 18 June, 2019;
originally announced June 2019.
-
Enhancing Salient Object Segmentation Through Attention
Authors:
Anuj Pahuja,
Avishek Majumder,
Anirban Chakraborty,
R. Venkatesh Babu
Abstract:
Segmenting salient objects in an image is an important vision task with ubiquitous applications. The problem becomes more challenging in the presence of a cluttered and textured background, low resolution and/or low contrast images. Even though existing algorithms perform well in segmenting most of the object(s) of interest, they often end up segmenting false positives due to resembling salient ob…
▽ More
Segmenting salient objects in an image is an important vision task with ubiquitous applications. The problem becomes more challenging in the presence of a cluttered and textured background, low resolution and/or low contrast images. Even though existing algorithms perform well in segmenting most of the object(s) of interest, they often end up segmenting false positives due to resembling salient objects in the background. In this work, we tackle this problem by iteratively attending to image patches in a recurrent fashion and subsequently enhancing the predicted segmentation mask. Saliency features are estimated independently for every image patch, which are further combined using an aggregation strategy based on a Convolutional Gated Recurrent Unit (ConvGRU) network. The proposed approach works in an end-to-end manner, removing background noise and false positives incrementally. Through extensive evaluation on various benchmark datasets, we show superior performance to the existing approaches without any post-processing.
△ Less
Submitted 27 May, 2019;
originally announced May 2019.
-
Zero-Shot Knowledge Distillation in Deep Networks
Authors:
Gaurav Kumar Nayak,
Konda Reddy Mopuri,
Vaisakh Shaj,
R. Venkatesh Babu,
Anirban Chakraborty
Abstract:
Knowledge distillation deals with the problem of training a smaller model (Student) from a high capacity source model (Teacher) so as to retain most of its performance. Existing approaches use either the training data or meta-data extracted from it in order to train the Student. However, accessing the dataset on which the Teacher has been trained may not always be feasible if the dataset is very l…
▽ More
Knowledge distillation deals with the problem of training a smaller model (Student) from a high capacity source model (Teacher) so as to retain most of its performance. Existing approaches use either the training data or meta-data extracted from it in order to train the Student. However, accessing the dataset on which the Teacher has been trained may not always be feasible if the dataset is very large or it poses privacy or safety concerns (e.g., bio-metric or medical data). Hence, in this paper, we propose a novel data-free method to train the Student from the Teacher. Without even using any meta-data, we synthesize the Data Impressions from the complex Teacher model and utilize these as surrogates for the original training data samples to transfer its learning to Student via knowledge distillation. We, therefore, dub our method "Zero-Shot Knowledge Distillation" and demonstrate that our framework results in competitive generalization performance as achieved by distillation using the actual training data samples on multiple benchmark datasets.
△ Less
Submitted 20 May, 2019;
originally announced May 2019.
-
Dense 3D Point Cloud Reconstruction Using a Deep Pyramid Network
Authors:
Priyanka Mandikal,
R. Venkatesh Babu
Abstract:
Reconstructing a high-resolution 3D model of an object is a challenging task in computer vision. Designing scalable and light-weight architectures is crucial while addressing this problem. Existing point-cloud based reconstruction approaches directly predict the entire point cloud in a single stage. Although this technique can handle low-resolution point clouds, it is not a viable solution for gen…
▽ More
Reconstructing a high-resolution 3D model of an object is a challenging task in computer vision. Designing scalable and light-weight architectures is crucial while addressing this problem. Existing point-cloud based reconstruction approaches directly predict the entire point cloud in a single stage. Although this technique can handle low-resolution point clouds, it is not a viable solution for generating dense, high-resolution outputs. In this work, we introduce DensePCR, a deep pyramidal network for point cloud reconstruction that hierarchically predicts point clouds of increasing resolution. Towards this end, we propose an architecture that first predicts a low-resolution point cloud, and then hierarchically increases the resolution by aggregating local and global point features to deform a grid. Our method generates point clouds that are accurate, uniform and dense. Through extensive quantitative and qualitative evaluation on synthetic and real datasets, we demonstrate that DensePCR outperforms the existing state-of-the-art point cloud reconstruction works, while also providing a light-weight and scalable architecture for predicting high-resolution outputs.
△ Less
Submitted 25 January, 2019;
originally announced January 2019.
-
BiHMP-GAN: Bidirectional 3D Human Motion Prediction GAN
Authors:
Jogendra Nath Kundu,
Maharshi Gor,
R. Venkatesh Babu
Abstract:
Human motion prediction model has applications in various fields of computer vision. Without taking into account the inherent stochasticity in the prediction of future pose dynamics, such methods often converges to a deterministic undesired mean of multiple probable outcomes. Devoid of this, we propose a novel probabilistic generative approach called Bidirectional Human motion prediction GAN, or B…
▽ More
Human motion prediction model has applications in various fields of computer vision. Without taking into account the inherent stochasticity in the prediction of future pose dynamics, such methods often converges to a deterministic undesired mean of multiple probable outcomes. Devoid of this, we propose a novel probabilistic generative approach called Bidirectional Human motion prediction GAN, or BiHMP-GAN. To be able to generate multiple probable human-pose sequences, conditioned on a given starting sequence, we introduce a random extrinsic factor r, drawn from a predefined prior distribution. Furthermore, to enforce a direct content loss on the predicted motion sequence and also to avoid mode-collapse, a novel bidirectional framework is incorporated by modifying the usual discriminator architecture. The discriminator is trained also to regress this extrinsic factor r, which is used alongside with the intrinsic factor (encoded starting pose sequence) to generate a particular pose sequence. To further regularize the training, we introduce a novel recursive prediction strategy. In spite of being in a probabilistic framework, the enhanced discriminator architecture allows predictions of an intermediate part of pose sequence to be used as a conditioning for prediction of the latter part of the sequence. The bidirectional setup also provides a new direction to evaluate the prediction quality against a given test sequence. For a fair assessment of BiHMP-GAN, we report performance of the generated motion sequence using (i) a critic model trained to discriminate between real and fake motion sequence, and (ii) an action classifier trained on real human motion dynamics. Outcomes of both qualitative and quantitative evaluations, on the probabilistic generations of the model, demonstrate the superiority of BiHMP-GAN over previously available methods.
△ Less
Submitted 6 December, 2018;
originally announced December 2018.
-
CAPNet: Continuous Approximation Projection For 3D Point Cloud Reconstruction Using 2D Supervision
Authors:
Navaneet K L,
Priyanka Mandikal,
Mayank Agarwal,
R. Venkatesh Babu
Abstract:
Knowledge of 3D properties of objects is a necessity in order to build effective computer vision systems. However, lack of large scale 3D datasets can be a major constraint for data-driven approaches in learning such properties. We consider the task of single image 3D point cloud reconstruction, and aim to utilize multiple foreground masks as our supervisory data to alleviate the need for large sc…
▽ More
Knowledge of 3D properties of objects is a necessity in order to build effective computer vision systems. However, lack of large scale 3D datasets can be a major constraint for data-driven approaches in learning such properties. We consider the task of single image 3D point cloud reconstruction, and aim to utilize multiple foreground masks as our supervisory data to alleviate the need for large scale 3D datasets. A novel differentiable projection module, called 'CAPNet', is introduced to obtain such 2D masks from a predicted 3D point cloud. The key idea is to model the projections as a continuous approximation of the points in the point cloud. To overcome the challenges of sparse projection maps, we propose a loss formulation termed 'affinity loss' to generate outlier-free reconstructions. We significantly outperform the existing projection based approaches on a large-scale synthetic dataset. We show the utility and generalizability of such a 2D supervised approach through experiments on a real-world dataset, where lack of 3D data can be a serious concern. To further enhance the reconstructions, we also propose a test stage optimization procedure to obtain reconstructions that display high correspondence with the observed input image.
△ Less
Submitted 28 November, 2018;
originally announced November 2018.
-
3D-PSRNet: Part Segmented 3D Point Cloud Reconstruction From a Single Image
Authors:
Priyanka Mandikal,
Navaneet K L,
R. Venkatesh Babu
Abstract:
We propose a mechanism to reconstruct part annotated 3D point clouds of objects given just a single input image. We demonstrate that jointly training for both reconstruction and segmentation leads to improved performance in both the tasks, when compared to training for each task individually. The key idea is to propagate information from each task so as to aid the other during the training procedu…
▽ More
We propose a mechanism to reconstruct part annotated 3D point clouds of objects given just a single input image. We demonstrate that jointly training for both reconstruction and segmentation leads to improved performance in both the tasks, when compared to training for each task individually. The key idea is to propagate information from each task so as to aid the other during the training procedure. Towards this end, we introduce a location-aware segmentation loss in the training regime. We empirically show the effectiveness of the proposed loss in generating more faithful part reconstructions while also improving segmentation accuracy. We thoroughly evaluate the proposed approach on different object categories from the ShapeNet dataset to obtain improved results in reconstruction as well as segmentation.
△ Less
Submitted 30 September, 2018;
originally announced October 2018.
-
Object Pose Estimation from Monocular Image using Multi-View Keypoint Correspondence
Authors:
Jogendra Nath Kundu,
Rahul M. V.,
Aditya Ganeshan,
R. Venkatesh Babu
Abstract:
Understanding the geometry and pose of objects in 2D images is a fundamental necessity for a wide range of real world applications. Driven by deep neural networks, recent methods have brought significant improvements to object pose estimation. However, they suffer due to scarcity of keypoint/pose-annotated real images and hence can not exploit the object's 3D structural information effectively. In…
▽ More
Understanding the geometry and pose of objects in 2D images is a fundamental necessity for a wide range of real world applications. Driven by deep neural networks, recent methods have brought significant improvements to object pose estimation. However, they suffer due to scarcity of keypoint/pose-annotated real images and hence can not exploit the object's 3D structural information effectively. In this work, we propose a data-efficient method which utilizes the geometric regularity of intraclass objects for pose estimation. First, we learn pose-invariant local descriptors of object parts from simple 2D RGB images. These descriptors, along with keypoints obtained from renders of a fixed 3D template model are then used to generate keypoint correspondence maps for a given monocular real image. Finally, a pose estimation network predicts 3D pose of the object using these correspondence maps. This pipeline is further extended to a multi-view approach, which assimilates keypoint information from correspondence sets generated from multiple views of the 3D template model. Fusion of multi-view information significantly improves geometric comprehension of the system which in turn enhances the pose estimation performance. Furthermore, use of correspondence framework responsible for the learning of pose invariant keypoint descriptor also allows us to effectively alleviate the data-scarcity problem. This enables our method to achieve state-of-the-art performance on multiple real-image viewpoint estimation datasets, such as Pascal3D+ and ObjectNet3D. To encourage reproducible research, we have released the codes for our proposed approach.
△ Less
Submitted 3 September, 2018;
originally announced September 2018.
-
Ask, Acquire, and Attack: Data-free UAP Generation using Class Impressions
Authors:
Konda Reddy Mopuri,
Phani Krishna Uppala,
R. Venkatesh Babu
Abstract:
Deep learning models are susceptible to input specific noise, called adversarial perturbations. Moreover, there exist input-agnostic noise, called Universal Adversarial Perturbations (UAP) that can affect inference of the models over most input samples. Given a model, there exist broadly two approaches to craft UAPs: (i) data-driven: that require data, and (ii) data-free: that do not require data…
▽ More
Deep learning models are susceptible to input specific noise, called adversarial perturbations. Moreover, there exist input-agnostic noise, called Universal Adversarial Perturbations (UAP) that can affect inference of the models over most input samples. Given a model, there exist broadly two approaches to craft UAPs: (i) data-driven: that require data, and (ii) data-free: that do not require data samples. Data-driven approaches require actual samples from the underlying data distribution and craft UAPs with high success (fooling) rate. However, data-free approaches craft UAPs without utilizing any data samples and therefore result in lesser success rates. In this paper, for data-free scenarios, we propose a novel approach that emulates the effect of data samples with class impressions in order to craft UAPs using data-driven objectives. Class impression for a given pair of category and model is a generic representation (in the input space) of the samples belonging to that category. Further, we present a neural network based generative model that utilizes the acquired class impressions to learn crafting UAPs. Experimental evaluation demonstrates that the learned generative model, (i) readily crafts UAPs via simple feed-forwarding through neural network layers, and (ii) achieves state-of-the-art success rates for data-free scenario and closer to that for data-driven setting without actually utilizing any data samples.
△ Less
Submitted 3 August, 2018;
originally announced August 2018.
-
iSPA-Net: Iterative Semantic Pose Alignment Network
Authors:
Jogendra Nath Kundu,
Aditya Ganeshan,
Rahul M. V.,
Aditya Prakash,
R. Venkatesh Babu
Abstract:
Understanding and extracting 3D information of objects from monocular 2D images is a fundamental problem in computer vision. In the task of 3D object pose estimation, recent data driven deep neural network based approaches suffer from scarcity of real images with 3D keypoint and pose annotations. Drawing inspiration from human cognition, where the annotators use a 3D CAD model as structural refere…
▽ More
Understanding and extracting 3D information of objects from monocular 2D images is a fundamental problem in computer vision. In the task of 3D object pose estimation, recent data driven deep neural network based approaches suffer from scarcity of real images with 3D keypoint and pose annotations. Drawing inspiration from human cognition, where the annotators use a 3D CAD model as structural reference to acquire ground-truth viewpoints for real images; we propose an iterative Semantic Pose Alignment Network, called iSPA-Net. Our approach focuses on exploiting semantic 3D structural regularity to solve the task of fine-grained pose estimation by predicting viewpoint difference between a given pair of images. Such image comparison based approach also alleviates the problem of data scarcity and hence enhances scalability of the proposed approach for novel object categories with minimal annotation. The fine-grained object pose estimator is also aided by correspondence of learned spatial descriptor of the input image pair. The proposed pose alignment framework enjoys the faculty to refine its initial pose estimation in consecutive iterations by utilizing an online rendering setup along with effectiveness of a non-uniform bin classification of pose-difference. This enables iSPA-Net to achieve state-of-the-art performance on various real image viewpoint estimation datasets. Further, we demonstrate effectiveness of the approach for multiple applications. First, we show results for active object viewpoint localization to capture images from similar pose considering only a single image as pose reference. Second, we demonstrate the ability of the learned semantic correspondence to perform unsupervised part-segmentation transfer using only a single part-annotated 3D template model per object class. To encourage reproducible research, we have released the codes for our proposed algorithm.
△ Less
Submitted 3 August, 2018;
originally announced August 2018.
-
3D-LMNet: Latent Embedding Matching for Accurate and Diverse 3D Point Cloud Reconstruction from a Single Image
Authors:
Priyanka Mandikal,
K L Navaneet,
Mayank Agarwal,
R. Venkatesh Babu
Abstract:
3D reconstruction from single view images is an ill-posed problem. Inferring the hidden regions from self-occluded images is both challenging and ambiguous. We propose a two-pronged approach to address these issues. To better incorporate the data prior and generate meaningful reconstructions, we propose 3D-LMNet, a latent embedding matching approach for 3D reconstruction. We first train a 3D point…
▽ More
3D reconstruction from single view images is an ill-posed problem. Inferring the hidden regions from self-occluded images is both challenging and ambiguous. We propose a two-pronged approach to address these issues. To better incorporate the data prior and generate meaningful reconstructions, we propose 3D-LMNet, a latent embedding matching approach for 3D reconstruction. We first train a 3D point cloud auto-encoder and then learn a mapping from the 2D image to the corresponding learnt embedding. To tackle the issue of uncertainty in the reconstruction, we predict multiple reconstructions that are consistent with the input view. This is achieved by learning a probablistic latent space with a novel view-specific diversity loss. Thorough quantitative and qualitative analysis is performed to highlight the significance of the proposed approach. We outperform state-of-the-art approaches on the task of single-view 3D reconstruction on both real and synthetic datasets while generating multiple plausible reconstructions, demonstrating the generalizability and utility of our approach.
△ Less
Submitted 26 March, 2019; v1 submitted 20 July, 2018;
originally announced July 2018.
-
Operator-in-the-Loop Deep Sequential Multi-camera Feature Fusion for Person Re-identification
Authors:
K L Navaneet,
Ravi Kiran Sarvadevabhatla,
Shashank Shekhar,
R. Venkatesh Babu,
Anirban Chakraborty
Abstract:
Given a target image as query, person re-identification systems retrieve a ranked list of candidate matches on a per-camera basis. In deployed systems, a human operator scans these lists and labels sighted targets by touch or mouse-based selection. However, classical re-id approaches generate per-camera lists independently. Therefore, target identifications by operator in a subset of cameras canno…
▽ More
Given a target image as query, person re-identification systems retrieve a ranked list of candidate matches on a per-camera basis. In deployed systems, a human operator scans these lists and labels sighted targets by touch or mouse-based selection. However, classical re-id approaches generate per-camera lists independently. Therefore, target identifications by operator in a subset of cameras cannot be utilized to improve ranking of the target in remaining set of network cameras. To address this shortcoming, we propose a novel sequential multi-camera re-id approach. The proposed approach can accommodate human operator inputs and provides early gains via a monotonic improvement in target ranking. At the heart of our approach is a fusion function which operates on deep feature representations of query and candidate matches. We formulate an optimization procedure custom-designed to incrementally improve query representation. Since existing evaluation methods cannot be directly adopted to our setting, we also propose two novel evaluation protocols. The results on two large-scale re-id datasets (Market-1501, DukeMTMC-reID) demonstrate that our multi-camera method significantly outperforms baselines and other popular feature fusion schemes. Additionally, we conduct a comparative subject-based study of human operator performance. The superior operator performance enabled by our approach makes a compelling case for its integration into deployable video-surveillance systems.
△ Less
Submitted 5 December, 2019; v1 submitted 19 July, 2018;
originally announced July 2018.
-
AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation
Authors:
Jogendra Nath Kundu,
Phani Krishna Uppala,
Anuj Pahuja,
R. Venkatesh Babu
Abstract:
Supervised deep learning methods have shown promising results for the task of monocular depth estimation; but acquiring ground truth is costly, and prone to noise as well as inaccuracies. While synthetic datasets have been used to circumvent above problems, the resultant models do not generalize well to natural scenes due to the inherent domain shift. Recent adversarial approaches for domain adapt…
▽ More
Supervised deep learning methods have shown promising results for the task of monocular depth estimation; but acquiring ground truth is costly, and prone to noise as well as inaccuracies. While synthetic datasets have been used to circumvent above problems, the resultant models do not generalize well to natural scenes due to the inherent domain shift. Recent adversarial approaches for domain adaption have performed well in mitigating the differences between the source and target domains. But these methods are mostly limited to a classification setup and do not scale well for fully-convolutional architectures. In this work, we propose AdaDepth - an unsupervised domain adaptation strategy for the pixel-wise regression task of monocular depth estimation. The proposed approach is devoid of above limitations through a) adversarial learning and b) explicit imposition of content consistency on the adapted target representation. Our unsupervised approach performs competitively with other established approaches on depth estimation tasks and achieves state-of-the-art results in a semi-supervised setting.
△ Less
Submitted 7 June, 2018; v1 submitted 5 March, 2018;
originally announced March 2018.
-
Game of Sketches: Deep Recurrent Models of Pictionary-style Word Guessing
Authors:
Ravi Kiran Sarvadevabhatla,
Shiv Surya,
Trisha Mittal,
Venkatesh Babu Radhakrishnan
Abstract:
The ability of intelligent agents to play games in human-like fashion is popularly considered a benchmark of progress in Artificial Intelligence. Similarly, performance on multi-disciplinary tasks such as Visual Question Answering (VQA) is considered a marker for gauging progress in Computer Vision. In our work, we bring games and VQA together. Specifically, we introduce the first computational mo…
▽ More
The ability of intelligent agents to play games in human-like fashion is popularly considered a benchmark of progress in Artificial Intelligence. Similarly, performance on multi-disciplinary tasks such as Visual Question Answering (VQA) is considered a marker for gauging progress in Computer Vision. In our work, we bring games and VQA together. Specifically, we introduce the first computational model aimed at Pictionary, the popular word-guessing social game. We first introduce Sketch-QA, an elementary version of Visual Question Answering task. Styled after Pictionary, Sketch-QA uses incrementally accumulated sketch stroke sequences as visual data. Notably, Sketch-QA involves asking a fixed question ("What object is being drawn?") and gathering open-ended guess-words from human guessers. We analyze the resulting dataset and present many interesting findings therein. To mimic Pictionary-style guessing, we subsequently propose a deep neural model which generates guess-words in response to temporally evolving human-drawn sketches. Our model even makes human-like mistakes while guessing, thus amplifying the human mimicry factor. We evaluate our model on the large-scale guess-word dataset generated via Sketch-QA task and compare with various baselines. We also conduct a Visual Turing Test to obtain human impressions of the guess-words generated by humans and our model. Experimental results demonstrate the promise of our approach for Pictionary and similarly themed games.
△ Less
Submitted 28 January, 2018;
originally announced January 2018.
-
Generalizable Data-free Objective for Crafting Universal Adversarial Perturbations
Authors:
Konda Reddy Mopuri,
Aditya Ganeshan,
R. Venkatesh Babu
Abstract:
Machine learning models are susceptible to adversarial perturbations: small changes to input that can cause large changes in output. It is also demonstrated that there exist input-agnostic perturbations, called universal adversarial perturbations, which can change the inference of target model on most of the data samples. However, existing methods to craft universal perturbations are (i) task spec…
▽ More
Machine learning models are susceptible to adversarial perturbations: small changes to input that can cause large changes in output. It is also demonstrated that there exist input-agnostic perturbations, called universal adversarial perturbations, which can change the inference of target model on most of the data samples. However, existing methods to craft universal perturbations are (i) task specific, (ii) require samples from the training data distribution, and (iii) perform complex optimizations. Additionally, because of the data dependence, fooling ability of the crafted perturbations is proportional to the available training data. In this paper, we present a novel, generalizable and data-free approaches for crafting universal adversarial perturbations. Independent of the underlying task, our objective achieves fooling via corrupting the extracted features at multiple layers. Therefore, the proposed objective is generalizable to craft image-agnostic perturbations across multiple vision tasks such as object recognition, semantic segmentation, and depth estimation. In the practical setting of black-box attack scenario (when the attacker does not have access to the target model and it's training data), we show that our objective outperforms the data dependent objectives to fool the learned models. Further, via exploiting simple priors related to the data distribution, our objective remarkably boosts the fooling ability of the crafted perturbations. Significant fooling rates achieved by our objective emphasize that the current deep learning models are now at an increased risk, since our objective generalizes across multiple tasks without the requirement of training data for crafting the perturbations. To encourage reproducible research, we have released the codes for our proposed algorithm.
△ Less
Submitted 24 July, 2018; v1 submitted 24 January, 2018;
originally announced January 2018.
-
DeepFuse: A Deep Unsupervised Approach for Exposure Fusion with Extreme Exposure Image Pairs
Authors:
K. Ram Prabhakar,
V. Sai Srikar,
R. Venkatesh Babu
Abstract:
We present a novel deep learning architecture for fusing static multi-exposure images. Current multi-exposure fusion (MEF) approaches use hand-crafted features to fuse input sequence. However, the weak hand-crafted representations are not robust to varying input conditions. Moreover, they perform poorly for extreme exposure image pairs. Thus, it is highly desirable to have a method that is robust…
▽ More
We present a novel deep learning architecture for fusing static multi-exposure images. Current multi-exposure fusion (MEF) approaches use hand-crafted features to fuse input sequence. However, the weak hand-crafted representations are not robust to varying input conditions. Moreover, they perform poorly for extreme exposure image pairs. Thus, it is highly desirable to have a method that is robust to varying input conditions and capable of handling extreme exposure without artifacts. Deep representations have known to be robust to input conditions and have shown phenomenal performance in a supervised setting. However, the stumbling block in using deep learning for MEF was the lack of sufficient training data and an oracle to provide the ground-truth for supervision. To address the above issues, we have gathered a large dataset of multi-exposure image stacks for training and to circumvent the need for ground truth images, we propose an unsupervised deep learning framework for MEF utilizing a no-reference quality metric as loss function. The proposed approach uses a novel CNN architecture trained to learn the fusion operation without reference ground truth image. The model fuses a set of common low level features extracted from each image to generate artifact-free perceptually pleasing results. We perform extensive quantitative and qualitative evaluation and show that the proposed technique outperforms existing state-of-the-art approaches for a variety of natural images.
△ Less
Submitted 20 December, 2017;
originally announced December 2017.
-
NAG: Network for Adversary Generation
Authors:
Konda Reddy Mopuri,
Utkarsh Ojha,
Utsav Garg,
R. Venkatesh Babu
Abstract:
Adversarial perturbations can pose a serious threat for deploying machine learning systems. Recent works have shown existence of image-agnostic perturbations that can fool classifiers over most natural images. Existing methods present optimization approaches that solve for a fooling objective with an imperceptibility constraint to craft the perturbations. However, for a given classifier, they gene…
▽ More
Adversarial perturbations can pose a serious threat for deploying machine learning systems. Recent works have shown existence of image-agnostic perturbations that can fool classifiers over most natural images. Existing methods present optimization approaches that solve for a fooling objective with an imperceptibility constraint to craft the perturbations. However, for a given classifier, they generate one perturbation at a time, which is a single instance from the manifold of adversarial perturbations. Also, in order to build robust models, it is essential to explore the manifold of adversarial perturbations. In this paper, we propose for the first time, a generative approach to model the distribution of adversarial perturbations. The architecture of the proposed model is inspired from that of GANs and is trained using fooling and diversity objectives. Our trained generator network attempts to capture the distribution of adversarial perturbations for a given classifier and readily generates a wide variety of such perturbations. Our experimental evaluation demonstrates that perturbations crafted by our model (i) achieve state-of-the-art fooling rates, (ii) exhibit wide variety and (iii) deliver excellent cross model generalizability. Our work can be deemed as an important step in the process of inferring about the complex manifolds of adversarial perturbations.
△ Less
Submitted 28 March, 2018; v1 submitted 9 December, 2017;
originally announced December 2017.
-
SketchParse : Towards Rich Descriptions for Poorly Drawn Sketches using Multi-Task Hierarchical Deep Networks
Authors:
Ravi Kiran Sarvadevabhatla,
Isht Dwivedi,
Abhijat Biswas,
Sahil Manocha,
R. Venkatesh Babu
Abstract:
The ability to semantically interpret hand-drawn line sketches, although very challenging, can pave way for novel applications in multimedia. We propose SketchParse, the first deep-network architecture for fully automatic parsing of freehand object sketches. SketchParse is configured as a two-level fully convolutional network. The first level contains shared layers common to all object categories.…
▽ More
The ability to semantically interpret hand-drawn line sketches, although very challenging, can pave way for novel applications in multimedia. We propose SketchParse, the first deep-network architecture for fully automatic parsing of freehand object sketches. SketchParse is configured as a two-level fully convolutional network. The first level contains shared layers common to all object categories. The second level contains a number of expert sub-networks. Each expert specializes in parsing sketches from object categories which contain structurally similar parts. Effectively, the two-level configuration enables our architecture to scale up efficiently as additional categories are added. We introduce a router layer which (i) relays sketch features from shared layers to the correct expert (ii) eliminates the need to manually specify object category during inference. To bypass laborious part-level annotation, we sketchify photos from semantic object-part image datasets and use them for training. Our architecture also incorporates object pose prediction as a novel auxiliary task which boosts overall performance while providing supplementary information regarding the sketch. We demonstrate SketchParse's abilities (i) on two challenging large-scale sketch datasets (ii) in parsing unseen, semantically related object categories (iii) in improving fine-grained sketch-based image retrieval. As a novel application, we also outline how SketchParse's output can be used to generate caption-style descriptions for hand-drawn sketches.
△ Less
Submitted 5 September, 2017;
originally announced September 2017.
-
CNN Fixations: An unraveling approach to visualize the discriminative image regions
Authors:
Konda Reddy Mopuri,
Utsav Garg,
R. Venkatesh Babu
Abstract:
Deep convolutional neural networks (CNN) have revolutionized various fields of vision research and have seen unprecedented adoption for multiple tasks such as classification, detection, captioning, etc. However, they offer little transparency into their inner workings and are often treated as black boxes that deliver excellent performance. In this work, we aim at alleviating this opaqueness of CNN…
▽ More
Deep convolutional neural networks (CNN) have revolutionized various fields of vision research and have seen unprecedented adoption for multiple tasks such as classification, detection, captioning, etc. However, they offer little transparency into their inner workings and are often treated as black boxes that deliver excellent performance. In this work, we aim at alleviating this opaqueness of CNNs by providing visual explanations for the network's predictions. Our approach can analyze variety of CNN based models trained for vision applications such as object recognition and caption generation. Unlike existing methods, we achieve this via unraveling the forward pass operation. Proposed method exploits feature dependencies across the layer hierarchy and uncovers the discriminative image locations that guide the network's predictions. We name these locations CNN-Fixations, loosely analogous to human eye fixations.
Our approach is a generic method that requires no architectural changes, additional training or gradient computation and computes the important image locations (CNN Fixations). We demonstrate through a variety of applications that our approach is able to localize the discriminative image locations across different network architectures, diverse vision tasks and data modalities.
△ Less
Submitted 12 December, 2018; v1 submitted 22 August, 2017;
originally announced August 2017.
-
Image Denoising via CNNs: An Adversarial Approach
Authors:
Nithish Divakar,
R. Venkatesh Babu
Abstract:
Is it possible to recover an image from its noisy version using convolutional neural networks? This is an interesting problem as convolutional layers are generally used as feature detectors for tasks like classification, segmentation and object detection. We present a new CNN architecture for blind image denoising which synergically combines three architecture components, a multi-scale feature ext…
▽ More
Is it possible to recover an image from its noisy version using convolutional neural networks? This is an interesting problem as convolutional layers are generally used as feature detectors for tasks like classification, segmentation and object detection. We present a new CNN architecture for blind image denoising which synergically combines three architecture components, a multi-scale feature extraction layer which helps in reducing the effect of noise on feature maps, an l_p regularizer which helps in selecting only the appropriate feature maps for the task of reconstruction, and finally a three step training approach which leverages adversarial training to give the final performance boost to the model. The proposed model shows competitive denoising performance when compared to the state-of-the-art approaches.
△ Less
Submitted 1 August, 2017;
originally announced August 2017.
-
Confidence estimation in Deep Neural networks via density modelling
Authors:
Akshayvarun Subramanya,
Suraj Srinivas,
R. Venkatesh Babu
Abstract:
State-of-the-art Deep Neural Networks can be easily fooled into providing incorrect high-confidence predictions for images with small amounts of adversarial noise. Does this expose a flaw with deep neural networks, or do we simply need a better way to estimate confidence? In this paper we consider the problem of accurately estimating predictive confidence. We formulate this problem as that of dens…
▽ More
State-of-the-art Deep Neural Networks can be easily fooled into providing incorrect high-confidence predictions for images with small amounts of adversarial noise. Does this expose a flaw with deep neural networks, or do we simply need a better way to estimate confidence? In this paper we consider the problem of accurately estimating predictive confidence. We formulate this problem as that of density modelling, and show how traditional methods such as softmax produce poor estimates. To address this issue, we propose a novel confidence measure based on density modelling approaches. We test these measures on images distorted by blur, JPEG compression, random noise and adversarial noise. Experiments show that our confidence measure consistently shows reduced confidence scores in the presence of such distortions - a property which softmax often lacks.
△ Less
Submitted 21 July, 2017;
originally announced July 2017.
-
Fast Feature Fool: A data independent approach to universal adversarial perturbations
Authors:
Konda Reddy Mopuri,
Utsav Garg,
R. Venkatesh Babu
Abstract:
State-of-the-art object recognition Convolutional Neural Networks (CNNs) are shown to be fooled by image agnostic perturbations, called universal adversarial perturbations. It is also observed that these perturbations generalize across multiple networks trained on the same target data. However, these algorithms require training data on which the CNNs were trained and compute adversarial perturbati…
▽ More
State-of-the-art object recognition Convolutional Neural Networks (CNNs) are shown to be fooled by image agnostic perturbations, called universal adversarial perturbations. It is also observed that these perturbations generalize across multiple networks trained on the same target data. However, these algorithms require training data on which the CNNs were trained and compute adversarial perturbations via complex optimization. The fooling performance of these approaches is directly proportional to the amount of available training data. This makes them unsuitable for practical attacks since its unreasonable for an attacker to have access to the training data. In this paper, for the first time, we propose a novel data independent approach to generate image agnostic perturbations for a range of CNNs trained for object recognition. We further show that these perturbations are transferable across multiple network architectures trained either on same or different data. In the absence of data, our method generates universal adversarial perturbations efficiently via fooling the features learned at multiple layers thereby causing CNNs to misclassify. Experiments demonstrate impressive fooling rates and surprising transferability for the proposed universal perturbations generated without any training data.
△ Less
Submitted 18 July, 2017;
originally announced July 2017.
-
DeLiGAN : Generative Adversarial Networks for Diverse and Limited Data
Authors:
Swaminathan Gurumurthy,
Ravi Kiran Sarvadevabhatla,
Venkatesh Babu Radhakrishnan
Abstract:
A class of recent approaches for generating images, called Generative Adversarial Networks (GAN), have been used to generate impressively realistic images of objects, bedrooms, handwritten digits and a variety of other image modalities. However, typical GAN-based approaches require large amounts of training data to capture the diversity across the image modality. In this paper, we propose DeLiGAN…
▽ More
A class of recent approaches for generating images, called Generative Adversarial Networks (GAN), have been used to generate impressively realistic images of objects, bedrooms, handwritten digits and a variety of other image modalities. However, typical GAN-based approaches require large amounts of training data to capture the diversity across the image modality. In this paper, we propose DeLiGAN -- a novel GAN-based architecture for diverse and limited training data scenarios. In our approach, we reparameterize the latent generative space as a mixture model and learn the mixture model's parameters along with those of GAN. This seemingly simple modification to the GAN framework is surprisingly effective and results in models which enable diversity in generated samples although trained with limited data. In our work, we show that DeLiGAN can generate images of handwritten digits, objects and hand-drawn sketches, all using limited amounts of data. To quantitatively characterize intra-class diversity of generated samples, we also introduce a modified version of "inception-score", a measure which has been found to correlate well with human assessment of generated samples.
△ Less
Submitted 7 June, 2017;
originally announced June 2017.
-
Deep image representations using caption generators
Authors:
Konda Reddy Mopuri,
Vishal B. Athreya,
R. Venkatesh Babu
Abstract:
Deep learning exploits large volumes of labeled data to learn powerful models. When the target dataset is small, it is a common practice to perform transfer learning using pre-trained models to learn new task specific representations. However, pre-trained CNNs for image recognition are provided with limited information about the image during training, which is label alone. Tasks such as scene retr…
▽ More
Deep learning exploits large volumes of labeled data to learn powerful models. When the target dataset is small, it is a common practice to perform transfer learning using pre-trained models to learn new task specific representations. However, pre-trained CNNs for image recognition are provided with limited information about the image during training, which is label alone. Tasks such as scene retrieval suffer from features learned from this weak supervision and require stronger supervision to better understand the contents of the image. In this paper, we exploit the features learned from caption generating models to learn novel task specific image representations. In particular, we consider the state-of-the art captioning system Show and Tell~\cite{SnT-pami-2016} and the dense region description model DenseCap~\cite{densecap-cvpr-2016}. We demonstrate that, owing to richer supervision provided during the process of training, the features learned by the captioning system perform better than those of CNNs. Further, we train a siamese network with a modified pair-wise loss to fuse the features learned by~\cite{SnT-pami-2016} and~\cite{densecap-cvpr-2016} and learn image representations suitable for retrieval. Experiments show that the proposed fusion exploits the complementary nature of the individual features and yields state-of-the art retrieval results on benchmark datasets.
△ Less
Submitted 25 May, 2017;
originally announced May 2017.
-
Object category understanding via eye fixations on freehand sketches
Authors:
Ravi Kiran Sarvadevabhatla,
Sudharshan Suresh,
R. Venkatesh Babu
Abstract:
The study of eye gaze fixations on photographic images is an active research area. In contrast, the image subcategory of freehand sketches has not received as much attention for such studies. In this paper, we analyze the results of a free-viewing gaze fixation study conducted on 3904 freehand sketches distributed across 160 object categories. Our analysis shows that fixation sequences exhibit mar…
▽ More
The study of eye gaze fixations on photographic images is an active research area. In contrast, the image subcategory of freehand sketches has not received as much attention for such studies. In this paper, we analyze the results of a free-viewing gaze fixation study conducted on 3904 freehand sketches distributed across 160 object categories. Our analysis shows that fixation sequences exhibit marked consistency within a sketch, across sketches of a category and even across suitably grouped sets of categories. This multi-level consistency is remarkable given the variability in depiction and extreme image content sparsity that characterizes hand-drawn object sketches. In our paper, we show that the multi-level consistency in the fixation data can be exploited to (a) predict a test sketch's category given only its fixation sequence and (b) build a computational model which predicts part-labels underlying fixations on objects. We hope that our findings motivate the community to deem sketch-like representations worthy of gaze-based studies vis-a-vis photographic images.
△ Less
Submitted 19 March, 2017;
originally announced March 2017.
-
'Part'ly first among equals: Semantic part-based benchmarking for state-of-the-art object recognition systems
Authors:
Ravi Kiran Sarvadevabhatla,
Shanthakumar Venkatraman,
R. Venkatesh Babu
Abstract:
An examination of object recognition challenge leaderboards (ILSVRC, PASCAL-VOC) reveals that the top-performing classifiers typically exhibit small differences amongst themselves in terms of error rate/mAP. To better differentiate the top performers, additional criteria are required. Moreover, the (test) images, on which the performance scores are based, predominantly contain fully visible object…
▽ More
An examination of object recognition challenge leaderboards (ILSVRC, PASCAL-VOC) reveals that the top-performing classifiers typically exhibit small differences amongst themselves in terms of error rate/mAP. To better differentiate the top performers, additional criteria are required. Moreover, the (test) images, on which the performance scores are based, predominantly contain fully visible objects. Therefore, `harder' test images, mimicking the challenging conditions (e.g. occlusion) in which humans routinely recognize objects, need to be utilized for benchmarking. To address the concerns mentioned above, we make two contributions. First, we systematically vary the level of local object-part content, global detail and spatial context in images from PASCAL VOC 2010 to create a new benchmarking dataset dubbed PPSS-12. Second, we propose an object-part based benchmarking procedure which quantifies classifiers' robustness to a range of visibility and contextual settings. The benchmarking procedure relies on a semantic similarity measure that naturally addresses potential semantic granularity differences between the category labels in training and test datasets, thus eliminating manual mapping. We use our procedure on the PPSS-12 dataset to benchmark top-performing classifiers trained on the ILSVRC-2012 dataset. Our results show that the proposed benchmarking procedure enables additional differentiation among state-of-the-art object classifiers in terms of their ability to handle missing content and insufficient object detail. Given this capability for additional differentiation, our approach can potentially supplement existing benchmarking procedures used in object recognition challenge leaderboards.
△ Less
Submitted 24 November, 2016; v1 submitted 23 November, 2016;
originally announced November 2016.
-
Generalized Dropout
Authors:
Suraj Srinivas,
R. Venkatesh Babu
Abstract:
Deep Neural Networks often require good regularizers to generalize well. Dropout is one such regularizer that is widely used among Deep Learning practitioners. Recent work has shown that Dropout can also be viewed as performing Approximate Bayesian Inference over the network parameters. In this work, we generalize this notion and introduce a rich family of regularizers which we call Generalized Dr…
▽ More
Deep Neural Networks often require good regularizers to generalize well. Dropout is one such regularizer that is widely used among Deep Learning practitioners. Recent work has shown that Dropout can also be viewed as performing Approximate Bayesian Inference over the network parameters. In this work, we generalize this notion and introduce a rich family of regularizers which we call Generalized Dropout. One set of methods in this family, called Dropout++, is a version of Dropout with trainable parameters. Classical Dropout emerges as a special case of this method. Another member of this family selects the width of neural network layers. Experiments show that these methods help in improving generalization performance over Dropout.
△ Less
Submitted 21 November, 2016;
originally announced November 2016.
-
Training Sparse Neural Networks
Authors:
Suraj Srinivas,
Akshayvarun Subramanya,
R. Venkatesh Babu
Abstract:
Deep neural networks with lots of parameters are typically used for large-scale computer vision tasks such as image classification. This is a result of using dense matrix multiplications and convolutions. However, sparse computations are known to be much more efficient. In this work, we train and build neural networks which implicitly use sparse computations. We introduce additional gate variables…
▽ More
Deep neural networks with lots of parameters are typically used for large-scale computer vision tasks such as image classification. This is a result of using dense matrix multiplications and convolutions. However, sparse computations are known to be much more efficient. In this work, we train and build neural networks which implicitly use sparse computations. We introduce additional gate variables to perform parameter selection and show that this is equivalent to using a spike-and-slab prior. We experimentally validate our method on both small and large networks and achieve state-of-the-art compression results for sparse neural network models.
△ Less
Submitted 21 November, 2016;
originally announced November 2016.
-
Compensating for Large In-Plane Rotations in Natural Images
Authors:
Lokesh Boominathan,
Suraj Srinivas,
R. Venkatesh Babu
Abstract:
Rotation invariance has been studied in the computer vision community primarily in the context of small in-plane rotations. This is usually achieved by building invariant image features. However, the problem of achieving invariance for large rotation angles remains largely unexplored. In this work, we tackle this problem by directly compensating for large rotations, as opposed to building invarian…
▽ More
Rotation invariance has been studied in the computer vision community primarily in the context of small in-plane rotations. This is usually achieved by building invariant image features. However, the problem of achieving invariance for large rotation angles remains largely unexplored. In this work, we tackle this problem by directly compensating for large rotations, as opposed to building invariant features. This is inspired by the neuro-scientific concept of mental rotation, which humans use to compare pairs of rotated objects. Our contributions here are three-fold. First, we train a Convolutional Neural Network (CNN) to detect image rotations. We find that generic CNN architectures are not suitable for this purpose. To this end, we introduce a convolutional template layer, which learns representations for canonical 'unrotated' images. Second, we use Bayesian Optimization to quickly sift through a large number of candidate images to find the canonical 'unrotated' image. Third, we use this method to achieve robustness to large angles in an image retrieval scenario. Our method is task-agnostic, and can be used as a pre-processing step in any computer vision system.
△ Less
Submitted 17 November, 2016;
originally announced November 2016.
-
CrowdNet: A Deep Convolutional Network for Dense Crowd Counting
Authors:
Lokesh Boominathan,
Srinivas S S Kruthiventi,
R. Venkatesh Babu
Abstract:
Our work proposes a novel deep learning framework for estimating crowd density from static images of highly dense crowds. We use a combination of deep and shallow, fully convolutional networks to predict the density map for a given crowd image. Such a combination is used for effectively capturing both the high-level semantic information (face/body detectors) and the low-level features (blob detect…
▽ More
Our work proposes a novel deep learning framework for estimating crowd density from static images of highly dense crowds. We use a combination of deep and shallow, fully convolutional networks to predict the density map for a given crowd image. Such a combination is used for effectively capturing both the high-level semantic information (face/body detectors) and the low-level features (blob detectors), that are necessary for crowd counting under large scale variations. As most crowd datasets have limited training samples (<100 images) and deep learning based approaches require large amounts of training data, we perform multi-scale data augmentation. Augmenting the training samples in such a manner helps in guiding the CNN to learn scale invariant representations. Our method is tested on the challenging UCF_CC_50 dataset, and shown to outperform the state of the art methods.
△ Less
Submitted 22 August, 2016;
originally announced August 2016.
-
Enabling My Robot To Play Pictionary : Recurrent Neural Networks For Sketch Recognition
Authors:
Ravi Kiran Sarvadevabhatla,
Jogendra Kundu,
Babu R. Venkatesh
Abstract:
Freehand sketching is an inherently sequential process. Yet, most approaches for hand-drawn sketch recognition either ignore this sequential aspect or exploit it in an ad-hoc manner. In our work, we propose a recurrent neural network architecture for sketch object recognition which exploits the long-term sequential and structural regularities in stroke data in a scalable manner. Specifically, we i…
▽ More
Freehand sketching is an inherently sequential process. Yet, most approaches for hand-drawn sketch recognition either ignore this sequential aspect or exploit it in an ad-hoc manner. In our work, we propose a recurrent neural network architecture for sketch object recognition which exploits the long-term sequential and structural regularities in stroke data in a scalable manner. Specifically, we introduce a Gated Recurrent Unit based framework which leverages deep sketch features and weighted per-timestep loss to achieve state-of-the-art results on a large database of freehand object sketches across a large number of object categories. The inherently online nature of our framework is especially suited for on-the-fly recognition of objects as they are being drawn. Thus, our framework can enable interesting applications such as camera-equipped robots playing the popular party game Pictionary with human players and generating sparsified yet recognizable sketches of objects.
△ Less
Submitted 11 August, 2016;
originally announced August 2016.
-
SwiDeN : Convolutional Neural Networks For Depiction Invariant Object Recognition
Authors:
Ravi Kiran Sarvadevabhatla,
Shiv Surya,
Srinivas S S Kruthiventi,
Venkatesh Babu R
Abstract:
Current state of the art object recognition architectures achieve impressive performance but are typically specialized for a single depictive style (e.g. photos only, sketches only). In this paper, we present SwiDeN : our Convolutional Neural Network (CNN) architecture which recognizes objects regardless of how they are visually depicted (line drawing, realistic shaded drawing, photograph etc.). I…
▽ More
Current state of the art object recognition architectures achieve impressive performance but are typically specialized for a single depictive style (e.g. photos only, sketches only). In this paper, we present SwiDeN : our Convolutional Neural Network (CNN) architecture which recognizes objects regardless of how they are visually depicted (line drawing, realistic shaded drawing, photograph etc.). In SwiDeN, we utilize a novel `deep' depictive style-based switching mechanism which appropriately addresses the depiction-specific and depiction-invariant aspects of the problem. We compare SwiDeN with alternative architectures and prior work on a 50-category Photo-Art dataset containing objects depicted in multiple styles. Experimental results show that SwiDeN outperforms other approaches for the depiction-invariant object recognition problem.
△ Less
Submitted 29 July, 2016;
originally announced July 2016.
-
A Taxonomy of Deep Convolutional Neural Nets for Computer Vision
Authors:
Suraj Srinivas,
Ravi Kiran Sarvadevabhatla,
Konda Reddy Mopuri,
Nikita Prabhu,
Srinivas S S Kruthiventi,
R. Venkatesh Babu
Abstract:
Traditional architectures for solving computer vision problems and the degree of success they enjoyed have been heavily reliant on hand-crafted features. However, of late, deep learning techniques have offered a compelling alternative -- that of automatically learning problem-specific features. With this new paradigm, every problem in computer vision is now being re-examined from a deep learning p…
▽ More
Traditional architectures for solving computer vision problems and the degree of success they enjoyed have been heavily reliant on hand-crafted features. However, of late, deep learning techniques have offered a compelling alternative -- that of automatically learning problem-specific features. With this new paradigm, every problem in computer vision is now being re-examined from a deep learning perspective. Therefore, it has become important to understand what kind of deep networks are suitable for a given problem. Although general surveys of this fast-moving paradigm (i.e. deep-networks) exist, a survey specific to computer vision is missing. We specifically consider one form of deep networks widely used in computer vision - convolutional neural networks (CNNs). We start with "AlexNet" as our base CNN and then examine the broad variations proposed over time to suit different applications. We hope that our recipe-style survey will serve as a guide, particularly for novice practitioners intending to use deep-learning techniques for computer vision.
△ Less
Submitted 25 January, 2016;
originally announced January 2016.
-
Learning Neural Network Architectures using Backpropagation
Authors:
Suraj Srinivas,
R. Venkatesh Babu
Abstract:
Deep neural networks with millions of parameters are at the heart of many state of the art machine learning models today. However, recent works have shown that models with much smaller number of parameters can also perform just as well. In this work, we introduce the problem of architecture-learning, i.e; learning the architecture of a neural network along with weights. We introduce a new trainabl…
▽ More
Deep neural networks with millions of parameters are at the heart of many state of the art machine learning models today. However, recent works have shown that models with much smaller number of parameters can also perform just as well. In this work, we introduce the problem of architecture-learning, i.e; learning the architecture of a neural network along with weights. We introduce a new trainable parameter called tri-state ReLU, which helps in eliminating unnecessary neurons. We also propose a smooth regularizer which encourages the total number of neurons after elimination to be small. The resulting objective is differentiable and simple to optimize. We experimentally validate our method on both small and large networks, and show that it can learn models with a considerably small number of parameters without affecting prediction accuracy.
△ Less
Submitted 2 August, 2016; v1 submitted 17 November, 2015;
originally announced November 2015.
-
DeepFix: A Fully Convolutional Neural Network for predicting Human Eye Fixations
Authors:
Srinivas S. S. Kruthiventi,
Kumar Ayush,
R. Venkatesh Babu
Abstract:
Understanding and predicting the human visual attentional mechanism is an active area of research in the fields of neuroscience and computer vision. In this work, we propose DeepFix, a first-of-its-kind fully convolutional neural network for accurate saliency prediction. Unlike classical works which characterize the saliency map using various hand-crafted features, our model automatically learns f…
▽ More
Understanding and predicting the human visual attentional mechanism is an active area of research in the fields of neuroscience and computer vision. In this work, we propose DeepFix, a first-of-its-kind fully convolutional neural network for accurate saliency prediction. Unlike classical works which characterize the saliency map using various hand-crafted features, our model automatically learns features in a hierarchical fashion and predicts saliency map in an end-to-end manner. DeepFix is designed to capture semantics at multiple scales while taking global context into account using network layers with very large receptive fields. Generally, fully convolutional nets are spatially invariant which prevents them from modeling location dependent patterns (e.g. centre-bias). Our network overcomes this limitation by incorporating a novel Location Biased Convolutional layer. We evaluate our model on two challenging eye fixation datasets -- MIT300, CAT2000 and show that it outperforms other recent approaches by a significant margin.
△ Less
Submitted 10 October, 2015;
originally announced October 2015.
-
Attribute-Graph: A Graph based approach to Image Ranking
Authors:
Nikita Prabhu,
R. Venkatesh Babu
Abstract:
We propose a novel image representation, termed Attribute-Graph, to rank images by their semantic similarity to a given query image. An Attribute-Graph is an undirected fully connected graph, incorporating both local and global image characteristics. The graph nodes characterise objects as well as the overall scene context using mid-level semantic attributes, while the edges capture the object top…
▽ More
We propose a novel image representation, termed Attribute-Graph, to rank images by their semantic similarity to a given query image. An Attribute-Graph is an undirected fully connected graph, incorporating both local and global image characteristics. The graph nodes characterise objects as well as the overall scene context using mid-level semantic attributes, while the edges capture the object topology. We demonstrate the effectiveness of Attribute-Graphs by applying them to the problem of image ranking. We benchmark the performance of our algorithm on the 'rPascal' and 'rImageNet' datasets, which we have created in order to evaluate the ranking performance on complex queries containing multiple objects. Our experimental evaluation shows that modelling images as Attribute-Graphs results in improved ranking performance over existing techniques.
△ Less
Submitted 8 October, 2015; v1 submitted 22 September, 2015;
originally announced September 2015.
-
Analyzing structural characteristics of object category representations from their semantic-part distributions
Authors:
Ravi Kiran Sarvadevabhatla,
Venkatesh Babu R
Abstract:
Studies from neuroscience show that part-mapping computations are employed by human visual system in the process of object recognition. In this work, we present an approach for analyzing semantic-part characteristics of object category representations. For our experiments, we use category-epitome, a recently proposed sketch-based spatial representation for objects. To enable part-importance analys…
▽ More
Studies from neuroscience show that part-mapping computations are employed by human visual system in the process of object recognition. In this work, we present an approach for analyzing semantic-part characteristics of object category representations. For our experiments, we use category-epitome, a recently proposed sketch-based spatial representation for objects. To enable part-importance analysis, we first obtain semantic-part annotations of hand-drawn sketches originally used to construct the corresponding epitomes. We then examine the extent to which the semantic-parts are present in the epitomes of a category and visualize the relative importance of parts as a word cloud. Finally, we show how such word cloud visualizations provide an intuitive understanding of category-level structural trends that exist in the category-epitome object representations.
△ Less
Submitted 15 September, 2015;
originally announced September 2015.
-
Data-free parameter pruning for Deep Neural Networks
Authors:
Suraj Srinivas,
R. Venkatesh Babu
Abstract:
Deep Neural nets (NNs) with millions of parameters are at the heart of many state-of-the-art computer vision systems today. However, recent works have shown that much smaller models can achieve similar levels of performance. In this work, we address the problem of pruning parameters in a trained NN model. Instead of removing individual weights one at a time as done in previous works, we remove one…
▽ More
Deep Neural nets (NNs) with millions of parameters are at the heart of many state-of-the-art computer vision systems today. However, recent works have shown that much smaller models can achieve similar levels of performance. In this work, we address the problem of pruning parameters in a trained NN model. Instead of removing individual weights one at a time as done in previous works, we remove one neuron at a time. We show how similar neurons are redundant, and propose a systematic way to remove them. Our experiments in pruning the densely connected layers show that we can remove upto 85\% of the total parameters in an MNIST-trained network, and about 35\% for AlexNet without significantly affecting performance. Our method can be applied on top of most networks with a fully connected layer to give a smaller network.
△ Less
Submitted 22 July, 2015;
originally announced July 2015.