-
FlowCut: Unsupervised Video Instance Segmentation via Temporal Mask Matching
Authors:
Alp Eren Sari,
Paolo Favaro
Abstract:
We propose FlowCut, a simple and capable method for unsupervised video instance segmentation consisting of a three-stage framework to construct a high-quality video dataset with pseudo labels. To our knowledge, our work is the first attempt to curate a video dataset with pseudo-labels for unsupervised video instance segmentation. In the first stage, we generate pseudo-instance masks by exploiting…
▽ More
We propose FlowCut, a simple and capable method for unsupervised video instance segmentation consisting of a three-stage framework to construct a high-quality video dataset with pseudo labels. To our knowledge, our work is the first attempt to curate a video dataset with pseudo-labels for unsupervised video instance segmentation. In the first stage, we generate pseudo-instance masks by exploiting the affinities of features from both images and optical flows. In the second stage, we construct short video segments containing high-quality, consistent pseudo-instance masks by temporally matching them across the frames. In the third stage, we use the YouTubeVIS-2021 video dataset to extract our training instance segmentation set, and then train a video segmentation model. FlowCut achieves state-of-the-art performance on the YouTubeVIS-2019, YouTubeVIS-2021, DAVIS-2017, and DAVIS-2017 Motion benchmarks.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
Boosting Unsupervised Segmentation Learning
Authors:
Alp Eren Sari,
Francesco Locatello,
Paolo Favaro
Abstract:
We present two practical improvement techniques for unsupervised segmentation learning. These techniques address limitations in the resolution and accuracy of predicted segmentation maps of recent state-of-the-art methods. Firstly, we leverage image post-processing techniques such as guided filtering to refine the output masks, improving accuracy while avoiding substantial computational costs. Sec…
▽ More
We present two practical improvement techniques for unsupervised segmentation learning. These techniques address limitations in the resolution and accuracy of predicted segmentation maps of recent state-of-the-art methods. Firstly, we leverage image post-processing techniques such as guided filtering to refine the output masks, improving accuracy while avoiding substantial computational costs. Secondly, we introduce a multi-scale consistency criterion, based on a teacher-student training scheme. This criterion matches segmentation masks predicted from regions of the input image extracted at different resolutions to each other. Experimental results on several benchmarks used in unsupervised segmentation learning demonstrate the effectiveness of our proposed techniques.
△ Less
Submitted 28 November, 2024; v1 submitted 4 April, 2024;
originally announced April 2024.
-
Efficient Training Under Limited Resources
Authors:
Mahdi Zolnouri,
Dounia Lakhmiri,
Christophe Tribes,
Eyyüb Sari,
Sébastien Le Digabel
Abstract:
Training time budget and size of the dataset are among the factors affecting the performance of a Deep Neural Network (DNN). This paper shows that Neural Architecture Search (NAS), Hyper Parameters Optimization (HPO), and Data Augmentation help DNNs perform much better while these two factors are limited. However, searching for an optimal architecture and the best hyperparameter values besides a g…
▽ More
Training time budget and size of the dataset are among the factors affecting the performance of a Deep Neural Network (DNN). This paper shows that Neural Architecture Search (NAS), Hyper Parameters Optimization (HPO), and Data Augmentation help DNNs perform much better while these two factors are limited. However, searching for an optimal architecture and the best hyperparameter values besides a good combination of data augmentation techniques under low resources requires many experiments. We present our approach to achieving such a goal in three steps: reducing training epoch time by compressing the model while maintaining the performance compared to the original model, preventing model overfitting when the dataset is small, and performing the hyperparameter tuning. We used NOMAD, which is a blackbox optimization software based on a derivative-free algorithm to do NAS and HPO. Our work achieved an accuracy of 86.0 % on a tiny subset of Mini-ImageNet at the ICLR 2021 Hardware Aware Efficient Training (HAET) Challenge and won second place in the competition. The competition results can be found at haet2021.github.io/challenge and our source code can be found at github.com/DouniaLakhmiri/ICLR\_HAET2021.
△ Less
Submitted 22 January, 2023;
originally announced January 2023.
-
Training Integer-Only Deep Recurrent Neural Networks
Authors:
Vahid Partovi Nia,
Eyyüb Sari,
Vanessa Courville,
Masoud Asgharian
Abstract:
Recurrent neural networks (RNN) are the backbone of many text and speech applications. These architectures are typically made up of several computationally complex components such as; non-linear activation functions, normalization, bi-directional dependence and attention. In order to maintain good accuracy, these components are frequently run using full-precision floating-point computation, making…
▽ More
Recurrent neural networks (RNN) are the backbone of many text and speech applications. These architectures are typically made up of several computationally complex components such as; non-linear activation functions, normalization, bi-directional dependence and attention. In order to maintain good accuracy, these components are frequently run using full-precision floating-point computation, making them slow, inefficient and difficult to deploy on edge devices. In addition, the complex nature of these operations makes them challenging to quantize using standard quantization methods without a significant performance drop. We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN). Our approach supports layer normalization, attention, and an adaptive piecewise linear (PWL) approximation of activation functions, to serve a wide range of state-of-the-art RNNs. The proposed method enables RNN-based language models to run on edge devices with $2\times$ improvement in runtime, and $4\times$ reduction in model size while maintaining similar accuracy as its full-precision counterpart.
△ Less
Submitted 22 December, 2022;
originally announced December 2022.
-
Demystifying and Generalizing BinaryConnect
Authors:
Tim Dockhorn,
Yaoliang Yu,
Eyyüb Sari,
Mahdi Zolnouri,
Vahid Partovi Nia
Abstract:
BinaryConnect (BC) and its many variations have become the de facto standard for neural network quantization. However, our understanding of the inner workings of BC is still quite limited. We attempt to close this gap in four different aspects: (a) we show that existing quantization algorithms, including post-training quantization, are surprisingly similar to each other; (b) we argue for proximal…
▽ More
BinaryConnect (BC) and its many variations have become the de facto standard for neural network quantization. However, our understanding of the inner workings of BC is still quite limited. We attempt to close this gap in four different aspects: (a) we show that existing quantization algorithms, including post-training quantization, are surprisingly similar to each other; (b) we argue for proximal maps as a natural family of quantizers that is both easy to design and analyze; (c) we refine the observation that BC is a special case of dual averaging, which itself is a special case of the generalized conditional gradient algorithm; (d) consequently, we propose ProxConnect (PC) as a generalization of BC and we prove its convergence properties by exploiting the established connections. We conduct experiments on CIFAR-10 and ImageNet, and verify that PC achieves competitive performance.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
iRNN: Integer-only Recurrent Neural Network
Authors:
Eyyüb Sari,
Vanessa Courville,
Vahid Partovi Nia
Abstract:
Recurrent neural networks (RNN) are used in many real-world text and speech applications. They include complex modules such as recurrence, exponential-based activation, gate interaction, unfoldable normalization, bi-directional dependence, and attention. The interaction between these elements prevents running them on integer-only operations without a significant performance drop. Deploying RNNs th…
▽ More
Recurrent neural networks (RNN) are used in many real-world text and speech applications. They include complex modules such as recurrence, exponential-based activation, gate interaction, unfoldable normalization, bi-directional dependence, and attention. The interaction between these elements prevents running them on integer-only operations without a significant performance drop. Deploying RNNs that include layer normalization and attention on integer-only arithmetic is still an open problem. We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN). Our approach supports layer normalization, attention, and an adaptive piecewise linear approximation of activations (PWL), to serve a wide range of RNNs on various applications. The proposed method is proven to work on RNN-based language models and challenging automatic speech recognition, enabling AI applications on the edge. Our iRNN maintains similar performance as its full-precision counterpart, their deployment on smartphones improves the runtime performance by $2\times$, and reduces the model size by $4\times$.
△ Less
Submitted 14 February, 2022; v1 submitted 20 September, 2021;
originally announced September 2021.
-
Generative Adversarial Learning via Kernel Density Discrimination
Authors:
Abdelhak Lemkhenter,
Adam Bielski,
Alp Eren Sari,
Paolo Favaro
Abstract:
We introduce Kernel Density Discrimination GAN (KDD GAN), a novel method for generative adversarial learning. KDD GAN formulates the training as a likelihood ratio optimization problem where the data distributions are written explicitly via (local) Kernel Density Estimates (KDE). This is inspired by the recent progress in contrastive learning and its relation to KDE. We define the KDEs directly in…
▽ More
We introduce Kernel Density Discrimination GAN (KDD GAN), a novel method for generative adversarial learning. KDD GAN formulates the training as a likelihood ratio optimization problem where the data distributions are written explicitly via (local) Kernel Density Estimates (KDE). This is inspired by the recent progress in contrastive learning and its relation to KDE. We define the KDEs directly in feature space and forgo the requirement of invertibility of the kernel feature mappings. In our approach, features are no longer optimized for linear separability, as in the original GAN formulation, but for the more general discrimination of distributions in the feature space. We analyze the gradient of our loss with respect to the feature representation and show that it is better behaved than that of the original hinge loss. We perform experiments with the proposed KDE-based loss, used either as a training loss or a regularization term, on both CIFAR10 and scaled versions of ImageNet. We use BigGAN/SA-GAN as a backbone and baseline, since our focus is not to design the architecture of the networks. We show a boost in the quality of generated samples with respect to FID from 10% to 40% compared to the baseline. Code will be made available.
△ Less
Submitted 13 July, 2021;
originally announced July 2021.
-
Batch Normalization in Quantized Networks
Authors:
Eyyüb Sari,
Vahid Partovi Nia
Abstract:
Implementation of quantized neural networks on computing hardware leads to considerable speed up and memory saving. However, quantized deep networks are difficult to train and batch~normalization (BatchNorm) layer plays an important role in training full-precision and quantized networks. Most studies on BatchNorm are focused on full-precision networks, and there is little research in understanding…
▽ More
Implementation of quantized neural networks on computing hardware leads to considerable speed up and memory saving. However, quantized deep networks are difficult to train and batch~normalization (BatchNorm) layer plays an important role in training full-precision and quantized networks. Most studies on BatchNorm are focused on full-precision networks, and there is little research in understanding BatchNorm affect in quantized training which we address here. We show BatchNorm avoids gradient explosion which is counter-intuitive and recently observed in numerical experiments by other researchers.
△ Less
Submitted 29 April, 2020;
originally announced April 2020.
-
SelfVIO: Self-Supervised Deep Monocular Visual-Inertial Odometry and Depth Estimation
Authors:
Yasin Almalioglu,
Mehmet Turan,
Alp Eren Sari,
Muhamad Risqi U. Saputra,
Pedro P. B. de Gusmão,
Andrew Markham,
Niki Trigoni
Abstract:
In the last decade, numerous supervised deep learning approaches requiring large amounts of labeled data have been proposed for visual-inertial odometry (VIO) and depth map estimation. To overcome the data limitation, self-supervised learning has emerged as a promising alternative, exploiting constraints such as geometric and photometric consistency in the scene. In this study, we introduce a nove…
▽ More
In the last decade, numerous supervised deep learning approaches requiring large amounts of labeled data have been proposed for visual-inertial odometry (VIO) and depth map estimation. To overcome the data limitation, self-supervised learning has emerged as a promising alternative, exploiting constraints such as geometric and photometric consistency in the scene. In this study, we introduce a novel self-supervised deep learning-based VIO and depth map recovery approach (SelfVIO) using adversarial training and self-adaptive visual-inertial sensor fusion. SelfVIO learns to jointly estimate 6 degrees-of-freedom (6-DoF) ego-motion and a depth map of the scene from unlabeled monocular RGB image sequences and inertial measurement unit (IMU) readings. The proposed approach is able to perform VIO without the need for IMU intrinsic parameters and/or the extrinsic calibration between the IMU and the camera. estimation and single-view depth recovery network. We provide comprehensive quantitative and qualitative evaluations of the proposed framework comparing its performance with state-of-the-art VIO, VO, and visual simultaneous localization and mapping (VSLAM) approaches on the KITTI, EuRoC and Cityscapes datasets. Detailed comparisons prove that SelfVIO outperforms state-of-the-art VIO approaches in terms of pose estimation and depth recovery, making it a promising approach among existing methods in the literature.
△ Less
Submitted 23 July, 2020; v1 submitted 22 November, 2019;
originally announced November 2019.
-
Adaptive Binary-Ternary Quantization
Authors:
Ryan Razani,
Grégoire Morin,
Vahid Partovi Nia,
Eyyüb Sari
Abstract:
Neural network models are resource hungry. It is difficult to deploy such deep networks on devices with limited resources, like smart wearables, cellphones, drones, and autonomous vehicles. Low bit quantization such as binary and ternary quantization is a common approach to alleviate this resource requirements. Ternary quantization provides a more flexible model and outperforms binary quantization…
▽ More
Neural network models are resource hungry. It is difficult to deploy such deep networks on devices with limited resources, like smart wearables, cellphones, drones, and autonomous vehicles. Low bit quantization such as binary and ternary quantization is a common approach to alleviate this resource requirements. Ternary quantization provides a more flexible model and outperforms binary quantization in terms of accuracy, however doubles the memory footprint and increases the computational cost. Contrary to these approaches, mixed quantized models allow a trade-off between accuracy and memory footprint. In such models, quantization depth is often chosen manually, or is tuned using a separate optimization routine. The latter requires training a quantized network multiple times. Here, we propose an adaptive combination of binary and ternary quantization, namely Smart Quantization (SQ), in which the quantization depth is modified directly via a regularization function, so that the model is trained only once. Our experimental results show that the proposed method adapts quantization depth successfully while keeping the model accuracy high on MNIST and CIFAR10 benchmarks.
△ Less
Submitted 13 September, 2021; v1 submitted 26 September, 2019;
originally announced September 2019.
-
How Does Batch Normalization Help Binary Training?
Authors:
Eyyüb Sari,
Mouloud Belbahri,
Vahid Partovi Nia
Abstract:
Binary Neural Networks (BNNs) are difficult to train, and suffer from drop of accuracy. It appears in practice that BNNs fail to train in the absence of Batch Normalization (BatchNorm) layer. We find the main role of BatchNorm is to avoid exploding gradients in the case of BNNs. This finding suggests that the common initialization methods developed for full-precision networks are irrelevant to BNN…
▽ More
Binary Neural Networks (BNNs) are difficult to train, and suffer from drop of accuracy. It appears in practice that BNNs fail to train in the absence of Batch Normalization (BatchNorm) layer. We find the main role of BatchNorm is to avoid exploding gradients in the case of BNNs. This finding suggests that the common initialization methods developed for full-precision networks are irrelevant to BNNs. We build a theoretical study on the role of BatchNorm in binary training, backed up by numerical experiments.
△ Less
Submitted 29 April, 2020; v1 submitted 18 September, 2019;
originally announced September 2019.
-
Differentiable Mask for Pruning Convolutional and Recurrent Networks
Authors:
Ramchalam Kinattinkara Ramakrishnan,
Eyyüb Sari,
Vahid Partovi Nia
Abstract:
Pruning is one of the most effective model reduction techniques. Deep networks require massive computation and such models need to be compressed to bring them on edge devices. Most existing pruning techniques are focused on vision-based models like convolutional networks, while text-based models are still evolving. The emergence of multi-modal multi-task learning calls for a general method that wo…
▽ More
Pruning is one of the most effective model reduction techniques. Deep networks require massive computation and such models need to be compressed to bring them on edge devices. Most existing pruning techniques are focused on vision-based models like convolutional networks, while text-based models are still evolving. The emergence of multi-modal multi-task learning calls for a general method that works on vision and text architectures simultaneously. We introduce a \emph{differentiable mask}, that induces sparsity on various granularity to fill this gap. We apply our method successfully to prune weights, filters, subnetwork of a convolutional architecture, as well as nodes of a recurrent network.
△ Less
Submitted 29 April, 2020; v1 submitted 10 September, 2019;
originally announced September 2019.
-
Foothill: A Quasiconvex Regularization for Edge Computing of Deep Neural Networks
Authors:
Mouloud Belbahri,
Eyyüb Sari,
Sajad Darabi,
Vahid Partovi Nia
Abstract:
Deep neural networks (DNNs) have demonstrated success for many supervised learning tasks, ranging from voice recognition, object detection, to image classification. However, their increasing complexity might yield poor generalization error that make them hard to be deployed on edge devices. Quantization is an effective approach to compress DNNs in order to meet these constraints. Using a quasiconv…
▽ More
Deep neural networks (DNNs) have demonstrated success for many supervised learning tasks, ranging from voice recognition, object detection, to image classification. However, their increasing complexity might yield poor generalization error that make them hard to be deployed on edge devices. Quantization is an effective approach to compress DNNs in order to meet these constraints. Using a quasiconvex base function in order to construct a binary quantizer helps training binary neural networks (BNNs) and adding noise to the input data or using a concrete regularization function helps to improve generalization error. Here we introduce foothill function, an infinitely differentiable quasiconvex function. This regularizer is flexible enough to deform towards $L_1$ and $L_2$ penalties. Foothill can be used as a binary quantizer, as a regularizer, or as a loss. In particular, we show this regularizer reduces the accuracy gap between BNNs and their full-precision counterpart for image classification on ImageNet.
△ Less
Submitted 23 May, 2019; v1 submitted 18 January, 2019;
originally announced January 2019.
-
Thinning CsPb2Br5 Perovskite Down to Monolayers: Cs-dependent Stability
Authors:
Fadil Iyikanat,
Emre Sari,
Hasan Sahin
Abstract:
Using first-principles density functional theory calculations, we systematically investigate the structural, electronic and vibrational properties of bulk and potential single-layer structures of perovskite-like CsPb2Br5 crystal. It is found that while Cs atoms have no effect on the electronic structure, their presence is essential for the formation of stable CsPb2Br5 crystals. Calculated vibratio…
▽ More
Using first-principles density functional theory calculations, we systematically investigate the structural, electronic and vibrational properties of bulk and potential single-layer structures of perovskite-like CsPb2Br5 crystal. It is found that while Cs atoms have no effect on the electronic structure, their presence is essential for the formation of stable CsPb2Br5 crystals. Calculated vibrational spectra of the crystal reveal that not only the bulk form but also the single-layer forms of CsPb2Br5 are dynamically stable. Predicted single-layer forms can exhibit either semiconducting or metallic character. Moreover, modification of the structural, electronic and magnetic properties of single-layer CsPb2Br5 upon formation of vacancy defects is investigated. It is found that the formation of Br vacancy (i) has the lowest formation energy, (ii) significantly changes the electronic structure, and (iii) leads to ferromagnetic ground state in the single-layer CsPb2Br5 . However, the formation of Pb and Cs vacancies leads to p-type doping of the single-layer structure. Results reported herein reveal that single-layer CsPb2Br5 crystal is a novel stable perovskite with enhanced functionality and a promising candidate for nanodevice applications.
△ Less
Submitted 24 October, 2017;
originally announced October 2017.
-
Endo-VMFuseNet: Deep Visual-Magnetic Sensor Fusion Approach for Uncalibrated, Unsynchronized and Asymmetric Endoscopic Capsule Robot Localization Data
Authors:
Mehmet Turan,
Yasin Almalioglu,
Hunter Gilbert,
Alp Eren Sari,
Ufuk Soylu,
Metin Sitti
Abstract:
In the last decade, researchers and medical device companies have made major advances towards transforming passive capsule endoscopes into active medical robots. One of the major challenges is to endow capsule robots with accurate perception of the environment inside the human body, which will provide necessary information and enable improved medical procedures. We extend the success of deep learn…
▽ More
In the last decade, researchers and medical device companies have made major advances towards transforming passive capsule endoscopes into active medical robots. One of the major challenges is to endow capsule robots with accurate perception of the environment inside the human body, which will provide necessary information and enable improved medical procedures. We extend the success of deep learning approaches from various research fields to the problem of uncalibrated, asynchronous, and asymmetric sensor fusion for endoscopic capsule robots. The results performed on real pig stomach datasets show that our method achieves sub-millimeter precision for both translational and rotational movements and contains various advantages over traditional sensor fusion techniques.
△ Less
Submitted 22 September, 2017; v1 submitted 18 September, 2017;
originally announced September 2017.
-
Proc. of the 9th Workshop on Semantic Ambient Media Experiences (SAME'2016/2): Visualisation, Emerging Media, and User-Experience: International Series on Information Systems and Management in Creative eMedia (CreMedia)
Authors:
Artur Lugmayr,
Richard Seale,
Andrew Woods,
Eunice Sari,
Adi Tedjasaputra
Abstract:
The 9th Semantic Ambient Media Experience (SAME) proceedings where based on the academic contributions to a two day workshop that was held at Curtin University, Perth, WA, Australia. The symposium was held to discuss visualisation, emerging media, and user-experience from various angles. The papers of this workshop are freely available through http://www.ambientmediaassociation.org/Journal under o…
▽ More
The 9th Semantic Ambient Media Experience (SAME) proceedings where based on the academic contributions to a two day workshop that was held at Curtin University, Perth, WA, Australia. The symposium was held to discuss visualisation, emerging media, and user-experience from various angles. The papers of this workshop are freely available through http://www.ambientmediaassociation.org/Journal under open access as provided by the International Ambient Media Association (iAMEA) Ry. iAMEA is hosting the international open access journal entitled "International Journal on Information Systems and Management in Creative eMedia", and the series entitled "International Series on Information Systems and Management in Creative eMedia". For any further information, please visit the website of the Association: http://www.ambientmediaassociation.org.
△ Less
Submitted 28 July, 2017;
originally announced August 2017.
-
Proceedings of the 8th Workshop on Semantic Ambient Media Experiences (SAME 2016): Smart Cities for Better Living with HCI and UX (SEACHI), International Series on Information Systems and Management in Creative eMedia (CreMedia)
Authors:
Eunice Sari,
Adi Tedjasaputra,
Do Yi Luen Ellen,
Henry Duh,
Artur Lugmayr
Abstract:
Digital and interactive technologies are becoming increasingly embedded in everyday lives of people around the world. Application of technologies such as real-time, context-aware, and interactive technologies; augmented and immersive realities; social media; and location-based services has been particularly evident in urban environments where technological and sociocultural infrastructures enable…
▽ More
Digital and interactive technologies are becoming increasingly embedded in everyday lives of people around the world. Application of technologies such as real-time, context-aware, and interactive technologies; augmented and immersive realities; social media; and location-based services has been particularly evident in urban environments where technological and sociocultural infrastructures enable easier deployment and adoption as compared to non-urban areas. There has been growing consumer demand for new forms of experiences and services enabled through these emerging technologies. We call this ambient media, as the media is embedded in the natural human living environment.
The 8th Semantic Ambient Media Workshop Experience (SAME) Proceedings where based on a collaboration between the SEACHI Workshop Smart Cities for Better Living with HCI and UX, which has been organized by UX Indonesia and was held in conjunction with Computers and Human-Computer Interaction (CHI) 2016 in San Jose, CA USA.
The extended versions of the workshop papers are freely available through www.ambientmediaassociation.org/Journal under open access by the International Ambient Media Association (iAMEA). iAMEA is hosting the international open access journal entitled "International Journal on Information Systems and Management in Creative eMedia", and the international open access series "International Series on Information Systems and Management in Creative eMedia" (see http://www.ambientmediaassociation.org).
△ Less
Submitted 28 July, 2017; v1 submitted 27 July, 2017;
originally announced July 2017.