-
Affine Transformation-Based Deep Frame Prediction
Authors:
Hyomin Choi,
Ivan V. Bajić
Abstract:
We propose a neural network model to estimate the current frame from two reference frames, using affine transformation and adaptive spatially-varying filters. The estimated affine transformation allows for using shorter filters compared to existing approaches for deep frame prediction. The predicted frame is used as a reference for coding the current frame. Since the proposed model is available at…
▽ More
We propose a neural network model to estimate the current frame from two reference frames, using affine transformation and adaptive spatially-varying filters. The estimated affine transformation allows for using shorter filters compared to existing approaches for deep frame prediction. The predicted frame is used as a reference for coding the current frame. Since the proposed model is available at both encoder and decoder, there is no need to code or transmit motion information for the predicted frame. By making use of dilated convolutions and reduced filter length, our model is significantly smaller, yet more accurate, than any of the neural networks in prior works on this topic. Two versions of the proposed model - one for uni-directional, and one for bi-directional prediction - are trained using a combination of Discrete Cosine Transform (DCT)-based l1-loss with various transform sizes, multi-scale Mean Squared Error (MSE) loss, and an object context reconstruction loss. The trained models are integrated with the HEVC video coding pipeline. The experiments show that the proposed models achieve about 7.3%, 5.4%, and 4.2% bit savings for the luminance component on average in the Low delay P, Low delay, and Random access configurations, respectively.
△ Less
Submitted 16 February, 2021; v1 submitted 11 September, 2020;
originally announced September 2020.
-
PowerGAN: Synthesizing Appliance Power Signatures Using Generative Adversarial Networks
Authors:
Alon Harell,
Richard Jones,
Stephen Makonin,
Ivan V. Bajic
Abstract:
Non-intrusive load monitoring (NILM) allows users and energy providers to gain insight into home appliance electricity consumption using only the building's smart meter. Most current techniques for NILM are trained using significant amounts of labeled appliances power data. The collection of such data is challenging, making data a major bottleneck in creating well generalizing NILM solutions. To h…
▽ More
Non-intrusive load monitoring (NILM) allows users and energy providers to gain insight into home appliance electricity consumption using only the building's smart meter. Most current techniques for NILM are trained using significant amounts of labeled appliances power data. The collection of such data is challenging, making data a major bottleneck in creating well generalizing NILM solutions. To help mitigate the data limitations, we present the first truly synthetic appliance power signature generator. Our solution, PowerGAN, is based on conditional, progressively growing, 1-D Wasserstein generative adversarial network (GAN). Using PowerGAN, we are able to synthesise truly random and realistic appliance power data signatures. We evaluate the samples generated by PowerGAN in a qualitative way as well as numerically by using traditional GAN evaluation methods such as the Inception score.
△ Less
Submitted 20 July, 2020;
originally announced July 2020.
-
Soft Video Multicasting Using Adaptive Compressed Sensing
Authors:
Hadi Hadizadeh,
Ivan V. bajic
Abstract:
Recently, soft video multicasting has gained a lot of attention, especially in broadcast and mobile scenarios where the bit rate supported by the channel may differ across receivers, and may vary quickly over time. Unlike the conventional designs that force the source to use a single bit rate according to the receiver with the worst channel quality, soft video delivery schemes transmit the video s…
▽ More
Recently, soft video multicasting has gained a lot of attention, especially in broadcast and mobile scenarios where the bit rate supported by the channel may differ across receivers, and may vary quickly over time. Unlike the conventional designs that force the source to use a single bit rate according to the receiver with the worst channel quality, soft video delivery schemes transmit the video such that the video quality at each receiver is commensurate with its specific instantaneous channel quality. In this paper, we present a soft video multicasting system using an adaptive block-based compressed sensing (BCS) method. The proposed system consists of an encoder, a transmission system, and a decoder. At the encoder side, each block in each frame of the input video is adaptively sampled with a rate that depends on the texture complexity and visual saliency of the block. The obtained BCS samples are then placed into several packets, and the packets are transmitted via a channel-aware OFDM (orthogonal frequency division multiplexing) transmission system with a number of subchannels. At the decoder side, the received BCS samples are first used to build an initial approximation of the transmitted frame. To further improve the reconstruction quality, an iterative BCS reconstruction algorithm is then proposed that uses an adaptive transform and an adaptive soft-thresholding operator, which exploits the temporal similarity between adjacent frames to achieve better reconstruction quality. The extensive objective and subjective experimental results indicate the superiority of the proposed system over the state-of-the-art soft video multicasting systems.
△ Less
Submitted 6 March, 2020;
originally announced March 2020.
-
Bit Allocation for Multi-Task Collaborative Intelligence
Authors:
Saeed Ranjbar Alvar,
Ivan V. Bajić
Abstract:
Recent studies have shown that collaborative intelligence (CI) is a promising framework for deployment of Artificial Intelligence (AI)-based services on mobile devices. In CI, a deep neural network is split between the mobile device and the cloud. Deep features obtained at the mobile are compressed and transferred to the cloud to complete the inference. So far, the methods in the literature focuse…
▽ More
Recent studies have shown that collaborative intelligence (CI) is a promising framework for deployment of Artificial Intelligence (AI)-based services on mobile devices. In CI, a deep neural network is split between the mobile device and the cloud. Deep features obtained at the mobile are compressed and transferred to the cloud to complete the inference. So far, the methods in the literature focused on transferring a single deep feature tensor from the mobile to the cloud. Such methods are not applicable to some recent, high-performance networks with multiple branches and skip connections. In this paper, we propose the first bit allocation method for multi-stream, multi-task CI. We first establish a model for the joint distortion of the multiple tasks as a function of the bit rates assigned to different deep feature tensors. Then, using the proposed model, we solve the rate-distortion optimization problem under a total rate constraint to obtain the best rate allocation among the tensors to be transferred. Experimental results illustrate the efficacy of the proposed scheme compared to several alternative bit allocation methods.
△ Less
Submitted 13 February, 2020;
originally announced February 2020.
-
Back-and-Forth prediction for deep tensor compression
Authors:
Hyomin Choi,
Robert A. Cohen,
Ivan V. Bajic
Abstract:
Recent AI applications such as Collaborative Intelligence with neural networks involve transferring deep feature tensors between various computing devices. This necessitates tensor compression in order to optimize the usage of bandwidth-constrained channels between devices. In this paper we present a prediction scheme called Back-and-Forth (BaF) prediction, developed for deep feature tensors, whic…
▽ More
Recent AI applications such as Collaborative Intelligence with neural networks involve transferring deep feature tensors between various computing devices. This necessitates tensor compression in order to optimize the usage of bandwidth-constrained channels between devices. In this paper we present a prediction scheme called Back-and-Forth (BaF) prediction, developed for deep feature tensors, which allows us to dramatically reduce tensor size and improve its compressibility. Our experiments with a state-of-the-art object detector demonstrate that the proposed method allows us to significantly reduce the number of bits needed for compressing feature tensors extracted from deep within the model, with negligible degradation of the detection performance and without requiring any retraining of the network weights. We achieve a 62% and 75% reduction in tensor size while keeping the loss in accuracy of the network to less than 1% and 2%, respectively.
△ Less
Submitted 13 February, 2020;
originally announced February 2020.
-
Shared Mobile-Cloud Inference for Collaborative Intelligence
Authors:
Mateen Ulhaq,
Ivan V. Bajić
Abstract:
As AI applications for mobile devices become more prevalent, there is an increasing need for faster execution and lower energy consumption for neural model inference. Historically, the models run on mobile devices have been smaller and simpler in comparison to large state-of-the-art research models, which can only run on the cloud. However, cloud-only inference has drawbacks such as increased netw…
▽ More
As AI applications for mobile devices become more prevalent, there is an increasing need for faster execution and lower energy consumption for neural model inference. Historically, the models run on mobile devices have been smaller and simpler in comparison to large state-of-the-art research models, which can only run on the cloud. However, cloud-only inference has drawbacks such as increased network bandwidth consumption and higher latency. In addition, cloud-only inference requires the input data (images, audio) to be fully transferred to the cloud, creating concerns about potential privacy breaches. We demonstrate an alternative approach: shared mobile-cloud inference. Partial inference is performed on the mobile in order to reduce the dimensionality of the input data and arrive at a compact feature tensor, which is a latent space representation of the input signal. The feature tensor is then transmitted to the server for further inference. This strategy can improve inference latency, energy consumption, and network bandwidth usage, as well as provide privacy protection, because the original signal never leaves the mobile. Further performance gain can be achieved by compressing the feature tensor before its transmission.
△ Less
Submitted 1 February, 2020;
originally announced February 2020.
-
Towards Automated Swimming Analytics Using Deep Neural Networks
Authors:
Timothy Woinoski,
Alon Harell,
Ivan V. Bajic
Abstract:
Methods for creating a system to automate the collection of swimming analytics on a pool-wide scale are considered in this paper. There has not been much work on swimmer tracking or the creation of a swimmer database for machine learning purposes. Consequently, methods for collecting swimmer data from videos of swim competitions are explored and analyzed. The result is a guide to the creation of a…
▽ More
Methods for creating a system to automate the collection of swimming analytics on a pool-wide scale are considered in this paper. There has not been much work on swimmer tracking or the creation of a swimmer database for machine learning purposes. Consequently, methods for collecting swimmer data from videos of swim competitions are explored and analyzed. The result is a guide to the creation of a comprehensive collection of swimming data suitable for training swimmer detection and tracking systems. With this database in place, systems can then be created to automate the collection of swimming analytics.
△ Less
Submitted 13 January, 2020;
originally announced January 2020.
-
3D Point Cloud Super-Resolution via Graph Total Variation on Surface Normals
Authors:
Chinthaka Dinesh,
Gene Cheung,
Ivan V. Bajic
Abstract:
Point cloud is a collection of 3D coordinates that are discrete geometric samples of an object's 2D surfaces. Using a low-cost 3D scanner to acquire data means that point clouds are often in lower resolution than desired for rendering on high-resolution displays. Building on recent advances in graph signal processing, we design a local algorithm for 3D point cloud super-resolution (SR). First, we…
▽ More
Point cloud is a collection of 3D coordinates that are discrete geometric samples of an object's 2D surfaces. Using a low-cost 3D scanner to acquire data means that point clouds are often in lower resolution than desired for rendering on high-resolution displays. Building on recent advances in graph signal processing, we design a local algorithm for 3D point cloud super-resolution (SR). First, we initialize new points at centroids of local triangles formed using the low-resolution point cloud, and connect all points using a k-nearestneighbor graph. Then, to establish a linear relationship between surface normals and 3D point coordinates, we perform bipartite graph approximation to divide all nodes into two disjoint sets, which are optimized alternately until convergence. For each node set, to promote piecewise smooth (PWS) 2D surfaces, we design a graph total variation (GTV) objective for nearby surface normals, under the constraint that coordinates of the original points are preserved. We pursue an augmented Lagrangian approach to tackle the optimization, and solve the unconstrained equivalent using the alternating method of multipliers (ADMM). Extensive experiments show that our proposed point cloud SR algorithm outperforms competing schemes objectively and subjectively for a large variety of point clouds.
△ Less
Submitted 17 August, 2019;
originally announced August 2019.
-
Datasets for Face and Object Detection in Fisheye Images
Authors:
Jianglin Fu,
Ivan V. Bajic,
Rodney G. Vaughan
Abstract:
We present two new fisheye image datasets for training face and object detection models: VOC-360 and Wider-360. The fisheye images are created by post-processing regular images collected from two well-known datasets, VOC2012 and Wider Face, using a model for mapping regular to fisheye images implemented in Matlab. VOC-360 contains 39,575 fisheye images for object detection, segmentation, and class…
▽ More
We present two new fisheye image datasets for training face and object detection models: VOC-360 and Wider-360. The fisheye images are created by post-processing regular images collected from two well-known datasets, VOC2012 and Wider Face, using a model for mapping regular to fisheye images implemented in Matlab. VOC-360 contains 39,575 fisheye images for object detection, segmentation, and classification. Wider-360 contains 63,897 fisheye images for face detection. These datasets will be useful for developing face and object detectors as well as segmentation modules for fisheye images while the efforts to collect and manually annotate true fisheye images are underway.
△ Less
Submitted 27 June, 2019;
originally announced June 2019.
-
Wavenilm: A causal neural network for power disaggregation from the complex power signal
Authors:
Alon Harell,
Stephen Makonin,
Ivan V. Bajić
Abstract:
Non-intrusive load monitoring (NILM) helps meet energy conservation goals by estimating individual appliance power usage from a single aggregate measurement. Deep neural networks have become increasingly popular in attempting to solve NILM problems; however, many of them are not causal which is important for real-time application. We present a causal 1-D convolutional neural network inspired by Wa…
▽ More
Non-intrusive load monitoring (NILM) helps meet energy conservation goals by estimating individual appliance power usage from a single aggregate measurement. Deep neural networks have become increasingly popular in attempting to solve NILM problems; however, many of them are not causal which is important for real-time application. We present a causal 1-D convolutional neural network inspired by WaveNet for NILM on low-frequency data. We also study using various components of the complex power signal for NILM, and demonstrate that using all four components available in a popular NILM dataset (current, active power, reactive power, and apparent power) we achieve faster convergence and higher performance than state-of-the-art results for the same dataset.
△ Less
Submitted 18 June, 2019; v1 submitted 23 February, 2019;
originally announced February 2019.
-
Multi-task learning with compressible features for Collaborative Intelligence
Authors:
Saeed Ranjbar Alvar,
Ivan V. Bajić
Abstract:
A promising way to deploy Artificial Intelligence (AI)-based services on mobile devices is to run a part of the AI model (a deep neural network) on the mobile itself, and the rest in the cloud. This is sometimes referred to as collaborative intelligence. In this framework, intermediate features from the deep network need to be transmitted to the cloud for further processing. We study the case wher…
▽ More
A promising way to deploy Artificial Intelligence (AI)-based services on mobile devices is to run a part of the AI model (a deep neural network) on the mobile itself, and the rest in the cloud. This is sometimes referred to as collaborative intelligence. In this framework, intermediate features from the deep network need to be transmitted to the cloud for further processing. We study the case where such features are used for multiple purposes in the cloud (multi-tasking) and where they need to be compressible in order to allow efficient transmission to the cloud. To this end, we introduce a new loss function that encourages feature compressibility while improving system performance on multiple tasks. Experimental results show that with the compression-friendly loss, one can achieve around 20% bitrate reduction without sacrificing the performance on several vision-related tasks.
△ Less
Submitted 15 May, 2019; v1 submitted 13 February, 2019;
originally announced February 2019.
-
FDDB-360: Face Detection in 360-degree Fisheye Images
Authors:
Jianglin Fu,
Saeed Ranjbar Alvar,
Ivan V. Bajic,
Rodney G. Vaughan
Abstract:
360-degree cameras offer the possibility to cover a large area, for example an entire room, without using multiple distributed vision sensors. However, geometric distortions introduced by their lenses make computer vision problems more challenging. In this paper we address face detection in 360-degree fisheye images. We show how a face detector trained on regular images can be re-trained for this…
▽ More
360-degree cameras offer the possibility to cover a large area, for example an entire room, without using multiple distributed vision sensors. However, geometric distortions introduced by their lenses make computer vision problems more challenging. In this paper we address face detection in 360-degree fisheye images. We show how a face detector trained on regular images can be re-trained for this purpose, and we also provide a 360-degree fisheye-like version of the popular FDDB face detection dataset, which we call FDDB-360.
△ Less
Submitted 7 February, 2019;
originally announced February 2019.
-
Deep Frame Prediction for Video Coding
Authors:
Hyomin Choi,
Ivan V. Bajic
Abstract:
We propose a novel frame prediction method using a deep neural network (DNN), with the goal of improving video coding efficiency. The proposed DNN makes use of decoded frames, at both encoder and decoder, to predict textures of the current coding block. Unlike conventional inter-prediction, the proposed method does not require any motion information to be transferred between the encoder and the de…
▽ More
We propose a novel frame prediction method using a deep neural network (DNN), with the goal of improving video coding efficiency. The proposed DNN makes use of decoded frames, at both encoder and decoder, to predict textures of the current coding block. Unlike conventional inter-prediction, the proposed method does not require any motion information to be transferred between the encoder and the decoder. Still, both uni-directional and bi-directional prediction are possible using the proposed DNN, which is enabled by the use of the temporal index channel, in addition to color channels. In this study, we developed a jointly trained DNN for both uni- and bi- directional prediction, as well as separate networks for uni- and bi-directional prediction, and compared the efficacy of both approaches. The proposed DNNs were compared with the conventional motion-compensated prediction in the latest video coding standard, HEVC, in terms of BD-Bitrate. The experiments show that the proposed joint DNN (for both uni- and bi-directional prediction) reduces the luminance bitrate by about 4.4%, 2.4%, and 2.3% in the Low delay P, Low delay, and Random access configurations, respectively. In addition, using the separately trained DNNs brings further bit savings of about 0.3%-0.5%.
△ Less
Submitted 20 June, 2019; v1 submitted 31 December, 2018;
originally announced January 2019.
-
3D Point Cloud Denoising via Bipartite Graph Approximation and Reweighted Graph Laplacian
Authors:
Chinthaka Dinesh,
Gene Cheung,
Ivan V. Bajic
Abstract:
Point cloud is a collection of 3D coordinates that are discrete geometric samples of an object's 2D surfaces. Imperfection in the acquisition process means that point clouds are often corrupted with noise. Building on recent advances in graph signal processing, we design local algorithms for 3D point cloud denoising. Specifically, we design a reweighted graph Laplacian regularizer (RGLR) for surfa…
▽ More
Point cloud is a collection of 3D coordinates that are discrete geometric samples of an object's 2D surfaces. Imperfection in the acquisition process means that point clouds are often corrupted with noise. Building on recent advances in graph signal processing, we design local algorithms for 3D point cloud denoising. Specifically, we design a reweighted graph Laplacian regularizer (RGLR) for surface normals and demonstrate its merits in rotation invariance, promotion of piecewise smoothness, and ease of optimization. Using RGLR as a signal prior, we formulate an optimization problem with a general lp-norm fidelity term that can explicitly model two types of independent noise: small but non-sparse noise (using l2 fidelity term) and large but sparser noise (using l1 fidelity term).
To establish a linear relationship between normals and 3D point coordinates, we first perform bipartite graph approximation to divide the point cloud into two disjoint node sets (red and blue). We then optimize the red and blue nodes' coordinates alternately. For l2-norm fidelity term, we iteratively solve an unconstrained quadratic programming (QP) problem, efficiently computed using conjugate gradient with a bounded condition number to ensure numerical stability. For l1-norm fidelity term, we iteratively minimize an l1-l2 cost function sing accelerated proximal gradient (APG), where a good step size is chosen via Lipschitz continuity analysis. Finally, we propose simple mean and median filters for flat patches of a given point cloud to estimate the noise variance given the noise type, which in turn is used to compute a weight parameter trading off the fidelity term and signal prior in the problem formulation. Extensive experiments show state-of-the-art denoising performance among local methods using our proposed algorithms.
△ Less
Submitted 18 December, 2018;
originally announced December 2018.
-
MV-YOLO: Motion Vector-aided Tracking by Semantic Object Detection
Authors:
Saeed Ranjbar Alvar,
Ivan V. Bajić
Abstract:
Object tracking is the cornerstone of many visual analytics systems. While considerable progress has been made in this area in recent years, robust, efficient, and accurate tracking in real-world video remains a challenge. In this paper, we present a hybrid tracker that leverages motion information from the compressed video stream and a general-purpose semantic object detector acting on decoded fr…
▽ More
Object tracking is the cornerstone of many visual analytics systems. While considerable progress has been made in this area in recent years, robust, efficient, and accurate tracking in real-world video remains a challenge. In this paper, we present a hybrid tracker that leverages motion information from the compressed video stream and a general-purpose semantic object detector acting on decoded frames to construct a fast and efficient tracking engine. The proposed approach is compared with several well-known recent trackers on the OTB tracking dataset. The results indicate advantages of the proposed method in terms of speed and/or accuracy.Other desirable features of the proposed method are its simplicity and deployment efficiency, which stems from the fact that it reuses the resources and information that may already exist in the system for other reasons.
△ Less
Submitted 15 June, 2018; v1 submitted 30 April, 2018;
originally announced May 2018.
-
Fast 3D Point Cloud Denoising via Bipartite Graph Approximation & Total Variation
Authors:
Chinthaka Dinesh,
Gene Cheung,
Ivan V. Bajic,
Cheng Yang
Abstract:
Acquired 3D point cloud data, whether from active sensors directly or from stereo-matching algorithms indirectly, typically contain non-negligible noise. To address the point cloud denoising problem, we propose a fast graph-based local algorithm. Specifically, given a k-nearest-neighbor graph of the 3D points, we first approximate it with a bipartite graph(independent sets of red and blue nodes) u…
▽ More
Acquired 3D point cloud data, whether from active sensors directly or from stereo-matching algorithms indirectly, typically contain non-negligible noise. To address the point cloud denoising problem, we propose a fast graph-based local algorithm. Specifically, given a k-nearest-neighbor graph of the 3D points, we first approximate it with a bipartite graph(independent sets of red and blue nodes) using a KL divergence criterion. For each partite of nodes (say red), we first define surface normal of each red node using 3D coordinates of neighboring blue nodes, so that red node normals n can be written as a linear function of red node coordinates p. We then formulate a convex optimization problem, with a quadratic fidelity term ||p-q||_2^2 given noisy observed red coordinates q and a graph total variation (GTV) regularization term for surface normals of neighboring red nodes. We minimize the resulting l2-l1-norm using alternating direction method of multipliers (ADMM) and proximal gradient descent. The two partites of nodes are alternately optimized until convergence. Experimental results show that compared to state-of-the-art schemes with similar complexity, our proposed algorithm achieves the best overall denoising performance objectively and subjectively.
△ Less
Submitted 28 April, 2018;
originally announced April 2018.
-
Adaptive Non-Rigid Inpainting of 3D Point Cloud Geometry
Authors:
Chinthaka Dinesh,
Ivan V. Bajic,
Gene Cheung
Abstract:
In this letter, we introduce several algorithms for geometry inpainting of 3D point clouds with large holes. The algorithms are examplar-based: hole filling is performed iteratively using templates near the hole boundary to find the best matching regions elsewhere in the cloud, from where existing points are transferred to the hole. We propose two improvements over the previous work on exemplar-ba…
▽ More
In this letter, we introduce several algorithms for geometry inpainting of 3D point clouds with large holes. The algorithms are examplar-based: hole filling is performed iteratively using templates near the hole boundary to find the best matching regions elsewhere in the cloud, from where existing points are transferred to the hole. We propose two improvements over the previous work on exemplar-based hole filling. The first one is adaptive template size selection in each iteration, which simultaneously leads to higher accuracy and lower execution time. The second improvement is a non-rigid transformation to better align the candidate set of points with the template before the point transfer, which leads to even higher accuracy. We demonstrate the algorithms' ability to fill holes that are difficult or impossible to fill by existing methods.
△ Less
Submitted 27 April, 2018;
originally announced April 2018.
-
Near-Lossless Deep Feature Compression for Collaborative Intelligence
Authors:
Hyomin Choi,
Ivan V. Bajic
Abstract:
Collaborative intelligence is a new paradigm for efficient deployment of deep neural networks across the mobile-cloud infrastructure. By dividing the network between the mobile and the cloud, it is possible to distribute the computational workload such that the overall energy and/or latency of the system is minimized. However, this necessitates sending deep feature data from the mobile to the clou…
▽ More
Collaborative intelligence is a new paradigm for efficient deployment of deep neural networks across the mobile-cloud infrastructure. By dividing the network between the mobile and the cloud, it is possible to distribute the computational workload such that the overall energy and/or latency of the system is minimized. However, this necessitates sending deep feature data from the mobile to the cloud in order to perform inference. In this work, we examine the differences between the deep feature data and natural image data, and propose a simple and effective near-lossless deep feature compressor. The proposed method achieves up to 5% bit rate reduction compared to HEVC-Intra and even more against other popular image codecs. Finally, we suggest an approach for reconstructing the input image from compressed deep features in the cloud, that could serve to supplement the inference performed by the deep model.
△ Less
Submitted 15 June, 2018; v1 submitted 26 April, 2018;
originally announced April 2018.
-
Deep feature compression for collaborative object detection
Authors:
Hyomin Choi,
Ivan V. Bajic
Abstract:
Recent studies have shown that the efficiency of deep neural networks in mobile applications can be significantly improved by distributing the computational workload between the mobile device and the cloud. This paradigm, termed collaborative intelligence, involves communicating feature data between the mobile and the cloud. The efficiency of such approach can be further improved by lossy compress…
▽ More
Recent studies have shown that the efficiency of deep neural networks in mobile applications can be significantly improved by distributing the computational workload between the mobile device and the cloud. This paradigm, termed collaborative intelligence, involves communicating feature data between the mobile and the cloud. The efficiency of such approach can be further improved by lossy compression of feature data, which has not been examined to date. In this work we focus on collaborative object detection and study the impact of both near-lossless and lossy compression of feature data on its accuracy. We also propose a strategy for improving the accuracy under lossy feature compression. Experiments indicate that using this strategy, the communication overhead can be reduced by up to 70% without sacrificing accuracy.
△ Less
Submitted 12 February, 2018;
originally announced February 2018.
-
High efficiency compression for object detection
Authors:
Hyomin Choi,
Ivan V. Bajic
Abstract:
Image and video compression has traditionally been tailored to human vision. However, modern applications such as visual analytics and surveillance rely on computers seeing and analyzing the images before (or instead of) humans. For these applications, it is important to adjust compression to computer vision. In this paper we present a bit allocation and rate control strategy that is tailored to o…
▽ More
Image and video compression has traditionally been tailored to human vision. However, modern applications such as visual analytics and surveillance rely on computers seeing and analyzing the images before (or instead of) humans. For these applications, it is important to adjust compression to computer vision. In this paper we present a bit allocation and rate control strategy that is tailored to object detection. Using the initial convolutional layers of a state-of-the-art object detector, we create an importance map that can guide bit allocation to areas that are important for object detection. The proposed method enables bit rate savings of 7% or more compared to default HEVC, at the equivalent object detection rate.
△ Less
Submitted 15 February, 2018; v1 submitted 30 October, 2017;
originally announced October 2017.
-
Can you find a face in a HEVC bitstream?
Authors:
Saeed Ranjbar Alvar,
Hyomin Choi,
Ivan V. Bajic
Abstract:
Finding faces in images is one of the most important tasks in computer vision, with applications in biometrics, surveillance, human-computer interaction, and other areas. In our earlier work, we demonstrated that it is possible to tell whether or not an image contains a face by only examining the HEVC syntax, without fully reconstructing the image. In the present work we move further in this direc…
▽ More
Finding faces in images is one of the most important tasks in computer vision, with applications in biometrics, surveillance, human-computer interaction, and other areas. In our earlier work, we demonstrated that it is possible to tell whether or not an image contains a face by only examining the HEVC syntax, without fully reconstructing the image. In the present work we move further in this direction by showing how to localize faces in HEVC-coded images, without full reconstruction. We also demonstrate the benefits that such approach can have in privacy-friendly face localization.
△ Less
Submitted 23 February, 2018; v1 submitted 29 October, 2017;
originally announced October 2017.
-
Can you tell a face from a HEVC bitstream?
Authors:
Saeed Ranjbar Alvar,
Hyomin Choi,
Ivan V. Bajic
Abstract:
Image and video analytics are being increasingly used on a massive scale. Not only is the amount of data growing, but the complexity of the data processing pipelines is also increasing, thereby exacerbating the problem. It is becoming increasingly important to save computational resources wherever possible. We focus on one of the poster problems of visual analytics -- face detection -- and approac…
▽ More
Image and video analytics are being increasingly used on a massive scale. Not only is the amount of data growing, but the complexity of the data processing pipelines is also increasing, thereby exacerbating the problem. It is becoming increasingly important to save computational resources wherever possible. We focus on one of the poster problems of visual analytics -- face detection -- and approach the issue of reducing the computation by asking: Is it possible to detect a face without full image reconstruction from the High Efficiency Video Coding (HEVC) bitstream? We demonstrate that this is indeed possible, with accuracy comparable to conventional face detection, by training a Convolutional Neural Network on the output of the HEVC entropy decoder.
△ Less
Submitted 9 September, 2017;
originally announced September 2017.
-
Compressed-domain visual saliency models: A comparative study
Authors:
Sayed Hossein Khatoonabadi,
Ivan V. Bajic,
Yufeng Shan
Abstract:
Computational modeling of visual saliency has become an important research problem in recent years, with applications in video quality estimation, video compression, object tracking, retargeting, summarization, and so on. While most visual saliency models for dynamic scenes operate on raw video, several models have been developed for use with compressed-domain information such as motion vectors an…
▽ More
Computational modeling of visual saliency has become an important research problem in recent years, with applications in video quality estimation, video compression, object tracking, retargeting, summarization, and so on. While most visual saliency models for dynamic scenes operate on raw video, several models have been developed for use with compressed-domain information such as motion vectors and transform coefficients. This paper presents a comparative study of eleven such models as well as two high-performing pixel-domain saliency models on two eye-tracking datasets using several comparison metrics. The results indicate that highly accurate saliency estimation is possible based only on a partially decoded video bitstream. The strategies that have shown success in compressed-domain saliency modeling are highlighted, and certain challenges are identified as potential avenues for further improvement.
△ Less
Submitted 25 April, 2016;
originally announced April 2016.
-
Load Disaggregation Based on Aided Linear Integer Programming
Authors:
Md. Zulfiquar Ali Bhotto,
Stephen Makonin,
Ivan V. Bajic
Abstract:
Load disaggregation based on aided linear integer programming (ALIP) is proposed. We start with a conventional linear integer programming (IP) based disaggregation and enhance it in several ways. The enhancements include additional constraints, correction based on a state diagram, median filtering, and linear programming-based refinement. With the aid of these enhancements, the performance of IP-b…
▽ More
Load disaggregation based on aided linear integer programming (ALIP) is proposed. We start with a conventional linear integer programming (IP) based disaggregation and enhance it in several ways. The enhancements include additional constraints, correction based on a state diagram, median filtering, and linear programming-based refinement. With the aid of these enhancements, the performance of IP-based disaggregation is significantly improved. The proposed ALIP system relies only on the instantaneous load samples instead of waveform signatures, and hence does not crucially depend on high sampling frequency. Experimental results show that the proposed ALIP system performs better than the conventional IP-based load disaggregation system.
△ Less
Submitted 30 August, 2016; v1 submitted 23 March, 2016;
originally announced March 2016.