-
A Study on the Effect of Color Spaces in Learned Image Compression
Authors:
Srivatsa Prativadibhayankaram,
Mahadev Prasad Panda,
Jürgen Seiler,
Thomas Richter,
Heiko Sparenberg,
Siegfried Fößel,
André Kaup
Abstract:
In this work, we present a comparison between color spaces namely YUV, LAB, RGB and their effect on learned image compression. For this we use the structure and color based learned image codec (SLIC) from our prior work, which consists of two branches - one for the luminance component (Y or L) and another for chrominance components (UV or AB). However, for the RGB variant we input all 3 channels i…
▽ More
In this work, we present a comparison between color spaces namely YUV, LAB, RGB and their effect on learned image compression. For this we use the structure and color based learned image codec (SLIC) from our prior work, which consists of two branches - one for the luminance component (Y or L) and another for chrominance components (UV or AB). However, for the RGB variant we input all 3 channels in a single branch, similar to most learned image codecs operating in RGB. The models are trained for multiple bitrate configurations in each color space. We report the findings from our experiments by evaluating them on various datasets and compare the results to state-of-the-art image codecs. The YUV model performs better than the LAB variant in terms of MS-SSIM with a Bjøntegaard delta bitrate (BD-BR) gain of 7.5\% using VTM intra-coding mode as the baseline. Whereas the LAB variant has a better performance than YUV model in terms of CIEDE2000 having a BD-BR gain of 8\%. Overall, the RGB variant of SLIC achieves the best performance with a BD-BR gain of 13.14\% in terms of MS-SSIM and a gain of 17.96\% in CIEDE2000 at the cost of a higher model complexity.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Achieving RGB-D level Segmentation Performance from a Single ToF Camera
Authors:
Pranav Sharma,
Jigyasa Singh Katrolia,
Jason Rambach,
Bruno Mirbach,
Didier Stricker,
Juergen Seiler
Abstract:
Depth is a very important modality in computer vision, typically used as complementary information to RGB, provided by RGB-D cameras. In this work, we show that it is possible to obtain the same level of accuracy as RGB-D cameras on a semantic segmentation task using infrared (IR) and depth images from a single Time-of-Flight (ToF) camera. In order to fuse the IR and depth modalities of the ToF ca…
▽ More
Depth is a very important modality in computer vision, typically used as complementary information to RGB, provided by RGB-D cameras. In this work, we show that it is possible to obtain the same level of accuracy as RGB-D cameras on a semantic segmentation task using infrared (IR) and depth images from a single Time-of-Flight (ToF) camera. In order to fuse the IR and depth modalities of the ToF camera, we introduce a method utilizing depth-specific convolutions in a multi-task learning framework. In our evaluation on an in-car segmentation dataset, we demonstrate the competitiveness of our method against the more costly RGB-D approaches.
△ Less
Submitted 30 June, 2023;
originally announced June 2023.
-
End-to-End Lidar-Camera Self-Calibration for Autonomous Vehicles
Authors:
Arya Rachman,
Jürgen Seiler,
André Kaup
Abstract:
Autonomous vehicles are equipped with a multi-modal sensor setup to enable the car to drive safely. The initial calibration of such perception sensors is a highly matured topic and is routinely done in an automated factory environment. However, an intriguing question arises on how to maintain the calibration quality throughout the vehicle's operating duration. Another challenge is to calibrate mul…
▽ More
Autonomous vehicles are equipped with a multi-modal sensor setup to enable the car to drive safely. The initial calibration of such perception sensors is a highly matured topic and is routinely done in an automated factory environment. However, an intriguing question arises on how to maintain the calibration quality throughout the vehicle's operating duration. Another challenge is to calibrate multiple sensors jointly to ensure no propagation of systemic errors. In this paper, we propose CaLiCa, an end-to-end deep self-calibration network which addresses the automatic calibration problem for pinhole camera and Lidar. We jointly predict the camera intrinsic parameters (focal length and distortion) as well as Lidar-Camera extrinsic parameters (rotation and translation), by regressing feature correlation between the camera image and the Lidar point cloud. The network is arranged in a Siamese-twin structure to constrain the network features learning to a mutually shared feature in both point cloud and camera (Lidar-camera constraint). Evaluation using KITTI datasets shows that we achieve 0.154 ° and 0.059 m accuracy with a reprojection error of 0.028 pixel with a single-pass inference. We also provide an ablative study of how our end-to-end learning architecture offers lower terminal loss (21% decrease in rotation loss) compared to isolated calibration
△ Less
Submitted 27 April, 2023; v1 submitted 24 April, 2023;
originally announced April 2023.
-
A hybrid motion estimation technique for fisheye video sequences based on equisolid re-projection
Authors:
Andrea Eichenseer,
Michel Bätz,
Jürgen Seiler,
André Kaup
Abstract:
Capturing large fields of view with only one camera is an important aspect in surveillance and automotive applications, but the wide-angle fisheye imagery thus obtained exhibits very special characteristics that may not be very well suited for typical image and video processing methods such as motion estimation. This paper introduces a motion estimation method that adapts to the typical radial cha…
▽ More
Capturing large fields of view with only one camera is an important aspect in surveillance and automotive applications, but the wide-angle fisheye imagery thus obtained exhibits very special characteristics that may not be very well suited for typical image and video processing methods such as motion estimation. This paper introduces a motion estimation method that adapts to the typical radial characteristics of fisheye video sequences by making use of an equisolid re-projection after moving part of the motion vector search into the perspective domain via a corresponding back-projection. By combining this approach with conventional translational motion estimation and compensation, average gains in luminance PSNR of up to 1.14 dB are achieved for synthetic fish-eye sequences and up to 0.96 dB for real-world data. Maximum gains for selected frame pairs amount to 2.40 dB and 1.39 dB for synthetic and real-world data, respectively.
△ Less
Submitted 30 November, 2022;
originally announced November 2022.
-
On Benefits and Challenges of Conditional Interframe Video Coding in Light of Information Theory
Authors:
Fabian Brand,
Jürgen Seiler,
André Kaup
Abstract:
The rise of variational autoencoders for image and video compression has opened the door to many elaborate coding techniques. One example here is the possibility of conditional interframe coding. Here, instead of transmitting the residual between the original frame and the predicted frame (often obtained by motion compensation), the current frame is transmitted under the condition of knowing the p…
▽ More
The rise of variational autoencoders for image and video compression has opened the door to many elaborate coding techniques. One example here is the possibility of conditional interframe coding. Here, instead of transmitting the residual between the original frame and the predicted frame (often obtained by motion compensation), the current frame is transmitted under the condition of knowing the prediction signal. In practice, conditional coding can be straightforwardly implemented using a conditional autoencoder, which has also shown good results in recent works. In this paper, we provide an information theoretical analysis of conditional coding for inter frames and show in which cases gains compared to traditional residual coding can be expected. We also show the effect of information bottlenecks which can occur in practical video coders in the prediction signal path due to the network structure, as a consequence of the data-processing theorem or due to quantization. We demonstrate that conditional coding has theoretical benefits over residual coding but that there are cases in which the benefits are quickly canceled by small information bottlenecks of the prediction signal.
△ Less
Submitted 13 December, 2022; v1 submitted 14 October, 2022;
originally announced October 2022.
-
3D Rendering Framework for Data Augmentation in Optical Character Recognition
Authors:
Andreas Spruck,
Maximiliane Hawesch,
Anatol Maier,
Christian Riess,
Jürgen Seiler,
André Kaup
Abstract:
In this paper, we propose a data augmentation framework for Optical Character Recognition (OCR). The proposed framework is able to synthesize new viewing angles and illumination scenarios, effectively enriching any available OCR dataset. Its modular structure allows to be modified to match individual user requirements. The framework enables to comfortably scale the enlargement factor of the availa…
▽ More
In this paper, we propose a data augmentation framework for Optical Character Recognition (OCR). The proposed framework is able to synthesize new viewing angles and illumination scenarios, effectively enriching any available OCR dataset. Its modular structure allows to be modified to match individual user requirements. The framework enables to comfortably scale the enlargement factor of the available dataset. Furthermore, the proposed method is not restricted to single frame OCR but can also be applied to video OCR. We demonstrate the performance of our framework by augmenting a 15% subset of the common Brno Mobile OCR dataset. Our proposed framework is capable of leveraging the performance of OCR applications especially for small datasets. Applying the proposed method, improvements of up to 2.79 percentage points in terms of Character Error Rate (CER), and up to 7.88 percentage points in terms of Word Error Rate (WER) are achieved on the subset. Especially the recognition of challenging text lines can be improved. The CER may be decreased by up to 14.92 percentage points and the WER by up to 18.19 percentage points for this class. Moreover, we are able to achieve smaller error rates when training on the 15% subset augmented with the proposed method than on the original non-augmented full dataset.
△ Less
Submitted 27 September, 2022;
originally announced September 2022.
-
Synthesizing Annotated Image and Video Data Using a Rendering-Based Pipeline for Improved License Plate Recognition
Authors:
Andreas Spruck,
Maximilane Gruber,
Anatol Maier,
Denise Moussa,
Jürgen Seiler,
Christian Riess,
André Kaup
Abstract:
An insufficient number of training samples is a common problem in neural network applications. While data augmentation methods require at least a minimum number of samples, we propose a novel, rendering-based pipeline for synthesizing annotated data sets. Our method does not modify existing samples but synthesizes entirely new samples. The proposed rendering-based pipeline is capable of generating…
▽ More
An insufficient number of training samples is a common problem in neural network applications. While data augmentation methods require at least a minimum number of samples, we propose a novel, rendering-based pipeline for synthesizing annotated data sets. Our method does not modify existing samples but synthesizes entirely new samples. The proposed rendering-based pipeline is capable of generating and annotating synthetic and partly-real image and video data in a fully automatic procedure. Moreover, the pipeline can aid the acquisition of real data. The proposed pipeline is based on a rendering process. This process generates synthetic data. Partly-real data bring the synthetic sequences closer to reality by incorporating real cameras during the acquisition process. The benefits of the proposed data generation pipeline, especially for machine learning scenarios with limited available training data, are demonstrated by an extensive experimental validation in the context of automatic license plate recognition. The experiments demonstrate a significant reduction of the character error rate and miss rate from 73.74% and 100% to 14.11% and 41.27% respectively, compared to an OCR algorithm trained on a real data set solely. These improvements are achieved by training the algorithm on synthesized data solely. When additionally incorporating real data, the error rates can be decreased further. Thereby, the character error rate and miss rate can be reduced to 11.90% and 39.88% respectively. All data used during the experiments as well as the proposed rendering-based pipeline for the automated data generation is made publicly available under (URL will be revealed upon publication).
△ Less
Submitted 28 September, 2022;
originally announced September 2022.
-
Forensic License Plate Recognition with Compression-Informed Transformers
Authors:
Denise Moussa,
Anatol Maier,
Andreas Spruck,
Jürgen Seiler,
Christian Riess
Abstract:
Forensic license plate recognition (FLPR) remains an open challenge in legal contexts such as criminal investigations, where unreadable license plates (LPs) need to be deciphered from highly compressed and/or low resolution footage, e.g., from surveillance cameras. In this work, we propose a side-informed Transformer architecture that embeds knowledge on the input compression level to improve reco…
▽ More
Forensic license plate recognition (FLPR) remains an open challenge in legal contexts such as criminal investigations, where unreadable license plates (LPs) need to be deciphered from highly compressed and/or low resolution footage, e.g., from surveillance cameras. In this work, we propose a side-informed Transformer architecture that embeds knowledge on the input compression level to improve recognition under strong compression. We show the effectiveness of Transformers for license plate recognition (LPR) on a low-quality real-world dataset. We also provide a synthetic dataset that includes strongly degraded, illegible LP images and analyze the impact of knowledge embedding on it. The network outperforms existing FLPR methods and standard state-of-the art image recognition models while requiring less parameters. For the severest degraded images, we can improve recognition by up to 8.9 percent points.
△ Less
Submitted 3 May, 2024; v1 submitted 29 July, 2022;
originally announced July 2022.
-
Reusing the H.264/AVC deblocking filter for efficient spatio-temporal prediction in video coding
Authors:
Jürgen Seiler,
André Kaup
Abstract:
The prediction step is a very important part of hybrid video codecs for effectively compressing video sequences. While existing video codecs predict either in temporal or in spatial direction only, the compression efficiency can be increased by a combined spatio-temporal prediction. In this paper we propose an algorithm for reusing the H.264/AVC deblocking filter for spatio-temporal prediction. Re…
▽ More
The prediction step is a very important part of hybrid video codecs for effectively compressing video sequences. While existing video codecs predict either in temporal or in spatial direction only, the compression efficiency can be increased by a combined spatio-temporal prediction. In this paper we propose an algorithm for reusing the H.264/AVC deblocking filter for spatio-temporal prediction. Reusing this highly op timized filter allows for a very low computational complexity of this prediction mode and an average rate reduction of up to 7.2% can be achieved.
△ Less
Submitted 4 July, 2022;
originally announced July 2022.
-
Motion Compensated Frequency Selective Extrapolation for Error Concealment in Video Coding
Authors:
Jürgen Seiler,
André Kaup
Abstract:
Although wireless and IP-based access to video content gives a new degree of freedom to the viewers, the risk of severe block losses caused by transmission errors is always present. The purpose of this paper is to present a new method for concealing block losses in erroneously received video sequences. For this, a motion compensated data set is generated around the lost block. Based on this aligne…
▽ More
Although wireless and IP-based access to video content gives a new degree of freedom to the viewers, the risk of severe block losses caused by transmission errors is always present. The purpose of this paper is to present a new method for concealing block losses in erroneously received video sequences. For this, a motion compensated data set is generated around the lost block. Based on this aligned data set, a model of the signal is created that continues the signal into the lost areas. Since spatial as well as temporal informations are used for the model generation, the proposed method is superior to methods that use either spatial or temporal information for concealment. Furthermore it outperforms current state of the art spatio-temporal concealment algorithms by up to 1.4 dB in PSNR.
△ Less
Submitted 1 July, 2022;
originally announced July 2022.
-
Scalable Kernel-Based Minimum Mean Square Error Estimator for Accelerated Image Error Concealment
Authors:
Ján Koloda,
Jürgen Seiler,
Antonio M. Peinado,
André Kaup
Abstract:
Error concealment is of great importance for block-based video systems, such as DVB or video streaming services. In this paper, we propose a novel scalable spatial error concealment algorithm that aims at obtaining high quality reconstructions with reduced computational burden. The proposed technique exploits the excellent reconstructing abilities of the kernel-based minimum mean square error K-MM…
▽ More
Error concealment is of great importance for block-based video systems, such as DVB or video streaming services. In this paper, we propose a novel scalable spatial error concealment algorithm that aims at obtaining high quality reconstructions with reduced computational burden. The proposed technique exploits the excellent reconstructing abilities of the kernel-based minimum mean square error K-MMSE estimator. We propose to decompose this approach into a set of hierarchically stacked layers. The first layer performs the basic reconstruction that the subsequent layers can eventually refine. In addition, we design a layer management mechanism, based on profiles, that dynamically adapts the use of higher layers to the visual complexity of the area being reconstructed. The proposed technique outperforms other state-of-the-art algorithms and produces high quality reconstructions, equivalent to K-MMSE, while requiring around one tenth of its computational time.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
Denoising-based image reconstruction from pixels located at non-integer positions
Authors:
Ján Koloda,
Jürgen Seiler,
André Kaup
Abstract:
Digital images are commonly represented as regular 2D arrays, so pixels are organized in form of a matrix addressed by integers. However, there are many image processing operations, such as rotation or motion compensation, that produce pixels at non-integer positions. Typically, image reconstruction techniques cannot handle samples at non-integer positions. In this paper, we propose to use triangu…
▽ More
Digital images are commonly represented as regular 2D arrays, so pixels are organized in form of a matrix addressed by integers. However, there are many image processing operations, such as rotation or motion compensation, that produce pixels at non-integer positions. Typically, image reconstruction techniques cannot handle samples at non-integer positions. In this paper, we propose to use triangulation-based reconstruction as initial estimate that is later refined by a novel adaptive denoising framework. Simulations reveal that improvements of up to more than 1.8 dB (in terms of PSNR) are achieved with respect to the initial estimate.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
Reliability-based Mesh-to-Grid Image Reconstruction
Authors:
Ján Koloda,
Jürgen Seiler,
André Kaup
Abstract:
This paper presents a novel method for the reconstruction of images from samples located at non-integer positions, called mesh. This is a common scenario for many image processing applications, such as super-resolution, warping or virtual view generation in multi-camera systems. The proposed method relies on a set of initial estimates that are later refined by a new reliability-based content-adapt…
▽ More
This paper presents a novel method for the reconstruction of images from samples located at non-integer positions, called mesh. This is a common scenario for many image processing applications, such as super-resolution, warping or virtual view generation in multi-camera systems. The proposed method relies on a set of initial estimates that are later refined by a new reliability-based content-adaptive framework that employs denoising in order to reduce the reconstruction error. The reliability of the initial estimate is computed so stronger denoising is applied to less reliable estimates. The proposed technique can improve the reconstruction quality by more than 2 dB (in terms of PSNR) with respect to the initial estimate and it outperforms the state-of-the-art denoising-based refinement by up to 0.7 dB.
△ Less
Submitted 20 May, 2022;
originally announced May 2022.
-
Frequency selective extrapolation with residual filtering for image error concealment
Authors:
Ján Koloda,
Jürgen Seiler,
André Kaup,
Victoria Sánchez,
Antonio M. Peinado
Abstract:
The purpose of signal extrapolation is to estimate unknown signal parts from known samples. This task is especially important for error concealment in image and video communication. For obtaining a high quality reconstruction, assumptions have to be made about the underlying signal in order to solve this underdetermined problem. Among existent reconstruction algorithms, frequency selective extrapo…
▽ More
The purpose of signal extrapolation is to estimate unknown signal parts from known samples. This task is especially important for error concealment in image and video communication. For obtaining a high quality reconstruction, assumptions have to be made about the underlying signal in order to solve this underdetermined problem. Among existent reconstruction algorithms, frequency selective extrapolation (FSE) achieves high performance by assuming that image signals can be sparsely represented in the frequency domain. However, FSE does not take into account the low-pass behaviour of natural images. In this paper, we propose a modified FSE that takes this prior knowledge into account for the modelling, yielding significant PSNR gains.
△ Less
Submitted 16 May, 2022;
originally announced May 2022.
-
Declipping of Speech Signals Using Frequency Selective Extrapolation
Authors:
Markus Jonscher,
Jürgen Seiler,
André Kaup
Abstract:
The reconstruction of clipped speech signals is an important task in audio signal processing to achieve an enhanced audio quality for further processing. In this paper, Frequency Selective Extrapolation (FSE), which is commonly used for error concealment or the reconstruction of incomplete image data, is adapted to be able to restore audio signals which are distorted from clipping. For this, FSE g…
▽ More
The reconstruction of clipped speech signals is an important task in audio signal processing to achieve an enhanced audio quality for further processing. In this paper, Frequency Selective Extrapolation (FSE), which is commonly used for error concealment or the reconstruction of incomplete image data, is adapted to be able to restore audio signals which are distorted from clipping. For this, FSE generates a model of the signal as an iterative superposition of Fourier basis functions. Clipped samples can then be replaced by estimated samples from the model. The performance of the proposed algorithm is evaluated by using different speech test data sets. Compared to other state-of-the-art declipping algorithms, this leads to a maximum gain in SNR of up to 3:5 dB and an average gain of 1 dB.
△ Less
Submitted 7 April, 2022;
originally announced April 2022.
-
Novel Consistency Check For Fast Recursive Reconstruction Of Non-Regularly Sampled Video Data
Authors:
Simon Grosche,
Jürgen Seiler,
André Kaup
Abstract:
Quarter sampling is a novel sensor design that allows for an acquisition of higher resolution images without increasing the number of pixels. When being used for video data, one out of four pixels is measured in each frame. Effectively, this leads to a non-regular spatio-temporal sub-sampling. Compared to purely spatial or temporal sub-sampling, this allows for an increased reconstruction quality,…
▽ More
Quarter sampling is a novel sensor design that allows for an acquisition of higher resolution images without increasing the number of pixels. When being used for video data, one out of four pixels is measured in each frame. Effectively, this leads to a non-regular spatio-temporal sub-sampling. Compared to purely spatial or temporal sub-sampling, this allows for an increased reconstruction quality, as aliasing artifacts can be reduced. For the fast reconstruction of such sensor data with a fixed mask, recursive variant of frequency selective reconstruction (FSR) was proposed. Here, pixels measured in previous frames are projected into the current frame to support its reconstruction. In doing so, the motion between the frames is computed using template matching. Since some of the motion vectors may be erroneous, it is important to perform a proper consistency checking. In this paper, we propose faster consistency checking methods as well as a novel recursive FSR that uses the projected pixels different than in literature and can handle dynamic masks. Altogether, we are able to significantly increase the reconstruction quality by + 1.01 dB compared to the state-of-the-art recursive reconstruction method using a fixed mask. Compared to a single frame reconstruction, an average gain of about + 1.52 dB is achieved for dynamic masks. At the same time, the computational complexity of the consistency checks is reduced by a factor of 13 compared to the literature algorithm.
△ Less
Submitted 17 March, 2022;
originally announced March 2022.
-
Generalized Difference Coder: A Novel Conditional Autoencoder Structure for Video Compression
Authors:
Fabian Brand,
Jürgen Seiler,
André Kaup
Abstract:
Motion compensated inter prediction is a common component of all video coders. The concept was established in traditional hybrid coding and successfully transferred to learning-based video compression. To compress the residual signal after prediction, usually the difference of the two signals is compressed using a standard autoencoder. However, information theory tells us that a general conditiona…
▽ More
Motion compensated inter prediction is a common component of all video coders. The concept was established in traditional hybrid coding and successfully transferred to learning-based video compression. To compress the residual signal after prediction, usually the difference of the two signals is compressed using a standard autoencoder. However, information theory tells us that a general conditional coder is more efficient. In this paper, we provide a solid foundation based on information theory and Shannon entropy to show the potentials but also the limits of conditional coding. Building on those results, we then propose the generalized difference coder, a special case of a conditional coder designed to avoid limiting bottlenecks. With this coder, we are able to achieve average rate savings of 27.8% compared to a standard autoencoder, by only adding a moderate complexity overhead of less than 7%.
△ Less
Submitted 15 December, 2021;
originally announced December 2021.
-
Image Super-Resolution Using T-Tetromino Pixels
Authors:
Simon Grosche,
Andy Regensky,
Jürgen Seiler,
André Kaup
Abstract:
For modern high-resolution imaging sensors, pixel binning is performed in low-lighting conditions and in case high frame rates are required. To recover the original spatial resolution, single-image super-resolution techniques can be applied for upscaling. To achieve a higher image quality after upscaling, we propose a novel binning concept using tetromino-shaped pixels. It is embedded into the fie…
▽ More
For modern high-resolution imaging sensors, pixel binning is performed in low-lighting conditions and in case high frame rates are required. To recover the original spatial resolution, single-image super-resolution techniques can be applied for upscaling. To achieve a higher image quality after upscaling, we propose a novel binning concept using tetromino-shaped pixels. It is embedded into the field of compressed sensing and the coherence is calculated to motivate the sensor layouts used. Next, we investigate the reconstruction quality using tetromino pixels for the first time in literature. Instead of using different types of tetrominoes as proposed elsewhere, we show that using a small repeating cell consisting of only four T-tetrominoes is sufficient. For reconstruction, we use a locally fully connected reconstruction (LFCR) network as well as two classical reconstruction methods from the field of compressed sensing. Using the LFCR network in combination with the proposed tetromino layout, we achieve superior image quality in terms of PSNR, SSIM, and visually compared to conventional single-image super-resolution using the very deep super-resolution (VDSR) network. For PSNR, a gain of up to \SI[retain-explicit-plus]{+1.92}{dB} is achieved.
△ Less
Submitted 22 March, 2023; v1 submitted 17 November, 2021;
originally announced November 2021.
-
Incorporating Prior Information in Compressive Online Robust Principal Component Analysis
Authors:
Huynh Van Luong,
Nikos Deligiannis,
Jurgen Seiler,
Soren Forchhammer,
Andre Kaup
Abstract:
We consider an online version of the robust Principle Component Analysis (PCA), which arises naturally in time-varying source separations such as video foreground-background separation. This paper proposes a compressive online robust PCA with prior information for recursively separating a sequences of frames into sparse and low-rank components from a small set of measurements. In contrast to conve…
▽ More
We consider an online version of the robust Principle Component Analysis (PCA), which arises naturally in time-varying source separations such as video foreground-background separation. This paper proposes a compressive online robust PCA with prior information for recursively separating a sequences of frames into sparse and low-rank components from a small set of measurements. In contrast to conventional batch-based PCA, which processes all the frames directly, the proposed method processes measurements taken from each frame. Moreover, this method can efficiently incorporate multiple prior information, namely previous reconstructed frames, to improve the separation and thereafter, update the prior information for the next frame. We utilize multiple prior information by solving $n\text{-}\ell_{1}$ minimization for incorporating the previous sparse components and using incremental singular value decomposition ($\mathrm{SVD}$) for exploiting the previous low-rank components. We also establish theoretical bounds on the number of measurements required to guarantee successful separation under assumptions of static or slowly-changing low-rank components. Using numerical experiments, we evaluate our bounds and the performance of the proposed algorithm. In addition, we apply the proposed algorithm to online video foreground and background separation from compressive measurements. Experimental results show that the proposed method outperforms the existing methods.
△ Less
Submitted 27 May, 2017; v1 submitted 24 January, 2017;
originally announced January 2017.
-
Sparse Signal Reconstruction with Multiple Side Information using Adaptive Weights for Multiview Sources
Authors:
Huynh Van Luong,
Jürgen Seiler,
André Kaup,
Søren Forchhammer
Abstract:
This work considers reconstructing a target signal in a context of distributed sparse sources. We propose an efficient reconstruction algorithm with the aid of other given sources as multiple side information (SI). The proposed algorithm takes advantage of compressive sensing (CS) with SI and adaptive weights by solving a proposed weighted $n$-$\ell_{1}$ minimization. The proposed algorithm comput…
▽ More
This work considers reconstructing a target signal in a context of distributed sparse sources. We propose an efficient reconstruction algorithm with the aid of other given sources as multiple side information (SI). The proposed algorithm takes advantage of compressive sensing (CS) with SI and adaptive weights by solving a proposed weighted $n$-$\ell_{1}$ minimization. The proposed algorithm computes the adaptive weights in two levels, first each individual intra-SI and then inter-SI weights are iteratively updated at every reconstructed iteration. This two-level optimization leads the proposed reconstruction algorithm with multiple SI using adaptive weights (RAMSIA) to robustly exploit the multiple SIs with different qualities. We experimentally perform our algorithm on generated sparse signals and also correlated feature histograms as multiview sparse sources from a multiview image database. The results show that RAMSIA significantly outperforms both classical CS and CS with single SI, and RAMSIA with higher number of SIs gained more than the one with smaller number of SIs.
△ Less
Submitted 22 May, 2016;
originally announced May 2016.
-
Measurement Bounds for Sparse Signal Reconstruction with Multiple Side Information
Authors:
Huynh Van Luong,
Jurgen Seiler,
Andre Kaup,
Soren Forchhammer,
Nikos Deligiannis
Abstract:
In the context of compressed sensing (CS), this paper considers the problem of reconstructing sparse signals with the aid of other given correlated sources as multiple side information. To address this problem, we theoretically study a generic \textcolor{black}{weighted $n$-$\ell_{1}$ minimization} framework and propose a reconstruction algorithm that leverages multiple side information signals (R…
▽ More
In the context of compressed sensing (CS), this paper considers the problem of reconstructing sparse signals with the aid of other given correlated sources as multiple side information. To address this problem, we theoretically study a generic \textcolor{black}{weighted $n$-$\ell_{1}$ minimization} framework and propose a reconstruction algorithm that leverages multiple side information signals (RAMSI). The proposed RAMSI algorithm computes adaptively optimal weights among the side information signals at every reconstruction iteration. In addition, we establish theoretical bounds on the number of measurements that are required to successfully reconstruct the sparse source by using \textcolor{black}{weighted $n$-$\ell_{1}$ minimization}. The analysis of the established bounds reveal that \textcolor{black}{weighted $n$-$\ell_{1}$ minimization} can achieve sharper bounds and significant performance improvements compared to classical CS. We evaluate experimentally the proposed RAMSI algorithm and the established bounds using synthetic sparse signals as well as correlated feature histograms, extracted from a multiview image database for object recognition. The obtained results show clearly that the proposed algorithm outperforms state-of-the-art algorithms---\textcolor{black}{including classical CS, $\ell_1\text{-}\ell_1$ minimization, Modified-CS, regularized Modified-CS, and weighted $\ell_1$ minimization}---in terms of both the theoretical bounds and the practical performance.
△ Less
Submitted 18 January, 2017; v1 submitted 10 May, 2016;
originally announced May 2016.
-
Concept for a CMOS Image Sensor Suited for Analog Image Pre-Processing
Authors:
Lan Shi,
Christopher Soell,
Andreas Baenisch,
Robert Weigel,
Jürgen Seiler,
Thomas Ussmueller
Abstract:
A concept for a novel CMOS image sensor suited for analog image pre-processing is presented in this paper. As an example, an image restoration algorithm for reducing image noise is applied as image pre-processing in the analog domain. To supply low-latency data input for analog image preprocessing, the proposed concept for a CMOS image sensor offers a new sensor signal acquisition method in 2D. In…
▽ More
A concept for a novel CMOS image sensor suited for analog image pre-processing is presented in this paper. As an example, an image restoration algorithm for reducing image noise is applied as image pre-processing in the analog domain. To supply low-latency data input for analog image preprocessing, the proposed concept for a CMOS image sensor offers a new sensor signal acquisition method in 2D. In comparison to image pre-processing in the digital domain, the proposed analog image pre-processing promises an improved image quality. Furthermore, the image noise at the stage of analog sensor signal acquisition can be used to select the most effective restoration algorithm applied to the analog circuit due to image processing prior to the A/D converter.
△ Less
Submitted 26 February, 2015;
originally announced February 2015.