Search | arXiv e-print repository

doi 10.1109/CVPRW63382.2024.00582

Deep Video Codec Control for Vision Models

Authors: Christoph Reich, Biplob Debnath, Deep Patel, Tim Prangemeier, Daniel Cremers, Srimat Chakradhar

Abstract: Standardized lossy video coding is at the core of almost all real-world video processing pipelines. Rate control is used to enable standard codecs to adapt to different network bandwidth conditions or storage constraints. However, standard video codecs (e.g., H.264) and their rate control modules aim to minimize video distortion w.r.t. human quality assessment. We demonstrate empirically that stan… ▽ More Standardized lossy video coding is at the core of almost all real-world video processing pipelines. Rate control is used to enable standard codecs to adapt to different network bandwidth conditions or storage constraints. However, standard video codecs (e.g., H.264) and their rate control modules aim to minimize video distortion w.r.t. human quality assessment. We demonstrate empirically that standard-coded videos vastly deteriorate the performance of deep vision models. To overcome the deterioration of vision performance, this paper presents the first end-to-end learnable deep video codec control that considers both bandwidth constraints and downstream deep vision performance, while adhering to existing standardization. We demonstrate that our approach better preserves downstream deep vision performance than traditional standard video coding. △ Less

Submitted 16 April, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: Accepted at CVPR 2024 Workshop on AI for Streaming (AIS)

arXiv:2109.01733 [pdf, other]

doi 10.1109/SMARTCOMP52413.2021.00060

F3S: Free Flow Fever Screening

Authors: Kunal Rao, Giuseppe Coviello, Min Feng, Biplob Debnath, Wang-Pin Hsiung, Murugan Sankaradas, Yi Yang, Oliver Po, Utsav Drolia, Srimat Chakradhar

Abstract: Identification of people with elevated body temperature can reduce or dramatically slow down the spread of infectious diseases like COVID-19. We present a novel fever-screening system, F3S, that uses edge machine learning techniques to accurately measure core body temperatures of multiple individuals in a free-flow setting. F3S performs real-time sensor fusion of visual camera with thermal camera… ▽ More Identification of people with elevated body temperature can reduce or dramatically slow down the spread of infectious diseases like COVID-19. We present a novel fever-screening system, F3S, that uses edge machine learning techniques to accurately measure core body temperatures of multiple individuals in a free-flow setting. F3S performs real-time sensor fusion of visual camera with thermal camera data streams to detect elevated body temperature, and it has several unique features: (a) visual and thermal streams represent very different modalities, and we dynamically associate semantically-equivalent regions across visual and thermal frames by using a new, dynamic alignment technique that analyzes content and context in real-time, (b) we track people through occlusions, identify the eye (inner canthus), forehead, face and head regions where possible, and provide an accurate temperature reading by using a prioritized refinement algorithm, and (c) we robustly detect elevated body temperature even in the presence of personal protective equipment like masks, or sunglasses or hats, all of which can be affected by hot weather and lead to spurious temperature readings. F3S has been deployed at over a dozen large commercial establishments, providing contact-less, free-flow, real-time fever screening for thousands of employees and customers in indoors and outdoor settings. △ Less

Submitted 3 September, 2021; originally announced September 2021.

arXiv:2105.08524 [pdf, other]

doi 10.1364/OSAC.425499

Imaging through fog using quadrature lock-in discrimination

Authors: Shashank Kumar, Bapan Debnath, Meena M. S., Julien Fade, Sankar Dhar, Mehdi Alouini, Fabien Bretenaker, Hema Ramachandran

Abstract: We report experiments conducted in the field in the presence of fog, that were aimed at imaging under poor visibility. By means of intensity modulation at the source and two-dimensional quadrature lock-in detection by software at the receiver, a significant enhancement of the contrast-to-noise ratio was achieved in the imaging of beacons over hectometric distances. Further by illuminating the fiel… ▽ More We report experiments conducted in the field in the presence of fog, that were aimed at imaging under poor visibility. By means of intensity modulation at the source and two-dimensional quadrature lock-in detection by software at the receiver, a significant enhancement of the contrast-to-noise ratio was achieved in the imaging of beacons over hectometric distances. Further by illuminating the field of view with a modulated source, the technique helped reveal objects that were earlier obscured due to multiple scattering of light. This method, thus, holds promise of aiding in various forms of navigation under poor visibility due to fog. △ Less

Submitted 17 May, 2021; originally announced May 2021.

Journal ref: OSA Continuum Vol. 4, Issue 5, pp. 1649-1657 (2021)

arXiv:2009.14326 [pdf, other]

Attention-Driven Body Pose Encoding for Human Activity Recognition

Authors: B Debnath, M O'brien, S Kumar, A Behera

Abstract: This article proposes a novel attention-based body pose encoding for human activity recognition that presents a enriched representation of body-pose that is learned. The enriched data complements the 3D body joint position data and improves model performance. In this paper, we propose a novel approach that learns enhanced feature representations from a given sequence of 3D body joints. To achieve… ▽ More This article proposes a novel attention-based body pose encoding for human activity recognition that presents a enriched representation of body-pose that is learned. The enriched data complements the 3D body joint position data and improves model performance. In this paper, we propose a novel approach that learns enhanced feature representations from a given sequence of 3D body joints. To achieve this encoding, the approach exploits 1) a spatial stream which encodes the spatial relationship between various body joints at each time point to learn spatial structure involving the spatial distribution of different body joints 2) a temporal stream that learns the temporal variation of individual body joints over the entire sequence duration to present a temporally enhanced representation. Afterwards, these two pose streams are fused with a multi-head attention mechanism. % adapted from neural machine translation. We also capture the contextual information from the RGB video stream using a Inception-ResNet-V2 model combined with a multi-head attention and a bidirectional Long Short-Term Memory (LSTM) network. %Moreover, we whose performance is enhanced through the multi-head attention mechanism. Finally, the RGB video stream is combined with the fused body pose stream to give a novel end-to-end deep model for effective human activity recognition. △ Less

Submitted 2 October, 2020; v1 submitted 29 September, 2020; originally announced September 2020.

Comments: This paper has been accepted for publication at the IAPR IEEE/Computer Society International Conference on Pattern Recognition (ICPR), Milan, 2021

Journal ref: IAPR IEEE/Computer Society International Conference on Pattern Recognition (ICPR), Milan, 2021

Showing 1–4 of 4 results for author: Debnath, B