-
Deep Video Codec Control for Vision Models
Authors:
Christoph Reich,
Biplob Debnath,
Deep Patel,
Tim Prangemeier,
Daniel Cremers,
Srimat Chakradhar
Abstract:
Standardized lossy video coding is at the core of almost all real-world video processing pipelines. Rate control is used to enable standard codecs to adapt to different network bandwidth conditions or storage constraints. However, standard video codecs (e.g., H.264) and their rate control modules aim to minimize video distortion w.r.t. human quality assessment. We demonstrate empirically that stan…
▽ More
Standardized lossy video coding is at the core of almost all real-world video processing pipelines. Rate control is used to enable standard codecs to adapt to different network bandwidth conditions or storage constraints. However, standard video codecs (e.g., H.264) and their rate control modules aim to minimize video distortion w.r.t. human quality assessment. We demonstrate empirically that standard-coded videos vastly deteriorate the performance of deep vision models. To overcome the deterioration of vision performance, this paper presents the first end-to-end learnable deep video codec control that considers both bandwidth constraints and downstream deep vision performance, while adhering to existing standardization. We demonstrate that our approach better preserves downstream deep vision performance than traditional standard video coding.
△ Less
Submitted 16 April, 2024; v1 submitted 30 August, 2023;
originally announced August 2023.
-
F3S: Free Flow Fever Screening
Authors:
Kunal Rao,
Giuseppe Coviello,
Min Feng,
Biplob Debnath,
Wang-Pin Hsiung,
Murugan Sankaradas,
Yi Yang,
Oliver Po,
Utsav Drolia,
Srimat Chakradhar
Abstract:
Identification of people with elevated body temperature can reduce or dramatically slow down the spread of infectious diseases like COVID-19. We present a novel fever-screening system, F3S, that uses edge machine learning techniques to accurately measure core body temperatures of multiple individuals in a free-flow setting. F3S performs real-time sensor fusion of visual camera with thermal camera…
▽ More
Identification of people with elevated body temperature can reduce or dramatically slow down the spread of infectious diseases like COVID-19. We present a novel fever-screening system, F3S, that uses edge machine learning techniques to accurately measure core body temperatures of multiple individuals in a free-flow setting. F3S performs real-time sensor fusion of visual camera with thermal camera data streams to detect elevated body temperature, and it has several unique features: (a) visual and thermal streams represent very different modalities, and we dynamically associate semantically-equivalent regions across visual and thermal frames by using a new, dynamic alignment technique that analyzes content and context in real-time, (b) we track people through occlusions, identify the eye (inner canthus), forehead, face and head regions where possible, and provide an accurate temperature reading by using a prioritized refinement algorithm, and (c) we robustly detect elevated body temperature even in the presence of personal protective equipment like masks, or sunglasses or hats, all of which can be affected by hot weather and lead to spurious temperature readings. F3S has been deployed at over a dozen large commercial establishments, providing contact-less, free-flow, real-time fever screening for thousands of employees and customers in indoors and outdoor settings.
△ Less
Submitted 3 September, 2021;
originally announced September 2021.
-
Imaging through fog using quadrature lock-in discrimination
Authors:
Shashank Kumar,
Bapan Debnath,
Meena M. S.,
Julien Fade,
Sankar Dhar,
Mehdi Alouini,
Fabien Bretenaker,
Hema Ramachandran
Abstract:
We report experiments conducted in the field in the presence of fog, that were aimed at imaging under poor visibility. By means of intensity modulation at the source and two-dimensional quadrature lock-in detection by software at the receiver, a significant enhancement of the contrast-to-noise ratio was achieved in the imaging of beacons over hectometric distances. Further by illuminating the fiel…
▽ More
We report experiments conducted in the field in the presence of fog, that were aimed at imaging under poor visibility. By means of intensity modulation at the source and two-dimensional quadrature lock-in detection by software at the receiver, a significant enhancement of the contrast-to-noise ratio was achieved in the imaging of beacons over hectometric distances. Further by illuminating the field of view with a modulated source, the technique helped reveal objects that were earlier obscured due to multiple scattering of light. This method, thus, holds promise of aiding in various forms of navigation under poor visibility due to fog.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
Attention-Driven Body Pose Encoding for Human Activity Recognition
Authors:
B Debnath,
M O'brien,
S Kumar,
A Behera
Abstract:
This article proposes a novel attention-based body pose encoding for human activity recognition that presents a enriched representation of body-pose that is learned. The enriched data complements the 3D body joint position data and improves model performance. In this paper, we propose a novel approach that learns enhanced feature representations from a given sequence of 3D body joints. To achieve…
▽ More
This article proposes a novel attention-based body pose encoding for human activity recognition that presents a enriched representation of body-pose that is learned. The enriched data complements the 3D body joint position data and improves model performance. In this paper, we propose a novel approach that learns enhanced feature representations from a given sequence of 3D body joints. To achieve this encoding, the approach exploits 1) a spatial stream which encodes the spatial relationship between various body joints at each time point to learn spatial structure involving the spatial distribution of different body joints 2) a temporal stream that learns the temporal variation of individual body joints over the entire sequence duration to present a temporally enhanced representation. Afterwards, these two pose streams are fused with a multi-head attention mechanism. % adapted from neural machine translation. We also capture the contextual information from the RGB video stream using a Inception-ResNet-V2 model combined with a multi-head attention and a bidirectional Long Short-Term Memory (LSTM) network. %Moreover, we whose performance is enhanced through the multi-head attention mechanism. Finally, the RGB video stream is combined with the fused body pose stream to give a novel end-to-end deep model for effective human activity recognition.
△ Less
Submitted 2 October, 2020; v1 submitted 29 September, 2020;
originally announced September 2020.