-
LAPS-Diff: A Diffusion-Based Framework for Singing Voice Synthesis With Language Aware Prosody-Style Guided Learning
Authors:
Sandipan Dhar,
Mayank Gupta,
Preeti Rao
Abstract:
The field of Singing Voice Synthesis (SVS) has seen significant advancements in recent years due to the rapid progress of diffusion-based approaches. However, capturing vocal style, genre-specific pitch inflections, and language-dependent characteristics remains challenging, particularly in low-resource scenarios. To address this, we propose LAPS-Diff, a diffusion model integrated with language-aw…
▽ More
The field of Singing Voice Synthesis (SVS) has seen significant advancements in recent years due to the rapid progress of diffusion-based approaches. However, capturing vocal style, genre-specific pitch inflections, and language-dependent characteristics remains challenging, particularly in low-resource scenarios. To address this, we propose LAPS-Diff, a diffusion model integrated with language-aware embeddings and a vocal-style guided learning mechanism, specifically designed for Bollywood Hindi singing style. We curate a Hindi SVS dataset and leverage pre-trained language models to extract word and phone-level embeddings for an enriched lyrics representation. Additionally, we incorporated a style encoder and a pitch extraction model to compute style and pitch losses, capturing features essential to the naturalness and expressiveness of the synthesized singing, particularly in terms of vocal style and pitch variations. Furthermore, we utilize MERT and IndicWav2Vec models to extract musical and contextual embeddings, serving as conditional priors to refine the acoustic feature generation process further. Based on objective and subjective evaluations, we demonstrate that LAPS-Diff significantly improves the quality of the generated samples compared to the considered state-of-the-art (SOTA) model for our constrained dataset that is typical of the low resource scenario.
△ Less
Submitted 7 July, 2025;
originally announced July 2025.
-
Multi-modal brain encoding models for multi-modal stimuli
Authors:
Subba Reddy Oota,
Khushbu Pahwa,
Mounika Marreddy,
Maneesh Singh,
Manish Gupta,
Bapi S. Raju
Abstract:
Despite participants engaging in unimodal stimuli, such as watching images or silent videos, recent work has demonstrated that multi-modal Transformer models can predict visual brain activity impressively well, even with incongruent modality representations. This raises the question of how accurately these multi-modal models can predict brain activity when participants are engaged in multi-modal s…
▽ More
Despite participants engaging in unimodal stimuli, such as watching images or silent videos, recent work has demonstrated that multi-modal Transformer models can predict visual brain activity impressively well, even with incongruent modality representations. This raises the question of how accurately these multi-modal models can predict brain activity when participants are engaged in multi-modal stimuli. As these models grow increasingly popular, their use in studying neural activity provides insights into how our brains respond to such multi-modal naturalistic stimuli, i.e., where it separates and integrates information across modalities through a hierarchy of early sensory regions to higher cognition. We investigate this question by using multiple unimodal and two types of multi-modal models-cross-modal and jointly pretrained-to determine which type of model is more relevant to fMRI brain activity when participants are engaged in watching movies. We observe that both types of multi-modal models show improved alignment in several language and visual regions. This study also helps in identifying which brain regions process unimodal versus multi-modal information. We further investigate the contribution of each modality to multi-modal alignment by carefully removing unimodal features one by one from multi-modal representations, and find that there is additional information beyond the unimodal embeddings that is processed in the visual and language regions. Based on this investigation, we find that while for cross-modal models, their brain alignment is partially attributed to the video modality; for jointly pretrained models, it is partially attributed to both the video and audio modalities. This serves as a strong motivation for the neuroscience community to investigate the interpretability of these models for deepening our understanding of multi-modal information processing in brain.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
CurviTrack: Curvilinear Trajectory Tracking for High-speed Chase of a USV
Authors:
Parakh M. Gupta,
Ondřej Procházka,
Tiago Nascimento,
Martin Saska
Abstract:
Heterogeneous robot teams used in marine environments incur time-and-energy penalties when the marine vehicle has to halt the mission to allow the autonomous aerial vehicle to land for recharging. In this paper, we present a solution for this problem using a novel drag-aware model formulation which is coupled with MPC, and therefore, enables tracking and landing during high-speed curvilinear traje…
▽ More
Heterogeneous robot teams used in marine environments incur time-and-energy penalties when the marine vehicle has to halt the mission to allow the autonomous aerial vehicle to land for recharging. In this paper, we present a solution for this problem using a novel drag-aware model formulation which is coupled with MPC, and therefore, enables tracking and landing during high-speed curvilinear trajectories of an USV without any communication. Compared to the state-of-the-art, our approach yields 40% decrease in prediction errors, and provides a 3-fold increase in certainty of predictions. Consequently, this leads to a 30% improvement in tracking performance and 40% higher success in landing on a moving USV even during aggressive turns that are unfeasible for conventional marine missions. We test our approach in two different real-world scenarios with marine vessels of two different sizes and further solidify our results through statistical analysis in simulation to demonstrate the robustness of our method.
△ Less
Submitted 28 February, 2025;
originally announced February 2025.
-
Model predictive control-based trajectory generation for agile landing of unmanned aerial vehicle on a moving boat
Authors:
Ondřej Procházka,
Filip Novák,
Tomáš Báča,
Parakh M. Gupta,
Robert Pěnička,
Martin Saska
Abstract:
This paper proposes a novel trajectory generation method based on Model Predictive Control (MPC) for agile landing of an Unmanned Aerial Vehicle (UAV) onto an Unmanned Surface Vehicle (USV)'s deck in harsh conditions. The trajectory generation exploits the state predictions of the USV to create periodically updated trajectories for a multirotor UAV to precisely land on the deck of a moving USV eve…
▽ More
This paper proposes a novel trajectory generation method based on Model Predictive Control (MPC) for agile landing of an Unmanned Aerial Vehicle (UAV) onto an Unmanned Surface Vehicle (USV)'s deck in harsh conditions. The trajectory generation exploits the state predictions of the USV to create periodically updated trajectories for a multirotor UAV to precisely land on the deck of a moving USV even in cases where the deck's inclination is continuously changing. We use an MPC-based scheme to create trajectories that consider both the UAV dynamics and the predicted states of the USV up to the first derivative of position and orientation. Compared to existing approaches, our method dynamically modifies the penalization matrices to precisely follow the corresponding states with respect to the flight phase. Especially during the landing maneuver, the UAV synchronizes attitude with the USV's, allowing for fast landing on a tilted deck. Simulations show the method's reliability in various sea conditions up to Rough sea (wave height 4 m), outperforming state-of-the-art methods in landing speed and accuracy, with twice the precision on average. Finally, real-world experiments validate the simulation results, demonstrating robust landings on a moving USV, while all computations are performed in real-time onboard the UAV.
△ Less
Submitted 10 December, 2024;
originally announced December 2024.
-
Direct Speech-to-Speech Neural Machine Translation: A Survey
Authors:
Mahendra Gupta,
Maitreyee Dutta,
Chandresh Kumar Maurya
Abstract:
Speech-to-Speech Translation (S2ST) models transform speech from one language to another target language with the same linguistic information. S2ST is important for bridging the communication gap among communities and has diverse applications. In recent years, researchers have introduced direct S2ST models, which have the potential to translate speech without relying on intermediate text generatio…
▽ More
Speech-to-Speech Translation (S2ST) models transform speech from one language to another target language with the same linguistic information. S2ST is important for bridging the communication gap among communities and has diverse applications. In recent years, researchers have introduced direct S2ST models, which have the potential to translate speech without relying on intermediate text generation, have better decoding latency, and the ability to preserve paralinguistic and non-linguistic features. However, direct S2ST has yet to achieve quality performance for seamless communication and still lags behind the cascade models in terms of performance, especially in real-world translation. To the best of our knowledge, no comprehensive survey is available on the direct S2ST system, which beginners and advanced researchers can look upon for a quick survey. The present work provides a comprehensive review of direct S2ST models, data and application issues, and performance metrics. We critically analyze the models' performance over the benchmark datasets and provide research challenges and future directions.
△ Less
Submitted 13 November, 2024;
originally announced November 2024.
-
GPT-4o System Card
Authors:
OpenAI,
:,
Aaron Hurst,
Adam Lerer,
Adam P. Goucher,
Adam Perelman,
Aditya Ramesh,
Aidan Clark,
AJ Ostrow,
Akila Welihinda,
Alan Hayes,
Alec Radford,
Aleksander Mądry,
Alex Baker-Whitcomb,
Alex Beutel,
Alex Borzunov,
Alex Carney,
Alex Chow,
Alex Kirillov,
Alex Nichol,
Alex Paino,
Alex Renzin,
Alex Tachard Passos,
Alexander Kirillov,
Alexi Christakis
, et al. (395 additional authors not shown)
Abstract:
GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 mil…
▽ More
GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50\% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models. In line with our commitment to building AI safely and consistent with our voluntary commitments to the White House, we are sharing the GPT-4o System Card, which includes our Preparedness Framework evaluations. In this System Card, we provide a detailed look at GPT-4o's capabilities, limitations, and safety evaluations across multiple categories, focusing on speech-to-speech while also evaluating text and image capabilities, and measures we've implemented to ensure the model is safe and aligned. We also include third-party assessments on dangerous capabilities, as well as discussion of potential societal impacts of GPT-4o's text and vision capabilities.
△ Less
Submitted 25 October, 2024;
originally announced October 2024.
-
Towards ubiquitous radio access using nanodiamond based quantum receivers
Authors:
Qunsong Zeng,
Jiahua Zhang,
Madhav Gupta,
Zhiqin Chu,
Kaibin Huang
Abstract:
The development of sixth-generation (6G) wireless communication systems demands innovative solutions to address challenges in the deployment of a large number of base stations and the detection of multi-band signals. Quantum technology, specifically nitrogen vacancy (NV) centers in diamonds, offers promising potential for the development of compact, robust receivers capable of supporting multiple…
▽ More
The development of sixth-generation (6G) wireless communication systems demands innovative solutions to address challenges in the deployment of a large number of base stations and the detection of multi-band signals. Quantum technology, specifically nitrogen vacancy (NV) centers in diamonds, offers promising potential for the development of compact, robust receivers capable of supporting multiple users. For the first time, we propose a multiple access scheme using fluorescent nanodiamonds (FNDs) containing NV centers as nano-antennas. The unique response of each FND to applied microwaves allows for distinguishable patterns of fluorescence intensities, enabling multi-user signal demodulation. We demonstrate the effectiveness of our FNDs-implemented receiver by simultaneously transmitting two uncoded digitally modulated information bit streams from two separate transmitters, achieving a low bit error ratio. Moreover, our design supports tunable frequency band communication and reference-free signal decoupling, reducing communication overhead. Furthermore, we implement a miniaturized device comprising all essential components, highlighting its practicality as a receiver serving multiple users simultaneously. This approach paves the way for the integration of quantum sensing technologies in future 6G wireless communication networks.
△ Less
Submitted 28 September, 2024;
originally announced September 2024.
-
Photon Inhibition for Energy-Efficient Single-Photon Imaging
Authors:
Lucas J. Koerner,
Shantanu Gupta,
Atul Ingle,
Mohit Gupta
Abstract:
Single-photon cameras (SPCs) are emerging as sensors of choice for various challenging imaging applications. One class of SPCs based on the single-photon avalanche diode (SPAD) detects individual photons using an avalanche process; the raw photon data can then be processed to extract scene information under extremely low light, high dynamic range, and rapid motion. Yet, single-photon sensitivity i…
▽ More
Single-photon cameras (SPCs) are emerging as sensors of choice for various challenging imaging applications. One class of SPCs based on the single-photon avalanche diode (SPAD) detects individual photons using an avalanche process; the raw photon data can then be processed to extract scene information under extremely low light, high dynamic range, and rapid motion. Yet, single-photon sensitivity in SPADs comes at a cost -- each photon detection consumes more energy than that of a CMOS camera. This avalanche power significantly limits sensor resolution and could restrict widespread adoption of SPAD-based SPCs. We propose a computational-imaging approach called \emph{photon inhibition} to address this challenge. Photon inhibition strategically allocates detections in space and time based on downstream inference task goals and resource constraints. We develop lightweight, on-sensor computational inhibition policies that use past photon data to disable SPAD pixels in real-time, to select the most informative future photons. As case studies, we design policies tailored for image reconstruction and edge detection, and demonstrate, both via simulations and real SPC captured data, considerable reduction in photon detections (over 90\% of photons) while maintaining task performance metrics. Our work raises the question of ``which photons should be detected?'', and paves the way for future energy-efficient single-photon imaging.
△ Less
Submitted 26 September, 2024;
originally announced September 2024.
-
Radiance Fields from Photons
Authors:
Sacha Jungerman,
Aryan Garg,
Mohit Gupta
Abstract:
Neural radiance fields, or NeRFs, have become the de facto approach for high-quality view synthesis from a collection of images captured from multiple viewpoints. However, many issues remain when capturing images in-the-wild under challenging conditions, such as low light, high dynamic range, or rapid motion leading to smeared reconstructions with noticeable artifacts. In this work, we introduce q…
▽ More
Neural radiance fields, or NeRFs, have become the de facto approach for high-quality view synthesis from a collection of images captured from multiple viewpoints. However, many issues remain when capturing images in-the-wild under challenging conditions, such as low light, high dynamic range, or rapid motion leading to smeared reconstructions with noticeable artifacts. In this work, we introduce quanta radiance fields, a novel class of neural radiance fields that are trained at the granularity of individual photons using single-photon cameras (SPCs). We develop theory and practical computational techniques for building radiance fields and estimating dense camera poses from unconventional, stochastic, and high-speed binary frame sequences captured by SPCs. We demonstrate, both via simulations and a SPC hardware prototype, high-fidelity reconstructions under high-speed motion, in low light, and for extreme dynamic range settings.
△ Less
Submitted 3 December, 2024; v1 submitted 12 July, 2024;
originally announced July 2024.
-
Generalized Event Cameras
Authors:
Varun Sundar,
Matthew Dutson,
Andrei Ardelean,
Claudio Bruschini,
Edoardo Charbon,
Mohit Gupta
Abstract:
Event cameras capture the world at high time resolution and with minimal bandwidth requirements. However, event streams, which only encode changes in brightness, do not contain sufficient scene information to support a wide variety of downstream tasks. In this work, we design generalized event cameras that inherently preserve scene intensity in a bandwidth-efficient manner. We generalize event cam…
▽ More
Event cameras capture the world at high time resolution and with minimal bandwidth requirements. However, event streams, which only encode changes in brightness, do not contain sufficient scene information to support a wide variety of downstream tasks. In this work, we design generalized event cameras that inherently preserve scene intensity in a bandwidth-efficient manner. We generalize event cameras in terms of when an event is generated and what information is transmitted. To implement our designs, we turn to single-photon sensors that provide digital access to individual photon detections; this modality gives us the flexibility to realize a rich space of generalized event cameras. Our single-photon event cameras are capable of high-speed, high-fidelity imaging at low readout rates. Consequently, these event cameras can support plug-and-play downstream inference, without capturing new event datasets or designing specialized event-vision models. As a practical implication, our designs, which involve lightweight and near-sensor-compatible computations, provide a way to use single-photon sensors without exorbitant bandwidth costs.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Streaming quanta sensors for online, high-performance imaging and vision
Authors:
Tianyi Zhang,
Matthew Dutson,
Vivek Boominathan,
Mohit Gupta,
Ashok Veeraraghavan
Abstract:
Recently quanta image sensors (QIS) -- ultra-fast, zero-read-noise binary image sensors -- have demonstrated remarkable imaging capabilities in many challenging scenarios. Despite their potential, the adoption of these sensors is severely hampered by (a) high data rates and (b) the need for new computational pipelines to handle the unconventional raw data. We introduce a simple, low-bandwidth comp…
▽ More
Recently quanta image sensors (QIS) -- ultra-fast, zero-read-noise binary image sensors -- have demonstrated remarkable imaging capabilities in many challenging scenarios. Despite their potential, the adoption of these sensors is severely hampered by (a) high data rates and (b) the need for new computational pipelines to handle the unconventional raw data. We introduce a simple, low-bandwidth computational pipeline to address these challenges. Our approach is based on a novel streaming representation with a small memory footprint, efficiently capturing intensity information at multiple temporal scales. Updating the representation requires only 16 floating-point operations/pixel, which can be efficiently computed online at the native frame rate of the binary frames. We use a neural network operating on this representation to reconstruct videos in real-time (10-30 fps). We illustrate why such representation is well-suited for these emerging sensors, and how it offers low latency and high frame rate while retaining flexibility for downstream computer vision. Our approach results in significant data bandwidth reductions ~100X and real-time image reconstruction and computer vision -- $10^4$-$10^5$ reduction in computation than existing state-of-the-art approach while maintaining comparable quality. To the best of our knowledge, our approach is the first to achieve online, real-time image reconstruction on QIS.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Damage identification of offshore jacket platforms in a digital twin framework considering optimal sensor placement
Authors:
Mengmeng Wang,
Atilla Incecik,
Shizhe Feng,
M. K. Gupta,
Grzegorz Krlolczyk,
Z Li
Abstract:
A new digital twin (DT) framework with optimal sensor placement (OSP) is proposed to accurately calculate the modal responses and identify the damage ratios of the offshore jacket platforms. The proposed damage identification framework consists of two models (namely one OSP model and one damage identification model). The OSP model adopts the multi-objective Lichtenberg algorithm (MOLA) to perform…
▽ More
A new digital twin (DT) framework with optimal sensor placement (OSP) is proposed to accurately calculate the modal responses and identify the damage ratios of the offshore jacket platforms. The proposed damage identification framework consists of two models (namely one OSP model and one damage identification model). The OSP model adopts the multi-objective Lichtenberg algorithm (MOLA) to perform the sensor number/location optimization to make a good balance between the sensor cost and the modal calculation accuracy. In the damage identification model, the Markov Chain Monte Carlo (MCMC)-Bayesian method is developed to calculate the structural damage ratios based on the modal information obtained from the sensory measurements, where the uncertainties of the structural parameters are quantified. The proposed method is validated using an offshore jacket platform, and the analysis results demonstrate efficient identification of the structural damage location and severity.
△ Less
Submitted 26 March, 2024;
originally announced April 2024.
-
Towards 3D Vision with Low-Cost Single-Photon Cameras
Authors:
Fangzhou Mu,
Carter Sifferman,
Sacha Jungerman,
Yiquan Li,
Mark Han,
Michael Gleicher,
Mohit Gupta,
Yin Li
Abstract:
We present a method for reconstructing 3D shape of arbitrary Lambertian objects based on measurements by miniature, energy-efficient, low-cost single-photon cameras. These cameras, operating as time resolved image sensors, illuminate the scene with a very fast pulse of diffuse light and record the shape of that pulse as it returns back from the scene at a high temporal resolution. We propose to mo…
▽ More
We present a method for reconstructing 3D shape of arbitrary Lambertian objects based on measurements by miniature, energy-efficient, low-cost single-photon cameras. These cameras, operating as time resolved image sensors, illuminate the scene with a very fast pulse of diffuse light and record the shape of that pulse as it returns back from the scene at a high temporal resolution. We propose to model this image formation process, account for its non-idealities, and adapt neural rendering to reconstruct 3D geometry from a set of spatially distributed sensors with known poses. We show that our approach can successfully recover complex 3D shapes from simulated data. We further demonstrate 3D object reconstruction from real-world captures, utilizing measurements from a commodity proximity sensor. Our work draws a connection between image-based modeling and active range scanning and is a step towards 3D vision with single-photon cameras.
△ Less
Submitted 29 March, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
FocusMAE: Gallbladder Cancer Detection from Ultrasound Videos with Focused Masked Autoencoders
Authors:
Soumen Basu,
Mayuna Gupta,
Chetan Madan,
Pankaj Gupta,
Chetan Arora
Abstract:
In recent years, automated Gallbladder Cancer (GBC) detection has gained the attention of researchers. Current state-of-the-art (SOTA) methodologies relying on ultrasound sonography (US) images exhibit limited generalization, emphasizing the need for transformative approaches. We observe that individual US frames may lack sufficient information to capture disease manifestation. This study advocate…
▽ More
In recent years, automated Gallbladder Cancer (GBC) detection has gained the attention of researchers. Current state-of-the-art (SOTA) methodologies relying on ultrasound sonography (US) images exhibit limited generalization, emphasizing the need for transformative approaches. We observe that individual US frames may lack sufficient information to capture disease manifestation. This study advocates for a paradigm shift towards video-based GBC detection, leveraging the inherent advantages of spatiotemporal representations. Employing the Masked Autoencoder (MAE) for representation learning, we address shortcomings in conventional image-based methods. We propose a novel design called FocusMAE to systematically bias the selection of masking tokens from high-information regions, fostering a more refined representation of malignancy. Additionally, we contribute the most extensive US video dataset for GBC detection. We also note that, this is the first study on US video-based GBC detection. We validate the proposed methods on the curated dataset, and report a new state-of-the-art (SOTA) accuracy of 96.4% for the GBC detection problem, against an accuracy of 84% by current Image-based SOTA - GBCNet, and RadFormer, and 94.7% by Video-based SOTA - AdaMAE. We further demonstrate the generality of the proposed FocusMAE on a public CT-based Covid detection dataset, reporting an improvement in accuracy by 3.3% over current baselines. The source code and pretrained models are available at: https://gbc-iitd.github.io/focusmae
△ Less
Submitted 29 March, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
SoDaCam: Software-defined Cameras via Single-Photon Imaging
Authors:
Varun Sundar,
Andrei Ardelean,
Tristan Swedish,
Claudio Bruschini,
Edoardo Charbon,
Mohit Gupta
Abstract:
Reinterpretable cameras are defined by their post-processing capabilities that exceed traditional imaging. We present "SoDaCam" that provides reinterpretable cameras at the granularity of photons, from photon-cubes acquired by single-photon devices. Photon-cubes represent the spatio-temporal detections of photons as a sequence of binary frames, at frame-rates as high as 100 kHz. We show that simpl…
▽ More
Reinterpretable cameras are defined by their post-processing capabilities that exceed traditional imaging. We present "SoDaCam" that provides reinterpretable cameras at the granularity of photons, from photon-cubes acquired by single-photon devices. Photon-cubes represent the spatio-temporal detections of photons as a sequence of binary frames, at frame-rates as high as 100 kHz. We show that simple transformations of the photon-cube, or photon-cube projections, provide the functionality of numerous imaging systems including: exposure bracketing, flutter shutter cameras, video compressive systems, event cameras, and even cameras that move during exposure. Our photon-cube projections offer the flexibility of being software-defined constructs that are only limited by what is computable, and shot-noise. We exploit this flexibility to provide new capabilities for the emulated cameras. As an added benefit, our projections provide camera-dependent compression of photon-cubes, which we demonstrate using an implementation of our projections on a novel compute architecture that is designed for single-photon imaging.
△ Less
Submitted 8 September, 2023; v1 submitted 31 August, 2023;
originally announced September 2023.
-
Design and Characterization of Crossbar architecture Velostat-based Flexible Writing Pad
Authors:
Mohee Datta Gupta
Abstract:
Pressure sensors are popular in a large variety of industries. For some applications, it is critical for these sensors to come in a flexible form factor. With the development of new synthetic polymers and novel fabrication techniques, flexible pressure sensing arrays are more easily accessible and can serve a variety of applications. As part of this dissertation, we demonstrate one such applicatio…
▽ More
Pressure sensors are popular in a large variety of industries. For some applications, it is critical for these sensors to come in a flexible form factor. With the development of new synthetic polymers and novel fabrication techniques, flexible pressure sensing arrays are more easily accessible and can serve a variety of applications. As part of this dissertation, we demonstrate one such application of the same by developing a low-cost flexible writing pad and doing crosstalk analysis on sensors with similar working principles. We present a low-cost, flexible writing pad that uses a 16x16 pressure sensing matrix based on the piezoresistive thin film of velostat. The writing area is 5 cm x 5 cm with an effective pixel area of 0.06 mm^2. A read-out circuit is designed to detect the change in resistance of the velostat pixel using a voltage divider. A microprocessor raster scans through the sensor pixel matrix to obtain a data frame of 256 numbers. This data is processed using techniques like squaring and normalising (S\&N), Gaussian blurring, and adaptive thresholding to generate a more readable output. The writing pad is able to resolve characters larger than 2 cm in length. The flexible writing pad produces legible output while flexed at a bending radius of up to 4 cm. Such flexibility promises to enhance the usability and portability of the writing pad significantly. We noticed that the raw data produced by the writing pad had a lot of crosstalk which we were subsequently able to resolve using the algorithms mentioned above. Such crosstalk has been reported in literature multiple times and is common, especially for sensors of the crossbar architecture.Crosstalk, in a sensor matrix, is the unwanted signal obtained at a sensor pixel that is not directly related to the stimulus. This paper presents a novel approach towards quantifying the crosstalk characteristics of a sensor matrix.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
Adoption of Blockchain Platform for Security Enhancement in Energy Transaction
Authors:
Madhuresh Gupta,
Soumyakanti Giri,
Prabhakar Karthikeyan Shanmugam,
Mahajan Sagar Bhaskar,
Jens Bo Holm-Nielsen,
Sanjeevikumar Padmanaban
Abstract:
Renewable energy has become a reality in the present and is being preferred by countries to become a considerable part of the central grid. With the increasing adoption of renewables it will soon become crucial to have a platform which would facilitate secure transaction of energy for consumers as well as producers. This paper discusses and implements a Blockchain based platform which enhances and…
▽ More
Renewable energy has become a reality in the present and is being preferred by countries to become a considerable part of the central grid. With the increasing adoption of renewables it will soon become crucial to have a platform which would facilitate secure transaction of energy for consumers as well as producers. This paper discusses and implements a Blockchain based platform which enhances and establishes a secure method to exchange energy. It would also lower the operation costs and accommodate other technologies like the IoT. A basic market mechanism has been developed for peer-to-peer (P2P) transaction of energy where different types of entities can be directly involved. Another concept which is discussed in the paper is the consensus mechanism and whether the model market could hold the security and privacy of the individual users.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
Forecaster-aided User Association and Load Balancing in Multi-band Mobile Networks
Authors:
Manan Gupta,
Sandeep Chinchali,
Paul Varkey,
Jeffrey G. Andrews
Abstract:
Cellular networks are becoming increasingly heterogeneous with higher base station (BS) densities and ever more frequency bands, making BS selection and band assignment key decisions in terms of rate and coverage. In this paper, we decompose the mobility-aware user association task into (i) forecasting of user rate and then (ii) convex utility maximization for user association accounting for the e…
▽ More
Cellular networks are becoming increasingly heterogeneous with higher base station (BS) densities and ever more frequency bands, making BS selection and band assignment key decisions in terms of rate and coverage. In this paper, we decompose the mobility-aware user association task into (i) forecasting of user rate and then (ii) convex utility maximization for user association accounting for the effects of BS load and handover overheads. Using a linear combination of normalized mean-squared error and normalized discounted cumulative gain as a novel loss function, a recurrent deep neural network is trained to reliably forecast the mobile users' future rates. Based on the forecast, the controller optimizes the association decisions to maximize the service rate-based network utility using our computationally efficient (speed up of 100x versus generic convex solver) algorithm based on the Frank-Wolfe method. Using an industry-grade network simulator developed by Meta, we show that the proposed model predictive control (MPC) approach improves the 5th percentile service rate by 3.5x compared to the traditional signal strength-based association, reduces the median number of handovers by 7x compared to a handover agnostic strategy, and achieves service rates close to a genie-aided scheme. Furthermore, our model-based approach is significantly more sample-efficient (needs 100x less training data) compared to model-free reinforcement learning (RL), and generalizes well across different user drop scenarios.
△ Less
Submitted 23 January, 2023;
originally announced January 2023.
-
3D Scene Inference from Transient Histograms
Authors:
Sacha Jungerman,
Atul Ingle,
Yin Li,
Mohit Gupta
Abstract:
Time-resolved image sensors that capture light at pico-to-nanosecond timescales were once limited to niche applications but are now rapidly becoming mainstream in consumer devices. We propose low-cost and low-power imaging modalities that capture scene information from minimal time-resolved image sensors with as few as one pixel. The key idea is to flood illuminate large scene patches (or the enti…
▽ More
Time-resolved image sensors that capture light at pico-to-nanosecond timescales were once limited to niche applications but are now rapidly becoming mainstream in consumer devices. We propose low-cost and low-power imaging modalities that capture scene information from minimal time-resolved image sensors with as few as one pixel. The key idea is to flood illuminate large scene patches (or the entire scene) with a pulsed light source and measure the time-resolved reflected light by integrating over the entire illuminated area. The one-dimensional measured temporal waveform, called \emph{transient}, encodes both distances and albedoes at all visible scene points and as such is an aggregate proxy for the scene's 3D geometry. We explore the viability and limitations of the transient waveforms by themselves for recovering scene information, and also when combined with traditional RGB cameras. We show that plane estimation can be performed from a single transient and that using only a few more it is possible to recover a depth map of the whole scene. We also show two proof-of-concept hardware prototypes that demonstrate the feasibility of our approach for compact, mobile, and budget-limited applications.
△ Less
Submitted 9 November, 2022;
originally announced November 2022.
-
Unsupervised Contrastive Learning of Image Representations from Ultrasound Videos with Hard Negative Mining
Authors:
Soumen Basu,
Somanshu Singla,
Mayank Gupta,
Pratyaksha Rana,
Pankaj Gupta,
Chetan Arora
Abstract:
Rich temporal information and variations in viewpoints make video data an attractive choice for learning image representations using unsupervised contrastive learning (UCL) techniques. State-of-the-art (SOTA) contrastive learning techniques consider frames within a video as positives in the embedding space, whereas the frames from other videos are considered negatives. We observe that unlike multi…
▽ More
Rich temporal information and variations in viewpoints make video data an attractive choice for learning image representations using unsupervised contrastive learning (UCL) techniques. State-of-the-art (SOTA) contrastive learning techniques consider frames within a video as positives in the embedding space, whereas the frames from other videos are considered negatives. We observe that unlike multiple views of an object in natural scene videos, an Ultrasound (US) video captures different 2D slices of an organ. Hence, there is almost no similarity between the temporally distant frames of even the same US video. In this paper we propose to instead utilize such frames as hard negatives. We advocate mining both intra-video and cross-video negatives in a hardness-sensitive negative mining curriculum in a UCL framework to learn rich image representations. We deploy our framework to learn the representations of Gallbladder (GB) malignancy from US videos. We also construct the first large-scale US video dataset containing 64 videos and 15,800 frames for learning GB representations. We show that the standard ResNet50 backbone trained with our framework improves the accuracy of models pretrained with SOTA UCL techniques as well as supervised pretrained models on ImageNet for the GB malignancy detection task by 2-6%. We further validate the generalizability of our method on a publicly available lung US image dataset of COVID-19 pathologies and show an improvement of 1.5% compared to SOTA. Source code, dataset, and models are available at https://gbc-iitd.github.io/usucl.
△ Less
Submitted 26 July, 2022;
originally announced July 2022.
-
Classification of COVID-19 Patients with their Severity Level from Chest CT Scans using Transfer Learning
Authors:
Mansi Gupta,
Aman Swaraj,
Karan Verma
Abstract:
Background and Objective: During pandemics, the use of artificial intelligence (AI) approaches combined with biomedical science play a significant role in reducing the burden on the healthcare systems and physicians. The rapid increment in cases of COVID-19 has led to an increase in demand for hospital beds and other medical equipment. However, since medical facilities are limited, it is recommend…
▽ More
Background and Objective: During pandemics, the use of artificial intelligence (AI) approaches combined with biomedical science play a significant role in reducing the burden on the healthcare systems and physicians. The rapid increment in cases of COVID-19 has led to an increase in demand for hospital beds and other medical equipment. However, since medical facilities are limited, it is recommended to diagnose patients as per the severity of the infection. Keeping this in mind, we share our research in detecting COVID-19 as well as assessing its severity using chest-CT scans and Deep Learning pre-trained models. Dataset: We have collected a total of 1966 CT Scan images for three different class labels, namely, Non-COVID, Severe COVID, and Non-Severe COVID, out of which 714 CT images belong to the Non-COVID category, 713 CT images are for Non-Severe COVID category and 539 CT images are of Severe COVID category. Methods: All of the images are initially pre-processed using the Contrast Limited Histogram Equalization (CLAHE) approach. The pre-processed images are then fed into the VGG-16 network for extracting features. Finally, the retrieved characteristics are categorized and the accuracy is evaluated using a support vector machine (SVM) with 10-fold cross-validation (CV). Result and Conclusion: In our study, we have combined well-known strategies for pre-processing, feature extraction, and classification which brings us to a remarkable success rate of disease and its severity recognition with an accuracy of 96.05% (97.7% for Non-Severe COVID-19 images and 93% for Severe COVID-19 images). Our model can therefore help radiologists detect COVID-19 and the extent of its severity.
△ Less
Submitted 27 May, 2022;
originally announced May 2022.
-
Over-the-Air Design of GAN Training for mmWave MIMO Channel Estimation
Authors:
Akash Doshi,
Manan Gupta,
Jeffrey G. Andrews
Abstract:
Future wireless systems are trending towards higher carrier frequencies that offer larger communication bandwidth but necessitate the use of large antenna arrays. Existing signal processing techniques for channel estimation do not scale well to this "high-dimensional" regime in terms of performance and pilot overhead. Meanwhile, training deep learning based approaches for channel estimation requir…
▽ More
Future wireless systems are trending towards higher carrier frequencies that offer larger communication bandwidth but necessitate the use of large antenna arrays. Existing signal processing techniques for channel estimation do not scale well to this "high-dimensional" regime in terms of performance and pilot overhead. Meanwhile, training deep learning based approaches for channel estimation requires large labeled datasets mapping pilot measurements to clean channel realizations, which can only be generated offline using simulated channels. In this paper, we develop a novel unsupervised over-the-air (OTA) algorithm that utilizes noisy received pilot measurements to train a deep generative model to output beamspace MIMO channel realizations. Our approach leverages Generative Adversarial Networks (GAN), while using a conditional input to distinguish between Line-of-Sight (LOS) and Non-Line-of-Sight (NLOS) channel realizations. We also present a federated implementation of the OTA algorithm that distributes the GAN training over multiple users and greatly reduces the user side computation. We then formulate channel estimation from a limited number of pilot measurements as an inverse problem and reconstruct the channel by optimizing the input vector of the trained generative model. Our proposed approach significantly outperforms Orthogonal Matching Pursuit on both LOS and NLOS channel models, and EM-GM-AMP -- an Approximate Message Passing algorithm -- on LOS channel models, while achieving comparable performance on NLOS channel models in terms of the normalized channel reconstruction error. More importantly, our proposed framework has the potential to be trained online using real noisy pilot measurements, is not restricted to a specific channel model and can even be utilized for a federated OTA design of a dataset generator from noisy data.
△ Less
Submitted 24 May, 2022;
originally announced May 2022.
-
BlueSky: Activity Control: A Vision for "Active" Security Models for Smart Collaborative Systems
Authors:
Tanjila Mawla,
Maanak Gupta,
Ravi Sandhu
Abstract:
Cyber physical ecosystem connects different intelligent devices over heterogeneous networks. Various operations are performed on smart objects to ensure efficiency and to support automation in smart environments. An Activity (defined by Gupta and Sandhu) reflects the current state of an object, which changes in response to requested operations. Due to multiple running activities on different objec…
▽ More
Cyber physical ecosystem connects different intelligent devices over heterogeneous networks. Various operations are performed on smart objects to ensure efficiency and to support automation in smart environments. An Activity (defined by Gupta and Sandhu) reflects the current state of an object, which changes in response to requested operations. Due to multiple running activities on different objects, it is critical to secure collaborative systems considering run-time decisions impacted due to related activities (and other parameters) supporting active enforcement of access control decision. Recently, Gupta and Sandhu proposed Activity-Centric Access Control (ACAC) and discussed the notion of activity as a prime abstraction for access control in collaborative systems. The model provides an active security approach that considers activity decision factors such as authorizations, obligations, conditions, and dependencies among related device activities. This paper takes a step forward and presents the core components of an ACAC model and compares with other security models differentiating novel properties of ACAC. We highlight how existing models do not (or in limited scope) support `active' decision and enforcement of authorization in collaborative systems. We propose a hierarchical structure for a family of ACAC models by gradually adding the properties related to notion of activity and discuss states of an activity. We highlight the convergence of ACAC with Zero Trust tenets to reflect how ACAC supports necessary security posture of distributed and connected smart ecosystems. This paper aims to gain a better understanding of ACAC in collaborative systems supporting novel abstractions, properties and requirements.
△ Less
Submitted 18 May, 2022;
originally announced May 2022.
-
Cross-view Brain Decoding
Authors:
Subba Reddy Oota,
Jashn Arora,
Manish Gupta,
Raju S. Bapi
Abstract:
How the brain captures the meaning of linguistic stimuli across multiple views is still a critical open question in neuroscience. Consider three different views of the concept apartment: (1) picture (WP) presented with the target word label, (2) sentence (S) using the target word, and (3) word cloud (WC) containing the target word along with other semantically related words. Unlike previous effort…
▽ More
How the brain captures the meaning of linguistic stimuli across multiple views is still a critical open question in neuroscience. Consider three different views of the concept apartment: (1) picture (WP) presented with the target word label, (2) sentence (S) using the target word, and (3) word cloud (WC) containing the target word along with other semantically related words. Unlike previous efforts, which focus only on single view analysis, in this paper, we study the effectiveness of brain decoding in a zero-shot cross-view learning setup. Further, we propose brain decoding in the novel context of cross-view-translation tasks like image captioning (IC), image tagging (IT), keyword extraction (KE), and sentence formation (SF). Using extensive experiments, we demonstrate that cross-view zero-shot brain decoding is practical leading to ~0.68 average pairwise accuracy across view pairs. Also, the decoded representations are sufficiently detailed to enable high accuracy for cross-view-translation tasks with following pairwise accuracy: IC (78.0), IT (83.0), KE (83.7) and SF (74.5). Analysis of the contribution of different brain networks reveals exciting cognitive insights: (1) A high percentage of visual voxels are involved in image captioning and image tagging tasks, and a high percentage of language voxels are involved in the sentence formation and keyword extraction tasks. (2) Zero-shot accuracy of the model trained on S view and tested on WC view is better than same-view accuracy of the model trained and tested on WC view.
△ Less
Submitted 18 April, 2022;
originally announced April 2022.
-
Similarity Learning based Few Shot Learning for ECG Time Series Classification
Authors:
Priyanka Gupta,
Sathvik Bhaskarpandit,
Manik Gupta
Abstract:
Using deep learning models to classify time series data generated from the Internet of Things (IoT) devices requires a large amount of labeled data. However, due to constrained resources available in IoT devices, it is often difficult to accommodate training using large data sets. This paper proposes and demonstrates a Similarity Learning-based Few Shot Learning for ECG arrhythmia classification u…
▽ More
Using deep learning models to classify time series data generated from the Internet of Things (IoT) devices requires a large amount of labeled data. However, due to constrained resources available in IoT devices, it is often difficult to accommodate training using large data sets. This paper proposes and demonstrates a Similarity Learning-based Few Shot Learning for ECG arrhythmia classification using Siamese Convolutional Neural Networks. Few shot learning resolves the data scarcity issue by identifying novel classes from very few labeled examples. Few Shot Learning relies first on pretraining the model on a related relatively large database, and then the learning is used for further adaptation towards few examples available per class. Our experiments evaluate the performance accuracy with respect to K (number of instances per class) for ECG time series data classification. The accuracy with 5- shot learning is 92.25% which marginally improves with further increase in K. We also compare the performance of our method against other well-established similarity learning techniques such as Dynamic Time Warping (DTW), Euclidean Distance (ED), and a deep learning model - Long Short Term Memory Fully Convolutional Network (LSTM-FCN) with the same amount of data and conclude that our method outperforms them for a limited dataset size. For K=5, the accuracies obtained are 57%, 54%, 33%, and 92% approximately for ED, DTW, LSTM-FCN, and SCNN, respectively.
△ Less
Submitted 31 January, 2022;
originally announced February 2022.
-
Detecting Anomalies using Overlapping Electrical Measurements in Smart Power Grids
Authors:
Sina Sontowski,
Nigel Lawrence,
Deepjyoti Deka,
Maanak Gupta
Abstract:
As cyber-attacks against critical infrastructure become more frequent, it is increasingly important to be able to rapidly identify and respond to these threats. This work investigates two independent systems with overlapping electrical measurements with the goal to more rapidly identify anomalies. The independent systems include HIST, a SCADA historian, and ION, an automatic meter reading system (…
▽ More
As cyber-attacks against critical infrastructure become more frequent, it is increasingly important to be able to rapidly identify and respond to these threats. This work investigates two independent systems with overlapping electrical measurements with the goal to more rapidly identify anomalies. The independent systems include HIST, a SCADA historian, and ION, an automatic meter reading system (AMR). While prior research has explored the benefits of fusing measurements, the possibility of overlapping measurements from an existing electrical system has not been investigated. To that end, we explore the potential benefits of combining overlapping measurements both to improve the speed/accuracy of anomaly detection and to provide additional validation of the collected measurements. In this paper, we show that merging overlapping measurements provide a more holistic picture of the observed systems. By applying Dynamic Time Warping more anomalies were found -- specifically, an average of 349 times more anomalies, when considering anomalies from both overlapping measurements. When merging the overlapping measurements, a percent change of anomalies of up to 785\% can be achieved compared to a non-merge of the data as reflected by experimental results.
△ Less
Submitted 6 January, 2022;
originally announced January 2022.
-
System-Level Analysis of Full-Duplex Self-Backhauled Millimeter Wave Networks
Authors:
Manan Gupta,
Ian P. Roberts,
Jeffrey G. Andrews
Abstract:
Integrated access and backhaul (IAB) facilitates cost-effective deployment of millimeter wave(mmWave) cellular networks through multihop self-backhauling. Full-duplex (FD) technology, particularly for mmWave systems, is a potential means to overcome latency and throughput challenges faced by IAB networks. We derive practical and tractable throughput and latency constraints using queueing theory an…
▽ More
Integrated access and backhaul (IAB) facilitates cost-effective deployment of millimeter wave(mmWave) cellular networks through multihop self-backhauling. Full-duplex (FD) technology, particularly for mmWave systems, is a potential means to overcome latency and throughput challenges faced by IAB networks. We derive practical and tractable throughput and latency constraints using queueing theory and formulate a network utility maximization problem to evaluate both FD-IAB and half-duplex(HD)-IAB networks. We use this to characterize the network-level improvements seen when upgrading from conventional HD IAB nodes to FD ones by deriving closed-form expressions for (i) latency gain of FD-IAB over HD-IAB and (ii) the maximum number of hops that a HD- and FD-IAB network can support while satisfying latency and throughput targets. Extensive simulations illustrate that FD-IAB can facilitate reduced latency, higher throughput, deeper networks, and fairer service. Compared to HD-IAB,FD-IAB can improve throughput by 8x and reduce latency by 4x for a fourth-hop user. In fact, upgrading IAB nodes with FD capability can allow the network to support latency and throughput targets that its HD counterpart fundamentally cannot meet. The gains are more profound for users further from the donor and can be achieved even when residual self-interference is significantly above the noise floor.
△ Less
Submitted 9 December, 2021;
originally announced December 2021.
-
Photon-Starved Scene Inference using Single Photon Cameras
Authors:
Bhavya Goyal,
Mohit Gupta
Abstract:
Scene understanding under low-light conditions is a challenging problem. This is due to the small number of photons captured by the camera and the resulting low signal-to-noise ratio (SNR). Single-photon cameras (SPCs) are an emerging sensing modality that are capable of capturing images with high sensitivity. Despite having minimal read-noise, images captured by SPCs in photon-starved conditions…
▽ More
Scene understanding under low-light conditions is a challenging problem. This is due to the small number of photons captured by the camera and the resulting low signal-to-noise ratio (SNR). Single-photon cameras (SPCs) are an emerging sensing modality that are capable of capturing images with high sensitivity. Despite having minimal read-noise, images captured by SPCs in photon-starved conditions still suffer from strong shot noise, preventing reliable scene inference. We propose photon scale-space a collection of high-SNR images spanning a wide range of photons-per-pixel (PPP) levels (but same scene content) as guides to train inference model on low photon flux images. We develop training techniques that push images with different illumination levels closer to each other in feature representation space. The key idea is that having a spectrum of different brightness levels during training enables effective guidance, and increases robustness to shot noise even in extreme noise cases. Based on the proposed approach, we demonstrate, via simulations and real experiments with a SPAD camera, high-performance on various inference tasks such as image classification and monocular depth estimation under ultra low-light, down to < 1 PPP.
△ Less
Submitted 16 August, 2021; v1 submitted 22 July, 2021;
originally announced July 2021.
-
Reconfigurable Architecture for Spatial Sensing in Wideband Radio Front-End
Authors:
M. Gupta,
S. Sharma,
H. Joshi,
S. J. Darak
Abstract:
The deployment of cellular spectrum in licensed, shared and unlicensed spectrum demands wideband sensing over non-contiguous sub-6 GHz spectrum. To improve the spectrum and energy efficiency, beamforming and massive multi-antenna systems are being explored which demand spatial sensing i.e. blind identification of vacant frequency bands and direction-of-arrival (DoA) of the occupied bands. We propo…
▽ More
The deployment of cellular spectrum in licensed, shared and unlicensed spectrum demands wideband sensing over non-contiguous sub-6 GHz spectrum. To improve the spectrum and energy efficiency, beamforming and massive multi-antenna systems are being explored which demand spatial sensing i.e. blind identification of vacant frequency bands and direction-of-arrival (DoA) of the occupied bands. We propose a reconfigurable architecture to perform spatial sensing of multi-band spectrum digitized via wideband radio front-end comprising of the sparse antenna array (SAA) and Sub-Nyquist Sampling (SNS). The proposed architecture comprises SAA pre-processing and algorithms to perform spatial sensing directly on SNS samples. The proposed architecture is realized on Zynq System on Chip (SoC), consisting of the ARM processor and FPGA, via hardware-software co-design (HSCD). Using the dynamic partial reconfiguration (DPR), on-the-fly switching between algorithms depending on the number of active signals in the sensed spectrum is enabled. The functionality, resource utilization, and execution time of the proposed architecture are analyzed for various HSCD configurations, word-length, number of digitized samples, signal-to-noise ratio (SNR), and antenna array (sparse/non-sparse).
△ Less
Submitted 26 May, 2021;
originally announced May 2021.
-
Music Generation using Three-layered LSTM
Authors:
Vaishali Ingale,
Anush Mohan,
Divit Adlakha,
Krishan Kumar,
Mohit Gupta
Abstract:
This paper explores the idea of utilising Long Short-Term Memory neural networks (LSTMNN) for the generation of musical sequences in ABC notation. The proposed approach takes ABC notations from the Nottingham dataset and encodes it to be fed as input for the neural networks. The primary objective is to input the neural networks with an arbitrary note, let the network process and augment a sequence…
▽ More
This paper explores the idea of utilising Long Short-Term Memory neural networks (LSTMNN) for the generation of musical sequences in ABC notation. The proposed approach takes ABC notations from the Nottingham dataset and encodes it to be fed as input for the neural networks. The primary objective is to input the neural networks with an arbitrary note, let the network process and augment a sequence based on the note until a good piece of music is produced. Multiple calibrations have been done to amend the parameters of the network for optimal generation. The output is assessed on the basis of rhythm, harmony, and grammar accuracy.
△ Less
Submitted 9 June, 2021; v1 submitted 19 May, 2021;
originally announced May 2021.
-
Passive Inter-Photon Imaging
Authors:
Atul Ingle,
Trevor Seets,
Mauro Buttafava,
Shantanu Gupta,
Alberto Tosi,
Mohit Gupta,
Andreas Velten
Abstract:
Digital camera pixels measure image intensities by converting incident light energy into an analog electrical current, and then digitizing it into a fixed-width binary representation. This direct measurement method, while conceptually simple, suffers from limited dynamic range and poor performance under extreme illumination -- electronic noise dominates under low illumination, and pixel full-well…
▽ More
Digital camera pixels measure image intensities by converting incident light energy into an analog electrical current, and then digitizing it into a fixed-width binary representation. This direct measurement method, while conceptually simple, suffers from limited dynamic range and poor performance under extreme illumination -- electronic noise dominates under low illumination, and pixel full-well capacity results in saturation under bright illumination. We propose a novel intensity cue based on measuring inter-photon timing, defined as the time delay between detection of successive photons. Based on the statistics of inter-photon times measured by a time-resolved single-photon sensor, we develop theory and algorithms for a scene brightness estimator which works over extreme dynamic range; we experimentally demonstrate imaging scenes with a dynamic range of over ten million to one. The proposed techniques, aided by the emergence of single-photon sensors such as single-photon avalanche diodes (SPADs) with picosecond timing resolution, will have implications for a wide range of imaging applications: robotics, consumer photography, astronomy, microscopy and biomedical imaging.
△ Less
Submitted 10 April, 2021; v1 submitted 31 March, 2021;
originally announced April 2021.
-
iToF2dToF: A Robust and Flexible Representation for Data-Driven Time-of-Flight Imaging
Authors:
Felipe Gutierrez-Barragan,
Huaijin Chen,
Mohit Gupta,
Andreas Velten,
Jinwei Gu
Abstract:
Indirect Time-of-Flight (iToF) cameras are a promising depth sensing technology. However, they are prone to errors caused by multi-path interference (MPI) and low signal-to-noise ratio (SNR). Traditional methods, after denoising, mitigate MPI by estimating a transient image that encodes depths. Recently, data-driven methods that jointly denoise and mitigate MPI have become state-of-the-art without…
▽ More
Indirect Time-of-Flight (iToF) cameras are a promising depth sensing technology. However, they are prone to errors caused by multi-path interference (MPI) and low signal-to-noise ratio (SNR). Traditional methods, after denoising, mitigate MPI by estimating a transient image that encodes depths. Recently, data-driven methods that jointly denoise and mitigate MPI have become state-of-the-art without using the intermediate transient representation. In this paper, we propose to revisit the transient representation. Using data-driven priors, we interpolate/extrapolate iToF frequencies and use them to estimate the transient image. Given direct ToF (dToF) sensors capture transient images, we name our method iToF2dToF. The transient representation is flexible. It can be integrated with different rule-based depth sensing algorithms that are robust to low SNR and can deal with ambiguous scenarios that arise in practice (e.g., specular MPI, optical cross-talk). We demonstrate the benefits of iToF2dToF over previous methods in real depth sensing scenarios.
△ Less
Submitted 21 December, 2021; v1 submitted 11 March, 2021;
originally announced March 2021.
-
A Comparative Study Between a Classical and Optimal Controller for a Quadrotor
Authors:
Prathamesh Saraf,
Manan Gupta,
Alivelu Manga Parimi
Abstract:
This paper presents a simulation-based comparison between the two controllers, Proportional Integral Derivative (PID), a classical controller and Linear Quadratic Regulator (LQR), an optimal controller, for a linearized quadrotor model. To simplify an otherwise complicated dynamic model of a quadrotor, we derive a linear mathematical model using Newtonian and Euler's laws and applying basic princi…
▽ More
This paper presents a simulation-based comparison between the two controllers, Proportional Integral Derivative (PID), a classical controller and Linear Quadratic Regulator (LQR), an optimal controller, for a linearized quadrotor model. To simplify an otherwise complicated dynamic model of a quadrotor, we derive a linear mathematical model using Newtonian and Euler's laws and applying basic principles of physics. This derivation gives the equations that govern the motion of a quadrotor, both concerning the body frame and the inertial frame. A state-space model is developed, which is then used to simulate the control algorithms for the quadrotor. Apart from the classic PID control algorithm, LQR is an optimal control regulator, and it is more robust for a quadrotor. Both the controllers are simulated in Simulink under the same initial conditions and show a satisfactory response.
△ Less
Submitted 28 September, 2020;
originally announced September 2020.
-
Quanta Burst Photography
Authors:
Sizhuo Ma,
Shantanu Gupta,
Arin C. Ulku,
Claudio Bruschini,
Edoardo Charbon,
Mohit Gupta
Abstract:
Single-photon avalanche diodes (SPADs) are an emerging sensor technology capable of detecting individual incident photons, and capturing their time-of-arrival with high timing precision. While these sensors were limited to single-pixel or low-resolution devices in the past, recently, large (up to 1 MPixel) SPAD arrays have been developed. These single-photon cameras (SPCs) are capable of capturing…
▽ More
Single-photon avalanche diodes (SPADs) are an emerging sensor technology capable of detecting individual incident photons, and capturing their time-of-arrival with high timing precision. While these sensors were limited to single-pixel or low-resolution devices in the past, recently, large (up to 1 MPixel) SPAD arrays have been developed. These single-photon cameras (SPCs) are capable of capturing high-speed sequences of binary single-photon images with no read noise. We present quanta burst photography, a computational photography technique that leverages SPCs as passive imaging devices for photography in challenging conditions, including ultra low-light and fast motion. Inspired by recent success of conventional burst photography, we design algorithms that align and merge binary sequences captured by SPCs into intensity images with minimal motion blur and artifacts, high signal-to-noise ratio (SNR), and high dynamic range. We theoretically analyze the SNR and dynamic range of quanta burst photography, and identify the imaging regimes where it provides significant benefits. We demonstrate, via a recently developed SPAD array, that the proposed method is able to generate high-quality images for scenes with challenging lighting, complex geometries, high dynamic range and moving objects. With the ongoing development of SPAD arrays, we envision quanta burst photography finding applications in both consumer and scientific photography.
△ Less
Submitted 21 June, 2020;
originally announced June 2020.
-
Recurrent Transform Learning
Authors:
Megha Gupta,
Angshul Majumdar
Abstract:
The objective of this work is to improve the accuracy of building demand forecasting. This is a more challenging task than grid level forecasting. For the said purpose, we develop a new technique called recurrent transform learning (RTL). Two versions are proposed. The first one (RTL) is unsupervised; this is used as a feature extraction tool that is further fed into a regression model. The second…
▽ More
The objective of this work is to improve the accuracy of building demand forecasting. This is a more challenging task than grid level forecasting. For the said purpose, we develop a new technique called recurrent transform learning (RTL). Two versions are proposed. The first one (RTL) is unsupervised; this is used as a feature extraction tool that is further fed into a regression model. The second formulation embeds regression into the RTL framework leading to regressing recurrent transform learning (R2TL). Forecasting experiments have been carried out on three popular publicly available datasets. Both of our proposed techniques yield results superior to the state-of-the-art like long short term memory network, echo state network and sparse coding regression.
△ Less
Submitted 11 December, 2019;
originally announced December 2019.
-
Asynchronous Single-Photon 3D Imaging
Authors:
Anant Gupta,
Atul Ingle,
Mohit Gupta
Abstract:
Single-photon avalanche diodes (SPADs) are becoming popular in time-of-flight depth-ranging due to their unique ability to capture individual photons with picosecond timing resolution. However, ambient light (e.g., sunlight) incident on a SPAD-based 3D camera leads to severe non-linear distortions (pileup) in the measured waveform, resulting in large depth errors. We propose asynchronous single-ph…
▽ More
Single-photon avalanche diodes (SPADs) are becoming popular in time-of-flight depth-ranging due to their unique ability to capture individual photons with picosecond timing resolution. However, ambient light (e.g., sunlight) incident on a SPAD-based 3D camera leads to severe non-linear distortions (pileup) in the measured waveform, resulting in large depth errors. We propose asynchronous single-photon 3D imaging, a family of acquisition schemes to mitigate pileup during data acquisition itself. Asynchronous acquisition temporally misaligns SPAD measurement windows and the laser cycles through deterministically predefined or randomized offsets. Our key insight is that pileup distortions can be "averaged out" by choosing a sequence of offsets that span the entire depth range. We develop a generalized image formation model and perform theoretical analysis to explore the space of asynchronous acquisition schemes and design high-performance schemes. Our simulations and experiments demonstrate an improvement in depth accuracy of up to an order of magnitude as compared to the state-of-the-art, across a wide range of imaging scenarios, including those with high ambient flux.
△ Less
Submitted 18 August, 2019;
originally announced August 2019.
-
Differential Scene Flow from Light Field Gradients
Authors:
Sizhuo Ma,
Brandon M. Smith,
Mohit Gupta
Abstract:
This paper presents novel techniques for recovering 3D dense scene flow, based on differential analysis of 4D light fields. The key enabling result is a per-ray linear equation, called the ray flow equation, that relates 3D scene flow to 4D light field gradients. The ray flow equation is invariant to 3D scene structure and applicable to a general class of scenes, but is under-constrained (3 unknow…
▽ More
This paper presents novel techniques for recovering 3D dense scene flow, based on differential analysis of 4D light fields. The key enabling result is a per-ray linear equation, called the ray flow equation, that relates 3D scene flow to 4D light field gradients. The ray flow equation is invariant to 3D scene structure and applicable to a general class of scenes, but is under-constrained (3 unknowns per equation). Thus, additional constraints must be imposed to recover motion. We develop two families of scene flow algorithms by leveraging the structural similarity between ray flow and optical flow equations: local 'Lucas-Kanade' ray flow and global 'Horn-Schunck' ray flow, inspired by corresponding optical flow methods. We also develop a combined local-global method by utilizing the correspondence structure in the light fields. We demonstrate high precision 3D scene flow recovery for a wide range of scenarios, including rotation and non-rigid motion. We analyze the theoretical and practical performance limits of the proposed techniques via the light field structure tensor, a 3x3 matrix that encodes the local structure of light fields. We envision that the proposed analysis and algorithms will lead to design of future light-field cameras that are optimized for motion sensing, in addition to depth sensing.
△ Less
Submitted 29 July, 2019; v1 submitted 26 July, 2019;
originally announced July 2019.
-
Photon-Flooded Single-Photon 3D Cameras
Authors:
Anant Gupta,
Atul Ingle,
Andreas Velten,
Mohit Gupta
Abstract:
Single photon avalanche diodes (SPADs) are starting to play a pivotal role in the development of photon-efficient, long-range LiDAR systems. However, due to non-linearities in their image formation model, a high photon flux (e.g., due to strong sunlight) leads to distortion of the incident temporal waveform, and potentially, large depth errors. Operating SPADs in low flux regimes can mitigate thes…
▽ More
Single photon avalanche diodes (SPADs) are starting to play a pivotal role in the development of photon-efficient, long-range LiDAR systems. However, due to non-linearities in their image formation model, a high photon flux (e.g., due to strong sunlight) leads to distortion of the incident temporal waveform, and potentially, large depth errors. Operating SPADs in low flux regimes can mitigate these distortions, but, often requires attenuating the signal and thus, results in low signal-to-noise ratio. In this paper, we address the following basic question: what is the optimal photon flux that a SPAD-based LiDAR should be operated in? We derive a closed form expression for the optimal flux, which is quasi-depth-invariant, and depends on the ambient light strength. The optimal flux is lower than what a SPAD typically measures in real world scenarios, but surprisingly, considerably higher than what is conventionally suggested for avoiding distortions. We propose a simple, adaptive approach for achieving the optimal flux by attenuating incident flux based on an estimate of ambient light strength. Using extensive simulations and a hardware prototype, we show that the optimal flux criterion holds for several depth estimators, under a wide range of illumination conditions.
△ Less
Submitted 29 April, 2019; v1 submitted 20 March, 2019;
originally announced March 2019.
-
High Flux Passive Imaging with Single-Photon Sensors
Authors:
Atul Ingle,
Andreas Velten,
Mohit Gupta
Abstract:
Single-photon avalanche diodes (SPADs) are an emerging technology with a unique capability of capturing individual photons with high timing precision. SPADs are being used in several active imaging systems (e.g., fluorescence lifetime microscopy and LiDAR), albeit mostly limited to low photon flux settings. We propose passive free-running SPAD (PF-SPAD) imaging, an imaging modality that uses SPADs…
▽ More
Single-photon avalanche diodes (SPADs) are an emerging technology with a unique capability of capturing individual photons with high timing precision. SPADs are being used in several active imaging systems (e.g., fluorescence lifetime microscopy and LiDAR), albeit mostly limited to low photon flux settings. We propose passive free-running SPAD (PF-SPAD) imaging, an imaging modality that uses SPADs for capturing 2D intensity images with unprecedented dynamic range under ambient lighting, without any active light source. Our key observation is that the precise inter-photon timing measured by a SPAD can be used for estimating scene brightness under ambient lighting conditions, even for very bright scenes. We develop a theoretical model for PF-SPAD imaging, and derive a scene brightness estimator based on the average time of darkness between successive photons detected by a PF-SPAD pixel. Our key insight is that due to the stochastic nature of photon arrivals, this estimator does not suffer from a hard saturation limit. Coupled with high sensitivity at low flux, this enables a PF-SPAD pixel to measure a wide range of scene brightness, from very low to very high, thereby achieving extreme dynamic range. We demonstrate an improvement of over 2 orders of magnitude over conventional sensors by imaging scenes spanning a dynamic range of 1,000,000:1.
△ Less
Submitted 23 April, 2019; v1 submitted 26 February, 2019;
originally announced February 2019.
-
Cascade Markov Decision Processes: Theory and Applications
Authors:
Manish Gupta
Abstract:
This paper considers the optimal control of time varying continuous time Markov chains whose transition rates are themselves Markov processes. In one set of problems the solution of an ordinary differential equation is shown to determine the optimal performance and feedback controls, while some other cases are shown to lead to singular optimal control problems which are more difficult to solve. So…
▽ More
This paper considers the optimal control of time varying continuous time Markov chains whose transition rates are themselves Markov processes. In one set of problems the solution of an ordinary differential equation is shown to determine the optimal performance and feedback controls, while some other cases are shown to lead to singular optimal control problems which are more difficult to solve. Solution techniques are demonstrated using examples from finance to behavioral decision making.
△ Less
Submitted 1 September, 2015;
originally announced September 2015.