Skip to main content

Showing 1–26 of 26 results for author: Agrawal, P

Searching in archive eess. Search in all archives.
.
  1. arXiv:2503.07522  [pdf, other

    eess.AS cs.CL

    Building English ASR model with regional language support

    Authors: Purvi Agrawal, Vikas Joshi, Bharati Patidar, Ankur Gupta, Rupesh Kumar Mehta

    Abstract: In this paper, we present a novel approach to developing an English Automatic Speech Recognition (ASR) system that can effectively handle Hindi queries, without compromising its performance on English. We propose a novel acoustic model (AM), referred to as SplitHead with Attention (SHA) model, features shared hidden layers across languages and language-specific projection layers combined via a sel… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: 5 pages, 3 figures

  2. arXiv:2502.12355  [pdf, other

    cs.RO cs.LG eess.SY

    Hovering Flight of Soft-Actuated Insect-Scale Micro Aerial Vehicles using Deep Reinforcement Learning

    Authors: Yi-Hsuan Hsiao, Wei-Tung Chen, Yun-Sheng Chang, Pulkit Agrawal, YuFeng Chen

    Abstract: Soft-actuated insect-scale micro aerial vehicles (IMAVs) pose unique challenges for designing robust and computationally efficient controllers. At the millimeter scale, fast robot dynamics ($\sim$ms), together with system delay, model uncertainty, and external disturbances significantly affect flight performances. Here, we design a deep reinforcement learning (RL) controller that addresses system… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: 7 pages, 7 figures, accepted to 2025 IEEE International Conference on Soft Robotics (RoboSoft)

  3. arXiv:2501.07197  [pdf

    eess.IV cs.CV cs.LG

    Lung Cancer detection using Deep Learning

    Authors: Aryan Chaudhari, Ankush Singh, Sanchi Gajbhiye, Pratham Agrawal

    Abstract: In this paper we discuss lung cancer detection using hybrid model of Convolutional-Neural-Networks (CNNs) and Support-Vector-Machines-(SVMs) in order to gain early detection of tumors, benign or malignant. The work uses this hybrid model by training upon the Computed Tomography scans (CT scans) as dataset. Using deep learning for detecting lung cancer early is a cutting-edge method.

    Submitted 13 January, 2025; originally announced January 2025.

  4. arXiv:2409.16342  [pdf

    eess.SY cs.LG

    Transformer based time series prediction of the maximum power point for solar photovoltaic cells

    Authors: Palaash Agrawal, Hari Om Bansal, Aditya R. Gautam, Om Prakash Mahela, Baseem Khan

    Abstract: This paper proposes an improved deep learning based maximum power point tracking (MPPT) in solar photovoltaic cells considering various time series based environmental inputs. Generally, artificial neural network based MPPT algorithms use basic neural network architectures and inputs which do not represent the ambient conditions in a comprehensive manner. In this article, the ambient conditions of… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

    Comments: Published June 2022, in Energy Science and Engineering, Volume10, Issue9, Pages 3397-3410

    Journal ref: Energy Sci Eng. 2022; 10: 3397-3410

  5. arXiv:2407.07884  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    Vegetable Peeling: A Case Study in Constrained Dexterous Manipulation

    Authors: Tao Chen, Eric Cousineau, Naveen Kuppuswamy, Pulkit Agrawal

    Abstract: Recent studies have made significant progress in addressing dexterous manipulation problems, particularly in in-hand object reorientation. However, there are few existing works that explore the potential utilization of developed dexterous manipulation controllers for downstream tasks. In this study, we focus on constrained dexterous manipulation for food peeling. Food peeling presents various cons… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  6. arXiv:2405.01402  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    Learning Force Control for Legged Manipulation

    Authors: Tifanny Portela, Gabriel B. Margolis, Yandong Ji, Pulkit Agrawal

    Abstract: Controlling contact forces during interactions is critical for locomotion and manipulation tasks. While sim-to-real reinforcement learning (RL) has succeeded in many contact-rich problems, current RL methods achieve forceful interactions implicitly without explicitly regulating forces. We propose a method for training RL policies for direct force control without requiring access to force sensing.… ▽ More

    Submitted 20 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: This work has been accepted to ICRA24, as well as the Loco-manipulation workshop at ICRA24

  7. Streaming Bilingual End-to-End ASR model using Attention over Multiple Softmax

    Authors: Aditya Patil, Vikas Joshi, Purvi Agrawal, Rupesh Mehta

    Abstract: Even with several advancements in multilingual modeling, it is challenging to recognize multiple languages using a single neural model, without knowing the input language and most multilingual models assume the availability of the input language. In this work, we propose a novel bilingual end-to-end (E2E) modeling approach, where a single neural model can recognize both languages and also support… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: Published in IEEE's Spoken Language Technology (SLT) 2022, 8 pages (6 + 2 for references), 5 figures

    Journal ref: 2022 IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar, 2023, pp. 252-259

  8. arXiv:2401.10460  [pdf, other

    cs.SD cs.LG eess.AS

    Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis

    Authors: Prabhav Agrawal, Thilo Koehler, Zhiping Xiu, Prashant Serai, Qing He

    Abstract: Neural vocoders model the raw audio waveform and synthesize high-quality audio, but even the highly efficient ones, like MB-MelGAN and LPCNet, fail to run real-time on a low-end device like a smartglass. A pure digital signal processing (DSP) based vocoder can be implemented via lightweight fast Fourier transforms (FFT), and therefore, is a magnitude faster than any neural vocoder. A DSP vocoder o… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted for ICASSP 2024

  9. arXiv:2212.05909   

    cs.CV eess.IV

    NFResNet: Multi-scale and U-shaped Networks for Deblurring

    Authors: Tanish Mittal, Preyansh Agrawal, Esha Pahwa, Aarya Makwana

    Abstract: Multi-Scale and U-shaped Networks are widely used in various image restoration problems, including deblurring. Keeping in mind the wide range of applications, we present a comparison of these architectures and their effects on image deblurring. We also introduce a new block called as NFResblock. It consists of a Fast Fourier Transformation layer and a series of modified Non-Linear Activation Free… ▽ More

    Submitted 12 December, 2023; v1 submitted 12 December, 2022; originally announced December 2022.

    Comments: Due to limitations in GPU Compute, We weren't able to test the paper on the popularly used GoPro Dataset which is mostly used for testing image deblurring problems. Afterwards the submission on Arxiv, We observed that we missed comparison of our results with some State-of-the-art papers like ARVo & Gated Spatio-Temporal Attention-Guided Video Deblurring

  10. arXiv:2212.03238  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    Walk These Ways: Tuning Robot Control for Generalization with Multiplicity of Behavior

    Authors: Gabriel B Margolis, Pulkit Agrawal

    Abstract: Learned locomotion policies can rapidly adapt to diverse environments similar to those experienced during training but lack a mechanism for fast tuning when they fail in an out-of-distribution test environment. This necessitates a slow and iterative cycle of reward and environment redesign to achieve good performance on a new task. As an alternative, we propose learning a single policy that encode… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: Oral presentation at CoRL 2022. Website at https://gmargo11.github.io/walk-these-ways/

  11. arXiv:2211.11744  [pdf, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    Visual Dexterity: In-Hand Reorientation of Novel and Complex Object Shapes

    Authors: Tao Chen, Megha Tippur, Siyang Wu, Vikash Kumar, Edward Adelson, Pulkit Agrawal

    Abstract: In-hand object reorientation is necessary for performing many dexterous manipulation tasks, such as tool use in less structured environments that remain beyond the reach of current robots. Prior works built reorientation systems assuming one or many of the following: reorienting only specific objects with simple shapes, limited range of reorientation, slow or quasistatic manipulation, simulation-o… ▽ More

    Submitted 24 November, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: Published in Science Robotics: https://www.science.org/doi/10.1126/scirobotics.adc9244

    Journal ref: Science Robotics, 8(84): eadc9244, 2023

  12. arXiv:2210.16045  [pdf, other

    cs.SD cs.CL eess.AS

    Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders

    Authors: Jason Fong, Yun Wang, Prabhav Agrawal, Vimal Manohar, Jilong Wu, Thilo Köhler, Qing He

    Abstract: Text-based voice editing (TBVE) uses synthetic output from text-to-speech (TTS) systems to replace words in an original recording. Recent work has used neural models to produce edited speech that is similar to the original speech in terms of clarity, speaker identity, and prosody. However, one limitation of prior work is the usage of finetuning to optimise performance: this requires further model… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  13. arXiv:2111.03274  [pdf

    eess.IV cs.CV cs.LG q-bio.QM

    Pathological Analysis of Blood Cells Using Deep Learning Techniques

    Authors: Virender Ranga, Shivam Gupta, Priyansh Agrawal, Jyoti Meena

    Abstract: Pathology deals with the practice of discovering the reasons for disease by analyzing the body samples. The most used way in this field, is to use histology which is basically studying and viewing microscopic structures of cell and tissues. The slide viewing method is widely being used and converted into digital form to produce high resolution images. This enabled the area of deep learning and mac… ▽ More

    Submitted 5 November, 2021; originally announced November 2021.

    Comments: 6 Page, 3 Table and 6 Figures

    Journal ref: Recent Advances in Computer Science and Communications(Formerly Recent Patents on Computer Science),04 September,2020, Article ID e140921185564

  14. arXiv:2111.03270  [pdf

    cs.LG eess.SP q-bio.NC

    Automated Human Mind Reading Using EEG Signals for Seizure Detection

    Authors: Virender Ranga, Shivam Gupta, Jyoti Meena, Priyansh Agrawal

    Abstract: Epilepsy is one of the most occurring neurological disease globally emerged back in 4000 BC. It is affecting around 50 million people of all ages these days. The trait of this disease is recurrent seizures. In the past few decades, the treatments available for seizure control have improved a lot with the advancements in the field of medical science and technology. Electroencephalogram (EEG) is a w… ▽ More

    Submitted 5 November, 2021; originally announced November 2021.

    Comments: 11 Pages, 12 Figures, 5 Tables

    Journal ref: Journal of Medical Engineering & Technology,2020, 44:5, 237-246

  15. EpilNet: A Novel Approach to IoT based Epileptic Seizure Prediction and Diagnosis System using Artificial Intelligence

    Authors: Shivam Gupta, Virender Ranga, Priyansh Agrawal

    Abstract: Epilepsy is one of the most occurring neurological diseases. The main characteristic of this disease is a frequent seizure, which is an electrical imbalance in the brain. It is generally accompanied by shaking of body parts and even leads (fainting). In the past few years, many treatments have come up. These mainly involve the use of anti-seizure drugs for controlling seizures. But in 70% of cases… ▽ More

    Submitted 5 November, 2021; originally announced November 2021.

    Comments: 12 Pages, 12 Figures, 2 Tables

    Journal ref: ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, Issue, Vol. 10 N. 4 (2021), 429-446

  16. arXiv:2111.00899  [pdf, other

    cs.CV cs.LG eess.IV physics.app-ph

    Equivariant Contrastive Learning

    Authors: Rumen Dangovski, Li Jing, Charlotte Loh, Seungwook Han, Akash Srivastava, Brian Cheung, Pulkit Agrawal, Marin Soljačić

    Abstract: In state-of-the-art self-supervised learning (SSL) pre-training produces semantically good representations by encouraging them to be invariant under meaningful transformations prescribed from human knowledge. In fact, the property of invariance is a trivial instance of a broader class called equivariance, which can be intuitively understood as the property that representations transform according… ▽ More

    Submitted 14 March, 2022; v1 submitted 28 October, 2021; originally announced November 2021.

    Comments: Camera Ready Revision. ICLR 2022. Discussion: https://openreview.net/forum?id=gKLAAfiytI Code: https://github.com/rdangovs/essl

  17. arXiv:2110.09748  [pdf, other

    eess.SY

    User Based Design and Evaluation Pipeline for Indoor Airships

    Authors: Zhaoliang Zheng, Jiahao Li, Parth Agrawal, Zhao Lei, Aaron John-Sabu, Ankur Mehta

    Abstract: Designing a controllable airship for non-expert users or preemptively evaluating the performance of desired airships has always been a very challenging problem. This paper explores the blimp design parameter space from the aspect of the user by considering various distributions of thrust, combinations of propulsive mechanisms, and balloon shapes. We provide open-source modular hardware and reconfi… ▽ More

    Submitted 23 November, 2021; v1 submitted 19 October, 2021; originally announced October 2021.

    Comments: Submitting to ICRA 2022

  18. arXiv:2108.01626  [pdf, other

    cs.RO eess.SY

    CPPNet: A Coverage Path Planning Network

    Authors: Zongyuan Shen, Palash Agrawal, James P. Wilson, Ryan Harvey, Shalabh Gupta

    Abstract: This paper presents a deep-learning based CPP algorithm, called Coverage Path Planning Network (CPPNet). CPPNet is built using a convolutional neural network (CNN) whose input is a graph-based representation of the occupancy grid map while its output is an edge probability heat graph, where the value of each edge is the probability of belonging to the optimal TSP tour. Finally, a greedy search is… ▽ More

    Submitted 3 August, 2021; originally announced August 2021.

  19. arXiv:2107.14793  [pdf, other

    eess.AS cs.SD eess.SP

    A Multi-Head Relevance Weighting Framework For Learning Raw Waveform Audio Representations

    Authors: Debottam Dutta, Purvi Agrawal, Sriram Ganapathy

    Abstract: In this work, we propose a multi-head relevance weighting framework to learn audio representations from raw waveforms. The audio waveform, split into windows of short duration, are processed with a 1-D convolutional layer of cosine modulated Gaussian filters acting as a learnable filterbank. The key novelty of the proposed framework is the introduction of multi-head relevance on the learnt filterb… ▽ More

    Submitted 30 July, 2021; originally announced July 2021.

    Comments: Submitted to 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics(WASPAA 2021)

  20. arXiv:2104.00631  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    Residual Model Learning for Microrobot Control

    Authors: Joshua Gruenstein, Tao Chen, Neel Doshi, Pulkit Agrawal

    Abstract: A majority of microrobots are constructed using compliant materials that are difficult to model analytically, limiting the utility of traditional model-based controllers. Challenges in data collection on microrobots and large errors between simulated models and real robots make current model-based learning and sim-to-real transfer methods difficult to apply. We propose a novel framework residual m… ▽ More

    Submitted 7 September, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

  21. arXiv:2102.07390  [pdf, other

    eess.AS

    Representation Learning For Speech Recognition Using Feedback Based Relevance Weighting

    Authors: Purvi Agrawal, Sriram Ganapathy

    Abstract: In this work, we propose an acoustic embedding based approach for representation learning in speech recognition. The proposed approach involves two stages comprising of acoustic filterbank learning from raw waveform, followed by modulation filterbank learning. In each stage, a relevance weighting operation is employed that acts as a feature selection module. In particular, the relevance weighting… ▽ More

    Submitted 15 February, 2021; originally announced February 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2011.00721, arXiv:2011.02136, arXiv:2001.07067

    Journal ref: IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2021

  22. Interpretable Representation Learning for Speech and Audio Signals Based on Relevance Weighting

    Authors: Purvi Agrawal, Sriram Ganapathy

    Abstract: The learning of interpretable representations from raw data presents significant challenges for time series data like speech. In this work, we propose a relevance weighting scheme that allows the interpretation of the speech representations during the forward propagation of the model itself. The relevance weighting is achieved using a sub-network approach that performs the task of feature selectio… ▽ More

    Submitted 29 October, 2020; originally announced November 2020.

    Comments: arXiv admin note: text overlap with arXiv:2011.00721

    Journal ref: IEEE Transactions and Audio, Speech and Language Processing, Vol. 28, pp. 2823 - 2836, 2020

  23. Robust Raw Waveform Speech Recognition Using Relevance Weighted Representations

    Authors: Purvi Agrawal, Sriram Ganapathy

    Abstract: Speech recognition in noisy and channel distorted scenarios is often challenging as the current acoustic modeling schemes are not adaptive to the changes in the signal distribution in the presence of noise. In this work, we develop a novel acoustic modeling framework for noise robust speech recognition based on relevance weighting mechanism. The relevance weighting is achieved using a sub-network… ▽ More

    Submitted 29 October, 2020; originally announced November 2020.

    Comments: arXiv admin note: text overlap with arXiv:2001.07067

    Journal ref: Proc. Interspeech 2020, 1649-1653 (2020)

  24. arXiv:2001.07067  [pdf, other

    eess.AS

    Interpretable Filter Learning Using Soft Self-attention For Raw Waveform Speech Recognition

    Authors: Purvi Agrawal, Sriram Ganapathy

    Abstract: Speech recognition from raw waveform involves learning the spectral decomposition of the signal in the first layer of the neural acoustic model using a convolution layer. In this work, we propose a raw waveform convolutional filter learning approach using soft self-attention. The acoustic filter bank in the proposed model is implemented using a parametric cosine-modulated Gaussian filter bank whos… ▽ More

    Submitted 20 January, 2020; originally announced January 2020.

  25. arXiv:1910.12579  [pdf, other

    cs.CR cs.CY cs.DC cs.MA eess.SP eess.SY

    Safe and Private Forward-Trading Platform for Transactive Microgrids

    Authors: Scott Eisele, Taha Eghtesad, Keegan Campanelli, Prakhar Agrawal, Aron Laszka, Abhishek Dubey

    Abstract: Transactive microgrids have emerged as a transformative solution for the problems faced by distribution system operators due to an increase in the use of distributed energy resources and rapid growth in renewable energy generation. Transactive microgrids are tightly coupled cyber and physical systems, which require resilient and robust financial markets where transactions can be submitted and clea… ▽ More

    Submitted 11 October, 2019; originally announced October 2019.

  26. arXiv:1910.08930  [pdf, other

    cs.CV cs.LG eess.IV

    Sketch2Code: Transformation of Sketches to UI in Real-time Using Deep Neural Network

    Authors: Vanita Jain, Piyush Agrawal, Subham Banga, Rishabh Kapoor, Shashwat Gulyani

    Abstract: User Interface (UI) prototyping is a necessary step in the early stages of application development. Transforming sketches of a Graphical User Interface (UI) into a coded UI application is an uninspired but time-consuming task performed by a UI designer. An automated system that can replace human efforts for straightforward implementation of UI designs will greatly speed up this procedure. The work… ▽ More

    Submitted 20 October, 2019; originally announced October 2019.