Skip to main content

Showing 1–50 of 126 results for author: Prashant

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.14973  [pdf, ps, other

    eess.AS cs.AI

    Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition

    Authors: Jiamin Xie, Ju Lin, Yiteng Huang, Tyler Vuong, Zhaojiang Lin, Zhaojun Yang, Peng Su, Prashant Rawat, Sangeeta Srivastava, Ming Sun, Florian Metze

    Abstract: Recent studies have demonstrated that prompting large language models (LLM) with audio encodings enables effective speech recognition capabilities. However, the ability of Speech LLMs to comprehend and process multi-channel audio with spatial cues remains a relatively uninvestigated area of research. In this work, we present directional-SpeechLlama, a novel approach that leverages the microphone a… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: Accepted to Interspeech 2025

  2. arXiv:2506.14294  [pdf, ps, other

    cs.RO cs.AI eess.SP

    Uncertainty-Driven Radar-Inertial Fusion for Instantaneous 3D Ego-Velocity Estimation

    Authors: Prashant Kumar Rai, Elham Kowsari, Nataliya Strokina, Reza Ghabcheloo

    Abstract: We present a method for estimating ego-velocity in autonomous navigation by integrating high-resolution imaging radar with an inertial measurement unit. The proposed approach addresses the limitations of traditional radar-based ego-motion estimation techniques by employing a neural network to process complex-valued raw radar data and estimate instantaneous linear ego-velocity along with its associ… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: This paper has been accepted for presentation at the 28th International Conference on Information Fusion (Fusion 2025)

  3. arXiv:2505.24115  [pdf

    cs.SD cs.HC eess.AS

    FeatureSense: Protecting Speaker Attributes in Always-On Audio Sensing System

    Authors: Bhawana Chhaglani, Sarmistha Sarna Gomasta, Yuvraj Agarwal, Jeremy Gummeson, Prashant Shenoy

    Abstract: Audio is a rich sensing modality that is useful for a variety of human activity recognition tasks. However, the ubiquitous nature of smartphones and smart speakers with always-on microphones has led to numerous privacy concerns and a lack of trust in deploying these audio-based sensing systems. This paper addresses this critical challenge of preserving user privacy when using audio for sensing app… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  4. arXiv:2505.00818  [pdf, other

    cs.LG eess.SY math.PR

    Dual Filter: A Mathematical Framework for Inference using Transformer-like Architectures

    Authors: Heng-Sheng Chang, Prashant G. Mehta

    Abstract: This paper presents a mathematical framework for causal nonlinear prediction in settings where observations are generated from an underlying hidden Markov model (HMM). Both the problem formulation and the proposed solution are motivated by the decoder-only transformer architecture, in which a finite sequence of observations (tokens) is mapped to the conditional probability of the next token. Our o… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: 49 pages, 6 figures

  5. arXiv:2504.10686  [pdf, other

    cs.CV eess.IV

    The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Hang Guo, Lei Sun, Zongwei Wu, Radu Timofte, Yawei Li, Yao Zhang, Xinning Chai, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Li Song, Hongyuan Yu, Pufan Xu, Cheng Wan, Zhijuan Huang, Peng Guo, Shuyuan Cui, Chenjun Li, Xuehai Hu, Pan Pan, Xin Zhang, Heng Zhang, Qing Luo, Linyan Jiang , et al. (122 additional authors not shown)

    Abstract: This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR). The challenge aimed to advance the development of deep models that optimize key computational metrics, i.e., runtime, parameters, and FLOPs, while achieving a PSNR of at least 26.90 dB on the $\operatorname{DIV2K\_LSDIR\_valid}$ dataset and 26.99 dB on the… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR2025 NTIRE Workshop, Efficient Super-Resolution Challenge Report. 50 pages

  6. arXiv:2504.02198  [pdf, other

    eess.SY math.NA math.OC

    Error Analysis of Sampling Algorithms for Approximating Stochastic Optimal Control

    Authors: Anant A. Joshi, Amirhossein Taghvaei, Prashant G. Mehta

    Abstract: This paper is concerned with the error analysis of two types of sampling algorithms, namely model predictive path integral (MPPI) and an interacting particle system (\IPS) algorithm, that have been proposed in the literature for numerical approximation of the stochastic optimal control. The analysis is presented through the lens of Gibbs variational principle. For an illustrative example of a sing… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  7. arXiv:2503.23912  [pdf, other

    eess.SY cs.LG math.OC

    Certified Approximate Reachability (CARe): Formal Error Bounds on Deep Learning of Reachable Sets

    Authors: Prashant Solanki, Nikolaus Vertovec, Yannik Schnitzer, Jasper Van Beers, Coen de Visser, Alessandro Abate

    Abstract: Recent approaches to leveraging deep learning for computing reachable sets of continuous-time dynamical systems have gained popularity over traditional level-set methods, as they overcome the curse of dimensionality. However, as with level-set methods, considerable care needs to be taken in limiting approximation errors, particularly since no guarantees are provided during training on the accuracy… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

  8. arXiv:2503.13928  [pdf, other

    eess.IV cs.CV

    Fibonacci-Net: A Lightweight CNN model for Automatic Brain Tumor Classification

    Authors: Santanu Roy, Ashvath Suresh, Archit Gupta, Shubhi Tiwari, Palak Sahu, Prashant Adhikari, Yuvraj S. Shekhawat

    Abstract: This research proposes a very lightweight model "Fibonacci-Net" along with a novel pooling technique, for automatic brain tumor classification from imbalanced Magnetic Resonance Imaging (MRI) datasets. Automatic brain tumor detection from MRI dataset has garnered significant attention in the research community, since the inception of Convolutional Neural Network (CNN) models. However, the performa… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  9. arXiv:2503.02904  [pdf, other

    eess.IV cs.CV cs.LG

    Surgical Vision World Model

    Authors: Saurabh Koju, Saurav Bastola, Prashant Shrestha, Sanskar Amgain, Yash Raj Shrestha, Rudra P. K. Poudel, Binod Bhattarai

    Abstract: Realistic and interactive surgical simulation has the potential to facilitate crucial applications, such as medical professional training and autonomous surgical agent training. In the natural visual domain, world models have enabled action-controlled data generation, demonstrating the potential to train autonomous agents in interactive simulated environments when large-scale real data acquisition… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  10. arXiv:2502.10072  [pdf, other

    eess.SY

    LifeSaver: Predictive Load Limit Estimation for Transport Vehicles in Hilly Areas

    Authors: Chanakya Rao, Vaibhav Chopra, Moksh Soni, Prashant Mishra

    Abstract: The transportation of essential goods in mountainous regions faces severe logistical challenges and frequent disruptions. To mitigate these difficulties, transport companies often overload trucks, which, though cost-saving, significantly heightens the risk of accidents and mechanical failures. This paper presents the development of a device that detects overloaded and insecurely fastened loads on… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

    Comments: Accepted at SCEECS 2025 at MANIT Bhopal

  11. arXiv:2412.18566  [pdf, other

    cs.CL eess.AS

    Zero-resource Speech Translation and Recognition with LLMs

    Authors: Karel Mundnich, Xing Niu, Prashant Mathur, Srikanth Ronanki, Brady Houston, Veera Raghavendra Elluru, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Anshu Bhatia, Daniel Garcia-Romero, Kyu J. Han, Katrin Kirchhoff

    Abstract: Despite recent advancements in speech processing, zero-resource speech translation (ST) and automatic speech recognition (ASR) remain challenging problems. In this work, we propose to leverage a multilingual Large Language Model (LLM) to perform ST and ASR in languages for which the model has never seen paired audio-text data. We achieve this by using a pre-trained multilingual speech encoder, a m… ▽ More

    Submitted 30 December, 2024; v1 submitted 24 December, 2024; originally announced December 2024.

    Comments: ICASSP 2025, 5 pages, 2 figures, 2 tables

  12. arXiv:2412.16530  [pdf, other

    cs.SD cs.CL cs.CV cs.MM eess.AS

    Improving Lip-synchrony in Direct Audio-Visual Speech-to-Speech Translation

    Authors: Lucas Goncalves, Prashant Mathur, Xing Niu, Brady Houston, Chandrashekhar Lavania, Srikanth Vishnubhotla, Lijia Sun, Anthony Ferritto

    Abstract: Audio-Visual Speech-to-Speech Translation typically prioritizes improving translation quality and naturalness. However, an equally critical aspect in audio-visual content is lip-synchrony-ensuring that the movements of the lips match the spoken content-essential for maintaining realism in dubbed videos. Despite its importance, the inclusion of lip-synchrony constraints in AVS2S models has been lar… ▽ More

    Submitted 21 December, 2024; originally announced December 2024.

    Comments: Accepted at ICASSP, 4 pages

  13. arXiv:2411.15576  [pdf, other

    eess.IV cs.CV

    MulModSeg: Enhancing Unpaired Multi-Modal Medical Image Segmentation with Modality-Conditioned Text Embedding and Alternating Training

    Authors: Chengyin Li, Hui Zhu, Rafi Ibn Sultan, Hassan Bagher Ebadian, Prashant Khanduri, Chetty Indrin, Kundan Thind, Dongxiao Zhu

    Abstract: In the diverse field of medical imaging, automatic segmentation has numerous applications and must handle a wide variety of input domains, such as different types of Computed Tomography (CT) scans and Magnetic Resonance (MR) images. This heterogeneity challenges automatic segmentation algorithms to maintain consistent performance across different modalities due to the requirement for spatially ali… ▽ More

    Submitted 23 November, 2024; originally announced November 2024.

    Comments: Accepted by WACV-2025

  14. arXiv:2411.09653  [pdf, other

    eess.SY math.OC

    How to implement the Bayes' formula in the age of ML?

    Authors: Amirhossein Taghvaei, Prashant G. Mehta

    Abstract: This chapter contains a self-contained introduction to the significance of Bayes' formula in the context of nonlinear filtering problems. Both discrete-time and continuous-time settings of the problem are considered in a unified manner. In control theory, the focus on optimization-based solution approaches is stressed together with a discussion of historical developments in this area (from 1960s o… ▽ More

    Submitted 14 November, 2024; originally announced November 2024.

  15. Memristors based Computation and Synthesis

    Authors: Prashant Gupta, Priscilla Jennifer

    Abstract: Memristor has been identified as the fourth fundamental circuit element by Dr. Leon Chua in 1971 and since then it has gathered a lot of interest because of its non-volatility and are considered as a viable solution to the beyond CMOS era computation. Recently, memristor have been used to perform basic logic operations like AND, OR, NAND, NOR, XOR etc. and are also used in applications like Dot Pr… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  16. arXiv:2408.10201  [pdf, other

    eess.SY

    LEAD: Towards Learning-Based Equity-Aware Decarbonization in Ridesharing Platforms

    Authors: Mahsa Sahebdel, Ali Zeynali, Noman Bashir, Prashant Shenoy, Mohammad Hajiesmaili

    Abstract: Ridesharing platforms such as Uber, Lyft, and DiDi have grown in popularity due to their on-demand availability, ease of use, and commute cost reductions, among other benefits. However, not all ridesharing promises have panned out. Recent studies demonstrate that the expected drop in traffic congestion and reduction in greenhouse gas (GHG) emissions have not materialized. This is primarily due to… ▽ More

    Submitted 12 April, 2025; v1 submitted 19 August, 2024; originally announced August 2024.

  17. A novel metric for detecting quadrotor loss-of-control

    Authors: Jasper van Beers, Prashant Solanki, Coen de Visser

    Abstract: Unmanned aerial vehicles (UAVs) are becoming an integral part of both industry and society. In particular, the quadrotor is now invaluable across a plethora of fields and recent developments, such as the inclusion of aerial manipulators, only extends their versatility. As UAVs become more widespread, preventing loss-of-control (LOC) is an ever growing concern. Unfortunately, LOC is not clearly def… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Presented at the International Conference on Robotics and Automation (ICRA) 2024 in Yokohama, Japan

    Journal ref: 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 2024, pp. 15570-15576

  18. arXiv:2406.11057  [pdf, other

    eess.SY

    Design of Interacting Particle Systems for Fast Linear Quadratic RL

    Authors: Anant A Joshi, Heng-Sheng Chang, Amirhossein Taghvaei, Prashant G Mehta, Sean P. Meyn

    Abstract: This paper is concerned with the design of algorithms based on systems of interacting particles to represent, approximate, and learn the optimal control law for reinforcement learning (RL). The primary contribution is that convergence rates are greatly accelerated by the interactions between particles. Theory focuses on the linear quadratic stochastic optimal control problem for which a complete a… ▽ More

    Submitted 1 December, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  19. arXiv:2406.09345  [pdf, other

    cs.CL cs.SD eess.AS

    DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding

    Authors: Suwon Shon, Kwangyoun Kim, Yi-Te Hsu, Prashant Sridhar, Shinji Watanabe, Karen Livescu

    Abstract: The integration of pre-trained text-based large language models (LLM) with speech input has enabled instruction-following capabilities for diverse speech tasks. This integration requires the use of a speech encoder, a speech adapter, and an LLM, trained on diverse tasks. We propose the use of discrete speech units (DSU), rather than continuous-valued speech encoder outputs, that are converted to t… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  20. arXiv:2405.08295  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechVerse: A Large-scale Generalizable Audio Language Model

    Authors: Nilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, Zhaocheng Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sravan Bodapati, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

    Abstract: Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore devel… ▽ More

    Submitted 24 March, 2025; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Single Column, 13 page

  21. arXiv:2404.18002  [pdf, other

    cs.SD eess.AS

    Towards Privacy-Preserving Audio Classification Systems

    Authors: Bhawana Chhaglani, Jeremy Gummeson, Prashant Shenoy

    Abstract: Audio signals can reveal intimate details about a person's life, including their conversations, health status, emotions, location, and personal preferences. Unauthorized access or misuse of this information can have profound personal and social implications. In an era increasingly populated by devices capable of audio recording, safeguarding user privacy is a critical obligation. This work studies… ▽ More

    Submitted 7 June, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

  22. arXiv:2404.07336  [pdf, other

    cs.CV cs.MM eess.AS

    PEAVS: Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers' Opinion Scores

    Authors: Lucas Goncalves, Prashant Mathur, Chandrashekhar Lavania, Metehan Cekic, Marcello Federico, Kyu J. Han

    Abstract: Recent advancements in audio-visual generative modeling have been propelled by progress in deep learning and the availability of data-rich benchmarks. However, the growth is not attributed solely to models and benchmarks. Universally accepted evaluation metrics also play an important role in advancing the field. While there are many metrics available to evaluate audio and visual content separately… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 24 pages

  23. arXiv:2404.06696  [pdf, other

    eess.SY

    Dual Ensemble Kalman Filter for Stochastic Optimal Control

    Authors: Anant A. Joshi, Amirhossein Taghvaei, Prashant G. Mehta, Sean P. Meyn

    Abstract: In this paper, stochastic optimal control problems in continuous time and space are considered. In recent years, such problems have received renewed attention from the lens of reinforcement learning (RL) which is also one of our motivation. The main contribution is a simulation-based algorithm -- dual ensemble Kalman filter (EnKF) -- to numerically approximate the solution of these problems. The p… ▽ More

    Submitted 26 October, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: Accepted to IEEE Conference on Decision and Control, 2024

  24. arXiv:2402.16734  [pdf, other

    eess.IV cs.CV cs.LG

    Investigating the Robustness of Vision Transformers against Label Noise in Medical Image Classification

    Authors: Bidur Khanal, Prashant Shrestha, Sanskar Amgain, Bishesh Khanal, Binod Bhattarai, Cristian A. Linte

    Abstract: Label noise in medical image classification datasets significantly hampers the training of supervised deep learning methods, undermining their generalizability. The test performance of a model tends to decrease as the label noise rate increases. Over recent years, several methods have been proposed to mitigate the impact of label noise in medical image classification and enhance the robustness of… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  25. arXiv:2402.01644  [pdf, other

    eess.SY

    A Holistic Approach for Equity-aware Carbon Reduction of Ridesharing Platforms

    Authors: Mahsa Sahebdel, Ali Zeynali, Noman Bashir, Prashant Shenoy, Mohammad Hajiesmaili

    Abstract: Ridesharing services have revolutionized personal mobility, offering convenient on-demand transportation anytime. While early proponents of ridesharing suggested that these services would reduce the overall carbon emissions of the transportation sector, recent studies reported a type of rebound effect showing substantial carbon emissions of ridesharing platforms, mainly due to their deadhead miles… ▽ More

    Submitted 16 February, 2024; v1 submitted 2 January, 2024; originally announced February 2024.

  26. arXiv:2402.01074  [pdf, other

    eess.SY cs.RO physics.bio-ph

    Neural Models and Algorithms for Sensorimotor Control of an Octopus Arm

    Authors: Tixian Wang, Udit Halder, Ekaterina Gribkova, Rhanor Gillette, Mattia Gazzola, Prashant G. Mehta

    Abstract: In this article, a biophysically realistic model of a soft octopus arm with internal musculature is presented. The modeling is motivated by experimental observations of sensorimotor control where an arm localizes and reaches a target. Major contributions of this article are: (i) development of models to capture the mechanical properties of arm musculature, the electrical properties of the arm peri… ▽ More

    Submitted 27 April, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  27. arXiv:2402.00671  [pdf, other

    eess.SY

    Uncertainty-Aware Guidance for Target Tracking subject to Intermittent Measurements using Motion Model Learning

    Authors: Andres Pulido, Kyle Volle, Kristy Waters, Zachary I. Bell, Prashant Ganesh, Jane Shin

    Abstract: This paper presents a novel guidance law for target tracking applications where the target motion model is unknown and sensor measurements are intermittent due to unknown environmental conditions and low measurement update rate. In this work, the target motion model is represented by a transformer neural network and trained by previous target position measurements. This transformer motion model se… ▽ More

    Submitted 20 March, 2025; v1 submitted 1 February, 2024; originally announced February 2024.

  28. arXiv:2401.10460  [pdf, other

    cs.SD cs.LG eess.AS

    Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis

    Authors: Prabhav Agrawal, Thilo Koehler, Zhiping Xiu, Prashant Serai, Qing He

    Abstract: Neural vocoders model the raw audio waveform and synthesize high-quality audio, but even the highly efficient ones, like MB-MelGAN and LPCNet, fail to run real-time on a low-end device like a smartglass. A pure digital signal processing (DSP) based vocoder can be implemented via lightweight fast Fourier transforms (FFT), and therefore, is a magnitude faster than any neural vocoder. A DSP vocoder o… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted for ICASSP 2024

  29. arXiv:2401.08835  [pdf, other

    cs.CL eess.AS

    Improving ASR Contextual Biasing with Guided Attention

    Authors: Jiyang Tang, Kwangyoun Kim, Suwon Shon, Felix Wu, Prashant Sridhar, Shinji Watanabe

    Abstract: In this paper, we propose a Guided Attention (GA) auxiliary training loss, which improves the effectiveness and robustness of automatic speech recognition (ASR) contextual biasing without introducing additional parameters. A common challenge in previous literature is that the word error rate (WER) reduction brought by contextual biasing diminishes as the number of bias phrases increases. To addres… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted at ICASSP 2024

  30. arXiv:2312.09895  [pdf, other

    cs.CL cs.SD eess.AS

    Generative Context-aware Fine-tuning of Self-supervised Speech Models

    Authors: Suwon Shon, Kwangyoun Kim, Prashant Sridhar, Yi-Te Hsu, Shinji Watanabe, Karen Livescu

    Abstract: When performing tasks like automatic speech recognition or spoken language understanding for a given utterance, access to preceding text or audio provides contextual information can improve performance. Considering the recent advances in generative large language models (LLM), we hypothesize that an LLM could generate useful context information using the preceding text. With appropriate prompts, L… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  31. arXiv:2311.00697  [pdf, other

    cs.CL eess.AS

    End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation

    Authors: Juan Zuluaga-Gomez, Zhaocheng Huang, Xing Niu, Rohit Paturi, Sundararajan Srinivasan, Prashant Mathur, Brian Thompson, Marcello Federico

    Abstract: Conventional speech-to-text translation (ST) systems are trained on single-speaker utterances, and they may not generalize to real-life scenarios where the audio contains conversations by multiple speakers. In this paper, we tackle single-channel multi-speaker conversational ST with an end-to-end and multi-task training model, named Speaker-Turn Aware Conversational Speech Translation, that combin… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: Accepted at EMNLP 2023. Code: https://github.com/amazon-science/stac-speech-translation

  32. arXiv:2310.09502  [pdf, ps, other

    eess.SY

    Deep Nonlinear Adaptive Control for Unmanned Aerial Systems Operating under Dynamic Uncertainties

    Authors: Zachary Lamb, Zachary I. Bell, Matthew Longmire, Jared Paquet, Prashant Ganesh, Ricardo Sanfelice

    Abstract: Recent literature in the field of machine learning (ML) control has shown promising theoretical results for a Deep Neural Network (DNN) based Nonlinear Adaptive Controller (DNAC) capable of achieving trajectory tracking for nonlinear systems. Expanding on this work, this paper applies DNAC to the Attitude Control System (ACS) of a quadrotor and shows improvement to attitude control performance und… ▽ More

    Submitted 14 October, 2023; originally announced October 2023.

  33. arXiv:2309.14477  [pdf, other

    cs.DC cs.ET cs.OS cs.PF eess.SY

    Carbon Containers: A System-level Facility for Managing Application-level Carbon Emissions

    Authors: John Thiede, Noman Bashir, David Irwin, Prashant Shenoy

    Abstract: To reduce their environmental impact, cloud datacenters' are increasingly focused on optimizing applications' carbon-efficiency, or work done per mass of carbon emitted. To facilitate such optimizations, we present Carbon Containers, a simple system-level facility, which extends prior work on power containers, that automatically regulates applications' carbon emissions in response to variations in… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: ACM Symposium on Cloud Computing (SoCC)

  34. arXiv:2308.09046  [pdf

    eess.SY

    Fault Detection and Classification using Wavelet and ANN in DFIG and TCSC Connected Transmission Line

    Authors: Satya Vikram Pratap Singh, Tanu Prasad, Siddharth Kamila, Prashant Agnihotri

    Abstract: This paper presents fault detection and classification using Wavelet and ANN based methods in a DFIG-based series compensated system. The state-of-the art methods include Wavelet transform, Fourier transform, and Wavelet-neuro fuzzy methods-based system for fault detection and classification. However, the accuracy of these state-of-the-art methods diminishes during variable conditions such as chan… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

  35. arXiv:2306.06502  [pdf, other

    cs.DC cs.CY eess.SY

    On the Limitations of Carbon-Aware Temporal and Spatial Workload Shifting in the Cloud

    Authors: Thanathorn Sukprasert, Abel Souza, Noman Bashir, David Irwin, Prashant Shenoy

    Abstract: Cloud platforms have been focusing on reducing their carbon emissions by shifting workloads across time and locations to when and where low-carbon energy is available. Despite the prominence of this idea, prior work has only quantified the potential of spatiotemporal workload shifting in narrow settings, i.e., for specific workloads in select regions. In particular, there has been limited work on… ▽ More

    Submitted 10 March, 2024; v1 submitted 10 June, 2023; originally announced June 2023.

    Comments: EuroSys'24: Nineteenth European Conference on Computer Systems, 2024

  36. arXiv:2306.01252  [pdf, other

    eess.IV physics.med-ph

    Deep Learning based Skin-layer Segmentation for Characterizing Cutaneous Wounds from Optical Coherence Tomography Images

    Authors: Prashant Kumar, Swatantra Dhara, Ayan Gope, Jyotirmoy Chatterjee, Subhamoy Mandal

    Abstract: Optical coherence tomography (OCT) is a medical imaging modality that allows us to probe deeper substructures of skin. The state-of-the-art wound care prediction and monitoring methods are based on visual evaluation and focus on surface information. However, research studies have shown that sub-surface information of the wound is critical for understanding the wound healing progression. This work… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted

    Journal ref: 45th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2023

  37. Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed Speech

    Authors: Shashi Kant Gupta, Sushant Hiray, Prashant Kukde

    Abstract: This work focuses on improving the Spoken Language Identification (LangId) system for a challenge that focuses on developing robust language identification systems that are reliable for non-standard, accented (Singaporean accent), spontaneous code-switched, and child-directed speech collected via Zoom. We propose a two-stage Encoder-Decoder-based E2E model. The encoder module consists of 1D depth-… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted by Interspeech 2023, 5 pages, 1 figure, 4 tables

    Journal ref: Proc. INTERSPEECH 2023, 4114--4118

  38. arXiv:2305.13204  [pdf, other

    cs.CL cs.SD eess.AS

    Improving Isochronous Machine Translation with Target Factors and Auxiliary Counters

    Authors: Proyag Pal, Brian Thompson, Yogesh Virkar, Prashant Mathur, Alexandra Chronopoulou, Marcello Federico

    Abstract: To translate speech for automatic dubbing, machine translation needs to be isochronous, i.e. translated speech needs to be aligned with the source in terms of speech durations. We introduce target factors in a transformer model to predict durations jointly with target language phoneme sequences. We also introduce auxiliary counters to help the decoder to keep track of the timing information while… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted at INTERSPEECH 2023

  39. arXiv:2305.11073  [pdf, other

    cs.CL cs.SD eess.AS

    A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks

    Authors: Yifan Peng, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, Shinji Watanabe

    Abstract: Conformer, a convolution-augmented Transformer variant, has become the de facto encoder architecture for speech processing due to its superior performance in various tasks, including automatic speech recognition (ASR), speech translation (ST) and spoken language understanding (SLU). Recently, a new encoder called E-Branchformer has outperformed Conformer in the LibriSpeech ASR benchmark, making it… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted at INTERSPEECH 2023. Code: https://github.com/espnet/espnet

  40. arXiv:2305.03272  [pdf, other

    eess.SY

    Robust Model Predictive Techno-Economic Control of Active Distribution Networks

    Authors: Salish Maharjan, Prashant Tiwari, Rui Cheng, Zhaoyu Wang

    Abstract: Stochastic controllers are perceived as a promising solution for techno-economic operation of distribution networks having higher generation uncertainties at large penetration of renewables. These controllers are supported by forecasters capable of predicting generation uncertainty by means of lower/upper bounds rather than by probability density function (PDF). Hence, the stochastic controller as… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

    Comments: Submitted to PESGM 2023

  41. arXiv:2305.00855  [pdf, other

    cs.DC cs.CY eess.SY

    Jointly Managing Electrical and Thermal Energy in Solar- and Battery-powered Computer Systems

    Authors: Noman Bashir, Yasra Chandio, David Irwin, Fatima M. Anwar, Jeremy Gummeson, Prashant Shenoy

    Abstract: Environmentally-powered computer systems operate on renewable energy harvested from their environment, such as solar or wind, and stored in batteries. While harvesting environmental energy has long been necessary for small-scale embedded systems without access to external power sources, it is also increasingly important in designing sustainable larger-scale systems for edge applications. For susta… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

    Comments: The 14th ACM International Conference on Future Energy Systems (e-Energy '23), June 20--23, 2023, Orlando, FL, USA

    Journal ref: In The 14th ACM International Conference on Future Energy Systems (e-Energy '23), June 20-23, 2023, Orlando, FL, USA. ACM, New York, NY, USA, 12 pages

  42. arXiv:2304.08413  [pdf, other

    cs.RO eess.SY physics.bio-ph

    Topology, dynamics, and control of an octopus-analog muscular hydrostat

    Authors: Arman Tekinalp, Noel Naughton, Seung-Hyun Kim, Udit Halder, Rhanor Gillette, Prashant G. Mehta, William Kier, Mattia Gazzola

    Abstract: Muscular hydrostats, such as octopus arms or elephant trunks, lack bones entirely, endowing them with exceptional dexterity and reconfigurability. Key to their unmatched ability to control nearly infinite degrees of freedom is the architecture into which muscle fibers are weaved. Their arrangement is, effectively, the instantiation of a sophisticated mechanical program that mediates, and likely fa… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Comments: 8 pages, 4 figures

  43. arXiv:2303.00802  [pdf, other

    cs.CL cs.SD eess.AS

    Synthetic Cross-accent Data Augmentation for Automatic Speech Recognition

    Authors: Philipp Klumpp, Pooja Chitkara, Leda Sarı, Prashant Serai, Jilong Wu, Irina-Elena Veliche, Rongqing Huang, Qing He

    Abstract: The awareness for biased ASR datasets or models has increased notably in recent years. Even for English, despite a vast amount of available training data, systems perform worse for non-native speakers. In this work, we improve an accent-conversion model (ACM) which transforms native US-English speech into accented pronunciation. We include phonetic knowledge in the ACM training to provide accurate… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

  44. arXiv:2302.14132  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding

    Authors: Yifan Peng, Kwangyoun Kim, Felix Wu, Prashant Sridhar, Shinji Watanabe

    Abstract: Self-supervised speech representation learning (SSL) has shown to be effective in various downstream tasks, but SSL models are usually large and slow. Model compression techniques such as pruning aim to reduce the model size and computation without degradation in accuracy. Prior studies focus on the pruning of Transformers; however, speech models not only utilize a stack of Transformer blocks, but… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

    Comments: Accepted at ICASSP 2023

  45. arXiv:2302.12979  [pdf, other

    cs.CL cs.SD eess.AS

    Jointly Optimizing Translations and Speech Timing to Improve Isochrony in Automatic Dubbing

    Authors: Alexandra Chronopoulou, Brian Thompson, Prashant Mathur, Yogesh Virkar, Surafel M. Lakew, Marcello Federico

    Abstract: Automatic dubbing (AD) is the task of translating the original speech in a video into target language speech. The new target language speech should satisfy isochrony; that is, the new speech should be time aligned with the original video, including mouth movements, pauses, hand gestures, etc. In this paper, we propose training a model that directly optimizes both the translation as well as the spe… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

    Comments: 5 pages

  46. arXiv:2302.05811  [pdf, other

    cs.RO eess.SY

    Hierarchical control and learning of a foraging CyberOctopus

    Authors: Chia-Hsien Shih, Noel Naughton, Udit Halder, Heng-Sheng Chang, Seung Hyun Kim, Rhanor Gillette, Prashant G. Mehta, Mattia Gazzola

    Abstract: Inspired by the unique neurophysiology of the octopus, we propose a hierarchical framework that simplifies the coordination of multiple soft arms by decomposing control into high-level decision making, low-level motor activation, and local reflexive behaviors via sensory feedback. When evaluated in the illustrative problem of a model octopus foraging for food, this hierarchical decomposition resul… ▽ More

    Submitted 11 February, 2023; originally announced February 2023.

    Comments: 16 pages, 7 figures

  47. Equitable Network-Aware Decarbonization of Residential Heating at City Scale

    Authors: Adam Lechowicz, Noman Bashir, John Wamburu, Mohammad Hajiesmaili, Prashant Shenoy

    Abstract: Residential heating, primarily powered by natural gas, accounts for a significant portion of residential sector energy use and carbon emissions in many parts of the world. Hence, there is a push towards decarbonizing residential heating by transitioning to energy-efficient heat pumps powered by an increasingly greener and less carbon-intensive electric grid. However, such a transition will add add… ▽ More

    Submitted 11 January, 2023; originally announced January 2023.

    Comments: Accepted to e-Energy 2023. 12 pages, 10 figures

  48. arXiv:2301.00935  [pdf, other

    eess.SY math.OC

    A Survey of Feedback Particle Filter and related Controlled Interacting Particle Systems (CIPS)

    Authors: Amirhossein Taghvaei, Prashant G. Mehta

    Abstract: In this survey, we describe controlled interacting particle systems (CIPS) to approximate the solution of the optimal filtering and the optimal control problems. Part I of the survey is focussed on the feedback particle filter (FPF) algorithm, its derivation based on optimal transportation theory, and its relationship to the ensemble Kalman filter (EnKF) and the conventional sequential importance… ▽ More

    Submitted 20 March, 2023; v1 submitted 2 January, 2023; originally announced January 2023.

  49. arXiv:2212.08542  [pdf, other

    eess.AS cs.CL

    Context-aware Fine-tuning of Self-supervised Speech Models

    Authors: Suwon Shon, Felix Wu, Kwangyoun Kim, Prashant Sridhar, Karen Livescu, Shinji Watanabe

    Abstract: Self-supervised pre-trained transformers have improved the state of the art on a variety of speech tasks. Due to the quadratic time and space complexity of self-attention, they usually operate at the level of relatively short (e.g., utterance) segments. In this paper, we study the use of context, i.e., surrounding segments, during fine-tuning and propose a new approach called context-aware fine-tu… ▽ More

    Submitted 28 March, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

  50. arXiv:2212.04229  [pdf, other

    cs.CR eess.SY

    ICSPatch: Automated Vulnerability Localization and Non-Intrusive Hotpatching in Industrial Control Systems using Data Dependence Graphs

    Authors: Prashant Hari Narayan Rajput, Constantine Doumanidis, Michail Maniatakos

    Abstract: The paradigm shift of enabling extensive intercommunication between the Operational Technology (OT) and Information Technology (IT) devices allows vulnerabilities typical to the IT world to propagate to the OT side. Therefore, the security layer offered in the past by air gapping is removed, making security patching for OT devices a hard requirement. Conventional patching involves a device reboot… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

    Comments: To appear in the 32nd USENIX Security Symposium, August 2023, Anaheim, CA, USA [16 pages, 12 figures, 5 tables, code available at https://github.com/momalab/ICSPatch]