Skip to main content

Showing 1–47 of 47 results for author: Joshi, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.09653  [pdf, ps, other

    eess.AS

    Recognizing Every Voice: Towards Inclusive ASR for Rural Bhojpuri Women

    Authors: Sakshi Joshi, Eldho Ittan George, Tahir Javed, Kaushal Bhogale, Nikhil Narasimhan, Mitesh M. Khapra

    Abstract: Digital inclusion remains a challenge for marginalized communities, especially rural women in low-resource language regions like Bhojpuri. Voice-based access to agricultural services, financial transactions, government schemes, and healthcare is vital for their empowerment, yet existing ASR systems for this group remain largely untested. To address this gap, we create SRUTI ,a benchmark consisting… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Accepted at Interspeech 2025

  2. arXiv:2503.01075  [pdf, other

    eess.IV cs.AI cs.CV

    Tackling Hallucination from Conditional Models for Medical Image Reconstruction with DynamicDPS

    Authors: Seunghoi Kim, Henry F. J. Tregidgo, Matteo Figini, Chen Jin, Sarang Joshi, Daniel C. Alexander

    Abstract: Hallucinations are spurious structures not present in the ground truth, posing a critical challenge in medical image reconstruction, especially for data-driven conditional models. We hypothesize that combining an unconditional diffusion model with data consistency, trained on a diverse dataset, can reduce these hallucinations. Based on this, we propose DynamicDPS, a diffusion-based framework that… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

  3. arXiv:2501.15310  [pdf, other

    cs.CL cs.SD eess.AS

    The Multicultural Medical Assistant: Can LLMs Improve Medical ASR Errors Across Borders?

    Authors: Ayo Adedeji, Mardhiyah Sanni, Emmanuel Ayodele, Sarita Joshi, Tobi Olatunji

    Abstract: The global adoption of Large Language Models (LLMs) in healthcare shows promise to enhance clinical workflows and improve patient outcomes. However, Automatic Speech Recognition (ASR) errors in critical medical terms remain a significant challenge. These errors can compromise patient care and safety if not detected. This study investigates the prevalence and impact of ASR errors in medical transcr… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

    Comments: 15 pages, 8 figures

  4. arXiv:2501.00961  [pdf, ps, other

    cs.LG cs.AI cs.CV eess.IV

    Uncovering Memorization Effect in the Presence of Spurious Correlations

    Authors: Chenyu You, Haocheng Dai, Yifei Min, Jasjeet S. Sekhon, Sarang Joshi, James S. Duncan

    Abstract: Machine learning models often rely on simple spurious features -- patterns in training data that correlate with targets but are not causally related to them, like image backgrounds in foreground classification. This reliance typically leads to imbalanced test performance across minority and majority groups. In this work, we take a closer look at the fundamental cause of such imbalanced performance… ▽ More

    Submitted 4 June, 2025; v1 submitted 1 January, 2025; originally announced January 2025.

    Comments: Accepted by Nature Communications

  5. arXiv:2412.00538  [pdf, other

    cs.RO cs.LG eess.SY stat.AP

    Prognostic Framework for Robotic Manipulators Operating Under Dynamic Task Severities

    Authors: Ayush Mohanty, Jason Dekarske, Stephen K. Robinson, Sanjay Joshi, Nagi Gebraeel

    Abstract: Robotic manipulators are critical in many applications but are known to degrade over time. This degradation is influenced by the nature of the tasks performed by the robot. Tasks with higher severity, such as handling heavy payloads, can accelerate the degradation process. One way this degradation is reflected is in the position accuracy of the robot's end-effector. In this paper, we present a pro… ▽ More

    Submitted 30 November, 2024; originally announced December 2024.

  6. arXiv:2409.18872  [pdf, other

    eess.IV cs.CV cs.LG

    Simulating Dynamic Tumor Contrast Enhancement in Breast MRI using Conditional Generative Adversarial Networks

    Authors: Richard Osuala, Smriti Joshi, Apostolia Tsirikoglou, Lidia Garrucho, Walter H. L. Pinaya, Daniel M. Lang, Julia A. Schnabel, Oliver Diaz, Karim Lekadir

    Abstract: This paper presents a method for virtual contrast enhancement in breast MRI, offering a promising non-invasive alternative to traditional contrast agent-based DCE-MRI acquisition. Using a conditional generative adversarial network, we predict DCE-MRI images, including jointly-generated sequences of multiple corresponding DCE-MRI timepoints, from non-contrast-enhanced MRIs, enabling tumor localizat… ▽ More

    Submitted 14 May, 2025; v1 submitted 27 September, 2024; originally announced September 2024.

  7. arXiv:2409.08985  [pdf, other

    cs.CR cs.LG eess.AS

    Clean Label Attacks against SLU Systems

    Authors: Henry Li Xinyuan, Sonal Joshi, Thomas Thebaud, Jesus Villalba, Najim Dehak, Sanjeev Khudanpur

    Abstract: Poisoning backdoor attacks involve an adversary manipulating the training data to induce certain behaviors in the victim model by inserting a trigger in the signal at inference time. We adapted clean label backdoor (CLBD)-data poisoning attacks, which do not modify the training labels, on state-of-the-art speech recognition models that support/perform a Spoken Language Understanding task, achievin… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

    Comments: Accepted at IEEE SLT 2024

  8. arXiv:2407.15964  [pdf, other

    cs.CV eess.IV

    FDWST: Fingerphoto Deblurring using Wavelet Style Transfer

    Authors: David Keaton, Amol S. Joshi, Jeremy Dawson, Nasser M. Nasrabadi

    Abstract: The challenge of deblurring fingerphoto images, or generating a sharp fingerphoto from a given blurry one, is a significant problem in the realm of computer vision. To address this problem, we propose a fingerphoto deblurring architecture referred to as Fingerphoto Deblurring using Wavelet Style Transfer (FDWST), which aims to utilize the information transmission of Style Transfer techniques to de… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Accepted by IJCB 2024

  9. arXiv:2403.13890  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Towards Learning Contrast Kinetics with Multi-Condition Latent Diffusion Models

    Authors: Richard Osuala, Daniel M. Lang, Preeti Verma, Smriti Joshi, Apostolia Tsirikoglou, Grzegorz Skorupko, Kaisar Kushibar, Lidia Garrucho, Walter H. L. Pinaya, Oliver Diaz, Julia A. Schnabel, Karim Lekadir

    Abstract: Contrast agents in dynamic contrast enhanced magnetic resonance imaging allow to localize tumors and observe their contrast kinetics, which is essential for cancer characterization and respective treatment decision-making. However, contrast agent administration is not only associated with adverse health risks, but also restricted for patients during pregnancy, and for those with kidney malfunction… ▽ More

    Submitted 17 July, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: Early Accept at MICCAI2024

  10. arXiv:2402.19355  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    Unraveling Adversarial Examples against Speaker Identification -- Techniques for Attack Detection and Victim Model Classification

    Authors: Sonal Joshi, Thomas Thebaud, Jesús Villalba, Najim Dehak

    Abstract: Adversarial examples have proven to threaten speaker identification systems, and several countermeasures against them have been proposed. In this paper, we propose a method to detect the presence of adversarial examples, i.e., a binary classifier distinguishing between benign and adversarial examples. We build upon and extend previous work on attack type classification by exploring new architectur… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  11. arXiv:2402.07658  [pdf, other

    cs.CL cs.SD eess.AS

    The Sound of Healthcare: Improving Medical Transcription ASR Accuracy with Large Language Models

    Authors: Ayo Adedeji, Sarita Joshi, Brendan Doohan

    Abstract: In the rapidly evolving landscape of medical documentation, transcribing clinical dialogues accurately is increasingly paramount. This study explores the potential of Large Language Models (LLMs) to enhance the accuracy of Automatic Speech Recognition (ASR) systems in medical transcription. Utilizing the PriMock57 dataset, which encompasses a diverse range of primary care consultations, we apply a… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: 31 pages, 17 figures

  12. arXiv:2311.10879  [pdf, other

    eess.IV cs.CV cs.LG

    Pre- to Post-Contrast Breast MRI Synthesis for Enhanced Tumour Segmentation

    Authors: Richard Osuala, Smriti Joshi, Apostolia Tsirikoglou, Lidia Garrucho, Walter H. L. Pinaya, Oliver Diaz, Karim Lekadir

    Abstract: Despite its benefits for tumour detection and treatment, the administration of contrast agents in dynamic contrast-enhanced MRI (DCE-MRI) is associated with a range of issues, including their invasiveness, bioaccumulation, and a risk of nephrogenic systemic fibrosis. This study explores the feasibility of producing synthetic contrast enhancements by translating pre-contrast T1-weighted fat-saturat… ▽ More

    Submitted 31 May, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

    Comments: Accepted as oral presentation at SPIE Medical Imaging 2024 (Image Processing)

  13. arXiv:2305.15760  [pdf, other

    cs.CL cs.SD eess.AS

    Svarah: Evaluating English ASR Systems on Indian Accents

    Authors: Tahir Javed, Sakshi Joshi, Vignesh Nagarajan, Sai Sundaresan, Janki Nawale, Abhigyan Raman, Kaushal Bhogale, Pratyush Kumar, Mitesh M. Khapra

    Abstract: India is the second largest English-speaking country in the world with a speaker base of roughly 130 million. Thus, it is imperative that automatic speech recognition (ASR) systems for English should be evaluated on Indian accents. Unfortunately, Indian speakers find a very poor representation in existing English ASR benchmarks such as LibriSpeech, Switchboard, Speech Accent Archive, etc. In this… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  14. arXiv:2305.06025  [pdf

    eess.IV cs.CV

    Brain Tumor Detection using Swin Transformers

    Authors: Prateek A. Meshram, Suraj Joshi, Devarshi Mahajan

    Abstract: The first MRI scan was done in the year 1978 by researchers at EML Laboratories. As per an estimate, approximately 251,329 people died due to primary cancerous brain and CNS (Central Nervous System) Tumors in the year 2020. It has been recommended by various medical professionals that brain tumor detection at an early stage would help in saving many lives. Whenever radiologists deal with a brain M… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

  15. arXiv:2304.03297  [pdf, other

    eess.IV cs.CV cs.LG

    Neural Operator Learning for Ultrasound Tomography Inversion

    Authors: Haocheng Dai, Michael Penwarden, Robert M. Kirby, Sarang Joshi

    Abstract: Neural operator learning as a means of mapping between complex function spaces has garnered significant attention in the field of computational science and engineering (CS&E). In this paper, we apply Neural operator learning to the time-of-flight ultrasound computed tomography (USCT) problem. We learn the mapping between time-of-flight (TOF) data and the heterogeneous sound speed field using a ful… ▽ More

    Submitted 28 May, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

    Comments: 4 pages, 1 figure

  16. arXiv:2210.15756  [pdf, other

    quant-ph eess.SY physics.app-ph

    Scaling up Superconducting Quantum Computers with Cryogenic RF-photonics

    Authors: Sanskriti Joshi, Sajjad Moazeni

    Abstract: Today's hundred-qubit quantum computers require a dramatic scale up to millions of qubits to become practical for solving real-world problems. Although a variety of qubit technologies have been demonstrated, scalability remains a major hurdle. Superconducting (SC) qubits are one of the most mature and promising technologies to overcome this challenge. However, these qubits reside in a millikelvin… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: 10 pages, 8 figures

  17. arXiv:2209.14472  [pdf, other

    eess.IV cs.CV cs.LG

    medigan: a Python library of pretrained generative models for medical image synthesis

    Authors: Richard Osuala, Grzegorz Skorupko, Noussair Lazrak, Lidia Garrucho, Eloy García, Smriti Joshi, Socayna Jouide, Michael Rutherford, Fred Prior, Kaisar Kushibar, Oliver Diaz, Karim Lekadir

    Abstract: Synthetic data generated by generative models can enhance the performance and capabilities of data-hungry deep learning models in medical imaging. However, there is (1) limited availability of (synthetic) datasets and (2) generative models are complex to train, which hinders their adoption in research and clinical applications. To reduce this entry barrier, we propose medigan, a one-stop shop for… ▽ More

    Submitted 23 February, 2023; v1 submitted 28 September, 2022; originally announced September 2022.

    Comments: 32 pages, 7 figures

    ACM Class: I.4.0; I.2.0; I.5.1

    Journal ref: Journal of Medical Imaging 10.6 (2023) 061403

  18. arXiv:2209.01498  [pdf, other

    q-bio.QM cs.LG eess.IV

    StreamNet: A WAE for White Matter Streamline Analysis

    Authors: Andrew Lizarraga, Katherine L. Narr, Kirsten A. Donald, Shantanu H. Joshi

    Abstract: We present StreamNet, an autoencoder architecture for the analysis of the highly heterogeneous geometry of large collections of white matter streamlines. This proposed framework takes advantage of geometry-preserving properties of the Wasserstein-1 metric in order to achieve direct encoding and reconstruction of entire bundles of streamlines. We show that the model not only accurately captures the… ▽ More

    Submitted 19 October, 2022; v1 submitted 3 September, 2022; originally announced September 2022.

  19. arXiv:2208.01408  [pdf, other

    eess.SY

    A Python-based Mixed Discrete-Continuous Simulation Framework for Digital Twins

    Authors: Neha Karanjkar, Subodh M. Joshi

    Abstract: The use of Digital Twins is set to transform the manufacturing sector by aiding monitoring and real-time decision making. For several applications in this sector, the system to be modeled consists of a mix of discrete-event and continuous processes interacting with each other. Building simulation-based Digital Twins of such systems necessitates an open, flexible simulation framework which can supp… ▽ More

    Submitted 31 July, 2022; originally announced August 2022.

    Comments: Under review for publication in Springer Lecture Notes in Networks and Systems (LNNS)

  20. arXiv:2205.08419  [pdf, other

    eess.SP cs.AI cs.HC cs.LG

    Human Emotion Classification based on EEG Signals Using Recurrent Neural Network And KNN

    Authors: Shashank Joshi, Falak Joshi

    Abstract: In human contact, emotion is very crucial. Attributes like words, voice intonation, facial expressions, and kinesics can all be used to portray one's feelings. However, brain-computer interface (BCI) devices have not yet reached the level required for emotion interpretation. With the rapid development of machine learning algorithms, dry electrode techniques, and different real-world applications o… ▽ More

    Submitted 10 May, 2022; originally announced May 2022.

  21. arXiv:2204.03851  [pdf, other

    eess.AS cs.CR cs.SD

    Defense against Adversarial Attacks on Hybrid Speech Recognition using Joint Adversarial Fine-tuning with Denoiser

    Authors: Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesus Villalba, Sanjeev Khudanpur, Najim Dehak

    Abstract: Adversarial attacks are a threat to automatic speech recognition (ASR) systems, and it becomes imperative to propose defenses to protect them. In this paper, we perform experiments to show that K2 conformer hybrid ASR is strongly affected by white-box adversarial attacks. We propose three defenses--denoiser pre-processor, adversarially fine-tuning ASR model, and adversarially fine-tuning joint mod… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: Submitted to Interspeech 2022

  22. arXiv:2204.03848  [pdf, ps, other

    eess.AS cs.CR cs.SD

    AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification

    Authors: Sonal Joshi, Saurabh Kataria, Jesus Villalba, Najim Dehak

    Abstract: Adversarial attacks pose a severe security threat to the state-of-the-art speaker identification systems, thereby making it vital to propose countermeasures against them. Building on our previous work that used representation learning to classify and detect adversarial attacks, we propose an improvement to it using AdvEst, a method to estimate adversarial perturbation. First, we prove our claim th… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: Submitted to InterSpeech 2022

  23. arXiv:2203.06122  [pdf, other

    q-bio.NC cs.CV eess.IV

    Modeling the Shape of the Brain Connectome via Deep Neural Networks

    Authors: Haocheng Dai, Martin Bauer, P. Thomas Fletcher, Sarang Joshi

    Abstract: The goal of diffusion-weighted magnetic resonance imaging (DWI) is to infer the structural connectivity of an individual subject's brain in vivo. To statistically study the variability and differences between normal and abnormal brain connectomes, a mathematical model of the neural connections is required. In this paper, we represent the brain connectome as a Riemannian manifold, which allows us t… ▽ More

    Submitted 3 March, 2023; v1 submitted 6 March, 2022; originally announced March 2022.

    Comments: 12 pages, 5 figures

  24. CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwannoma and Cochlea Segmentation

    Authors: Reuben Dorent, Aaron Kujawa, Marina Ivory, Spyridon Bakas, Nicola Rieke, Samuel Joutard, Ben Glocker, Jorge Cardoso, Marc Modat, Kayhan Batmanghelich, Arseniy Belkov, Maria Baldeon Calisto, Jae Won Choi, Benoit M. Dawant, Hexin Dong, Sergio Escalera, Yubo Fan, Lasse Hansen, Mattias P. Heinrich, Smriti Joshi, Victoriya Kashtanova, Hyeon Gyu Kim, Satoshi Kondo, Christian N. Kruse, Susana K. Lai-Yuen , et al. (15 additional authors not shown)

    Abstract: Domain Adaptation (DA) has recently raised strong interests in the medical imaging community. While a large variety of DA techniques has been proposed for image segmentation, most of these techniques have been validated either on private datasets or on small publicly available datasets. Moreover, these datasets mostly addressed single-class problems. To tackle these limitations, the Cross-Modality… ▽ More

    Submitted 14 December, 2022; v1 submitted 8 January, 2022; originally announced January 2022.

    Comments: In Medical Image Analysis

  25. Latency-Aware Multi-antenna SWIPT System with Battery-Constrained Receivers

    Authors: Dileep Kumar, Onel L. Alcaraz López, Satya Krishna Joshi, Antti Tölli

    Abstract: Power splitting (PS) based simultaneous wireless information and power transfer (SWIPT) is considered in a multi-user multiple-input-single-output broadcast scenario. Specifically, we focus on jointly configuring the transmit beamforming vectors and receive PS ratios to minimize the total transmit energy of the base station under the user-specific latency and energy harvesting (EH) requirements. T… ▽ More

    Submitted 22 October, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

    Comments: 34 pages, 12 figures

  26. arXiv:2110.13023  [pdf, other

    cs.LG cs.SD eess.AS

    ML-Based Analysis to Identify Speech Features Relevant in Predicting Alzheimer's Disease

    Authors: Yash Kumar, Piyush Maheshwari, Shreyansh Joshi, Veeky Baths

    Abstract: Alzheimer's disease (AD) is a neurodegenerative disease that affects nearly 50 million individuals across the globe and is one of the leading causes of deaths globally. It is projected that by 2050, the number of people affected by the disease would more than double. Consequently, the growing advancements in technology beg the question, can technology be used to predict Alzheimer's for a better an… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

  27. arXiv:2110.12503  [pdf, other

    cs.LG eess.SP

    Deep Neural Networks on EEG Signals to Predict Auditory Attention Score Using Gramian Angular Difference Field

    Authors: Mahak Kothari, Shreyansh Joshi, Adarsh Nandanwar, Aadetya Jaiswal, Veeky Baths

    Abstract: Auditory attention is a selective type of hearing in which people focus their attention intentionally on a specific source of a sound or spoken words whilst ignoring or inhibiting other auditory stimuli. In some sense, the auditory attention score of an individual shows the focus the person can have in auditory tasks. The recent advancements in deep learning and in the non-invasive technologies re… ▽ More

    Submitted 24 October, 2021; originally announced October 2021.

    Comments: 7 pages, 3 figures

  28. arXiv:2109.11139  [pdf, other

    eess.IV

    Source Printer Identification using Printer Specific Pooling of Letter Descriptors

    Authors: Sharad Joshi, Yogesh Kumar Gupta, Nitin Khanna

    Abstract: The digital revolution has replaced the use of printed documents with their digital counterparts. However, many applications require the use of both due to several factors, including challenges of digital security, installation costs, ease of use, and lack of digital expertise. Technological developments in the digital domain have also resulted in the easy availability of high-quality scanners, pr… ▽ More

    Submitted 23 September, 2021; originally announced September 2021.

    Comments: 34 pages, 5 figures, Journal

  29. Latency-Constrained Highly-Reliable mmWave Communication via Multi-point Connectivity

    Authors: Dileep Kumar, Satya Joshi, Antti Tölli

    Abstract: The sensitivity of millimeter-wave (mmWave) radio channel to blockage is a fundamental challenge in achieving low-latency and ultra-reliable connectivity. In this paper, we explore the viability of using coordinated multi-point (CoMP) transmission for a delay bounded and reliable mmWave communication. We propose a novel blockage-aware algorithm for the sum-power minimization problem under the user… ▽ More

    Submitted 20 August, 2021; originally announced August 2021.

    Comments: 12 pages, 10 figures

  30. arXiv:2108.08405  [pdf, other

    cs.CL cs.SD eess.AS

    Integrating Dialog History into End-to-End Spoken Language Understanding Systems

    Authors: Jatin Ganhotra, Samuel Thomas, Hong-Kwang J. Kuo, Sachindra Joshi, George Saon, Zoltán Tüske, Brian Kingsbury

    Abstract: End-to-end spoken language understanding (SLU) systems that process human-human or human-computer interactions are often context independent and process each turn of a conversation independently. Spoken conversations on the other hand, are very much context dependent, and dialog history contains useful information that can improve the processing of each conversational turn. In this paper, we inves… ▽ More

    Submitted 18 August, 2021; originally announced August 2021.

    Comments: Interspeech 2021

  31. arXiv:2107.05578  [pdf, other

    physics.ao-ph astro-ph.EP astro-ph.IM eess.IV stat.AP

    Impact of Scene-Specific Enhancement Spectra on Matched Filter Greenhouse Gas Retrievals from Imaging Spectroscopy

    Authors: Markus D. Foote, Philip E. Dennison, Patrick R. Sullivan, Kelly B. O'Neill, Andrew K. Thorpe, David R. Thompson, Daniel H. Cusworth, Riley Duren, Sarang C. Joshi

    Abstract: Matched filter (MF) techniques have been widely used for retrieval of greenhouse gas enhancements (enh.) from imaging spectroscopy datasets. While multiple algorithmic techniques and refinements have been proposed, the greenhouse gas target spectrum used for concentration enh. estimation has remained largely unaltered since the introduction of quantitative MF retrievals. The magnitude of retrieved… ▽ More

    Submitted 10 August, 2021; v1 submitted 25 June, 2021; originally announced July 2021.

    Comments: 13 pages, 5 figures, 3 tables

    Journal ref: Remote Sensing of Environment, Volume 264, October 2021, 112574

  32. arXiv:2107.04448  [pdf, other

    eess.AS

    Representation Learning to Classify and Detect Adversarial Attacks against Speaker and Speech Recognition Systems

    Authors: Jesús Villalba, Sonal Joshi, Piotr Żelasko, Najim Dehak

    Abstract: Adversarial attacks have become a major threat for machine learning applications. There is a growing interest in studying these attacks in the audio domain, e.g, speech and speaker recognition; and find defenses against them. In this work, we focus on using representation learning to classify/detect attacks w.r.t. the attack algorithm, threat model or signal-to-adversarial-noise ratio. We found th… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: Accepted at Interspeech 2021

  33. arXiv:2103.17122  [pdf, ps, other

    eess.AS cs.CR cs.SD

    Adversarial Attacks and Defenses for Speech Recognition Systems

    Authors: Piotr Żelasko, Sonal Joshi, Yiwen Shao, Jesus Villalba, Jan Trmal, Najim Dehak, Sanjeev Khudanpur

    Abstract: The ubiquitous presence of machine learning systems in our lives necessitates research into their vulnerabilities and appropriate countermeasures. In particular, we investigate the effectiveness of adversarial attacks and defenses against automatic speech recognition (ASR) systems. We select two ASR models - a thoroughly studied DeepSpeech model and a more recent Espresso framework Transformer enc… ▽ More

    Submitted 31 March, 2021; originally announced March 2021.

    Comments: This work has been submitted to the IEEE for possible publication

  34. arXiv:2103.13497  [pdf, other

    eess.IV cs.CV

    3D Reasoning for Unsupervised Anomaly Detection in Pediatric WbMRI

    Authors: Alex Chang, Vinith Suriyakumar, Abhishek Moturu, James Tu, Nipaporn Tewattanarat, Sayali Joshi, Andrea Doria, Anna Goldenberg

    Abstract: Modern deep unsupervised learning methods have shown great promise for detecting diseases across a variety of medical imaging modalities. While previous generative modeling approaches successfully perform anomaly detection by learning the distribution of healthy 2D image slices, they process such slices independently and ignore the fact that they are correlated, all being sampled from a 3D volume.… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

    Comments: 10 pages, 2 tables, 3 figures, in submission

  35. The Generalized Fourier Transform: A Unified Framework for the Fourier, Laplace, Mellin and $Z$ Transforms

    Authors: Pushpendra Singh, Anubha Gupta, Shiv Dutt Joshi

    Abstract: This paper introduces Generalized Fourier transform (GFT) that is an extension or the generalization of the Fourier transform (FT). The Unilateral Laplace transform (LT) is observed to be the special case of GFT. GFT, as proposed in this work, contributes significantly to the scholarly literature. There are many salient contribution of this work. Firstly, GFT is applicable to a much larger class o… ▽ More

    Submitted 12 February, 2021; originally announced March 2021.

    Comments: 18 pages

  36. arXiv:2101.08909  [pdf, other

    eess.AS cs.SD

    Study of Pre-processing Defenses against Adversarial Attacks on State-of-the-art Speaker Recognition Systems

    Authors: Sonal Joshi, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velázquez, Najim Dehak

    Abstract: Adversarial examples to speaker recognition (SR) systems are generated by adding a carefully crafted noise to the speech signal to make the system fail while being imperceptible to humans. Such attacks pose severe security risks, making it vital to deep-dive and understand how much the state-of-the-art SR systems are vulnerable to these attacks. Moreover, it is of greater importance to propose def… ▽ More

    Submitted 25 June, 2021; v1 submitted 21 January, 2021; originally announced January 2021.

    Comments: This work has been submitted to the IEEE for possible publication

  37. arXiv:2011.10708  [pdf, other

    eess.IV physics.med-ph

    Histology to 3D In Vivo MR Registration for Volumetric Evaluation of MRgFUS Treatment Assessment Biomarkers

    Authors: Blake E. Zimmerman, Sara L. Johnson, Henrik A. Odéen, Jill E. Shea, Rachel E. Factor, Sarang C. Joshi, Allison H. Payne

    Abstract: Advances in imaging and early cancer detection have increased interest in magnetic resonance (MR) guided focused ultrasound (MRgFUS) technologies for cancer treatment. MRgFUS ablation treatments could reduce surgical risks, preserve organ tissue/function, and improve patient quality of life. However, surgical resection and histological analysis remain the gold standard to assess cancer treatment r… ▽ More

    Submitted 20 November, 2020; originally announced November 2020.

    Comments: 12 pages, 5 figures, 2 tables

  38. Analog vs. Digital Spatial Transforms: A Throughput, Power, and Area Comparison

    Authors: Zephan M. Enciso, Seyed Hadi Mirfarshbafan, Oscar Castañeda, Clemens JS. Schaefer, Christoph Studer, Siddharth Joshi

    Abstract: Spatial linear transforms that process multiple parallel analog signals to simplify downstream signal processing find widespread use in multi-antenna communication systems, machine learning inference, data compression, audio and ultrasound applications, among many others. In the past, a wide range of mixed-signal as well as digital spatial transform circuits have been proposed---it is, however, a… ▽ More

    Submitted 15 September, 2020; originally announced September 2020.

    Comments: 2020 IEEE 63rd International Midwest Symposium on Circuits and Systems (MWSCAS), Springfield, MA, USA, 2020, pp. 125-128, doi: 10.1109/MWSCAS48704.2020.9184566

  39. arXiv:2005.00447  [pdf, other

    stat.ML cs.LG eess.IV

    Image fusion using symmetric skip autoencodervia an Adversarial Regulariser

    Authors: Snigdha Bhagat, S. D. Joshi, Brejesh Lall

    Abstract: It is a challenging task to extract the best of both worlds by combining the spatial characteristics of a visible image and the spectral content of an infrared image. In this work, we propose a spatially constrained adversarial autoencoder that extracts deep features from the infrared and visible images to obtain a more exhaustive and global representation. In this paper, we propose a residual aut… ▽ More

    Submitted 4 June, 2020; v1 submitted 1 May, 2020; originally announced May 2020.

  40. arXiv:2004.06094  [pdf, other

    cs.ET eess.SP

    A Device Non-Ideality Resilient Approach for Mapping Neural Networks to Crossbar Arrays

    Authors: Arman Kazemi, Cristobal Alessandri, Alan C. Seabaugh, X. Sharon Hu, Michael Niemier, Siddharth Joshi

    Abstract: We propose a technology-independent method, referred to as adjacent connection matrix (ACM), to efficiently map signed weight matrices to non-negative crossbar arrays. When compared to same-hardware-overhead mapping methods, using ACM leads to improvements of up to 20% in training accuracy for ResNet-20 with the CIFAR-10 dataset when training with 5-bit precision crossbar arrays or lower. When com… ▽ More

    Submitted 1 April, 2020; originally announced April 2020.

    Comments: Accepted at DAC'20

  41. arXiv:2004.01929  [pdf, other

    eess.IV cs.CV

    Empirical Evaluation of PRNU Fingerprint Variation for Mismatched Imaging Pipelines

    Authors: Sharad Joshi, Pawel Korus, Nitin Khanna, Nasir Memon

    Abstract: We assess the variability of PRNU-based camera fingerprints with mismatched imaging pipelines (e.g., different camera ISP or digital darkroom software). We show that camera fingerprints exhibit non-negligible variations in this setup, which may lead to unexpected degradation of detection statistics in real-world use-cases. We tested 13 different pipelines, including standard digital darkroom softw… ▽ More

    Submitted 9 October, 2020; v1 submitted 4 April, 2020; originally announced April 2020.

    Comments: 6 pages and 3 pages supplemental file

  42. arXiv:2003.12602  [pdf, other

    cs.CV eess.IV

    Source Printer Identification from Document Images Acquired using Smartphone

    Authors: Sharad Joshi, Suraj Saxena, Nitin Khanna

    Abstract: Vast volumes of printed documents continue to be used for various important as well as trivial applications. Such applications often rely on the information provided in the form of printed text documents whose integrity verification poses a challenge due to time constraints and lack of resources. Source printer identification provides essential information about the origin and integrity of a print… ▽ More

    Submitted 27 March, 2020; originally announced March 2020.

    Comments: 10 pages

  43. arXiv:2003.02978  [pdf, other

    eess.IV cs.DC physics.ao-ph stat.AP

    Fast and Accurate Retrieval of Methane Concentration from Imaging Spectrometer Data Using Sparsity Prior

    Authors: Markus D. Foote, Philip E. Dennison, Andrew K. Thorpe, David R. Thompson, Siraput Jongaramrungruang, Christian Frankenberg, Sarang C. Joshi

    Abstract: The strong radiative forcing by atmospheric methane has stimulated interest in identifying natural and anthropogenic sources of this potent greenhouse gas. Point sources are important targets for quantification, and anthropogenic targets have potential for emissions reduction. Methane point source plume detection and concentration retrieval have been previously demonstrated using data from the Air… ▽ More

    Submitted 5 March, 2020; originally announced March 2020.

    Comments: 13 pages, 11 figures

    Journal ref: IEEE Transactions on Geoscience and Remote Sensing, 2020, pp. 1-13

  44. arXiv:2002.02116  [pdf, other

    math.ST eess.SP

    Quantification of Differential Information using Matrix Pencil

    Authors: Snigdha Bhagat, S. D. Joshi

    Abstract: Any traditional classification problem in general involves modelling individual classes and in turn classification by evaluating the similarity of the test set with the modelled classes. In this paper, we introduce another approach that would find the differential information between two classes rather than modelling individual classes separately. The classes are viewed on a common frame of refere… ▽ More

    Submitted 6 February, 2020; originally announced February 2020.

  45. arXiv:1910.10769  [pdf, other

    eess.IV cs.LG physics.med-ph stat.ML

    Learning Multiparametric Biomarkers for Assessing MR-Guided Focused Ultrasound Treatment of Malignant Tumors

    Authors: Blake E. Zimmerman, Sara Johnson, Henrik Odéen, Jill Shea, Markus D. Foote, Nicole Winkler, Sarang C. Joshi, Allison Payne

    Abstract: Noninvasive MR-guided focused ultrasound (MRgFUS) treatments are promising alternatives to the surgical removal of malignant tumors. A significant challenge is assessing the viability of treated tissue during and immediately after MRgFUS procedures. Current clinical assessment uses the nonperfused volume (NPV) biomarker immediately after treatment from contrast-enhanced MRI. The NPV has variable a… ▽ More

    Submitted 29 September, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

    Comments: 11 pages, 12 figures

  46. Rank Constrained Diffeomorphic Density Motion Estimation for Respiratory Correlated Computed Tomography

    Authors: Markus D. Foote, Pouya Sabouri, Amit Sawant, Sarang C. Joshi

    Abstract: Motion estimation of organs in a sequence of images is important in numerous medical imaging applications. The focus of this paper is the analysis of 4D Respiratory Correlated Computed Tomography (RCCT) Imaging. It is hypothesized that the quasi-periodic breathing induced motion of organs in the thorax can be represented by deformations spanning a very low dimension subspace of the full infinite d… ▽ More

    Submitted 25 September, 2019; originally announced September 2019.

    Journal ref: In: MFCA 2017. Lecture Notes in Computer Science, vol 10551. Springer, Cham

  47. arXiv:1710.10227  [pdf, other

    eess.SP cs.IT math.CT

    Unified Functorial Signal Representation III: Foundations, Redundancy, $L^0$ and $L^2$ functors

    Authors: Salil Samant, Shiv Dutt Joshi

    Abstract: In this paper we propose and lay the foundations of a functorial framework for representing signals. By incorporating additional category-theoretic relative and generative perspective alongside the classic set-theoretic measure theory the fundamental concepts of redundancy, compression are formulated in a novel authentic arrow-theoretic way. The existing classic framework representing a signal as… ▽ More

    Submitted 27 October, 2017; originally announced October 2017.

    Comments: First draft version