-
Recognizing Every Voice: Towards Inclusive ASR for Rural Bhojpuri Women
Authors:
Sakshi Joshi,
Eldho Ittan George,
Tahir Javed,
Kaushal Bhogale,
Nikhil Narasimhan,
Mitesh M. Khapra
Abstract:
Digital inclusion remains a challenge for marginalized communities, especially rural women in low-resource language regions like Bhojpuri. Voice-based access to agricultural services, financial transactions, government schemes, and healthcare is vital for their empowerment, yet existing ASR systems for this group remain largely untested. To address this gap, we create SRUTI ,a benchmark consisting…
▽ More
Digital inclusion remains a challenge for marginalized communities, especially rural women in low-resource language regions like Bhojpuri. Voice-based access to agricultural services, financial transactions, government schemes, and healthcare is vital for their empowerment, yet existing ASR systems for this group remain largely untested. To address this gap, we create SRUTI ,a benchmark consisting of rural Bhojpuri women speakers. Evaluation of current ASR models on SRUTI shows poor performance due to data scarcity, which is difficult to overcome due to social and cultural barriers that hinder large-scale data collection. To overcome this, we propose generating synthetic speech using just 25-30 seconds of audio per speaker from approximately 100 rural women. Augmenting existing datasets with this synthetic data achieves an improvement of 4.7 WER, providing a scalable, minimally intrusive solution to enhance ASR and promote digital inclusion in low-resource language.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
Tackling Hallucination from Conditional Models for Medical Image Reconstruction with DynamicDPS
Authors:
Seunghoi Kim,
Henry F. J. Tregidgo,
Matteo Figini,
Chen Jin,
Sarang Joshi,
Daniel C. Alexander
Abstract:
Hallucinations are spurious structures not present in the ground truth, posing a critical challenge in medical image reconstruction, especially for data-driven conditional models. We hypothesize that combining an unconditional diffusion model with data consistency, trained on a diverse dataset, can reduce these hallucinations. Based on this, we propose DynamicDPS, a diffusion-based framework that…
▽ More
Hallucinations are spurious structures not present in the ground truth, posing a critical challenge in medical image reconstruction, especially for data-driven conditional models. We hypothesize that combining an unconditional diffusion model with data consistency, trained on a diverse dataset, can reduce these hallucinations. Based on this, we propose DynamicDPS, a diffusion-based framework that integrates conditional and unconditional diffusion models to enhance low-quality medical images while systematically reducing hallucinations. Our approach first generates an initial reconstruction using a conditional model, then refines it with an adaptive diffusion-based inverse problem solver. DynamicDPS skips early stage in the reverse process by selecting an optimal starting time point per sample and applies Wolfe's line search for adaptive step sizes, improving both efficiency and image fidelity. Using diffusion priors and data consistency, our method effectively reduces hallucinations from any conditional model output. We validate its effectiveness in Image Quality Transfer for low-field MRI enhancement. Extensive evaluations on synthetic and real MR scans, including a downstream task for tissue volume estimation, show that DynamicDPS reduces hallucinations, improving relative volume estimation by over 15% for critical tissues while using only 5% of the sampling steps required by baseline diffusion models. As a model-agnostic and fine-tuning-free approach, DynamicDPS offers a robust solution for hallucination reduction in medical imaging. The code will be made publicly available upon publication.
△ Less
Submitted 2 March, 2025;
originally announced March 2025.
-
The Multicultural Medical Assistant: Can LLMs Improve Medical ASR Errors Across Borders?
Authors:
Ayo Adedeji,
Mardhiyah Sanni,
Emmanuel Ayodele,
Sarita Joshi,
Tobi Olatunji
Abstract:
The global adoption of Large Language Models (LLMs) in healthcare shows promise to enhance clinical workflows and improve patient outcomes. However, Automatic Speech Recognition (ASR) errors in critical medical terms remain a significant challenge. These errors can compromise patient care and safety if not detected. This study investigates the prevalence and impact of ASR errors in medical transcr…
▽ More
The global adoption of Large Language Models (LLMs) in healthcare shows promise to enhance clinical workflows and improve patient outcomes. However, Automatic Speech Recognition (ASR) errors in critical medical terms remain a significant challenge. These errors can compromise patient care and safety if not detected. This study investigates the prevalence and impact of ASR errors in medical transcription in Nigeria, the United Kingdom, and the United States. By evaluating raw and LLM-corrected transcriptions of accented English in these regions, we assess the potential and limitations of LLMs to address challenges related to accents and medical terminology in ASR. Our findings highlight significant disparities in ASR accuracy across regions and identify specific conditions under which LLM corrections are most effective.
△ Less
Submitted 25 January, 2025;
originally announced January 2025.
-
Uncovering Memorization Effect in the Presence of Spurious Correlations
Authors:
Chenyu You,
Haocheng Dai,
Yifei Min,
Jasjeet S. Sekhon,
Sarang Joshi,
James S. Duncan
Abstract:
Machine learning models often rely on simple spurious features -- patterns in training data that correlate with targets but are not causally related to them, like image backgrounds in foreground classification. This reliance typically leads to imbalanced test performance across minority and majority groups. In this work, we take a closer look at the fundamental cause of such imbalanced performance…
▽ More
Machine learning models often rely on simple spurious features -- patterns in training data that correlate with targets but are not causally related to them, like image backgrounds in foreground classification. This reliance typically leads to imbalanced test performance across minority and majority groups. In this work, we take a closer look at the fundamental cause of such imbalanced performance through the lens of memorization, which refers to the ability to predict accurately on atypical examples (minority groups) in the training set but failing in achieving the same accuracy in the testing set. This paper systematically shows the ubiquitous existence of spurious features in a small set of neurons within the network, providing the first-ever evidence that memorization may contribute to imbalanced group performance. Through three experimental sources of converging empirical evidence, we find the property of a small subset of neurons or channels in memorizing minority group information. Inspired by these findings, we hypothesize that spurious memorization, concentrated within a small subset of neurons, plays a key role in driving imbalanced group performance. To further substantiate this hypothesis, we show that eliminating these unnecessary spurious memorization patterns via a novel framework during training can significantly affect the model performance on minority groups. Our experimental results across various architectures and benchmarks offer new insights on how neural networks encode core and spurious knowledge, laying the groundwork for future research in demystifying robustness to spurious correlation.
△ Less
Submitted 4 June, 2025; v1 submitted 1 January, 2025;
originally announced January 2025.
-
Prognostic Framework for Robotic Manipulators Operating Under Dynamic Task Severities
Authors:
Ayush Mohanty,
Jason Dekarske,
Stephen K. Robinson,
Sanjay Joshi,
Nagi Gebraeel
Abstract:
Robotic manipulators are critical in many applications but are known to degrade over time. This degradation is influenced by the nature of the tasks performed by the robot. Tasks with higher severity, such as handling heavy payloads, can accelerate the degradation process. One way this degradation is reflected is in the position accuracy of the robot's end-effector. In this paper, we present a pro…
▽ More
Robotic manipulators are critical in many applications but are known to degrade over time. This degradation is influenced by the nature of the tasks performed by the robot. Tasks with higher severity, such as handling heavy payloads, can accelerate the degradation process. One way this degradation is reflected is in the position accuracy of the robot's end-effector. In this paper, we present a prognostic modeling framework that predicts a robotic manipulator's Remaining Useful Life (RUL) while accounting for the effects of task severity. Our framework represents the robot's position accuracy as a Brownian motion process with a random drift parameter that is influenced by task severity. The dynamic nature of task severity is modeled using a continuous-time Markov chain (CTMC). To evaluate RUL, we discuss two approaches -- (1) a novel closed-form expression for Remaining Lifetime Distribution (RLD), and (2) Monte Carlo simulations, commonly used in prognostics literature. Theoretical results establish the equivalence between these RUL computation approaches. We validate our framework through experiments using two distinct physics-based simulators for planar and spatial robot fleets. Our findings show that robots in both fleets experience shorter RUL when handling a higher proportion of high-severity tasks.
△ Less
Submitted 30 November, 2024;
originally announced December 2024.
-
Simulating Dynamic Tumor Contrast Enhancement in Breast MRI using Conditional Generative Adversarial Networks
Authors:
Richard Osuala,
Smriti Joshi,
Apostolia Tsirikoglou,
Lidia Garrucho,
Walter H. L. Pinaya,
Daniel M. Lang,
Julia A. Schnabel,
Oliver Diaz,
Karim Lekadir
Abstract:
This paper presents a method for virtual contrast enhancement in breast MRI, offering a promising non-invasive alternative to traditional contrast agent-based DCE-MRI acquisition. Using a conditional generative adversarial network, we predict DCE-MRI images, including jointly-generated sequences of multiple corresponding DCE-MRI timepoints, from non-contrast-enhanced MRIs, enabling tumor localizat…
▽ More
This paper presents a method for virtual contrast enhancement in breast MRI, offering a promising non-invasive alternative to traditional contrast agent-based DCE-MRI acquisition. Using a conditional generative adversarial network, we predict DCE-MRI images, including jointly-generated sequences of multiple corresponding DCE-MRI timepoints, from non-contrast-enhanced MRIs, enabling tumor localization and characterization without the associated health risks. Furthermore, we qualitatively and quantitatively evaluate the synthetic DCE-MRI images, proposing a multi-metric Scaled Aggregate Measure (SAMe), assessing their utility in a tumor segmentation downstream task, and conclude with an analysis of the temporal patterns in multi-sequence DCE-MRI generation. Our approach demonstrates promising results in generating realistic and useful DCE-MRI sequences, highlighting the potential of virtual contrast enhancement for improving breast cancer diagnosis and treatment, particularly for patients where contrast agent administration is contraindicated.
△ Less
Submitted 14 May, 2025; v1 submitted 27 September, 2024;
originally announced September 2024.
-
Clean Label Attacks against SLU Systems
Authors:
Henry Li Xinyuan,
Sonal Joshi,
Thomas Thebaud,
Jesus Villalba,
Najim Dehak,
Sanjeev Khudanpur
Abstract:
Poisoning backdoor attacks involve an adversary manipulating the training data to induce certain behaviors in the victim model by inserting a trigger in the signal at inference time. We adapted clean label backdoor (CLBD)-data poisoning attacks, which do not modify the training labels, on state-of-the-art speech recognition models that support/perform a Spoken Language Understanding task, achievin…
▽ More
Poisoning backdoor attacks involve an adversary manipulating the training data to induce certain behaviors in the victim model by inserting a trigger in the signal at inference time. We adapted clean label backdoor (CLBD)-data poisoning attacks, which do not modify the training labels, on state-of-the-art speech recognition models that support/perform a Spoken Language Understanding task, achieving 99.8% attack success rate by poisoning 10% of the training data. We analyzed how varying the signal-strength of the poison, percent of samples poisoned, and choice of trigger impact the attack. We also found that CLBD attacks are most successful when applied to training samples that are inherently hard for a proxy model. Using this strategy, we achieved an attack success rate of 99.3% by poisoning a meager 1.5% of the training data. Finally, we applied two previously developed defenses against gradient-based attacks, and found that they attain mixed success against poisoning.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
FDWST: Fingerphoto Deblurring using Wavelet Style Transfer
Authors:
David Keaton,
Amol S. Joshi,
Jeremy Dawson,
Nasser M. Nasrabadi
Abstract:
The challenge of deblurring fingerphoto images, or generating a sharp fingerphoto from a given blurry one, is a significant problem in the realm of computer vision. To address this problem, we propose a fingerphoto deblurring architecture referred to as Fingerphoto Deblurring using Wavelet Style Transfer (FDWST), which aims to utilize the information transmission of Style Transfer techniques to de…
▽ More
The challenge of deblurring fingerphoto images, or generating a sharp fingerphoto from a given blurry one, is a significant problem in the realm of computer vision. To address this problem, we propose a fingerphoto deblurring architecture referred to as Fingerphoto Deblurring using Wavelet Style Transfer (FDWST), which aims to utilize the information transmission of Style Transfer techniques to deblur fingerphotos. Additionally, we incorporate the Discrete Wavelet Transform (DWT) for its ability to split images into different frequency bands. By combining these two techniques, we can perform Style Transfer over a wide array of wavelet frequency bands, thereby increasing the quality and variety of sharpness information transferred from sharp to blurry images. Using this technique, our model was able to drastically increase the quality of the generated fingerphotos compared to their originals, and achieve a peak matching accuracy of 0.9907 when tasked with matching a deblurred fingerphoto to its sharp counterpart, outperforming multiple other state-of-the-art deblurring and style transfer techniques.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
Towards Learning Contrast Kinetics with Multi-Condition Latent Diffusion Models
Authors:
Richard Osuala,
Daniel M. Lang,
Preeti Verma,
Smriti Joshi,
Apostolia Tsirikoglou,
Grzegorz Skorupko,
Kaisar Kushibar,
Lidia Garrucho,
Walter H. L. Pinaya,
Oliver Diaz,
Julia A. Schnabel,
Karim Lekadir
Abstract:
Contrast agents in dynamic contrast enhanced magnetic resonance imaging allow to localize tumors and observe their contrast kinetics, which is essential for cancer characterization and respective treatment decision-making. However, contrast agent administration is not only associated with adverse health risks, but also restricted for patients during pregnancy, and for those with kidney malfunction…
▽ More
Contrast agents in dynamic contrast enhanced magnetic resonance imaging allow to localize tumors and observe their contrast kinetics, which is essential for cancer characterization and respective treatment decision-making. However, contrast agent administration is not only associated with adverse health risks, but also restricted for patients during pregnancy, and for those with kidney malfunction, or other adverse reactions. With contrast uptake as key biomarker for lesion malignancy, cancer recurrence risk, and treatment response, it becomes pivotal to reduce the dependency on intravenous contrast agent administration. To this end, we propose a multi-conditional latent diffusion model capable of acquisition time-conditioned image synthesis of DCE-MRI temporal sequences. To evaluate medical image synthesis, we additionally propose and validate the Fréchet radiomics distance as an image quality measure based on biomarker variability between synthetic and real imaging data. Our results demonstrate our method's ability to generate realistic multi-sequence fat-saturated breast DCE-MRI and uncover the emerging potential of deep learning based contrast kinetics simulation. We publicly share our accessible codebase at https://github.com/RichardObi/ccnet and provide a user-friendly library for Fréchet radiomics distance calculation at https://pypi.org/project/frd-score.
△ Less
Submitted 17 July, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
Unraveling Adversarial Examples against Speaker Identification -- Techniques for Attack Detection and Victim Model Classification
Authors:
Sonal Joshi,
Thomas Thebaud,
Jesús Villalba,
Najim Dehak
Abstract:
Adversarial examples have proven to threaten speaker identification systems, and several countermeasures against them have been proposed. In this paper, we propose a method to detect the presence of adversarial examples, i.e., a binary classifier distinguishing between benign and adversarial examples. We build upon and extend previous work on attack type classification by exploring new architectur…
▽ More
Adversarial examples have proven to threaten speaker identification systems, and several countermeasures against them have been proposed. In this paper, we propose a method to detect the presence of adversarial examples, i.e., a binary classifier distinguishing between benign and adversarial examples. We build upon and extend previous work on attack type classification by exploring new architectures. Additionally, we introduce a method for identifying the victim model on which the adversarial attack is carried out. To achieve this, we generate a new dataset containing multiple attacks performed against various victim models. We achieve an AUC of 0.982 for attack detection, with no more than a 0.03 drop in performance for unknown attacks. Our attack classification accuracy (excluding benign) reaches 86.48% across eight attack types using our LightResNet34 architecture, while our victim model classification accuracy reaches 72.28% across four victim models.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
The Sound of Healthcare: Improving Medical Transcription ASR Accuracy with Large Language Models
Authors:
Ayo Adedeji,
Sarita Joshi,
Brendan Doohan
Abstract:
In the rapidly evolving landscape of medical documentation, transcribing clinical dialogues accurately is increasingly paramount. This study explores the potential of Large Language Models (LLMs) to enhance the accuracy of Automatic Speech Recognition (ASR) systems in medical transcription. Utilizing the PriMock57 dataset, which encompasses a diverse range of primary care consultations, we apply a…
▽ More
In the rapidly evolving landscape of medical documentation, transcribing clinical dialogues accurately is increasingly paramount. This study explores the potential of Large Language Models (LLMs) to enhance the accuracy of Automatic Speech Recognition (ASR) systems in medical transcription. Utilizing the PriMock57 dataset, which encompasses a diverse range of primary care consultations, we apply advanced LLMs to refine ASR-generated transcripts. Our research is multifaceted, focusing on improvements in general Word Error Rate (WER), Medical Concept WER (MC-WER) for the accurate transcription of essential medical terms, and speaker diarization accuracy. Additionally, we assess the role of LLM post-processing in improving semantic textual similarity, thereby preserving the contextual integrity of clinical dialogues. Through a series of experiments, we compare the efficacy of zero-shot and Chain-of-Thought (CoT) prompting techniques in enhancing diarization and correction accuracy. Our findings demonstrate that LLMs, particularly through CoT prompting, not only improve the diarization accuracy of existing ASR systems but also achieve state-of-the-art performance in this domain. This improvement extends to more accurately capturing medical concepts and enhancing the overall semantic coherence of the transcribed dialogues. These findings illustrate the dual role of LLMs in augmenting ASR outputs and independently excelling in transcription tasks, holding significant promise for transforming medical ASR systems and leading to more accurate and reliable patient records in healthcare settings.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Pre- to Post-Contrast Breast MRI Synthesis for Enhanced Tumour Segmentation
Authors:
Richard Osuala,
Smriti Joshi,
Apostolia Tsirikoglou,
Lidia Garrucho,
Walter H. L. Pinaya,
Oliver Diaz,
Karim Lekadir
Abstract:
Despite its benefits for tumour detection and treatment, the administration of contrast agents in dynamic contrast-enhanced MRI (DCE-MRI) is associated with a range of issues, including their invasiveness, bioaccumulation, and a risk of nephrogenic systemic fibrosis. This study explores the feasibility of producing synthetic contrast enhancements by translating pre-contrast T1-weighted fat-saturat…
▽ More
Despite its benefits for tumour detection and treatment, the administration of contrast agents in dynamic contrast-enhanced MRI (DCE-MRI) is associated with a range of issues, including their invasiveness, bioaccumulation, and a risk of nephrogenic systemic fibrosis. This study explores the feasibility of producing synthetic contrast enhancements by translating pre-contrast T1-weighted fat-saturated breast MRI to their corresponding first DCE-MRI sequence leveraging the capabilities of a generative adversarial network (GAN). Additionally, we introduce a Scaled Aggregate Measure (SAMe) designed for quantitatively evaluating the quality of synthetic data in a principled manner and serving as a basis for selecting the optimal generative model. We assess the generated DCE-MRI data using quantitative image quality metrics and apply them to the downstream task of 3D breast tumour segmentation. Our results highlight the potential of post-contrast DCE-MRI synthesis in enhancing the robustness of breast tumour segmentation models via data augmentation. Our code is available at https://github.com/RichardObi/pre_post_synthesis.
△ Less
Submitted 31 May, 2024; v1 submitted 17 November, 2023;
originally announced November 2023.
-
Svarah: Evaluating English ASR Systems on Indian Accents
Authors:
Tahir Javed,
Sakshi Joshi,
Vignesh Nagarajan,
Sai Sundaresan,
Janki Nawale,
Abhigyan Raman,
Kaushal Bhogale,
Pratyush Kumar,
Mitesh M. Khapra
Abstract:
India is the second largest English-speaking country in the world with a speaker base of roughly 130 million. Thus, it is imperative that automatic speech recognition (ASR) systems for English should be evaluated on Indian accents. Unfortunately, Indian speakers find a very poor representation in existing English ASR benchmarks such as LibriSpeech, Switchboard, Speech Accent Archive, etc. In this…
▽ More
India is the second largest English-speaking country in the world with a speaker base of roughly 130 million. Thus, it is imperative that automatic speech recognition (ASR) systems for English should be evaluated on Indian accents. Unfortunately, Indian speakers find a very poor representation in existing English ASR benchmarks such as LibriSpeech, Switchboard, Speech Accent Archive, etc. In this work, we address this gap by creating Svarah, a benchmark that contains 9.6 hours of transcribed English audio from 117 speakers across 65 geographic locations throughout India, resulting in a diverse range of accents. Svarah comprises both read speech and spontaneous conversational data, covering various domains, such as history, culture, tourism, etc., ensuring a diverse vocabulary. We evaluate 6 open source ASR models and 2 commercial ASR systems on Svarah and show that there is clear scope for improvement on Indian accents. Svarah as well as all our code will be publicly available.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Brain Tumor Detection using Swin Transformers
Authors:
Prateek A. Meshram,
Suraj Joshi,
Devarshi Mahajan
Abstract:
The first MRI scan was done in the year 1978 by researchers at EML Laboratories. As per an estimate, approximately 251,329 people died due to primary cancerous brain and CNS (Central Nervous System) Tumors in the year 2020. It has been recommended by various medical professionals that brain tumor detection at an early stage would help in saving many lives. Whenever radiologists deal with a brain M…
▽ More
The first MRI scan was done in the year 1978 by researchers at EML Laboratories. As per an estimate, approximately 251,329 people died due to primary cancerous brain and CNS (Central Nervous System) Tumors in the year 2020. It has been recommended by various medical professionals that brain tumor detection at an early stage would help in saving many lives. Whenever radiologists deal with a brain MRI they try to diagnose it with the histological subtype which is quite subjective and here comes the major issue. Upon that, in developing countries like India, where there is 1 doctor for every 1151 people, the need for efficient diagnosis to help radiologists and doctors come into picture. In our approach, we aim to solve the problem using swin transformers and deep learning to detect, classify, locate and provide the size of the tumor in the particular MRI scan which would assist the doctors and radiologists in increasing their efficiency. At the end, the medics would be able to download the predictions and measures in a PDF (Portable Document Format). Keywords: brain tumor, transformers, classification, medical, deep learning, detection
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
Neural Operator Learning for Ultrasound Tomography Inversion
Authors:
Haocheng Dai,
Michael Penwarden,
Robert M. Kirby,
Sarang Joshi
Abstract:
Neural operator learning as a means of mapping between complex function spaces has garnered significant attention in the field of computational science and engineering (CS&E). In this paper, we apply Neural operator learning to the time-of-flight ultrasound computed tomography (USCT) problem. We learn the mapping between time-of-flight (TOF) data and the heterogeneous sound speed field using a ful…
▽ More
Neural operator learning as a means of mapping between complex function spaces has garnered significant attention in the field of computational science and engineering (CS&E). In this paper, we apply Neural operator learning to the time-of-flight ultrasound computed tomography (USCT) problem. We learn the mapping between time-of-flight (TOF) data and the heterogeneous sound speed field using a full-wave solver to generate the training data. This novel application of operator learning circumnavigates the need to solve the computationally intensive iterative inverse problem. The operator learns the non-linear mapping offline and predicts the heterogeneous sound field with a single forward pass through the model. This is the first time operator learning has been used for ultrasound tomography and is the first step in potential real-time predictions of soft tissue distribution for tumor identification in beast imaging.
△ Less
Submitted 28 May, 2023; v1 submitted 6 April, 2023;
originally announced April 2023.
-
Scaling up Superconducting Quantum Computers with Cryogenic RF-photonics
Authors:
Sanskriti Joshi,
Sajjad Moazeni
Abstract:
Today's hundred-qubit quantum computers require a dramatic scale up to millions of qubits to become practical for solving real-world problems. Although a variety of qubit technologies have been demonstrated, scalability remains a major hurdle. Superconducting (SC) qubits are one of the most mature and promising technologies to overcome this challenge. However, these qubits reside in a millikelvin…
▽ More
Today's hundred-qubit quantum computers require a dramatic scale up to millions of qubits to become practical for solving real-world problems. Although a variety of qubit technologies have been demonstrated, scalability remains a major hurdle. Superconducting (SC) qubits are one of the most mature and promising technologies to overcome this challenge. However, these qubits reside in a millikelvin cryogenic dilution fridge, isolating them from thermal and electrical noise. They are controlled by a rack-full of external electronics through extremely complex wiring and cables. Although thousands of qubits can be fabricated on a single chip and cooled down to millikelvin temperatures, scaling up the control and readout electronics remains an elusive goal. This is mainly due to the limited available cooling power in cryogenic systems constraining the wiring capacity and cabling heat load management.
In this paper, we focus on scaling up the number of XY-control lines by using cryogenic RF-photonic links. This is one of the major roadblocks to build a thousand qubit superconducting QC. We will first review and study the challenges of state-of-the-art proposed approaches, including cryogenic CMOS and deep-cryogenic photonic methods, to scale up the control interface for SC qubit systems. We will discuss their limitations due to the active power dissipation and passive heat leakage in detail. By analytically modeling the noise sources and thermal budget limits, we will show that our solution can achieve a scale up to a thousand of qubits. Our proposed method can be seamlessly implemented using advanced silicon photonic processes, and the number of required optical fibers can be further reduced by using wavelength division multiplexing (WDM).
△ Less
Submitted 27 October, 2022;
originally announced October 2022.
-
medigan: a Python library of pretrained generative models for medical image synthesis
Authors:
Richard Osuala,
Grzegorz Skorupko,
Noussair Lazrak,
Lidia Garrucho,
Eloy García,
Smriti Joshi,
Socayna Jouide,
Michael Rutherford,
Fred Prior,
Kaisar Kushibar,
Oliver Diaz,
Karim Lekadir
Abstract:
Synthetic data generated by generative models can enhance the performance and capabilities of data-hungry deep learning models in medical imaging. However, there is (1) limited availability of (synthetic) datasets and (2) generative models are complex to train, which hinders their adoption in research and clinical applications. To reduce this entry barrier, we propose medigan, a one-stop shop for…
▽ More
Synthetic data generated by generative models can enhance the performance and capabilities of data-hungry deep learning models in medical imaging. However, there is (1) limited availability of (synthetic) datasets and (2) generative models are complex to train, which hinders their adoption in research and clinical applications. To reduce this entry barrier, we propose medigan, a one-stop shop for pretrained generative models implemented as an open-source framework-agnostic Python library. medigan allows researchers and developers to create, increase, and domain-adapt their training data in just a few lines of code. Guided by design decisions based on gathered end-user requirements, we implement medigan based on modular components for generative model (i) execution, (ii) visualisation, (iii) search & ranking, and (iv) contribution. The library's scalability and design is demonstrated by its growing number of integrated and readily-usable pretrained generative models consisting of 21 models utilising 9 different Generative Adversarial Network architectures trained on 11 datasets from 4 domains, namely, mammography, endoscopy, x-ray, and MRI. Furthermore, 3 applications of medigan are analysed in this work, which include (a) enabling community-wide sharing of restricted data, (b) investigating generative model evaluation metrics, and (c) improving clinical downstream tasks. In (b), extending on common medical image synthesis assessment and reporting standards, we show Fréchet Inception Distance variability based on image normalisation and radiology-specific feature extraction.
△ Less
Submitted 23 February, 2023; v1 submitted 28 September, 2022;
originally announced September 2022.
-
StreamNet: A WAE for White Matter Streamline Analysis
Authors:
Andrew Lizarraga,
Katherine L. Narr,
Kirsten A. Donald,
Shantanu H. Joshi
Abstract:
We present StreamNet, an autoencoder architecture for the analysis of the highly heterogeneous geometry of large collections of white matter streamlines. This proposed framework takes advantage of geometry-preserving properties of the Wasserstein-1 metric in order to achieve direct encoding and reconstruction of entire bundles of streamlines. We show that the model not only accurately captures the…
▽ More
We present StreamNet, an autoencoder architecture for the analysis of the highly heterogeneous geometry of large collections of white matter streamlines. This proposed framework takes advantage of geometry-preserving properties of the Wasserstein-1 metric in order to achieve direct encoding and reconstruction of entire bundles of streamlines. We show that the model not only accurately captures the distributive structures of streamlines in the population, but is also able to achieve superior reconstruction performance between real and synthetic streamlines. Experimental model performance is evaluated on white matter streamlines resulting from T1-weighted diffusion imaging of 40 healthy controls using recent state of the art bundle comparison metric that measures fiber-shape similarities.
△ Less
Submitted 19 October, 2022; v1 submitted 3 September, 2022;
originally announced September 2022.
-
A Python-based Mixed Discrete-Continuous Simulation Framework for Digital Twins
Authors:
Neha Karanjkar,
Subodh M. Joshi
Abstract:
The use of Digital Twins is set to transform the manufacturing sector by aiding monitoring and real-time decision making. For several applications in this sector, the system to be modeled consists of a mix of discrete-event and continuous processes interacting with each other. Building simulation-based Digital Twins of such systems necessitates an open, flexible simulation framework which can supp…
▽ More
The use of Digital Twins is set to transform the manufacturing sector by aiding monitoring and real-time decision making. For several applications in this sector, the system to be modeled consists of a mix of discrete-event and continuous processes interacting with each other. Building simulation-based Digital Twins of such systems necessitates an open, flexible simulation framework which can support easy modeling and fast simulation of both continuous and discrete-event components, and their interactions. In this paper, we present an outline and key design aspects of a Python-based framework for performing mixed discrete-continuous simulations. The continuous processes in the system are assumed to be loosely coupled to other components via pre-defined events. For example, a continuous state variable crossing a threshold may trigger an external event. Similarly, external events may lead to a sudden change in the trajectory, state value or boundary conditions in a continuous process. We first present a systematic events-based interface using which such interactions can be modeled and simulated. We then discuss implementation details of the framework along with a detailed example. In our implementation, the advancement of time is controlled and performed using the event-stepped engine of SimPy (a popular discrete-event simulation library in Python). The continuous processes are modelled using existing frameworks with a Python wrapper providing the events interface. We discuss possible improvements to the time advancement scheme, a roadmap and use cases for the framework.
△ Less
Submitted 31 July, 2022;
originally announced August 2022.
-
Human Emotion Classification based on EEG Signals Using Recurrent Neural Network And KNN
Authors:
Shashank Joshi,
Falak Joshi
Abstract:
In human contact, emotion is very crucial. Attributes like words, voice intonation, facial expressions, and kinesics can all be used to portray one's feelings. However, brain-computer interface (BCI) devices have not yet reached the level required for emotion interpretation. With the rapid development of machine learning algorithms, dry electrode techniques, and different real-world applications o…
▽ More
In human contact, emotion is very crucial. Attributes like words, voice intonation, facial expressions, and kinesics can all be used to portray one's feelings. However, brain-computer interface (BCI) devices have not yet reached the level required for emotion interpretation. With the rapid development of machine learning algorithms, dry electrode techniques, and different real-world applications of the brain-computer interface for normal individuals, emotion categorization from EEG data has recently gotten a lot of attention. Electroencephalogram (EEG) signals are a critical resource for these systems. The primary benefit of employing EEG signals is that they reflect true emotion and are easily resolved by computer systems. In this work, EEG signals associated with good, neutral, and negative emotions were identified using channel selection preprocessing. However, researchers had a limited grasp of the specifics of the link between various emotional states until now. To identify EEG signals, we used discrete wavelet transform and machine learning techniques such as recurrent neural network (RNN) and k-nearest neighbor (kNN) algorithm. Initially, the classifier methods were utilized for channel selection. As a result, final feature vectors were created by integrating the features of EEG segments from these channels. Using the RNN and kNN algorithms, the final feature vectors with connected positive, neutral, and negative emotions were categorized independently. The classification performance of both techniques is computed and compared. Using RNN and kNN, the average overall accuracies were 94.844 % and 93.438 %, respectively.
△ Less
Submitted 10 May, 2022;
originally announced May 2022.
-
Defense against Adversarial Attacks on Hybrid Speech Recognition using Joint Adversarial Fine-tuning with Denoiser
Authors:
Sonal Joshi,
Saurabh Kataria,
Yiwen Shao,
Piotr Zelasko,
Jesus Villalba,
Sanjeev Khudanpur,
Najim Dehak
Abstract:
Adversarial attacks are a threat to automatic speech recognition (ASR) systems, and it becomes imperative to propose defenses to protect them. In this paper, we perform experiments to show that K2 conformer hybrid ASR is strongly affected by white-box adversarial attacks. We propose three defenses--denoiser pre-processor, adversarially fine-tuning ASR model, and adversarially fine-tuning joint mod…
▽ More
Adversarial attacks are a threat to automatic speech recognition (ASR) systems, and it becomes imperative to propose defenses to protect them. In this paper, we perform experiments to show that K2 conformer hybrid ASR is strongly affected by white-box adversarial attacks. We propose three defenses--denoiser pre-processor, adversarially fine-tuning ASR model, and adversarially fine-tuning joint model of ASR and denoiser. Our evaluation shows denoiser pre-processor (trained on offline adversarial examples) fails to defend against adaptive white-box attacks. However, adversarially fine-tuning the denoiser using a tandem model of denoiser and ASR offers more robustness. We evaluate two variants of this defense--one updating parameters of both models and the second keeping ASR frozen. The joint model offers a mean absolute decrease of 19.3\% ground truth (GT) WER with reference to baseline against fast gradient sign method (FGSM) attacks with different $L_\infty$ norms. The joint model with frozen ASR parameters gives the best defense against projected gradient descent (PGD) with 7 iterations, yielding a mean absolute increase of 22.3\% GT WER with reference to baseline; and against PGD with 500 iterations, yielding a mean absolute decrease of 45.08\% GT WER and an increase of 68.05\% adversarial target WER.
△ Less
Submitted 8 April, 2022;
originally announced April 2022.
-
AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification
Authors:
Sonal Joshi,
Saurabh Kataria,
Jesus Villalba,
Najim Dehak
Abstract:
Adversarial attacks pose a severe security threat to the state-of-the-art speaker identification systems, thereby making it vital to propose countermeasures against them. Building on our previous work that used representation learning to classify and detect adversarial attacks, we propose an improvement to it using AdvEst, a method to estimate adversarial perturbation. First, we prove our claim th…
▽ More
Adversarial attacks pose a severe security threat to the state-of-the-art speaker identification systems, thereby making it vital to propose countermeasures against them. Building on our previous work that used representation learning to classify and detect adversarial attacks, we propose an improvement to it using AdvEst, a method to estimate adversarial perturbation. First, we prove our claim that training the representation learning network using adversarial perturbations as opposed to adversarial examples (consisting of the combination of clean signal and adversarial perturbation) is beneficial because it eliminates nuisance information. At inference time, we use a time-domain denoiser to estimate the adversarial perturbations from adversarial examples. Using our improved representation learning approach to obtain attack embeddings (signatures), we evaluate their performance for three applications: known attack classification, attack verification, and unknown attack detection. We show that common attacks in the literature (Fast Gradient Sign Method (FGSM), Projected Gradient Descent (PGD), Carlini-Wagner (CW) with different Lp threat models) can be classified with an accuracy of ~96%. We also detect unknown attacks with an equal error rate (EER) of ~9%, which is absolute improvement of ~12% from our previous work.
△ Less
Submitted 8 April, 2022;
originally announced April 2022.
-
Modeling the Shape of the Brain Connectome via Deep Neural Networks
Authors:
Haocheng Dai,
Martin Bauer,
P. Thomas Fletcher,
Sarang Joshi
Abstract:
The goal of diffusion-weighted magnetic resonance imaging (DWI) is to infer the structural connectivity of an individual subject's brain in vivo. To statistically study the variability and differences between normal and abnormal brain connectomes, a mathematical model of the neural connections is required. In this paper, we represent the brain connectome as a Riemannian manifold, which allows us t…
▽ More
The goal of diffusion-weighted magnetic resonance imaging (DWI) is to infer the structural connectivity of an individual subject's brain in vivo. To statistically study the variability and differences between normal and abnormal brain connectomes, a mathematical model of the neural connections is required. In this paper, we represent the brain connectome as a Riemannian manifold, which allows us to model neural connections as geodesics. This leads to the challenging problem of estimating a Riemannian metric that is compatible with the DWI data, i.e., a metric such that the geodesic curves represent individual fiber tracts of the connectomics. We reduce this problem to that of solving a highly nonlinear set of partial differential equations (PDEs) and study the applicability of convolutional encoder-decoder neural networks (CEDNNs) for solving this geometrically motivated PDE. Our method achieves excellent performance in the alignment of geodesics with white matter pathways and tackles a long-standing issue in previous geodesic tractography methods: the inability to recover crossing fibers with high fidelity.
△ Less
Submitted 3 March, 2023; v1 submitted 6 March, 2022;
originally announced March 2022.
-
CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwannoma and Cochlea Segmentation
Authors:
Reuben Dorent,
Aaron Kujawa,
Marina Ivory,
Spyridon Bakas,
Nicola Rieke,
Samuel Joutard,
Ben Glocker,
Jorge Cardoso,
Marc Modat,
Kayhan Batmanghelich,
Arseniy Belkov,
Maria Baldeon Calisto,
Jae Won Choi,
Benoit M. Dawant,
Hexin Dong,
Sergio Escalera,
Yubo Fan,
Lasse Hansen,
Mattias P. Heinrich,
Smriti Joshi,
Victoriya Kashtanova,
Hyeon Gyu Kim,
Satoshi Kondo,
Christian N. Kruse,
Susana K. Lai-Yuen
, et al. (15 additional authors not shown)
Abstract:
Domain Adaptation (DA) has recently raised strong interests in the medical imaging community. While a large variety of DA techniques has been proposed for image segmentation, most of these techniques have been validated either on private datasets or on small publicly available datasets. Moreover, these datasets mostly addressed single-class problems. To tackle these limitations, the Cross-Modality…
▽ More
Domain Adaptation (DA) has recently raised strong interests in the medical imaging community. While a large variety of DA techniques has been proposed for image segmentation, most of these techniques have been validated either on private datasets or on small publicly available datasets. Moreover, these datasets mostly addressed single-class problems. To tackle these limitations, the Cross-Modality Domain Adaptation (crossMoDA) challenge was organised in conjunction with the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021). CrossMoDA is the first large and multi-class benchmark for unsupervised cross-modality DA. The challenge's goal is to segment two key brain structures involved in the follow-up and treatment planning of vestibular schwannoma (VS): the VS and the cochleas. Currently, the diagnosis and surveillance in patients with VS are performed using contrast-enhanced T1 (ceT1) MRI. However, there is growing interest in using non-contrast sequences such as high-resolution T2 (hrT2) MRI. Therefore, we created an unsupervised cross-modality segmentation benchmark. The training set provides annotated ceT1 (N=105) and unpaired non-annotated hrT2 (N=105). The aim was to automatically perform unilateral VS and bilateral cochlea segmentation on hrT2 as provided in the testing set (N=137). A total of 16 teams submitted their algorithm for the evaluation phase. The level of performance reached by the top-performing teams is strikingly high (best median Dice - VS:88.4%; Cochleas:85.7%) and close to full supervision (median Dice - VS:92.5%; Cochleas:87.7%). All top-performing methods made use of an image-to-image translation approach to transform the source-domain images into pseudo-target-domain images. A segmentation network was then trained using these generated images and the manual annotations provided for the source image.
△ Less
Submitted 14 December, 2022; v1 submitted 8 January, 2022;
originally announced January 2022.
-
Latency-Aware Multi-antenna SWIPT System with Battery-Constrained Receivers
Authors:
Dileep Kumar,
Onel L. Alcaraz López,
Satya Krishna Joshi,
Antti Tölli
Abstract:
Power splitting (PS) based simultaneous wireless information and power transfer (SWIPT) is considered in a multi-user multiple-input-single-output broadcast scenario. Specifically, we focus on jointly configuring the transmit beamforming vectors and receive PS ratios to minimize the total transmit energy of the base station under the user-specific latency and energy harvesting (EH) requirements. T…
▽ More
Power splitting (PS) based simultaneous wireless information and power transfer (SWIPT) is considered in a multi-user multiple-input-single-output broadcast scenario. Specifically, we focus on jointly configuring the transmit beamforming vectors and receive PS ratios to minimize the total transmit energy of the base station under the user-specific latency and energy harvesting (EH) requirements. The battery depletion phenomenon is avoided by preemptively incorporating information regarding the receivers' battery state and EH fluctuations into the resource allocation design. The resulting time-average sum-power minimization problem is temporally correlated, non-convex (including mutually coupled latency-battery queue dynamics), and in general intractable. We use the Lyapunov optimization framework and derive a dynamic control algorithm to transform the original problem into a sequence of deterministic and independent subproblems, which are then solved via two alternative approaches: i) semidefinite relaxation combined with fractional programming, and ii) successive convex approximation. Furthermore, we design a low-complexity closed-form iterative algorithm exploiting the Karush-Kuhn-Tucker optimality conditions for a specific scenario with delay bounded batteryless receivers. Numerical results provide insights on the robustness of the proposed design to realize an energy-efficient SWIPT system while ensuring latency and EH requirements in a time dynamic mobile access network.
△ Less
Submitted 22 October, 2022; v1 submitted 10 December, 2021;
originally announced December 2021.
-
ML-Based Analysis to Identify Speech Features Relevant in Predicting Alzheimer's Disease
Authors:
Yash Kumar,
Piyush Maheshwari,
Shreyansh Joshi,
Veeky Baths
Abstract:
Alzheimer's disease (AD) is a neurodegenerative disease that affects nearly 50 million individuals across the globe and is one of the leading causes of deaths globally. It is projected that by 2050, the number of people affected by the disease would more than double. Consequently, the growing advancements in technology beg the question, can technology be used to predict Alzheimer's for a better an…
▽ More
Alzheimer's disease (AD) is a neurodegenerative disease that affects nearly 50 million individuals across the globe and is one of the leading causes of deaths globally. It is projected that by 2050, the number of people affected by the disease would more than double. Consequently, the growing advancements in technology beg the question, can technology be used to predict Alzheimer's for a better and early diagnosis? In this paper, we focus on this very problem. Specifically, we have trained both ML models and neural networks to predict and classify participants based on their speech patterns. We computed a number of linguistic variables using DementiaBank's Pitt Corpus, a database consisting of transcripts of interviews with subjects suffering from multiple neurodegenerative diseases. We then trained both binary classifiers, as well as multiclass classifiers to distinguish AD from normal aging and other neurodegenerative diseases. We also worked on establishing the link between specific speech factors that can help determine the onset of AD. Confusion matrices and feature importance graphs have been plotted model-wise to compare the performances of our models. In both multiclass and binary classification, neural networks were found to outperform the other models with a testing accuracy of 76.44% and 92.05% respectively. For the feature importance, it was concluded that '%_PRESP' (present participle), '%_3S' (3rd person present tense markers) were two of the most important speech features for our classifiers in predicting AD.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
Deep Neural Networks on EEG Signals to Predict Auditory Attention Score Using Gramian Angular Difference Field
Authors:
Mahak Kothari,
Shreyansh Joshi,
Adarsh Nandanwar,
Aadetya Jaiswal,
Veeky Baths
Abstract:
Auditory attention is a selective type of hearing in which people focus their attention intentionally on a specific source of a sound or spoken words whilst ignoring or inhibiting other auditory stimuli. In some sense, the auditory attention score of an individual shows the focus the person can have in auditory tasks. The recent advancements in deep learning and in the non-invasive technologies re…
▽ More
Auditory attention is a selective type of hearing in which people focus their attention intentionally on a specific source of a sound or spoken words whilst ignoring or inhibiting other auditory stimuli. In some sense, the auditory attention score of an individual shows the focus the person can have in auditory tasks. The recent advancements in deep learning and in the non-invasive technologies recording neural activity beg the question, can deep learning along with technologies such as electroencephalography (EEG) be used to predict the auditory attention score of an individual? In this paper, we focus on this very problem of estimating a person's auditory attention level based on their brain's electrical activity captured using 14-channeled EEG signals. More specifically, we deal with attention estimation as a regression problem. The work has been performed on the publicly available Phyaat dataset. The concept of Gramian Angular Difference Field (GADF) has been used to convert time-series EEG data into an image having 14 channels, enabling us to train various deep learning models such as 2D CNN, 3D CNN, and convolutional autoencoders. Their performances have been compared amongst themselves as well as with the work done previously. Amongst the different models we tried, 2D CNN gave the best performance. It outperformed the existing methods by a decent margin of 0.22 mean absolute error (MAE).
△ Less
Submitted 24 October, 2021;
originally announced October 2021.
-
Source Printer Identification using Printer Specific Pooling of Letter Descriptors
Authors:
Sharad Joshi,
Yogesh Kumar Gupta,
Nitin Khanna
Abstract:
The digital revolution has replaced the use of printed documents with their digital counterparts. However, many applications require the use of both due to several factors, including challenges of digital security, installation costs, ease of use, and lack of digital expertise. Technological developments in the digital domain have also resulted in the easy availability of high-quality scanners, pr…
▽ More
The digital revolution has replaced the use of printed documents with their digital counterparts. However, many applications require the use of both due to several factors, including challenges of digital security, installation costs, ease of use, and lack of digital expertise. Technological developments in the digital domain have also resulted in the easy availability of high-quality scanners, printers, and image editing software at lower prices. Miscreants leverage such technology to develop forged documents that may go undetected in vast volumes of printed documents. These developments mandate the research on creating fast and accurate digital systems for source printer identification of printed documents. We extensively analyze and propose a printer-specific pooling that improves the performance of printer-specific local texture descriptor on two datasets. The proposed pooling performs well using a simple correlation-based prediction instead of a complex machine learning-based classifier achieving improved performance under cross-font scenarios. The proposed system achieves an average classification accuracy of 93.5%, 94.3%, and 60.3% on documents printed in Arial, Times New Roman, and Comic Sans font types respectively, when documents printed in only Cambria font are available for training.
△ Less
Submitted 23 September, 2021;
originally announced September 2021.
-
Latency-Constrained Highly-Reliable mmWave Communication via Multi-point Connectivity
Authors:
Dileep Kumar,
Satya Joshi,
Antti Tölli
Abstract:
The sensitivity of millimeter-wave (mmWave) radio channel to blockage is a fundamental challenge in achieving low-latency and ultra-reliable connectivity. In this paper, we explore the viability of using coordinated multi-point (CoMP) transmission for a delay bounded and reliable mmWave communication. We propose a novel blockage-aware algorithm for the sum-power minimization problem under the user…
▽ More
The sensitivity of millimeter-wave (mmWave) radio channel to blockage is a fundamental challenge in achieving low-latency and ultra-reliable connectivity. In this paper, we explore the viability of using coordinated multi-point (CoMP) transmission for a delay bounded and reliable mmWave communication. We propose a novel blockage-aware algorithm for the sum-power minimization problem under the user-specific latency requirements in a dynamic mobile access network. We use the Lyapunov optimization framework, and provide a dynamic control algorithm, which efficiently transforms a time-average stochastic problem into a sequence of deterministic subproblems. A robust beamformer design is then proposed by exploiting the queue backlogs and channel information, that efficiently allocates the required radio and cooperation resources, and proactively leverages the multi-antenna spatial diversity according to the instantaneous needs of the users. Further, to adapt to the uncertainties of the mmWave channel, we consider a pessimistic estimate of the rates over link blockage combinations and an adaptive selection of the CoMP serving set from the available remote radio units (RRUs). Moreover, after the relaxation of coupled and non-convex constraints via the Fractional Program (FP) techniques, a low-complexity closed-form iterative algorithm is provided by solving a system of Karush-Kuhn-Tucker (KKT) optimality conditions. The simulation results manifest that, in the presence of random blockages, the proposed methods outperform the baseline scenarios and provide power-efficient, high-reliable, and low-latency mmWave communication.
△ Less
Submitted 20 August, 2021;
originally announced August 2021.
-
Integrating Dialog History into End-to-End Spoken Language Understanding Systems
Authors:
Jatin Ganhotra,
Samuel Thomas,
Hong-Kwang J. Kuo,
Sachindra Joshi,
George Saon,
Zoltán Tüske,
Brian Kingsbury
Abstract:
End-to-end spoken language understanding (SLU) systems that process human-human or human-computer interactions are often context independent and process each turn of a conversation independently. Spoken conversations on the other hand, are very much context dependent, and dialog history contains useful information that can improve the processing of each conversational turn. In this paper, we inves…
▽ More
End-to-end spoken language understanding (SLU) systems that process human-human or human-computer interactions are often context independent and process each turn of a conversation independently. Spoken conversations on the other hand, are very much context dependent, and dialog history contains useful information that can improve the processing of each conversational turn. In this paper, we investigate the importance of dialog history and how it can be effectively integrated into end-to-end SLU systems. While processing a spoken utterance, our proposed RNN transducer (RNN-T) based SLU model has access to its dialog history in the form of decoded transcripts and SLU labels of previous turns. We encode the dialog history as BERT embeddings, and use them as an additional input to the SLU model along with the speech features for the current utterance. We evaluate our approach on a recently released spoken dialog data set, the HarperValleyBank corpus. We observe significant improvements: 8% for dialog action and 30% for caller intent recognition tasks, in comparison to a competitive context independent end-to-end baseline system.
△ Less
Submitted 18 August, 2021;
originally announced August 2021.
-
Impact of Scene-Specific Enhancement Spectra on Matched Filter Greenhouse Gas Retrievals from Imaging Spectroscopy
Authors:
Markus D. Foote,
Philip E. Dennison,
Patrick R. Sullivan,
Kelly B. O'Neill,
Andrew K. Thorpe,
David R. Thompson,
Daniel H. Cusworth,
Riley Duren,
Sarang C. Joshi
Abstract:
Matched filter (MF) techniques have been widely used for retrieval of greenhouse gas enhancements (enh.) from imaging spectroscopy datasets. While multiple algorithmic techniques and refinements have been proposed, the greenhouse gas target spectrum used for concentration enh. estimation has remained largely unaltered since the introduction of quantitative MF retrievals. The magnitude of retrieved…
▽ More
Matched filter (MF) techniques have been widely used for retrieval of greenhouse gas enhancements (enh.) from imaging spectroscopy datasets. While multiple algorithmic techniques and refinements have been proposed, the greenhouse gas target spectrum used for concentration enh. estimation has remained largely unaltered since the introduction of quantitative MF retrievals. The magnitude of retrieved methane and carbon dioxide enh., and thereby integrated mass enh. (IME) and estimated flux of point-source emitters, is heavily dependent on this target spectrum. Current standard use of molecular absorption coefficients to create unit enh. target spectra does not account for absorption by background concentrations of greenhouse gases, solar and sensor geometry, or atmospheric water vapor absorption. We introduce geometric and atmospheric parameters into the generation of scene-specific (SS) unit enh. spectra to provide target spectra that are compatible with all greenhouse gas retrieval MF techniques. For methane plumes, IME resulting from use of standard, generic enh. spectra varied from -22 to +28.7% compared to SS enh. spectra. Due to differences in spectral shape between the generic and SS enh. spectra, differences in methane plume IME were linked to surface spectral characteristics in addition to geometric and atmospheric parameters. IME differences for carbon dioxide plumes, with generic enh. spectra producing integrated mass enh. -76.1 to -48.1% compared to SS enh. spectra. Fluxes calculated from these integrated enh. would vary by the same %s, assuming equivalent wind conditions. Methane and carbon dioxide IME were most sensitive to changes in solar zenith angle and ground elevation. SS target spectra can improve confidence in greenhouse gas retrievals and flux estimates across collections of scenes with diverse geometric and atmospheric conditions.
△ Less
Submitted 10 August, 2021; v1 submitted 25 June, 2021;
originally announced July 2021.
-
Representation Learning to Classify and Detect Adversarial Attacks against Speaker and Speech Recognition Systems
Authors:
Jesús Villalba,
Sonal Joshi,
Piotr Żelasko,
Najim Dehak
Abstract:
Adversarial attacks have become a major threat for machine learning applications. There is a growing interest in studying these attacks in the audio domain, e.g, speech and speaker recognition; and find defenses against them. In this work, we focus on using representation learning to classify/detect attacks w.r.t. the attack algorithm, threat model or signal-to-adversarial-noise ratio. We found th…
▽ More
Adversarial attacks have become a major threat for machine learning applications. There is a growing interest in studying these attacks in the audio domain, e.g, speech and speaker recognition; and find defenses against them. In this work, we focus on using representation learning to classify/detect attacks w.r.t. the attack algorithm, threat model or signal-to-adversarial-noise ratio. We found that common attacks in the literature can be classified with accuracies as high as 90%. Also, representations trained to classify attacks against speaker identification can be used also to classify attacks against speaker verification and speech recognition. We also tested an attack verification task, where we need to decide whether two speech utterances contain the same attack. We observed that our models did not generalize well to attack algorithms not included in the attack representation model training. Motivated by this, we evaluated an unknown attack detection task. We were able to detect unknown attacks with equal error rates of about 19%, which is promising.
△ Less
Submitted 9 July, 2021;
originally announced July 2021.
-
Adversarial Attacks and Defenses for Speech Recognition Systems
Authors:
Piotr Żelasko,
Sonal Joshi,
Yiwen Shao,
Jesus Villalba,
Jan Trmal,
Najim Dehak,
Sanjeev Khudanpur
Abstract:
The ubiquitous presence of machine learning systems in our lives necessitates research into their vulnerabilities and appropriate countermeasures. In particular, we investigate the effectiveness of adversarial attacks and defenses against automatic speech recognition (ASR) systems. We select two ASR models - a thoroughly studied DeepSpeech model and a more recent Espresso framework Transformer enc…
▽ More
The ubiquitous presence of machine learning systems in our lives necessitates research into their vulnerabilities and appropriate countermeasures. In particular, we investigate the effectiveness of adversarial attacks and defenses against automatic speech recognition (ASR) systems. We select two ASR models - a thoroughly studied DeepSpeech model and a more recent Espresso framework Transformer encoder-decoder model. We investigate two threat models: a denial-of-service scenario where fast gradient-sign method (FGSM) or weak projected gradient descent (PGD) attacks are used to degrade the model's word error rate (WER); and a targeted scenario where a more potent imperceptible attack forces the system to recognize a specific phrase. We find that the attack transferability across the investigated ASR systems is limited. To defend the model, we use two preprocessing defenses: randomized smoothing and WaveGAN-based vocoder, and find that they significantly improve the model's adversarial robustness. We show that a WaveGAN vocoder can be a useful countermeasure to adversarial attacks on ASR systems - even when it is jointly attacked with the ASR, the target phrases' word error rate is high.
△ Less
Submitted 31 March, 2021;
originally announced March 2021.
-
3D Reasoning for Unsupervised Anomaly Detection in Pediatric WbMRI
Authors:
Alex Chang,
Vinith Suriyakumar,
Abhishek Moturu,
James Tu,
Nipaporn Tewattanarat,
Sayali Joshi,
Andrea Doria,
Anna Goldenberg
Abstract:
Modern deep unsupervised learning methods have shown great promise for detecting diseases across a variety of medical imaging modalities. While previous generative modeling approaches successfully perform anomaly detection by learning the distribution of healthy 2D image slices, they process such slices independently and ignore the fact that they are correlated, all being sampled from a 3D volume.…
▽ More
Modern deep unsupervised learning methods have shown great promise for detecting diseases across a variety of medical imaging modalities. While previous generative modeling approaches successfully perform anomaly detection by learning the distribution of healthy 2D image slices, they process such slices independently and ignore the fact that they are correlated, all being sampled from a 3D volume. We show that incorporating the 3D context and processing whole-body MRI volumes is beneficial to distinguishing anomalies from their benign counterparts. In our work, we introduce a multi-channel sliding window generative model to perform lesion detection in whole-body MRI (wbMRI). Our experiments demonstrate that our proposed method significantly outperforms processing individual images in isolation and our ablations clearly show the importance of 3D reasoning. Moreover, our work also shows that it is beneficial to include additional patient-specific features to further improve anomaly detection in pediatric scans.
△ Less
Submitted 24 March, 2021;
originally announced March 2021.
-
The Generalized Fourier Transform: A Unified Framework for the Fourier, Laplace, Mellin and $Z$ Transforms
Authors:
Pushpendra Singh,
Anubha Gupta,
Shiv Dutt Joshi
Abstract:
This paper introduces Generalized Fourier transform (GFT) that is an extension or the generalization of the Fourier transform (FT). The Unilateral Laplace transform (LT) is observed to be the special case of GFT. GFT, as proposed in this work, contributes significantly to the scholarly literature. There are many salient contribution of this work. Firstly, GFT is applicable to a much larger class o…
▽ More
This paper introduces Generalized Fourier transform (GFT) that is an extension or the generalization of the Fourier transform (FT). The Unilateral Laplace transform (LT) is observed to be the special case of GFT. GFT, as proposed in this work, contributes significantly to the scholarly literature. There are many salient contribution of this work. Firstly, GFT is applicable to a much larger class of signals, some of which cannot be analyzed with FT and LT. For example, we have shown the applicability of GFT on the polynomially decaying functions and super exponentials. Secondly, we demonstrate the efficacy of GFT in solving the initial value problems (IVPs). Thirdly, the generalization presented for FT is extended for other integral transforms with examples shown for wavelet transform and cosine transform. Likewise, generalized Gamma function is also presented. One interesting application of GFT is the computation of generalized moments, for the otherwise non-finite moments, of any random variable such as the Cauchy random variable. Fourthly, we introduce Fourier scale transform (FST) that utilizes GFT with the topological isomorphism of an exponential map. Lastly, we propose Generalized Discrete-Time Fourier transform (GDTFT). The DTFT and unilateral $z$-transform are shown to be the special cases of the proposed GDTFT. The properties of GFT and GDTFT have also been discussed.
△ Less
Submitted 12 February, 2021;
originally announced March 2021.
-
Study of Pre-processing Defenses against Adversarial Attacks on State-of-the-art Speaker Recognition Systems
Authors:
Sonal Joshi,
Jesús Villalba,
Piotr Żelasko,
Laureano Moro-Velázquez,
Najim Dehak
Abstract:
Adversarial examples to speaker recognition (SR) systems are generated by adding a carefully crafted noise to the speech signal to make the system fail while being imperceptible to humans. Such attacks pose severe security risks, making it vital to deep-dive and understand how much the state-of-the-art SR systems are vulnerable to these attacks. Moreover, it is of greater importance to propose def…
▽ More
Adversarial examples to speaker recognition (SR) systems are generated by adding a carefully crafted noise to the speech signal to make the system fail while being imperceptible to humans. Such attacks pose severe security risks, making it vital to deep-dive and understand how much the state-of-the-art SR systems are vulnerable to these attacks. Moreover, it is of greater importance to propose defenses that can protect the systems against these attacks. Addressing these concerns, this paper at first investigates how state-of-the-art x-vector based SR systems are affected by white-box adversarial attacks, i.e., when the adversary has full knowledge of the system. x-Vector based SR systems are evaluated against white-box adversarial attacks common in the literature like fast gradient sign method (FGSM), basic iterative method (BIM)--a.k.a. iterative-FGSM--, projected gradient descent (PGD), and Carlini-Wagner (CW) attack. To mitigate against these attacks, the paper proposes four pre-processing defenses. It evaluates them against powerful adaptive white-box adversarial attacks, i.e., when the adversary has full knowledge of the system, including the defense. The four pre-processing defenses--viz. randomized smoothing, DefenseGAN, variational autoencoder (VAE), and Parallel WaveGAN vocoder (PWG) are compared against the baseline defense of adversarial training. Conclusions indicate that SR systems were extremely vulnerable under BIM, PGD, and CW attacks. Among the proposed pre-processing defenses, PWG combined with randomized smoothing offers the most protection against the attacks, with accuracy averaging 93% compared to 52% in the undefended system and an absolute improvement >90% for BIM attacks with $L_\infty>0.001$ and CW attack.
△ Less
Submitted 25 June, 2021; v1 submitted 21 January, 2021;
originally announced January 2021.
-
Histology to 3D In Vivo MR Registration for Volumetric Evaluation of MRgFUS Treatment Assessment Biomarkers
Authors:
Blake E. Zimmerman,
Sara L. Johnson,
Henrik A. Odéen,
Jill E. Shea,
Rachel E. Factor,
Sarang C. Joshi,
Allison H. Payne
Abstract:
Advances in imaging and early cancer detection have increased interest in magnetic resonance (MR) guided focused ultrasound (MRgFUS) technologies for cancer treatment. MRgFUS ablation treatments could reduce surgical risks, preserve organ tissue/function, and improve patient quality of life. However, surgical resection and histological analysis remain the gold standard to assess cancer treatment r…
▽ More
Advances in imaging and early cancer detection have increased interest in magnetic resonance (MR) guided focused ultrasound (MRgFUS) technologies for cancer treatment. MRgFUS ablation treatments could reduce surgical risks, preserve organ tissue/function, and improve patient quality of life. However, surgical resection and histological analysis remain the gold standard to assess cancer treatment response. For non-invasive ablation therapies such as MRgFUS, the treatment response must be determined through MR imaging biomarkers. However, current MR biomarkers are inconclusive and have not been rigorously evaluated against histology via accurate registration. Existing registration methods rely on anatomical features to directly register in vivo MR and histology. For MRgFUS applications in anatomies such as liver, kidney, or breast, anatomical features independent from treatment features are often insufficient to perform direct registration. We present a novel MR to histology registration workflow that utilizes intermediate imaging and does not rely on these independent features. The presented workflow yields an overall registration accuracy of 1.00 +/- 0.13 mm. The developed registration pipeline is used to evaluate a common MRgFUS treatment assessment biomarker against histology. Evaluating MR biomarkers against histology using this registration pipeline will facilitate validating novel MRgFUS biomarkers to improve treatment assessment without surgical intervention.
△ Less
Submitted 20 November, 2020;
originally announced November 2020.
-
Analog vs. Digital Spatial Transforms: A Throughput, Power, and Area Comparison
Authors:
Zephan M. Enciso,
Seyed Hadi Mirfarshbafan,
Oscar Castañeda,
Clemens JS. Schaefer,
Christoph Studer,
Siddharth Joshi
Abstract:
Spatial linear transforms that process multiple parallel analog signals to simplify downstream signal processing find widespread use in multi-antenna communication systems, machine learning inference, data compression, audio and ultrasound applications, among many others. In the past, a wide range of mixed-signal as well as digital spatial transform circuits have been proposed---it is, however, a…
▽ More
Spatial linear transforms that process multiple parallel analog signals to simplify downstream signal processing find widespread use in multi-antenna communication systems, machine learning inference, data compression, audio and ultrasound applications, among many others. In the past, a wide range of mixed-signal as well as digital spatial transform circuits have been proposed---it is, however, a longstanding question whether analog or digital transforms are superior in terms of throughput, power, and area. In this paper, we focus on Hadamard transforms and perform a systematic comparison of state-of-the-art analog and digital circuits implementing spatial transforms in the same 65\,nm CMOS technology. We analyze the trade-offs between throughput, power, and area, and we identify regimes in which mixed-signal or digital Hadamard transforms are preferable. Our comparison reveals that (i) there is no clear winner and (ii) analog-to-digital conversion is often dominating area and energy efficiency---and not the spatial transform.
△ Less
Submitted 15 September, 2020;
originally announced September 2020.
-
Image fusion using symmetric skip autoencodervia an Adversarial Regulariser
Authors:
Snigdha Bhagat,
S. D. Joshi,
Brejesh Lall
Abstract:
It is a challenging task to extract the best of both worlds by combining the spatial characteristics of a visible image and the spectral content of an infrared image. In this work, we propose a spatially constrained adversarial autoencoder that extracts deep features from the infrared and visible images to obtain a more exhaustive and global representation. In this paper, we propose a residual aut…
▽ More
It is a challenging task to extract the best of both worlds by combining the spatial characteristics of a visible image and the spectral content of an infrared image. In this work, we propose a spatially constrained adversarial autoencoder that extracts deep features from the infrared and visible images to obtain a more exhaustive and global representation. In this paper, we propose a residual autoencoder architecture, regularised by a residual adversarial network, to generate a more realistic fused image. The residual module serves as primary building for the encoder, decoder and adversarial network, as an add on the symmetric skip connections perform the functionality of embedding the spatial characteristics directly from the initial layers of encoder structure to the decoder part of the network. The spectral information in the infrared image is incorporated by adding the feature maps over several layers in the encoder part of the fusion structure, which makes inference on both the visual and infrared images separately. In order to efficiently optimize the parameters of the network, we propose an adversarial regulariser network which would perform supervised learning on the fused image and the original visual image.
△ Less
Submitted 4 June, 2020; v1 submitted 1 May, 2020;
originally announced May 2020.
-
A Device Non-Ideality Resilient Approach for Mapping Neural Networks to Crossbar Arrays
Authors:
Arman Kazemi,
Cristobal Alessandri,
Alan C. Seabaugh,
X. Sharon Hu,
Michael Niemier,
Siddharth Joshi
Abstract:
We propose a technology-independent method, referred to as adjacent connection matrix (ACM), to efficiently map signed weight matrices to non-negative crossbar arrays. When compared to same-hardware-overhead mapping methods, using ACM leads to improvements of up to 20% in training accuracy for ResNet-20 with the CIFAR-10 dataset when training with 5-bit precision crossbar arrays or lower. When com…
▽ More
We propose a technology-independent method, referred to as adjacent connection matrix (ACM), to efficiently map signed weight matrices to non-negative crossbar arrays. When compared to same-hardware-overhead mapping methods, using ACM leads to improvements of up to 20% in training accuracy for ResNet-20 with the CIFAR-10 dataset when training with 5-bit precision crossbar arrays or lower. When compared with strategies that use two elements to represent a weight, ACM achieves comparable training accuracies, while also offering area and read energy reductions of 2.3x and 7x, respectively. ACM also has a mild regularization effect that improves inference accuracy in crossbar arrays without any retraining or costly device/variation-aware training.
△ Less
Submitted 1 April, 2020;
originally announced April 2020.
-
Empirical Evaluation of PRNU Fingerprint Variation for Mismatched Imaging Pipelines
Authors:
Sharad Joshi,
Pawel Korus,
Nitin Khanna,
Nasir Memon
Abstract:
We assess the variability of PRNU-based camera fingerprints with mismatched imaging pipelines (e.g., different camera ISP or digital darkroom software). We show that camera fingerprints exhibit non-negligible variations in this setup, which may lead to unexpected degradation of detection statistics in real-world use-cases. We tested 13 different pipelines, including standard digital darkroom softw…
▽ More
We assess the variability of PRNU-based camera fingerprints with mismatched imaging pipelines (e.g., different camera ISP or digital darkroom software). We show that camera fingerprints exhibit non-negligible variations in this setup, which may lead to unexpected degradation of detection statistics in real-world use-cases. We tested 13 different pipelines, including standard digital darkroom software and recent neural-networks. We observed that correlation between fingerprints from mismatched pipelines drops on average to 0.38 and the PCE detection statistic drops by over 40%. The degradation in error rates is the strongest for small patches commonly used in photo manipulation detection, and when neural networks are used for photo development. At a fixed 0.5% FPR setting, the TPR drops by 17 ppt (percentage points) for 128 px and 256 px patches.
△ Less
Submitted 9 October, 2020; v1 submitted 4 April, 2020;
originally announced April 2020.
-
Source Printer Identification from Document Images Acquired using Smartphone
Authors:
Sharad Joshi,
Suraj Saxena,
Nitin Khanna
Abstract:
Vast volumes of printed documents continue to be used for various important as well as trivial applications. Such applications often rely on the information provided in the form of printed text documents whose integrity verification poses a challenge due to time constraints and lack of resources. Source printer identification provides essential information about the origin and integrity of a print…
▽ More
Vast volumes of printed documents continue to be used for various important as well as trivial applications. Such applications often rely on the information provided in the form of printed text documents whose integrity verification poses a challenge due to time constraints and lack of resources. Source printer identification provides essential information about the origin and integrity of a printed document in a fast and cost-effective manner. Even when fraudulent documents are identified, information about their origin can help stop future frauds. If a smartphone camera replaces scanner for the document acquisition process, document forensics would be more economical, user-friendly, and even faster in many applications where remote and distributed analysis is beneficial. Building on existing methods, we propose to learn a single CNN model from the fusion of letter images and their printer-specific noise residuals. In the absence of any publicly available dataset, we created a new dataset consisting of 2250 document images of text documents printed by eighteen printers and acquired by a smartphone camera at five acquisition settings. The proposed method achieves 98.42% document classification accuracy using images of letter 'e' under a 5x2 cross-validation approach. Further, when tested using about half a million letters of all types, it achieves 90.33% and 98.01% letter and document classification accuracies, respectively, thus highlighting the ability to learn a discriminative model without dependence on a single letter type. Also, classification accuracies are encouraging under various acquisition settings, including low illumination and change in angle between the document and camera planes.
△ Less
Submitted 27 March, 2020;
originally announced March 2020.
-
Fast and Accurate Retrieval of Methane Concentration from Imaging Spectrometer Data Using Sparsity Prior
Authors:
Markus D. Foote,
Philip E. Dennison,
Andrew K. Thorpe,
David R. Thompson,
Siraput Jongaramrungruang,
Christian Frankenberg,
Sarang C. Joshi
Abstract:
The strong radiative forcing by atmospheric methane has stimulated interest in identifying natural and anthropogenic sources of this potent greenhouse gas. Point sources are important targets for quantification, and anthropogenic targets have potential for emissions reduction. Methane point source plume detection and concentration retrieval have been previously demonstrated using data from the Air…
▽ More
The strong radiative forcing by atmospheric methane has stimulated interest in identifying natural and anthropogenic sources of this potent greenhouse gas. Point sources are important targets for quantification, and anthropogenic targets have potential for emissions reduction. Methane point source plume detection and concentration retrieval have been previously demonstrated using data from the Airborne Visible InfraRed Imaging Spectrometer Next Generation (AVIRIS-NG). Current quantitative methods have tradeoffs between computational requirements and retrieval accuracy, creating obstacles for processing real-time data or large datasets from flight campaigns. We present a new computationally efficient algorithm that applies sparsity and an albedo correction to matched filter retrieval of trace gas concentration-pathlength. The new algorithm was tested using AVIRIS-NG data acquired over several point source plumes in Ahmedabad, India. The algorithm was validated using simulated AVIRIS-NG data including synthetic plumes of known methane concentration. Sparsity and albedo correction together reduced the root mean squared error of retrieved methane concentration-pathlength enhancement by 60.7% compared with a previous robust matched filter method. Background noise was reduced by a factor of 2.64. The new algorithm was able to process the entire 300 flightline 2016 AVIRIS-NG India campaign in just over 8 hours on a desktop computer with GPU acceleration.
△ Less
Submitted 5 March, 2020;
originally announced March 2020.
-
Quantification of Differential Information using Matrix Pencil
Authors:
Snigdha Bhagat,
S. D. Joshi
Abstract:
Any traditional classification problem in general involves modelling individual classes and in turn classification by evaluating the similarity of the test set with the modelled classes. In this paper, we introduce another approach that would find the differential information between two classes rather than modelling individual classes separately. The classes are viewed on a common frame of refere…
▽ More
Any traditional classification problem in general involves modelling individual classes and in turn classification by evaluating the similarity of the test set with the modelled classes. In this paper, we introduce another approach that would find the differential information between two classes rather than modelling individual classes separately. The classes are viewed on a common frame of reference in which one class would have a constant variance, unlike the other class which would have unequal variance along its basis vectors which would capture the differential information of one class over the other.This, when mathematically formulated, leads to the solution of Matrix Pencil equation.The theory of binary classification was extended to a multi-class scenario.This is borne out by illustrative examples on the classification of the MNIST database.
△ Less
Submitted 6 February, 2020;
originally announced February 2020.
-
Learning Multiparametric Biomarkers for Assessing MR-Guided Focused Ultrasound Treatment of Malignant Tumors
Authors:
Blake E. Zimmerman,
Sara Johnson,
Henrik Odéen,
Jill Shea,
Markus D. Foote,
Nicole Winkler,
Sarang C. Joshi,
Allison Payne
Abstract:
Noninvasive MR-guided focused ultrasound (MRgFUS) treatments are promising alternatives to the surgical removal of malignant tumors. A significant challenge is assessing the viability of treated tissue during and immediately after MRgFUS procedures. Current clinical assessment uses the nonperfused volume (NPV) biomarker immediately after treatment from contrast-enhanced MRI. The NPV has variable a…
▽ More
Noninvasive MR-guided focused ultrasound (MRgFUS) treatments are promising alternatives to the surgical removal of malignant tumors. A significant challenge is assessing the viability of treated tissue during and immediately after MRgFUS procedures. Current clinical assessment uses the nonperfused volume (NPV) biomarker immediately after treatment from contrast-enhanced MRI. The NPV has variable accuracy, and the use of contrast agent prevents continuing MRgFUS treatment if tumor coverage is inadequate. This work presents a novel, noncontrast, learned multiparametric MR biomarker that can be used during treatment for intratreatment assessment, validated in a VX2 rabbit tumor model. A deep convolutional neural network was trained on noncontrast multiparametric MR images using the NPV biomarker from follow-up MR imaging (3-5 days after MRgFUS treatment) as the accurate label of nonviable tissue. A novel volume-conserving registration algorithm yielded a voxel-wise correlation between treatment and follow-up NPV, providing a rigorous validation of the biomarker. The learned noncontrast multiparametric MR biomarker predicted the follow-up NPV with an average DICE coefficient of 0.71, substantially outperforming the current clinical standard (DICE coefficient = 0.53). Noncontrast multiparametric MR imaging integrated with a deep convolutional neural network provides a more accurate prediction of MRgFUS treatment outcome than current contrast-based techniques.
△ Less
Submitted 29 September, 2020; v1 submitted 23 October, 2019;
originally announced October 2019.
-
Rank Constrained Diffeomorphic Density Motion Estimation for Respiratory Correlated Computed Tomography
Authors:
Markus D. Foote,
Pouya Sabouri,
Amit Sawant,
Sarang C. Joshi
Abstract:
Motion estimation of organs in a sequence of images is important in numerous medical imaging applications. The focus of this paper is the analysis of 4D Respiratory Correlated Computed Tomography (RCCT) Imaging. It is hypothesized that the quasi-periodic breathing induced motion of organs in the thorax can be represented by deformations spanning a very low dimension subspace of the full infinite d…
▽ More
Motion estimation of organs in a sequence of images is important in numerous medical imaging applications. The focus of this paper is the analysis of 4D Respiratory Correlated Computed Tomography (RCCT) Imaging. It is hypothesized that the quasi-periodic breathing induced motion of organs in the thorax can be represented by deformations spanning a very low dimension subspace of the full infinite dimensional space of diffeomorphic transformations. This paper presents a novel motion estimation algorithm that includes the constraint for low-rank motion between the different phases of the RCCT images. Low-rank deformation solutions are necessary for the efficient statistical analysis and improved treatment planning and delivery. Although the application focus of this paper is RCCT the algorithm is quite general and applicable to various motion estimation problems in medical imaging.
△ Less
Submitted 25 September, 2019;
originally announced September 2019.
-
Unified Functorial Signal Representation III: Foundations, Redundancy, $L^0$ and $L^2$ functors
Authors:
Salil Samant,
Shiv Dutt Joshi
Abstract:
In this paper we propose and lay the foundations of a functorial framework for representing signals. By incorporating additional category-theoretic relative and generative perspective alongside the classic set-theoretic measure theory the fundamental concepts of redundancy, compression are formulated in a novel authentic arrow-theoretic way. The existing classic framework representing a signal as…
▽ More
In this paper we propose and lay the foundations of a functorial framework for representing signals. By incorporating additional category-theoretic relative and generative perspective alongside the classic set-theoretic measure theory the fundamental concepts of redundancy, compression are formulated in a novel authentic arrow-theoretic way. The existing classic framework representing a signal as a vector of appropriate linear space is shown as a special case of the proposed framework.
Next in the context of signal-spaces as a categories we study the various covariant and contravariant forms of $L^0$ and $L^2$ functors using categories of measurable or measure spaces and their opposites involving Boolean and measure algebras along with partial extension. Finally we contribute a novel definition of intra-signal redundancy using general concept of isomorphism arrow in a category covering the translation case and others as special cases. Through category-theory we provide a simple yet precise explanation for the well-known heuristic of lossless differential encoding standards yielding better compressions in image types such as line drawings, iconic image, text etc; as compared to classic representation techniques such as JPEG which choose bases or frames in a global Hilbert space.
△ Less
Submitted 27 October, 2017;
originally announced October 2017.