-
Adapting Whisper for Streaming Speech Recognition via Two-Pass Decoding
Authors:
Haoran Zhou,
Xingchen Song,
Brendan Fahy,
Qiaochu Song,
Binbin Zhang,
Zhendong Peng,
Anshul Wadhawan,
Denglin Jiang,
Apurv Verma,
Vinay Ramesh,
Srivas Prasad,
Michele M. Franceschini
Abstract:
OpenAI Whisper is a family of robust Automatic Speech Recognition (ASR) models trained on 680,000 hours of audio. However, its encoder-decoder architecture, trained with a sequence-to-sequence objective, lacks native support for streaming ASR. In this paper, we fine-tune Whisper for streaming ASR using the WeNet toolkit by adopting a Unified Two-pass (U2) structure. We introduce an additional Conn…
▽ More
OpenAI Whisper is a family of robust Automatic Speech Recognition (ASR) models trained on 680,000 hours of audio. However, its encoder-decoder architecture, trained with a sequence-to-sequence objective, lacks native support for streaming ASR. In this paper, we fine-tune Whisper for streaming ASR using the WeNet toolkit by adopting a Unified Two-pass (U2) structure. We introduce an additional Connectionist Temporal Classification (CTC) decoder trained with causal attention masks to generate streaming partial transcripts, while the original Whisper decoder reranks these partial outputs. Our experiments on LibriSpeech and an earnings call dataset demonstrate that, with adequate fine-tuning data, Whisper can be adapted into a capable streaming ASR model. We also introduce a hybrid tokenizer approach, which uses a smaller token space for the CTC decoder while retaining Whisper's original token space for the attention decoder, resulting in improved data efficiency and generalization.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
Olfactory Inertial Odometry: Sensor Calibration and Drift Compensation
Authors:
Kordel K. France,
Ovidiu Daescu,
Anirban Paul,
Shalini Prasad
Abstract:
Visual inertial odometry (VIO) is a process for fusing visual and kinematic data to understand a machine's state in a navigation task. Olfactory inertial odometry (OIO) is an analog to VIO that fuses signals from gas sensors with inertial data to help a robot navigate by scent. Gas dynamics and environmental factors introduce disturbances into olfactory navigation tasks that can make OIO difficult…
▽ More
Visual inertial odometry (VIO) is a process for fusing visual and kinematic data to understand a machine's state in a navigation task. Olfactory inertial odometry (OIO) is an analog to VIO that fuses signals from gas sensors with inertial data to help a robot navigate by scent. Gas dynamics and environmental factors introduce disturbances into olfactory navigation tasks that can make OIO difficult to facilitate. With our work here, we define a process for calibrating a robot for OIO that generalizes to several olfaction sensor types. Our focus is specifically on calibrating OIO for centimeter-level accuracy in localizing an odor source on a slow-moving robot platform to demonstrate use cases in robotic surgery and touchless security screening. We demonstrate our process for OIO calibration on a real robotic arm and show how this calibration improves performance over a cold-start olfactory navigation task.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Study on Downlink CSI compression: Are Neural Networks the Only Solution?
Authors:
K. Sai Praneeth,
Anil Kumar Yerrapragada,
Achyuth Sagireddi,
Sai Prasad,
Radha Krishna Ganti
Abstract:
Massive Multi Input Multi Output (MIMO) systems enable higher data rates in the downlink (DL) with spatial multiplexing achieved by forming narrow beams. The higher DL data rates are achieved by effective implementation of spatial multiplexing and beamforming which is subject to availability of DL channel state information (CSI) at the base station. For Frequency Division Duplexing (FDD) systems,…
▽ More
Massive Multi Input Multi Output (MIMO) systems enable higher data rates in the downlink (DL) with spatial multiplexing achieved by forming narrow beams. The higher DL data rates are achieved by effective implementation of spatial multiplexing and beamforming which is subject to availability of DL channel state information (CSI) at the base station. For Frequency Division Duplexing (FDD) systems, the DL CSI has to be transmitted by User Equipment (UE) to the gNB and it constitutes a significant overhead which scales with the number of transmitter antennas and the granularity of the CSI. To address the overhead issue, AI/ML methods using auto-encoders have been investigated, where an encoder neural network model at the UE compresses the CSI and a decoder neural network model at the gNB reconstructs it. However, the use of AI/ML methods has a number of challenges related to (1) model complexity, (2) model generalization across channel scenarios and (3) inter-vendor compatibility of the two sides of the model. In this work, we investigate a more traditional dimensionality reduction method that uses Principal Component Analysis (PCA) and therefore does not suffer from the above challenges. Simulation results show that PCA based CSI compression actually achieves comparable reconstruction performance to commonly used deep neural networks based models.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
A Novel Breast Ultrasound Image Augmentation Method Using Advanced Neural Style Transfer: An Efficient and Explainable Approach
Authors:
Lipismita Panigrahi,
Prianka Rani Saha,
Jurdana Masuma Iqrah,
Sushil Prasad
Abstract:
Clinical diagnosis of breast malignancy (BM) is a challenging problem in the recent era. In particular, Deep learning (DL) models have continued to offer important solutions for early BM diagnosis but their performance experiences overfitting due to the limited volume of breast ultrasound (BUS) image data. Further, large BUS datasets are difficult to manage due to privacy and legal concerns. Hence…
▽ More
Clinical diagnosis of breast malignancy (BM) is a challenging problem in the recent era. In particular, Deep learning (DL) models have continued to offer important solutions for early BM diagnosis but their performance experiences overfitting due to the limited volume of breast ultrasound (BUS) image data. Further, large BUS datasets are difficult to manage due to privacy and legal concerns. Hence, image augmentation is a necessary and challenging step to improve the performance of the DL models. However, the current DL-based augmentation models are inadequate and operate as a black box resulting lack of information and justifications about their suitability and efficacy. Additionally, pre and post-augmentation need high-performance computational resources and time to produce the augmented image and evaluate the model performance. Thus, this study aims to develop a novel efficient augmentation approach for BUS images with advanced neural style transfer (NST) and Explainable AI (XAI) harnessing GPU-based parallel infrastructure. We scale and distribute the training of the augmentation model across 8 GPUs using the Horovod framework on a DGX cluster, achieving a 5.09 speedup while maintaining the model's accuracy. The proposed model is evaluated on 800 (348 benign and 452 malignant) BUS images and its performance is analyzed with other progressive techniques, using different quantitative analyses. The result indicates that the proposed approach can successfully augment the BUS images with 92.47% accuracy.
△ Less
Submitted 31 October, 2024;
originally announced November 2024.
-
On the Application of Deep Learning for Precise Indoor Positioning in 6G
Authors:
Sai Prasanth Kotturi,
Anil Kumar Yerrapragada,
Sai Prasad,
Radha Krishna Ganti
Abstract:
Accurate localization in indoor environments is a challenge due to the Non Line of Sight (NLoS) nature of the signaling. In this paper, we explore the use of AI/ML techniques for positioning accuracy enhancement in Indoor Factory (InF) scenarios. The proposed neural network, which we term LocNet, is trained on measurements such as Channel Impulse Response (CIR) and Reference Signal Received Power…
▽ More
Accurate localization in indoor environments is a challenge due to the Non Line of Sight (NLoS) nature of the signaling. In this paper, we explore the use of AI/ML techniques for positioning accuracy enhancement in Indoor Factory (InF) scenarios. The proposed neural network, which we term LocNet, is trained on measurements such as Channel Impulse Response (CIR) and Reference Signal Received Power (RSRP) from multiple Transmit Receive Points (TRPs). Simulation results show that when using measurements from 18 TRPs, LocNet achieves a 9 cm positioning accuracy at the 90th percentile. Additionally, we demonstrate that the same model generalizes effectively even when measurements from some TRPs randomly become unavailable. Lastly, we provide insights on the robustness of the trained model to the errors in ground truth labels used for training.
△ Less
Submitted 25 October, 2024;
originally announced October 2024.
-
Gaze-Vector Estimation in the Dark with Temporally Encoded Event-driven Neural Networks
Authors:
Abeer Banerjee,
Naval K. Mehta,
Shyam S. Prasad,
Himanshu,
Sumeet Saurav,
Sanjay Singh
Abstract:
In this paper, we address the intricate challenge of gaze vector prediction, a pivotal task with applications ranging from human-computer interaction to driver monitoring systems. Our innovative approach is designed for the demanding setting of extremely low-light conditions, leveraging a novel temporal event encoding scheme, and a dedicated neural network architecture. The temporal encoding metho…
▽ More
In this paper, we address the intricate challenge of gaze vector prediction, a pivotal task with applications ranging from human-computer interaction to driver monitoring systems. Our innovative approach is designed for the demanding setting of extremely low-light conditions, leveraging a novel temporal event encoding scheme, and a dedicated neural network architecture. The temporal encoding method seamlessly integrates Dynamic Vision Sensor (DVS) events with grayscale guide frames, generating consecutively encoded images for input into our neural network. This unique solution not only captures diverse gaze responses from participants within the active age group but also introduces a curated dataset tailored for low-light conditions. The encoded temporal frames paired with our network showcase impressive spatial localization and reliable gaze direction in their predictions. Achieving a remarkable 100-pixel accuracy of 100%, our research underscores the potency of our neural network to work with temporally consecutive encoded images for precise gaze vector predictions in challenging low-light videos, contributing to the advancement of gaze prediction technologies.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Toward Polar Sea-Ice Classification using Color-based Segmentation and Auto-labeling of Sentinel-2 Imagery to Train an Efficient Deep Learning Model
Authors:
Jurdana Masuma Iqrah,
Younghyun Koo,
Wei Wang,
Hongjie Xie,
Sushil Prasad
Abstract:
Global warming is an urgent issue that is generating catastrophic environmental changes, such as the melting of sea ice and glaciers, particularly in the polar regions. The melting pattern and retreat of polar sea ice cover is an essential indicator of global warming. The Sentinel-2 satellite (S2) captures high-resolution optical imagery over the polar regions. This research aims at developing a r…
▽ More
Global warming is an urgent issue that is generating catastrophic environmental changes, such as the melting of sea ice and glaciers, particularly in the polar regions. The melting pattern and retreat of polar sea ice cover is an essential indicator of global warming. The Sentinel-2 satellite (S2) captures high-resolution optical imagery over the polar regions. This research aims at developing a robust and effective system for classifying polar sea ice as thick or snow-covered, young or thin, or open water using S2 images. A key challenge is the lack of labeled S2 training data to serve as the ground truth. We demonstrate a method with high precision to segment and automatically label the S2 images based on suitably determined color thresholds and employ these auto-labeled data to train a U-Net machine model (a fully convolutional neural network), yielding good classification accuracy. Evaluation results over S2 data from the polar summer season in the Ross Sea region of the Antarctic show that the U-Net model trained on auto-labeled data has an accuracy of 90.18% over the original S2 images, whereas the U-Net model trained on manually labeled data has an accuracy of 91.39%. Filtering out the thin clouds and shadows from the S2 images further improves U-Net's accuracy, respectively, to 98.97% for auto-labeled and 98.40% for manually labeled training datasets.
△ Less
Submitted 8 March, 2023;
originally announced March 2023.
-
Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition
Authors:
Ashish Seth,
Lodagala V S V Durga Prasad,
Sreyan Ghosh,
S. Umesh
Abstract:
Self-supervised learning (SSL) to learn high-level speech representations has been a popular approach to building Automatic Speech Recognition (ASR) systems in low-resource settings. However, the common assumption made in literature is that a considerable amount of unlabeled data is available for the same domain or language that can be leveraged for SSL pre-training, which we acknowledge is not fe…
▽ More
Self-supervised learning (SSL) to learn high-level speech representations has been a popular approach to building Automatic Speech Recognition (ASR) systems in low-resource settings. However, the common assumption made in literature is that a considerable amount of unlabeled data is available for the same domain or language that can be leveraged for SSL pre-training, which we acknowledge is not feasible in a real-world setting. In this paper, as part of the Interspeech Gram Vaani ASR challenge, we try to study the effect of domain, language, dataset size, and other aspects of our upstream pre-training SSL data on the final performance low-resource downstream ASR task. We also build on the continued pre-training paradigm to study the effect of prior knowledge possessed by models trained using SSL. Extensive experiments and studies reveal that the performance of ASR systems is susceptible to the data used for SSL pre-training. Their performance improves with an increase in similarity and volume of pre-training data. We believe our work will be helpful to the speech community in building better ASR systems in low-resource settings and steer research towards improving generalization in SSL-based pre-training for speech systems.
△ Less
Submitted 17 May, 2023; v1 submitted 31 March, 2022;
originally announced March 2022.
-
PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations
Authors:
Lodagala V S V Durga Prasad,
Sreyan Ghosh,
S. Umesh
Abstract:
While self-supervised speech representation learning (SSL) models serve a variety of downstream tasks, these models have been observed to overfit to the domain from which the unlabelled data originates. To alleviate this issue, we propose PADA (Pruning Assisted Domain Adaptation) and zero out redundant weights from models pre-trained on large amounts of out-of-domain (OOD) data. Intuitively, this…
▽ More
While self-supervised speech representation learning (SSL) models serve a variety of downstream tasks, these models have been observed to overfit to the domain from which the unlabelled data originates. To alleviate this issue, we propose PADA (Pruning Assisted Domain Adaptation) and zero out redundant weights from models pre-trained on large amounts of out-of-domain (OOD) data. Intuitively, this helps to make space for the target-domain ASR finetuning. The redundant weights can be identified through various pruning strategies which have been discussed in detail as a part of this work. Specifically, we investigate the effect of the recently discovered Task-Agnostic and Task-Aware pruning on PADA and propose a new pruning paradigm based on the latter, which we call Cross-Domain Task-Aware Pruning (CD-TAW). CD-TAW obtains the initial pruning mask from a well fine-tuned OOD model, which makes it starkly different from the rest of the pruning strategies discussed in the paper. Our proposed CD-TAW methodology achieves up to 20.6% relative WER improvement over our baseline when fine-tuned on a 2-hour subset of Switchboard data without language model (LM) decoding. Furthermore, we conduct a detailed analysis to highlight the key design choices of our proposed method.
△ Less
Submitted 13 May, 2023; v1 submitted 31 March, 2022;
originally announced March 2022.
-
Attacks as Defenses: Designing Robust Audio CAPTCHAs Using Attacks on Automatic Speech Recognition Systems
Authors:
Hadi Abdullah,
Aditya Karlekar,
Saurabh Prasad,
Muhammad Sajidur Rahman,
Logan Blue,
Luke A. Bauer,
Vincent Bindschaedler,
Patrick Traynor
Abstract:
Audio CAPTCHAs are supposed to provide a strong defense for online resources; however, advances in speech-to-text mechanisms have rendered these defenses ineffective. Audio CAPTCHAs cannot simply be abandoned, as they are specifically named by the W3C as important enablers of accessibility. Accordingly, demonstrably more robust audio CAPTCHAs are important to the future of a secure and accessible…
▽ More
Audio CAPTCHAs are supposed to provide a strong defense for online resources; however, advances in speech-to-text mechanisms have rendered these defenses ineffective. Audio CAPTCHAs cannot simply be abandoned, as they are specifically named by the W3C as important enablers of accessibility. Accordingly, demonstrably more robust audio CAPTCHAs are important to the future of a secure and accessible Web. We look to recent literature on attacks on speech-to-text systems for inspiration for the construction of robust, principle-driven audio defenses. We begin by comparing 20 recent attack papers, classifying and measuring their suitability to serve as the basis of new "robust to transcription" but "easy for humans to understand" CAPTCHAs. After showing that none of these attacks alone are sufficient, we propose a new mechanism that is both comparatively intelligible (evaluated through a user study) and hard to automatically transcribe (i.e., $P({\rm transcription}) = 4 \times 10^{-5}$). Finally, we demonstrate that our audio samples have a high probability of being detected as CAPTCHAs when given to speech-to-text systems ($P({\rm evasion}) = 1.77 \times 10^{-4}$). In so doing, we not only demonstrate a CAPTCHA that is approximately four orders of magnitude more difficult to crack, but that such systems can be designed based on the insights gained from attack papers using the differences between the ways that humans and computers process audio.
△ Less
Submitted 10 March, 2022;
originally announced March 2022.
-
Design and experimental investigation of a vibro-impact self-propelled capsule robot with orientation control
Authors:
Jiajia Zhang,
Jiyuan Tian,
Dibin Zhu,
Yang Liu,
Shyam Prasad
Abstract:
This paper presents a novel design and experimental investigation for a self-propelled capsule robot that can be used for painless colonoscopy during a retrograde progression from the patient's rectum. The steerable robot is driven forward and backward via its internal vibration and impact with orientation control by using an electromagnetic actuator. The actuator contains four sets of coils and a…
▽ More
This paper presents a novel design and experimental investigation for a self-propelled capsule robot that can be used for painless colonoscopy during a retrograde progression from the patient's rectum. The steerable robot is driven forward and backward via its internal vibration and impact with orientation control by using an electromagnetic actuator. The actuator contains four sets of coils and a shaft made by permanent magnet. The shaft can be excited linearly in a controllable and tilted angle, so guide the progression orientation of the robot. Two control strategies are studied in this work and compared via simulation and experiment. Extensive results are presented to demonstrate the progression efficiency of the robot and its potential for robotic colonoscopy.
△ Less
Submitted 1 March, 2022; v1 submitted 23 February, 2022;
originally announced February 2022.
-
Mean Square Performance of a family of Adaptive Algorithms for colored noise
Authors:
R Sankara Prasad
Abstract:
In real-time applications the characteristics and properties of a signal vary inconsistently. So, to maintain the integrity of such signals there is a need for effective adaptive filters. The conventional Least Mean Squared(LMS) algorithm is widely used because of its computational simplicity and ease of implementation. But, its convergence speed rapidly reduces when colored noise is present in th…
▽ More
In real-time applications the characteristics and properties of a signal vary inconsistently. So, to maintain the integrity of such signals there is a need for effective adaptive filters. The conventional Least Mean Squared(LMS) algorithm is widely used because of its computational simplicity and ease of implementation. But, its convergence speed rapidly reduces when colored noise is present in the signal. Affine projection(AP) algorithms are used to speed up the convergence but have high computational costs. In this paper, the mean square performance of LMS and AP algorithms is analyzed when subject to white noise and colored noise.
△ Less
Submitted 21 November, 2021;
originally announced November 2021.
-
Analysis of bio-electro-chemical signals from passive sweat-based wearable electro-impedance spectroscopy (EIS) towards assessing blood glucose modulations
Authors:
Devangsingh Sankhala,
Madhavi Pali,
Kai-Chun Lin,
Badrinath Jagannath,
Sriram Muthukumar,
Shalini Prasad
Abstract:
There has been a recent tremendous interest in label-free detection of biomarkers which is a critical enabler of point-of-need diagnostics. A low-power, small form factor, multiplexed wearable system is proposed for continuous detection of glucose in passively expressed sweat using electrochemical impedance spectroscopy (EIS) measurement. The wearable EIS system consists of a sensing analog front…
▽ More
There has been a recent tremendous interest in label-free detection of biomarkers which is a critical enabler of point-of-need diagnostics. A low-power, small form factor, multiplexed wearable system is proposed for continuous detection of glucose in passively expressed sweat using electrochemical impedance spectroscopy (EIS) measurement. The wearable EIS system consists of a sensing analog front end integrated with low-volume (1-5 $μ$L) ultra-sensitive flexible biosensors. A passive sweat sensor was designed to integrate a glucose oxidase electrochemical system on active semiconducting material. The non-faradaic EIS response of the biosensor was used to calibrate the analog front end response using ratiometric Discrete Fourier Transform (DFT) for a shorter measurement time. In this work, a stringent assessment of a continuous glucose sensing platform is performed in a bottom-up approach, going from the biosensor to the system to the interaction with a human subject. The active semiconductor-based biosensors are dosed with glucose concentrations ranging from 5-200 mg/dL and detection is performed using the analog front end. In addition, a detailed analysis of battery life and performance of a wearable EIS system is discussed to define a figure of merit for an optimally integrated design. Moreover, a continuous glucose detection test is performed on a healthy human subject cohort to investigate the stability of the sensor-system mechanism for an 8-hour period, and a time-series-based, auto-regressive (AR) model was created for the system.
△ Less
Submitted 5 April, 2021;
originally announced April 2021.
-
Advances in Deep Learning for Hyperspectral Image Analysis--Addressing Challenges Arising in Practical Imaging Scenarios
Authors:
Xiong Zhou,
Saurabh Prasad
Abstract:
Deep neural networks have proven to be very effective for computer vision tasks, such as image classification, object detection, and semantic segmentation -- these are primarily applied to color imagery and video. In recent years, there has been an emergence of deep learning algorithms being applied to hyperspectral and multispectral imagery for remote sensing and biomedicine tasks. These multi-ch…
▽ More
Deep neural networks have proven to be very effective for computer vision tasks, such as image classification, object detection, and semantic segmentation -- these are primarily applied to color imagery and video. In recent years, there has been an emergence of deep learning algorithms being applied to hyperspectral and multispectral imagery for remote sensing and biomedicine tasks. These multi-channel images come with their own unique set of challenges that must be addressed for effective image analysis. Challenges include limited ground truth (annotation is expensive and extensive labeling is often not feasible), and high dimensional nature of the data (each pixel is represented by hundreds of spectral bands), despite being presented by a large amount of unlabeled data and the potential to leverage multiple sensors/sources that observe the same scene. In this chapter, we will review recent advances in the community that leverage deep learning for robust hyperspectral image analysis despite these unique challenges -- specifically, we will review unsupervised, semi-supervised and active learning approaches to image analysis, as well as transfer learning approaches for multi-source (e.g. multi-sensor, or multi-temporal) image analysis.
△ Less
Submitted 16 July, 2020;
originally announced July 2020.
-
Point Spread Function Engineering for 3D Imaging of Space Debris using a Continuous Exact l0 Penalty (CEL0) Based Algorithm
Authors:
Chao Wang,
Raymond H. Chan,
Robert J. Plemmons,
Sudhakar Prasad
Abstract:
We consider three-dimensional (3D) localization and imaging of space debris from only one two-dimensional (2D) snapshot image. The technique involves an optical imager that exploits off-center image rotation to encode both the lateral and depth coordinates of point sources, with the latter being encoded in the angle of rotation of the PSF. We formulate 3D localization into a large-scale sparse 3D…
▽ More
We consider three-dimensional (3D) localization and imaging of space debris from only one two-dimensional (2D) snapshot image. The technique involves an optical imager that exploits off-center image rotation to encode both the lateral and depth coordinates of point sources, with the latter being encoded in the angle of rotation of the PSF. We formulate 3D localization into a large-scale sparse 3D inverse problem in the discretized form. A recently developed penalty called continuous exact l0 (CEL0) is applied in this problem for the Gaussian noise model. Numerical experiments and comparisons illustrate the efficiency of the algorithm.
△ Less
Submitted 2 June, 2020;
originally announced June 2020.
-
Automated Diabetic Retinopathy Grading using Deep Convolutional Neural Network
Authors:
Saket S. Chaturvedi,
Kajol Gupta,
Vaishali Ninawe,
Prakash S. Prasad
Abstract:
Diabetic Retinopathy is a global health problem, influences 100 million individuals worldwide, and in the next few decades, these incidences are expected to reach epidemic proportions. Diabetic Retinopathy is a subtle eye disease that can cause sudden, irreversible vision loss. The early-stage Diabetic Retinopathy diagnosis can be challenging for human experts, considering the visual complexity of…
▽ More
Diabetic Retinopathy is a global health problem, influences 100 million individuals worldwide, and in the next few decades, these incidences are expected to reach epidemic proportions. Diabetic Retinopathy is a subtle eye disease that can cause sudden, irreversible vision loss. The early-stage Diabetic Retinopathy diagnosis can be challenging for human experts, considering the visual complexity of fundus photography retinal images. However, Early Stage detection of Diabetic Retinopathy can significantly alter the severe vision loss problem. The competence of computer-aided detection systems to accurately detect the Diabetic Retinopathy had popularized them among researchers. In this study, we have utilized a pre-trained DenseNet121 network with several modifications and trained on APTOS 2019 dataset. The proposed method outperformed other state-of-the-art networks in early-stage detection and achieved 96.51% accuracy in severity grading of Diabetic Retinopathy for multi-label classification and achieved 94.44% accuracy for single-class classification method. Moreover, the precision, recall, f1-score, and quadratic weighted kappa for our network was reported as 86%, 87%, 86%, and 91.96%, respectively. Our proposed architecture is simultaneously very simple, accurate, and efficient concerning computational time and space.
△ Less
Submitted 14 April, 2020;
originally announced April 2020.
-
Two Channel Audio Zooming System For Smartphone
Authors:
Anant Khandelwal,
E. B. Goud,
Y. Chand,
L. Kumar,
S. Prasad,
N. Agarwala,
R. Singh
Abstract:
In this paper, two microphone based systems for audio zooming is proposed for the first time. The audio zooming application allows sound capture and enhancement from the front direction while attenuating interfering sources from all other directions. The complete audio zooming system utilizes beamforming based target extraction. In particular, Minimum Power Distortionless Response (MPDR) beamforme…
▽ More
In this paper, two microphone based systems for audio zooming is proposed for the first time. The audio zooming application allows sound capture and enhancement from the front direction while attenuating interfering sources from all other directions. The complete audio zooming system utilizes beamforming based target extraction. In particular, Minimum Power Distortionless Response (MPDR) beamformer and Griffith Jim Beamformer (GJBF) are explored. This is followed by block thresholding for residual noise and interference suppression, and zooming effect creation. A number of simulation and real life experiments using Samsung smartphone (Samsung Galaxy A5) were conducted. Objective and subjective measures confirm the rich user experience.
△ Less
Submitted 13 January, 2020;
originally announced January 2020.
-
Skin Lesion Analyser: An Efficient Seven-Way Multi-Class Skin Cancer Classification Using MobileNet
Authors:
Saket S. Chaturvedi,
Kajol Gupta,
Prakash. S. Prasad
Abstract:
Skin cancer, a major form of cancer, is a critical public health problem with 123,000 newly diagnosed melanoma cases and between 2 and 3 million non-melanoma cases worldwide each year. The leading cause of skin cancer is high exposure of skin cells to UV radiation, which can damage the DNA inside skin cells leading to uncontrolled growth of skin cells. Skin cancer is primarily diagnosed visually e…
▽ More
Skin cancer, a major form of cancer, is a critical public health problem with 123,000 newly diagnosed melanoma cases and between 2 and 3 million non-melanoma cases worldwide each year. The leading cause of skin cancer is high exposure of skin cells to UV radiation, which can damage the DNA inside skin cells leading to uncontrolled growth of skin cells. Skin cancer is primarily diagnosed visually employing clinical screening, a biopsy, dermoscopic analysis, and histopathological examination. It has been demonstrated that the dermoscopic analysis in the hands of inexperienced dermatologists may cause a reduction in diagnostic accuracy. Early detection and screening of skin cancer have the potential to reduce mortality and morbidity. Previous studies have shown Deep Learning ability to perform better than human experts in several visual recognition tasks. In this paper, we propose an efficient seven-way automated multi-class skin cancer classification system having performance comparable with expert dermatologists. We used a pretrained MobileNet model to train over HAM10000 dataset using transfer learning. The model classifies skin lesion image with a categorical accuracy of 83.1 percent, top2 accuracy of 91.36 percent and top3 accuracy of 95.34 percent. The weighted average of precision, recall, and f1-score were found to be 0.89, 0.83, and 0.83 respectively. The model has been deployed as a web application for public use at (https://saketchaturvedi.github.io). This fast, expansible method holds the potential for substantial clinical impact, including broadening the scope of primary care practice and augmenting clinical decision-making for dermatology specialists.
△ Less
Submitted 27 May, 2020; v1 submitted 7 July, 2019;
originally announced July 2019.
-
Explainable Anatomical Shape Analysis through Deep Hierarchical Generative Models
Authors:
Carlo Biffi,
Juan J. Cerrolaza,
Giacomo Tarroni,
Wenjia Bai,
Antonio de Marvao,
Ozan Oktay,
Christian Ledig,
Loic Le Folgoc,
Konstantinos Kamnitsas,
Georgia Doumou,
Jinming Duan,
Sanjay K. Prasad,
Stuart A. Cook,
Declan P. O'Regan,
Daniel Rueckert
Abstract:
Quantification of anatomical shape changes currently relies on scalar global indexes which are largely insensitive to regional or asymmetric modifications. Accurate assessment of pathology-driven anatomical remodeling is a crucial step for the diagnosis and treatment of many conditions. Deep learning approaches have recently achieved wide success in the analysis of medical images, but they lack in…
▽ More
Quantification of anatomical shape changes currently relies on scalar global indexes which are largely insensitive to regional or asymmetric modifications. Accurate assessment of pathology-driven anatomical remodeling is a crucial step for the diagnosis and treatment of many conditions. Deep learning approaches have recently achieved wide success in the analysis of medical images, but they lack interpretability in the feature extraction and decision processes. In this work, we propose a new interpretable deep learning model for shape analysis. In particular, we exploit deep generative networks to model a population of anatomical segmentations through a hierarchy of conditional latent variables. At the highest level of this hierarchy, a two-dimensional latent space is simultaneously optimised to discriminate distinct clinical conditions, enabling the direct visualisation of the classification space. Moreover, the anatomical variability encoded by this discriminative latent space can be visualised in the segmentation space thanks to the generative properties of the model, making the classification task transparent. This approach yielded high accuracy in the categorisation of healthy and remodelled left ventricles when tested on unseen segmentations from our own multi-centre dataset as well as in an external validation set, and on hippocampi from healthy controls and patients with Alzheimer's disease when tested on ADNI data. More importantly, it enabled the visualisation in three-dimensions of both global and regional anatomical features which better discriminate between the conditions under exam. The proposed approach scales effectively to large populations, facilitating high-throughput analysis of normal anatomy and pathology in large-scale studies of volumetric imaging.
△ Less
Submitted 4 January, 2020; v1 submitted 28 June, 2019;
originally announced July 2019.
-
Joint 3D Localization and Classification of Space Debris using a Multispectral Rotating Point Spread Function
Authors:
Chao Wang,
Grey Ballard,
Robert Plemmons,
Sudhakar Prasad
Abstract:
We consider the problem of joint three-dimensional (3D) localization and material classification of unresolved space debris using a multispectral rotating point spread function (RPSF). The use of RPSF allows one to estimate the 3D locations of point sources from their rotated images acquired by a single 2D sensor array, since the amount of rotation of each source image about its x, y location depe…
▽ More
We consider the problem of joint three-dimensional (3D) localization and material classification of unresolved space debris using a multispectral rotating point spread function (RPSF). The use of RPSF allows one to estimate the 3D locations of point sources from their rotated images acquired by a single 2D sensor array, since the amount of rotation of each source image about its x, y location depends on its axial distance z. Using multi-spectral images, with one RPSF per spectral band, we are able not only to localize the 3D positions of the space debris but also classify their material composition. We propose a three-stage method for achieving joint localization and classification. In Stage 1, we adopt an optimization scheme for localization in which the spectral signature of each material is assumed to be uniform, which significantly improves efficiency and yields better localization results than possible with a single spectral band. In Stage 2, we estimate the spectral signature and refine the localization result via an alternating approach. We process classification in the final stage. Both Poisson noise and Gaussian noise models are considered, and the implementation of each is discussed. Numerical tests using multispectral data from NASA show the efficiency of our three-stage approach and illustrate the improvement of point source localization and spectral classification from using multiple bands over a single band.
△ Less
Submitted 11 June, 2019;
originally announced June 2019.
-
Direction of Arrival Estimation for Nanoscale Sensor Networks
Authors:
Shree M. Prasad,
Trilochan Panigrahi,
Mahbub Hassan
Abstract:
Nanoscale wireless sensor networks (NWSNs) could be within reach soon using graphene-based antennas, which resonate in 0.1-10 terahertz band. To conserve the limited energy available at nanoscale, it is expected that NWSNs will communicate using extremely short pulses on the order of femtoseconds. Accurate estimation of direction of arrival (DOA) for such terahertz pulses will help realize many us…
▽ More
Nanoscale wireless sensor networks (NWSNs) could be within reach soon using graphene-based antennas, which resonate in 0.1-10 terahertz band. To conserve the limited energy available at nanoscale, it is expected that NWSNs will communicate using extremely short pulses on the order of femtoseconds. Accurate estimation of direction of arrival (DOA) for such terahertz pulses will help realize many useful applications for NWSNs. In this paper, using the well-known MUltiple SIgnal Classification (MUSIC) algorithm, we study DOA estimation for NWSNs for different energy levels, distances, pulse shapes, and frequencies. Our analyses reveal that the best DOA estimation is achieved with the first order Gaussian pulses, which emit their peak energy at 6 THz. Based on Monte Carlo simulations, we demonstrate that MUSIC algorithm is capable of estimating DOA with root mean square error less than one degree from a distance of around 6 meter for pulse energy as little as 1 atto Joule.
△ Less
Submitted 12 July, 2018;
originally announced July 2018.
-
Non-convex optimization for 3D point source localization using a rotating point spread function
Authors:
Chao Wang,
Raymond Chan,
Mila Nikolova,
Robert Plemmons,
Sudhakar Prasad
Abstract:
We consider the high-resolution imaging problem of 3D point source image recovery from 2D data using a method based on point spread function (PSF) engineering. The method involves a new technique, recently proposed by S.~Prasad, based on the use of a rotating PSF with a single lobe to obtain depth from defocus. The amount of rotation of the PSF encodes the depth position of the point source. Appli…
▽ More
We consider the high-resolution imaging problem of 3D point source image recovery from 2D data using a method based on point spread function (PSF) engineering. The method involves a new technique, recently proposed by S.~Prasad, based on the use of a rotating PSF with a single lobe to obtain depth from defocus. The amount of rotation of the PSF encodes the depth position of the point source. Applications include high-resolution single molecule localization microscopy as well as the problem addressed in this paper on localization of space debris using a space-based telescope. The localization problem is discretized on a cubical lattice where the coordinates of nonzero entries represent the 3D locations and the values of these entries the fluxes of the point sources. Finding the locations and fluxes of the point sources is a large-scale sparse 3D inverse problem. A new nonconvex regularization method with a data-fitting term based on Kullback-Leibler (KL) divergence is proposed for 3D localization for the Poisson noise model. In addition, we propose a new scheme of estimation of the source fluxes from the KL data-fitting term. Numerical experiments illustrate the efficiency and stability of the algorithms that are trained on a random subset of image data before being applied to other images. Our 3D localization algorithms can be readily applied to other kinds of depth-encoding PSFs as well.
△ Less
Submitted 27 September, 2018; v1 submitted 10 April, 2018;
originally announced April 2018.
-
Optimization of LMS Algorithm for System Identification
Authors:
Saurabh R. Prasad,
Bhalchandra B. Godbole
Abstract:
An adaptive filter is defined as a digital filter that has the capability of self adjusting its transfer function under the control of some optimizing algorithms. Most common optimizing algorithms are Least Mean Square (LMS) and Recursive Least Square (RLS). Although RLS algorithm perform superior to LMS algorithm, it has very high computational complexity so not useful in most of the practical sc…
▽ More
An adaptive filter is defined as a digital filter that has the capability of self adjusting its transfer function under the control of some optimizing algorithms. Most common optimizing algorithms are Least Mean Square (LMS) and Recursive Least Square (RLS). Although RLS algorithm perform superior to LMS algorithm, it has very high computational complexity so not useful in most of the practical scenario. So most feasible choice of the adaptive filtering algorithm is the LMS algorithm including its various variants. The LMS algorithm uses transversal FIR filter as underlying digital filter. This paper is based on implementation and optimization of LMS algorithm for the application of unknown system identification. Keywords- Adaptive Filtering, LMS Algorithm, Optimization, System Identification, MATLAB
△ Less
Submitted 3 June, 2017;
originally announced June 2017.