Search | arXiv e-print repository

Direct Image Classification from Fourier Ptychographic Microscopy Measurements without Reconstruction

Authors: Navya Sonal Agarwal, Jan Philipp Schneider, Kanchana Vaishnavi Gandikota, Syed Muhammad Kazim, John Meshreki, Ivo Ihrke, Michael Moeller

Abstract: The computational imaging technique of Fourier Ptychographic Microscopy (FPM) enables high-resolution imaging with a wide field of view and can serve as an extremely valuable tool, e.g. in the classification of cells in medical applications. However, reconstructing a high-resolution image from tens or even hundreds of measurements is computationally expensive, particularly for a wide field of view… ▽ More The computational imaging technique of Fourier Ptychographic Microscopy (FPM) enables high-resolution imaging with a wide field of view and can serve as an extremely valuable tool, e.g. in the classification of cells in medical applications. However, reconstructing a high-resolution image from tens or even hundreds of measurements is computationally expensive, particularly for a wide field of view. Therefore, in this paper, we investigate the idea of classifying the image content in the FPM measurements directly without performing a reconstruction step first. We show that Convolutional Neural Networks (CNN) can extract meaningful information from measurement sequences, significantly outperforming the classification on a single band-limited image (up to 12 %) while being significantly more efficient than a reconstruction of a high-resolution image. Furthermore, we demonstrate that a learned multiplexing of several raw measurements allows maintaining the classification accuracy while reducing the amount of data (and consequently also the acquisition time) significantly. △ Less

Submitted 8 May, 2025; originally announced May 2025.

Comments: ISCS 2025

arXiv:2412.16137 [pdf, ps, other]

Camera-Based Localization and Enhanced Normalized Mutual Information

Authors: Vishnu Teja Kunde, Jean-Francois Chamberland, Siddharth Agarwal

Abstract: Robust and fine localization algorithms are crucial for autonomous driving. For the production of such vehicles as a commodity, affordable sensing solutions and reliable localization algorithms must be designed. This work considers scenarios where the sensor data comes from images captured by an inexpensive camera mounted on the vehicle and where the vehicle contains a fine global map. Such locali… ▽ More Robust and fine localization algorithms are crucial for autonomous driving. For the production of such vehicles as a commodity, affordable sensing solutions and reliable localization algorithms must be designed. This work considers scenarios where the sensor data comes from images captured by an inexpensive camera mounted on the vehicle and where the vehicle contains a fine global map. Such localization algorithms typically involve finding the section in the global map that best matches the captured image. In harsh environments, both the global map and the captured image can be noisy. Because of physical constraints on camera placement, the image captured by the camera can be viewed as a noisy perspective transformed version of the road in the global map. Thus, an optimal algorithm should take into account the unequal noise power in various regions of the captured image, and the intrinsic uncertainty in the global map due to environmental variations. This article briefly reviews two matching methods: (i) standard inner product (SIP) and (ii) normalized mutual information (NMI). It then proposes novel and principled modifications to improve the performance of these algorithms significantly in noisy environments. These enhancements are inspired by the physical constraints associated with autonomous vehicles. They are grounded in statistical signal processing and, in some context, are provably better. Numerical simulations demonstrate the effectiveness of such modifications. △ Less

Submitted 20 December, 2024; originally announced December 2024.

arXiv:2411.07444 [pdf, other]

Input-Based Ensemble-Learning Method for Dynamic Memory Configuration of Serverless Computing Functions

Authors: Siddharth Agarwal, Maria A. Rodriguez, Rajkumar Buyya

Abstract: In today's Function-as-a-Service offerings, a programmer is usually responsible for configuring function memory for its successful execution, which allocates proportional function resources such as CPU and network. However, right-sizing the function memory force developers to speculate performance and make ad-hoc configuration decisions. Recent research has highlighted that a function's input char… ▽ More In today's Function-as-a-Service offerings, a programmer is usually responsible for configuring function memory for its successful execution, which allocates proportional function resources such as CPU and network. However, right-sizing the function memory force developers to speculate performance and make ad-hoc configuration decisions. Recent research has highlighted that a function's input characteristics, such as input size, type and number of inputs, significantly impact its resource demand, run-time performance and costs with fluctuating workloads. This correlation further makes memory configuration a non-trivial task. On that account, an input-aware function memory allocator not only improves developer productivity by completely hiding resource-related decisions but also drives an opportunity to reduce resource wastage and offer a finer-grained cost-optimised pricing scheme. Therefore, we present MemFigLess, a serverless solution that estimates the memory requirement of a serverless function with input-awareness. The framework executes function profiling in an offline stage and trains a multi-output Random Forest Regression model on the collected metrics to invoke input-aware optimal configurations. We evaluate our work with the state-of-the-art approaches on AWS Lambda service to find that MemFigLess is able to capture the input-aware resource relationships and allocate upto 82% less resources and save up to 87% run-time costs. △ Less

Submitted 11 November, 2024; originally announced November 2024.

Comments: 10 pages, 2 tables, 28 figures, accepted conference paper - UCC'24

Journal ref: 17th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2024)

arXiv:2410.21276 [pdf, other]

GPT-4o System Card

Authors: OpenAI, :, Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, Aleksander Mądry, Alex Baker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, Alex Paino, Alex Renzin, Alex Tachard Passos, Alexander Kirillov, Alexi Christakis , et al. (395 additional authors not shown)

Abstract: GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 mil… ▽ More GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50\% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models. In line with our commitment to building AI safely and consistent with our voluntary commitments to the White House, we are sharing the GPT-4o System Card, which includes our Preparedness Framework evaluations. In this System Card, we provide a detailed look at GPT-4o's capabilities, limitations, and safety evaluations across multiple categories, focusing on speech-to-speech while also evaluating text and image capabilities, and measures we've implemented to ensure the model is safe and aligned. We also include third-party assessments on dangerous capabilities, as well as discussion of potential societal impacts of GPT-4o's text and vision capabilities. △ Less

Submitted 25 October, 2024; originally announced October 2024.

arXiv:2410.12652 [pdf, other]

Constrained Posterior Sampling: Time Series Generation with Hard Constraints

Authors: Sai Shankar Narasimhan, Shubhankar Agarwal, Litu Rout, Sanjay Shakkottai, Sandeep P. Chinchali

Abstract: Generating realistic time series samples is crucial for stress-testing models and protecting user privacy by using synthetic data. In engineering and safety-critical applications, these samples must meet certain hard constraints that are domain-specific or naturally imposed by physics or nature. Consider, for example, generating electricity demand patterns with constraints on peak demand times. Th… ▽ More Generating realistic time series samples is crucial for stress-testing models and protecting user privacy by using synthetic data. In engineering and safety-critical applications, these samples must meet certain hard constraints that are domain-specific or naturally imposed by physics or nature. Consider, for example, generating electricity demand patterns with constraints on peak demand times. This can be used to stress-test the functioning of power grids during adverse weather conditions. Existing approaches for generating constrained time series are either not scalable or degrade sample quality. To address these challenges, we introduce Constrained Posterior Sampling (CPS), a diffusion-based sampling algorithm that aims to project the posterior mean estimate into the constraint set after each denoising update. Notably, CPS scales to a large number of constraints (~100) without requiring additional training. We provide theoretical justifications highlighting the impact of our projection step on sampling. Empirically, CPS outperforms state-of-the-art methods in sample quality and similarity to real time series by around 10% and 42%, respectively, on real-world stocks, traffic, and air quality datasets. △ Less

Submitted 16 October, 2024; originally announced October 2024.

arXiv:2409.19829 [pdf, other]

Generalizability of Graph Neural Networks for Decentralized Unlabeled Motion Planning

Authors: Shreyas Muthusamy, Damian Owerko, Charilaos I. Kanatsoulis, Saurav Agarwal, Alejandro Ribeiro

Abstract: Unlabeled motion planning involves assigning a set of robots to target locations while ensuring collision avoidance, aiming to minimize the total distance traveled. The problem forms an essential building block for multi-robot systems in applications such as exploration, surveillance, and transportation. We address this problem in a decentralized setting where each robot knows only the positions o… ▽ More Unlabeled motion planning involves assigning a set of robots to target locations while ensuring collision avoidance, aiming to minimize the total distance traveled. The problem forms an essential building block for multi-robot systems in applications such as exploration, surveillance, and transportation. We address this problem in a decentralized setting where each robot knows only the positions of its $k$-nearest robots and $k$-nearest targets. This scenario combines elements of combinatorial assignment and continuous-space motion planning, posing significant scalability challenges for traditional centralized approaches. To overcome these challenges, we propose a decentralized policy learned via a Graph Neural Network (GNN). The GNN enables robots to determine (1) what information to communicate to neighbors and (2) how to integrate received information with local observations for decision-making. We train the GNN using imitation learning with the centralized Hungarian algorithm as the expert policy, and further fine-tune it using reinforcement learning to avoid collisions and enhance performance. Extensive empirical evaluations demonstrate the scalability and effectiveness of our approach. The GNN policy trained on 100 robots generalizes to scenarios with up to 500 robots, outperforming state-of-the-art solutions by 8.6\% on average and significantly surpassing greedy decentralized methods. This work lays the foundation for solving multi-robot coordination problems in settings where scalability is important. △ Less

Submitted 29 September, 2024; originally announced September 2024.

Comments: 6 pages, 6 figures, submitted to ICRA 2025

arXiv:2409.19071 [pdf, other]

Analog fast Fourier transforms for scalable and efficient signal processing

Authors: T. Patrick Xiao, Ben Feinberg, David K. Richardson, Matthew Cannon, Harsha Medu, Vineet Agrawal, Matthew J. Marinella, Sapan Agarwal, Christopher H. Bennett

Abstract: Edge devices are being deployed at increasing volumes to sense and act on information from the physical world. The discrete Fourier transform (DFT) is often necessary to make this sensed data suitable for further processing $\unicode{x2013}$ such as by artificial intelligence (AI) algorithms $\unicode{x2013}$ and for transmission over communication networks. Analog in-memory computing has been sho… ▽ More Edge devices are being deployed at increasing volumes to sense and act on information from the physical world. The discrete Fourier transform (DFT) is often necessary to make this sensed data suitable for further processing $\unicode{x2013}$ such as by artificial intelligence (AI) algorithms $\unicode{x2013}$ and for transmission over communication networks. Analog in-memory computing has been shown to be a fast and energy-efficient solution for processing edge AI workloads, but not for Fourier transforms. This is because of the existence of the fast Fourier transform (FFT) algorithm, which enormously reduces the complexity of the DFT but has so far belonged only to digital processors. Here, we show that the FFT can be mapped to analog in-memory computing systems, enabling them to efficiently scale to arbitrarily large Fourier transforms without requiring large sizes or large numbers of non-volatile memory arrays. We experimentally demonstrate analog FFTs on 1D audio and 2D image signals, using a large-scale charge-trapping memory array with precisely tunable, low-conductance analog states. The scalability of both the new analog FFT approach and the charge-trapping memory device is leveraged to compute a 65,536-point analog DFT, a scale that is otherwise inaccessible by analog systems and which is $>$1000$\times$ larger than any previous analog DFT demonstration. The analog FFT also provides more numerically precise DFTs with greater tolerance to device and circuit non-idealities than a direct matrix-vector multiplication approach. We show that the extension of the FFT algorithm to analog in-memory processors leads to design considerations that differ markedly from digital implementations, and that analog Fourier transforms have a substantial power efficiency advantage at all size scales over FFTs implemented on state-of-the-art digital hardware. △ Less

Submitted 27 September, 2024; originally announced September 2024.

arXiv:2409.11311 [pdf, other]

Constrained Learning for Decentralized Multi-Objective Coverage Control

Authors: Juan Cervino, Saurav Agarwal, Vijay Kumar, Alejandro Ribeiro

Abstract: The multi-objective coverage control problem requires a robot swarm to collaboratively provide sensor coverage to multiple heterogeneous importance density fields IDFs simultaneously. We pose this as an optimization problem with constraints and study two different formulations: (1) Fair coverage, where we minimize the maximum coverage cost for any field, promoting equitable resource distribution a… ▽ More The multi-objective coverage control problem requires a robot swarm to collaboratively provide sensor coverage to multiple heterogeneous importance density fields IDFs simultaneously. We pose this as an optimization problem with constraints and study two different formulations: (1) Fair coverage, where we minimize the maximum coverage cost for any field, promoting equitable resource distribution among all fields; and (2) Constrained coverage, where each field must be covered below a certain cost threshold, ensuring that critical areas receive adequate coverage according to predefined importance levels. We study the decentralized setting where robots have limited communication and local sensing capabilities, making the system more realistic, scalable, and robust. Given the complexity, we propose a novel decentralized constrained learning approach that combines primal-dual optimization with a Learnable Perception-Action-Communication (LPAC) neural network architecture. We show that the Lagrangian of the dual problem can be reformulated as a linear combination of the IDFs, enabling the LPAC policy to serve as a primal solver. We empirically demonstrate that the proposed method (i) significantly outperforms state-of-the-art decentralized controllers by 30% on average in terms of coverage cost, (ii) transfers well to larger environments with more robots, and (iii) scalable in the number of IDFs and robots in the swarm. △ Less

Submitted 13 March, 2025; v1 submitted 17 September, 2024; originally announced September 2024.

arXiv:2408.00870 [pdf, other]

Self-Similar Characteristics in Queue Length Dynamics: Insights from Adaptive Signalized Corridor

Authors: Shakib Mustavee, Shaurya Agarwal

Abstract: Self-similarity, a fractal characteristic of traffic flow dynamics, is widely recognized in transportation engineering and physics. However, its practical application in real-world traffic scenarios remains limited. Conversely, the traffic flow dynamics at adaptive signalized intersections still need to be fully understood. This paper addresses this gap by analyzing the queue length time series fr… ▽ More Self-similarity, a fractal characteristic of traffic flow dynamics, is widely recognized in transportation engineering and physics. However, its practical application in real-world traffic scenarios remains limited. Conversely, the traffic flow dynamics at adaptive signalized intersections still need to be fully understood. This paper addresses this gap by analyzing the queue length time series from an adaptive signalized corridor and characterizing its self-similarity. The findings uncover a $1/f$ structure in the power spectrum of queue lengths, indicative of self-similarity. Furthermore, the paper estimates local scaling exponents $(α)$, a measure of self-similarity computed via detrended fluctuation analysis (DFA), and identifies a positive correlation with congestion patterns. Additionally, the study examines the fractal dynamics of queue length through the evolution of scaling exponent. As a result, the paper offers new insights into the queue length dynamics of signalized intersections, which might help better understand the impact of adaptivity within the system. △ Less

Submitted 1 August, 2024; originally announced August 2024.

arXiv:2408.00867 [pdf, other]

An Extreme Value Theory Approach for Understanding Queue Length Dynamics in Adaptive Corridors

Authors: Shakib Mustavee, Pushkin Kachroo, Shaurya Agarwal

Abstract: This paper introduces a novel approach employing extreme value theory to analyze queue lengths within a corridor controlled by adaptive controllers. We consider the maximum queue lengths of a signalized corridor consisting of nine intersections every two minutes, roughly equivalent to the cycle length. Our research shows that maximum queue lengths at all the intersections follow the extreme value… ▽ More This paper introduces a novel approach employing extreme value theory to analyze queue lengths within a corridor controlled by adaptive controllers. We consider the maximum queue lengths of a signalized corridor consisting of nine intersections every two minutes, roughly equivalent to the cycle length. Our research shows that maximum queue lengths at all the intersections follow the extreme value distributions. To the best knowledge of the authors, this is the first attempt to characterize queue length time series using extreme value analysis. These findings are significant as they offer a mechanism to assess the extremity of queue lengths, thereby aiding in evaluating the effectiveness of the adaptive signal controllers and corridor management. Given that extreme queue lengths often precipitate spillover effects, this insight can be instrumental in preempting such scenarios. △ Less

Submitted 1 August, 2024; originally announced August 2024.

arXiv:2405.05658 [pdf]

Artificial intelligence for abnormality detection in high volume neuroimaging: a systematic review and meta-analysis

Authors: Siddharth Agarwal, David A. Wood, Mariusz Grzeda, Chandhini Suresh, Munaib Din, James Cole, Marc Modat, Thomas C Booth

Abstract: Purpose: Most studies evaluating artificial intelligence (AI) models that detect abnormalities in neuroimaging are either tested on unrepresentative patient cohorts or are insufficiently well-validated, leading to poor generalisability to real-world tasks. The aim was to determine the diagnostic test accuracy and summarise the evidence supporting the use of AI models performing first-line, high-vo… ▽ More Purpose: Most studies evaluating artificial intelligence (AI) models that detect abnormalities in neuroimaging are either tested on unrepresentative patient cohorts or are insufficiently well-validated, leading to poor generalisability to real-world tasks. The aim was to determine the diagnostic test accuracy and summarise the evidence supporting the use of AI models performing first-line, high-volume neuroimaging tasks. Methods: Medline, Embase, Cochrane library and Web of Science were searched until September 2021 for studies that temporally or externally validated AI capable of detecting abnormalities in first-line CT or MR neuroimaging. A bivariate random-effects model was used for meta-analysis where appropriate. PROSPERO: CRD42021269563. Results: Only 16 studies were eligible for inclusion. Included studies were not compromised by unrepresentative datasets or inadequate validation methodology. Direct comparison with radiologists was available in 4/16 studies. 15/16 had a high risk of bias. Meta-analysis was only suitable for intracranial haemorrhage detection in CT imaging (10/16 studies), where AI systems had a pooled sensitivity and specificity 0.90 (95% CI 0.85 - 0.94) and 0.90 (95% CI 0.83 - 0.95) respectively. Other AI studies using CT and MRI detected target conditions other than haemorrhage (2/16), or multiple target conditions (4/16). Only 3/16 studies implemented AI in clinical pathways, either for pre-read triage or as post-read discrepancy identifiers. Conclusion: The paucity of eligible studies reflects that most abnormality detection AI studies were not adequately validated in representative clinical cohorts. The few studies describing how abnormality detection AI could impact patients and clinicians did not explore the full ramifications of clinical implementation. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2403.02682 [pdf, other]

Time Weaver: A Conditional Time Series Generation Model

Authors: Sai Shankar Narasimhan, Shubhankar Agarwal, Oguzhan Akcin, Sujay Sanghavi, Sandeep Chinchali

Abstract: Imagine generating a city's electricity demand pattern based on weather, the presence of an electric vehicle, and location, which could be used for capacity planning during a winter freeze. Such real-world time series are often enriched with paired heterogeneous contextual metadata (weather, location, etc.). Current approaches to time series generation often ignore this paired metadata, and its he… ▽ More Imagine generating a city's electricity demand pattern based on weather, the presence of an electric vehicle, and location, which could be used for capacity planning during a winter freeze. Such real-world time series are often enriched with paired heterogeneous contextual metadata (weather, location, etc.). Current approaches to time series generation often ignore this paired metadata, and its heterogeneity poses several practical challenges in adapting existing conditional generation approaches from the image, audio, and video domains to the time series domain. To address this gap, we introduce Time Weaver, a novel diffusion-based model that leverages the heterogeneous metadata in the form of categorical, continuous, and even time-variant variables to significantly improve time series generation. Additionally, we show that naive extensions of standard evaluation metrics from the image to the time series domain are insufficient. These metrics do not penalize conditional generation approaches for their poor specificity in reproducing the metadata-specific features in the generated time series. Thus, we innovate a novel evaluation metric that accurately captures the specificity of conditional generation and the realism of the generated time series. We show that Time Weaver outperforms state-of-the-art benchmarks, such as Generative Adversarial Networks (GANs), by up to 27% in downstream classification tasks on real-world energy, medical, air quality, and traffic data sets. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2401.00728 [pdf, other]

MultiFusionNet: Multilayer Multimodal Fusion of Deep Neural Networks for Chest X-Ray Image Classification

Authors: Saurabh Agarwal, K. V. Arya, Yogesh Kumar Meena

Abstract: Chest X-ray imaging is a critical diagnostic tool for identifying pulmonary diseases. However, manual interpretation of these images is time-consuming and error-prone. Automated systems utilizing convolutional neural networks (CNNs) have shown promise in improving the accuracy and efficiency of chest X-ray image classification. While previous work has mainly focused on using feature maps from the… ▽ More Chest X-ray imaging is a critical diagnostic tool for identifying pulmonary diseases. However, manual interpretation of these images is time-consuming and error-prone. Automated systems utilizing convolutional neural networks (CNNs) have shown promise in improving the accuracy and efficiency of chest X-ray image classification. While previous work has mainly focused on using feature maps from the final convolution layer, there is a need to explore the benefits of leveraging additional layers for improved disease classification. Extracting robust features from limited medical image datasets remains a critical challenge. In this paper, we propose a novel deep learning-based multilayer multimodal fusion model that emphasizes extracting features from different layers and fusing them. Our disease detection model considers the discriminatory information captured by each layer. Furthermore, we propose the fusion of different-sized feature maps (FDSFM) module to effectively merge feature maps from diverse layers. The proposed model achieves a significantly higher accuracy of 97.21% and 99.60% for both three-class and two-class classifications, respectively. The proposed multilayer multimodal fusion model, along with the FDSFM module, holds promise for accurate disease classification and can also be extended to other disease classifications in chest X-ray images. △ Less

Submitted 1 January, 2024; originally announced January 2024.

Comments: 19 pages

arXiv:2310.13259 [pdf]

Domain-specific optimization and diverse evaluation of self-supervised models for histopathology

Authors: Jeremy Lai, Faruk Ahmed, Supriya Vijay, Tiam Jaroensri, Jessica Loo, Saurabh Vyawahare, Saloni Agarwal, Fayaz Jamil, Yossi Matias, Greg S. Corrado, Dale R. Webster, Jonathan Krause, Yun Liu, Po-Hsuan Cameron Chen, Ellery Wulczyn, David F. Steiner

Abstract: Task-specific deep learning models in histopathology offer promising opportunities for improving diagnosis, clinical research, and precision medicine. However, development of such models is often limited by availability of high-quality data. Foundation models in histopathology that learn general representations across a wide range of tissue types, diagnoses, and magnifications offer the potential… ▽ More Task-specific deep learning models in histopathology offer promising opportunities for improving diagnosis, clinical research, and precision medicine. However, development of such models is often limited by availability of high-quality data. Foundation models in histopathology that learn general representations across a wide range of tissue types, diagnoses, and magnifications offer the potential to reduce the data, compute, and technical expertise necessary to develop task-specific deep learning models with the required level of model performance. In this work, we describe the development and evaluation of foundation models for histopathology via self-supervised learning (SSL). We first establish a diverse set of benchmark tasks involving 17 unique tissue types and 12 unique cancer types and spanning different optimal magnifications and task types. Next, we use this benchmark to explore and evaluate histopathology-specific SSL methods followed by further evaluation on held out patch-level and weakly supervised tasks. We found that standard SSL methods thoughtfully applied to histopathology images are performant across our benchmark tasks and that domain-specific methodological improvements can further increase performance. Our findings reinforce the value of using domain-specific SSL methods in pathology, and establish a set of high quality foundation models to enable further research across diverse applications. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: 4 main tables, 3 main figures, additional supplemental tables and figures

arXiv:2309.11076 [pdf, other]

Symbolic Regression on Sparse and Noisy Data with Gaussian Processes

Authors: Junette Hsin, Shubhankar Agarwal, Adam Thorpe, Luis Sentis, David Fridovich-Keil

Abstract: In this paper, we address the challenge of deriving dynamical models from sparse and noisy data. High-quality data is crucial for symbolic regression algorithms; limited and noisy data can present modeling challenges. To overcome this, we combine Gaussian process regression with a sparse identification of nonlinear dynamics (SINDy) method to denoise the data and identify nonlinear dynamical equati… ▽ More In this paper, we address the challenge of deriving dynamical models from sparse and noisy data. High-quality data is crucial for symbolic regression algorithms; limited and noisy data can present modeling challenges. To overcome this, we combine Gaussian process regression with a sparse identification of nonlinear dynamics (SINDy) method to denoise the data and identify nonlinear dynamical equations. Our approach GPSINDy offers improved robustness with sparse, noisy data compared to SINDy alone. We demonstrate its effectiveness on simulation data from Lotka-Volterra and unicycle models and hardware data from an NVIDIA JetRacer system. We show superior performance over baselines including more than 50% improvement over SINDy and other baselines in predicting future trajectories from noise-corrupted and sparse 5 Hz data. △ Less

Submitted 10 October, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

Comments: Submitted to ACC 2025

arXiv:2308.07541 [pdf, other]

On-demand Cold Start Frequency Reduction with Off-Policy Reinforcement Learning in Serverless Computing

Authors: Siddharth Agarwal, Maria A. Rodriguez, Rajkumar Buyya

Abstract: Function-as-a-Service (FaaS) is a cloud computing paradigm offering an event-driven execution model to applications. It features serverless attributes by eliminating resource management responsibilities from developers, and offers transparent and on-demand scalability of applications. To provide seamless on-demand scalability, new function instances are prepared to serve the incoming workload in t… ▽ More Function-as-a-Service (FaaS) is a cloud computing paradigm offering an event-driven execution model to applications. It features serverless attributes by eliminating resource management responsibilities from developers, and offers transparent and on-demand scalability of applications. To provide seamless on-demand scalability, new function instances are prepared to serve the incoming workload in the absence or unavailability of function instances. However, FaaS platforms are known to suffer from cold starts, where this function provisioning process introduces a non-negligible delay in function response and reduces the end-user experience. Therefore, the presented work focuses on reducing the frequent, on-demand cold starts on the platform by using Reinforcement Learning(RL). The proposed approach uses model-free Q-learning that consider function metrics such as CPU utilization, existing function instances, and response failure rate, to proactively initialize functions, in advance, based on the expected demand. The proposed solution is implemented on Kubeless and evaluated using an open-source function invocation trace applied to a matrix multiplication function. The evaluation results demonstrate a favourable performance of the RL-based agent when compared to Kubeless' default policy and a function keep-alive policy by improving throughput by up to 8.81% and reducing computation load and resource wastage by up to 55% and 37%, respectively, that is a direct outcome of reduced cold starts. △ Less

Submitted 12 November, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

Comments: 13 figures, 24 pages, 3 tables

Journal ref: International Conference on Computational Intelligence and Data Analytics (ICCIDA 2024, Springer, Singapore), Hyderabad, India, June 28-29, 2024

arXiv:2308.05937 [pdf, other]

doi 10.1109/TSC.2024.3387661

A Deep Recurrent-Reinforcement Learning Method for Intelligent AutoScaling of Serverless Functions

Authors: Siddharth Agarwal, Maria A. Rodriguez, Rajkumar Buyya

Abstract: FaaS introduces a lightweight, function-based cloud execution model that finds its relevance in a range of applications like IoT-edge data processing and anomaly detection. While cloud service providers offer a near-infinite function elasticity, these applications often experience fluctuating workloads and stricter performance constraints. A typical CSP strategy is to empirically determine and adj… ▽ More FaaS introduces a lightweight, function-based cloud execution model that finds its relevance in a range of applications like IoT-edge data processing and anomaly detection. While cloud service providers offer a near-infinite function elasticity, these applications often experience fluctuating workloads and stricter performance constraints. A typical CSP strategy is to empirically determine and adjust desired function instances or resources, known as autoscaling, based on monitoring-based thresholds such as CPU or memory, to cope with demand and performance. However, threshold configuration either requires expert knowledge, historical data or a complete view of the environment, making autoscaling a performance bottleneck that lacks an adaptable solution. RL algorithms are proven to be beneficial in analysing complex cloud environments and result in an adaptable policy that maximizes the expected objectives. Most realistic cloud environments usually involve operational interference and have limited visibility, making them partially observable. A general solution to tackle observability in highly dynamic settings is to integrate Recurrent units with model-free RL algorithms and model a decision process as a POMDP. Therefore, in this paper, we investigate model-free Recurrent RL agents for function autoscaling and compare them against the model-free PPO algorithm. We explore the integration of a LSTM network with the state-of-the-art PPO algorithm to find that under our experimental and evaluation settings, recurrent policies were able to capture the environment parameters and show promising results for function autoscaling. We further compare a PPO-based autoscaling agent with commercially used threshold-based function autoscaling and posit that a LSTM-based autoscaling agent is able to improve throughput by 18%, function execution by 13% and account for 8.4% more function instances. △ Less

Submitted 11 November, 2024; v1 submitted 11 August, 2023; originally announced August 2023.

Comments: 12 pages, 15 figures, 4 tables

Journal ref: in IEEE Transactions on Services Computing, vol. 17, no. 5, pp. 1899-1910, Sept.-Oct. 2024

arXiv:2208.12410 [pdf, other]

Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer

Authors: Shrutina Agarwal, Sriram Ganapathy, Naoya Takahashi

Abstract: In this paper, we propose a model to perform style transfer of speech to singing voice. Contrary to the previous signal processing-based methods, which require high-quality singing templates or phoneme synchronization, we explore a data-driven approach for the problem of converting natural speech to singing voice. We develop a novel neural network architecture, called SymNet, which models the alig… ▽ More In this paper, we propose a model to perform style transfer of speech to singing voice. Contrary to the previous signal processing-based methods, which require high-quality singing templates or phoneme synchronization, we explore a data-driven approach for the problem of converting natural speech to singing voice. We develop a novel neural network architecture, called SymNet, which models the alignment of the input speech with the target melody while preserving the speaker identity and naturalness. The proposed SymNet model is comprised of symmetrical stack of three types of layers - convolutional, transformer, and self-attention layers. The paper also explores novel data augmentation and generative loss annealing methods to facilitate the model training. Experiments are performed on the NUS and NHSS datasets which consist of parallel data of speech and singing voice. In these experiments, we show that the proposed SymNet model improves the objective reconstruction quality significantly over the previously published methods and baseline architectures. Further, a subjective listening test confirms the improved quality of the audio obtained using the proposed approach (absolute improvement of 0.37 in mean opinion score measure over the baseline system). △ Less

Submitted 25 August, 2022; originally announced August 2022.

Comments: accepted to INTERSPEECH 2022

arXiv:2204.03573 [pdf]

An optimized hybrid solution for IoT based lifestyle disease classification using stress data

Authors: Sadhana Tiwari, Sonali Agarwal

Abstract: Stress, anxiety, and nervousness are all high-risk health states in everyday life. Previously, stress levels were determined by speaking with people and gaining insight into what they had experienced recently or in the past. Typically, stress is caused by an incidence that occurred a long time ago, but sometimes it is triggered by unknown factors. This is a challenging and complex task, but recent… ▽ More Stress, anxiety, and nervousness are all high-risk health states in everyday life. Previously, stress levels were determined by speaking with people and gaining insight into what they had experienced recently or in the past. Typically, stress is caused by an incidence that occurred a long time ago, but sometimes it is triggered by unknown factors. This is a challenging and complex task, but recent research advances have provided numerous opportunities to automate it. The fundamental features of most of these techniques are electro dermal activity (EDA) and heart rate values (HRV). We utilized an accelerometer to measure body motions to solve this challenge. The proposed novel method employs a test that measures a subject's electrocardiogram (ECG), galvanic skin values (GSV), HRV values, and body movements in order to provide a low-cost and time-saving solution for detecting stress lifestyle disease in modern times using cyber physical systems. This study provides a new hybrid model for lifestyle disease classification that decreases execution time while picking the best collection of characteristics and increases classification accuracy. The developed approach is capable of dealing with the class imbalance problem by using WESAD (wearable stress and affect dataset) dataset. The new model uses the Grid search (GS) method to select an optimized set of hyper parameters, and it uses a combination of the Correlation coefficient based Recursive feature elimination (CoC-RFE) method for optimal feature selection and gradient boosting as an estimator to classify the dataset, which achieves high accuracy and helps to provide smart, accurate, and high-quality healthcare systems. To demonstrate the validity and utility of the proposed methodology, its performance is compared to those of other well-established machine learning models. △ Less

Submitted 4 April, 2022; originally announced April 2022.

Comments: Data mining and Data analytics used for healthcare data

arXiv:2203.10014 [pdf, other]

Parametric Scaling of Preprocessing assisted U-net Architecture for Improvised Retinal Vessel Segmentation

Authors: Kundan Kumar, Sumanshu Agarwal

Abstract: Extracting blood vessels from retinal fundus images plays a decisive role in diagnosing the progression in pertinent diseases. In medical image analysis, vessel extraction is a semantic binary segmentation problem, where blood vasculature needs to be extracted from the background. Here, we present an image enhancement technique based on the morphological preprocessing coupled with a scaled U-net a… ▽ More Extracting blood vessels from retinal fundus images plays a decisive role in diagnosing the progression in pertinent diseases. In medical image analysis, vessel extraction is a semantic binary segmentation problem, where blood vasculature needs to be extracted from the background. Here, we present an image enhancement technique based on the morphological preprocessing coupled with a scaled U-net architecture. Despite a relatively less number of trainable network parameters, the scaled version of U-net architecture provides better performance compare to other methods in the domain. We validated the proposed method on retinal fundus images from the DRIVE database. A significant improvement as compared to the other algorithms in the domain, in terms of the area under ROC curve (>0.9762) and classification accuracy (>95.47%) are evident from the results. Furthermore, the proposed method is resistant to the central vessel reflex while sensitive to detect blood vessels in the presence of background items viz. exudates, optic disc, and fovea. △ Less

Submitted 18 March, 2022; originally announced March 2022.

Comments: 10 pages, 5 figures, ICAIHC-2022

arXiv:2203.10005 [pdf, other]

Application of Top-hat Transformation for Enhanced Blood Vessel Extraction

Authors: Tithi Parna Das, Sheetal Praharaj, Sarita Swain, Sumanshu Agarwal, Kundan Kumar

Abstract: In the medical domain, different computer-aided diagnosis systems have been proposed to extract blood vessels from retinal fundus images for the clinical treatment of vascular diseases. Accurate extraction of blood vessels from the fundus images using a computer-generated method can help the clinician to produce timely and accurate reports for the patient suffering from these diseases. In this art… ▽ More In the medical domain, different computer-aided diagnosis systems have been proposed to extract blood vessels from retinal fundus images for the clinical treatment of vascular diseases. Accurate extraction of blood vessels from the fundus images using a computer-generated method can help the clinician to produce timely and accurate reports for the patient suffering from these diseases. In this article, we integrate top-hat based preprocessing approach with fine-tuned B-COSFIRE filter to achieve more accurate segregation of blood vessel pixels from the background. The use of top-hat transformation in the preprocessing stage enhances the efficacy of the algorithm to extract blood vessels in presence of structures like fovea, exudates, haemorrhages, etc. Furthermore, to reduce the false positives, small clusters of blood vessel pixels are removed in the postprocessing stage. Further, we find that the proposed algorithm is more efficient as compared to various modern algorithms reported in the literature. △ Less

Submitted 18 March, 2022; originally announced March 2022.

Comments: 9 pages, 3 figures, ICAIHC-2022

arXiv:2201.08020 [pdf, other]

A Deep Learning Approach To Estimation Using Measurements Received Over a Network

Authors: Shivangi Agarwal, Sanjit K. Kaul, Saket Anand, P. B. Sujit

Abstract: We propose a novel deep neural network (DNN) based approximation architecture to learn estimates of measurements. We detail an algorithm that enables training of the DNN. The DNN estimator only uses measurements, if and when they are received over a communication network. The measurements are communicated over a network as packets, at a rate unknown to the estimator. Packets may suffer drops and n… ▽ More We propose a novel deep neural network (DNN) based approximation architecture to learn estimates of measurements. We detail an algorithm that enables training of the DNN. The DNN estimator only uses measurements, if and when they are received over a communication network. The measurements are communicated over a network as packets, at a rate unknown to the estimator. Packets may suffer drops and need retransmission. They may suffer waiting delays as they traverse a network path. Works on estimation often assume knowledge of the dynamic model of the measured system, which may not be available in practice. The DNN estimator doesn't assume knowledge of the dynamic system model or the communication network. It doesn't require a history of measurements, often used by other works. The DNN estimator results in significantly smaller average estimation error than the commonly used Time-varying Kalman Filter and the Unscented Kalman Filter, in simulations of linear and nonlinear dynamic systems. The DNN need not be trained separately for different communications network settings. It is robust to errors in estimation of network delays that occur due to imperfect time synchronization between the measurement source and the estimator. Last but not the least, our simulations shed light on the rate of updates that result in low estimation error. △ Less

Submitted 12 September, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

arXiv:2112.03916 [pdf, other]

BT-Unet: A self-supervised learning framework for biomedical image segmentation using Barlow Twins with U-Net models

Authors: Narinder Singh Punn, Sonali Agarwal

Abstract: Deep learning has brought the most profound contribution towards biomedical image segmentation to automate the process of delineation in medical imaging. To accomplish such task, the models are required to be trained using huge amount of annotated or labelled data that highlights the region of interest with a binary mask. However, efficient generation of the annotations for such huge data requires… ▽ More Deep learning has brought the most profound contribution towards biomedical image segmentation to automate the process of delineation in medical imaging. To accomplish such task, the models are required to be trained using huge amount of annotated or labelled data that highlights the region of interest with a binary mask. However, efficient generation of the annotations for such huge data requires expert biomedical analysts and extensive manual effort. It is a tedious and expensive task, while also being vulnerable to human error. To address this problem, a self-supervised learning framework, BT-Unet is proposed that uses the Barlow Twins approach to pre-train the encoder of a U-Net model via redundancy reduction in an unsupervised manner to learn data representation. Later, complete network is fine-tuned to perform actual segmentation. The BT-Unet framework can be trained with a limited number of annotated samples while having high number of unannotated samples, which is mostly the case in real-world problems. This framework is validated over multiple U-Net models over diverse datasets by generating scenarios of a limited number of labelled samples using standard evaluation metrics. With exhaustive experiment trials, it is observed that the BT-Unet framework enhances the performance of the U-Net models with significant margin under such circumstances. △ Less

Submitted 23 March, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

arXiv:2108.02508 [pdf, other]

doi 10.1007/s00138-022-01280-3

RCA-IUnet: A residual cross-spatial attention guided inception U-Net model for tumor segmentation in breast ultrasound imaging

Authors: Narinder Singh Punn, Sonali Agarwal

Abstract: The advancements in deep learning technologies have produced immense contributions to biomedical image analysis applications. With breast cancer being the common deadliest disease among women, early detection is the key means to improve survivability. Medical imaging like ultrasound presents an excellent visual representation of the functioning of the organs; however, for any radiologist analysing… ▽ More The advancements in deep learning technologies have produced immense contributions to biomedical image analysis applications. With breast cancer being the common deadliest disease among women, early detection is the key means to improve survivability. Medical imaging like ultrasound presents an excellent visual representation of the functioning of the organs; however, for any radiologist analysing such scans is challenging and time consuming which delays the diagnosis process. Although various deep learning based approaches are proposed that achieved promising results, the present article introduces an efficient residual cross-spatial attention guided inception U-Net (RCA-IUnet) model with minimal training parameters for tumor segmentation using breast ultrasound imaging to further improve the segmentation performance of varying tumor sizes. The RCA-IUnet model follows U-Net topology with residual inception depth-wise separable convolution and hybrid pooling (max pooling and spectral pooling) layers. In addition, cross-spatial attention filters are added to suppress the irrelevant features and focus on the target structure. The segmentation performance of the proposed model is validated on two publicly available datasets using standard segmentation evaluation metrics, where it outperformed the other state-of-the-art segmentation models. △ Less

Submitted 2 January, 2022; v1 submitted 5 August, 2021; originally announced August 2021.

Journal ref: Machine Vision and Applications, Springer, 2022

arXiv:2107.12321 [pdf, other]

doi 10.1007/978-3-030-93620-4_1

MAG-Net: Multi-task attention guided network for brain tumor segmentation and classification

Authors: Sachin Gupta, Narinder Singh Punn, Sanjay Kumar Sonbhadra, Sonali Agarwal

Abstract: Brain tumor is the most common and deadliest disease that can be found in all age groups. Generally, MRI modality is adopted for identifying and diagnosing tumors by the radiologists. The correct identification of tumor regions and its type can aid to diagnose tumors with the followup treatment plans. However, for any radiologist analysing such scans is a complex and time-consuming task. Motivated… ▽ More Brain tumor is the most common and deadliest disease that can be found in all age groups. Generally, MRI modality is adopted for identifying and diagnosing tumors by the radiologists. The correct identification of tumor regions and its type can aid to diagnose tumors with the followup treatment plans. However, for any radiologist analysing such scans is a complex and time-consuming task. Motivated by the deep learning based computer-aided-diagnosis systems, this paper proposes multi-task attention guided encoder-decoder network (MAG-Net) to classify and segment the brain tumor regions using MRI images. The MAG-Net is trained and evaluated on the Figshare dataset that includes coronal, axial, and sagittal views with 3 types of tumors meningioma, glioma, and pituitary tumor. With exhaustive experimental trials the model achieved promising results as compared to existing state-of-the-art models, while having least number of training parameters among other state-of-the-art models. △ Less

Submitted 6 December, 2021; v1 submitted 26 July, 2021; originally announced July 2021.

arXiv:2107.07380 [pdf, other]

A Linear Dynamical Perspective on Epidemiology: Interplay Between Early COVID-19 Outbreak and Human Mobility

Authors: Shakib Mustavee, Shaurya Agarwal, Chinwendu Enyioha, Suddhasattwa Das

Abstract: This paper investigates the impact of human activity and mobility (HAM) in the spreading dynamics of an epidemic. Specifically, it explores the interconnections between HAM and its effect on the early spread of the COVID-19 virus. During the early stages of the pandemic, effective reproduction numbers exhibited a high correlation with human mobility patterns, leading to a hypothesis that the HAM s… ▽ More This paper investigates the impact of human activity and mobility (HAM) in the spreading dynamics of an epidemic. Specifically, it explores the interconnections between HAM and its effect on the early spread of the COVID-19 virus. During the early stages of the pandemic, effective reproduction numbers exhibited a high correlation with human mobility patterns, leading to a hypothesis that the HAM system can be studied as a coupled system with disease spread dynamics. This study applies the generalized Koopman framework with control inputs to determine the nonlinear disease spread dynamics and the input-output characteristics as a locally linear controlled dynamical system. The approach solely relies on the snapshots of spatiotemporal data and does not require any knowledge of the system's physical laws. We exploit the Koopman operator framework by utilizing the Hankel Dynamic Mode Decomposition with Control (HDMDc) algorithm to obtain a linear disease spread model incorporating human mobility as a control input. The study demonstrated that the proposed methodology could capture the impact of local mobility on the early dynamics of the ongoing global pandemic. The obtained locally linear model can accurately forecast the number of new infections for various prediction windows ranging from two to four weeks. The study corroborates a leader-follower relationship between mobility and disease spread dynamics. In addition, the effect of delay embedding in the HDMDc algorithm is also investigated and reported. A case study was performed using COVID infection data from Florida, US, and HAM data extracted from Google community mobility data report. △ Less

Submitted 4 August, 2021; v1 submitted 13 July, 2021; originally announced July 2021.

arXiv:2107.06369 [pdf, other]

Exploring DMD-type Algorithms for Modeling Signalised Intersections

Authors: Kazi Redwan Shabab, Shakib Mustavee, Shaurya Agarwal, Mohamed H. Zaki, Sajal Das

Abstract: This paper explores a novel data-driven approach based on recent developments in Koopman operator theory and dynamic mode decomposition (DMD) for modeling signalized intersections. Vehicular flow and queue formation on signalized intersections have complex nonlinear dynamics, making system identification, modeling, and controller design tasks challenging. We employ a Koopman theoretic approach to… ▽ More This paper explores a novel data-driven approach based on recent developments in Koopman operator theory and dynamic mode decomposition (DMD) for modeling signalized intersections. Vehicular flow and queue formation on signalized intersections have complex nonlinear dynamics, making system identification, modeling, and controller design tasks challenging. We employ a Koopman theoretic approach to transform the original nonlinear dynamics into locally linear infinite-dimensional dynamics. The data-driven approach relies entirely on spatio-temporal snapshots of the traffic data. We investigate several key aspects of the approach and provide insights into the usage of DMD-type algorithms for application in adaptive signalized intersections. To demonstrate the utility of the obtained linearized dynamics, we perform prediction of the queue lengths at the intersection; and compare the results with the state-of-the-art long short term memory (LSTM) method. The case study involves the morning peak vehicle movements and queue lengths at two Orlando area signalized intersections. It is observed that DMD-based algorithms are able to capture complex dynamics with a linear approximation to a reasonable extent. △ Less

Submitted 13 July, 2021; originally announced July 2021.

Comments: 11 pages, 8 figures, Submitted to: Journal of Intelligent Transportation Systems

Report number: GITS-2021-0219

arXiv:2107.04537 [pdf, other]

doi 10.1007/s10462-022-10152-1

Modality specific U-Net variants for biomedical image segmentation: A survey

Authors: Narinder Singh Punn, Sonali Agarwal

Abstract: With the advent of advancements in deep learning approaches, such as deep convolution neural network, residual neural network, adversarial network; U-Net architectures are most widely utilized in biomedical image segmentation to address the automation in identification and detection of the target regions or sub-regions. In recent studies, U-Net based approaches have illustrated state-of-the-art pe… ▽ More With the advent of advancements in deep learning approaches, such as deep convolution neural network, residual neural network, adversarial network; U-Net architectures are most widely utilized in biomedical image segmentation to address the automation in identification and detection of the target regions or sub-regions. In recent studies, U-Net based approaches have illustrated state-of-the-art performance in different applications for the development of computer-aided diagnosis systems for early diagnosis and treatment of diseases such as brain tumor, lung cancer, alzheimer, breast cancer, etc., using various modalities. This article contributes in presenting the success of these approaches by describing the U-Net framework, followed by the comprehensive analysis of the U-Net variants by performing 1) inter-modality, and 2) intra-modality categorization to establish better insights into the associated challenges and solutions. Besides, this article also highlights the contribution of U-Net based frameworks in the ongoing pandemic, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) also known as COVID-19. Finally, the strengths and similarities of these U-Net variants are analysed along with the challenges involved in biomedical image segmentation to uncover promising future research directions in this area. △ Less

Submitted 27 January, 2022; v1 submitted 9 July, 2021; originally announced July 2021.

Journal ref: Artificial Intelligence Review (2022)

arXiv:2106.08176 [pdf, other]

Automated triaging of head MRI examinations using convolutional neural networks

Authors: David A. Wood, Sina Kafiabadi, Ayisha Al Busaidi, Emily Guilhem, Antanas Montvila, Siddharth Agarwal, Jeremy Lynch, Matthew Townend, Gareth Barker, Sebastien Ourselin, James H. Cole, Thomas C. Booth

Abstract: The growing demand for head magnetic resonance imaging (MRI) examinations, along with a global shortage of radiologists, has led to an increase in the time taken to report head MRI scans around the world. For many neurological conditions, this delay can result in increased morbidity and mortality. An automated triaging tool could reduce reporting times for abnormal examinations by identifying abno… ▽ More The growing demand for head magnetic resonance imaging (MRI) examinations, along with a global shortage of radiologists, has led to an increase in the time taken to report head MRI scans around the world. For many neurological conditions, this delay can result in increased morbidity and mortality. An automated triaging tool could reduce reporting times for abnormal examinations by identifying abnormalities at the time of imaging and prioritizing the reporting of these scans. In this work, we present a convolutional neural network for detecting clinically-relevant abnormalities in $\text{T}_2$-weighted head MRI scans. Using a validated neuroradiology report classifier, we generated a labelled dataset of 43,754 scans from two large UK hospitals for model training, and demonstrate accurate classification (area under the receiver operating curve (AUC) = 0.943) on a test set of 800 scans labelled by a team of neuroradiologists. Importantly, when trained on scans from only a single hospital the model generalized to scans from the other hospital ($Δ$AUC $\leq$ 0.02). A simulation study demonstrated that our model would reduce the mean reporting time for abnormal examinations from 28 days to 14 days and from 9 days to 5 days at the two hospitals, demonstrating feasibility for use in a clinical triage environment. △ Less

Submitted 28 June, 2022; v1 submitted 15 June, 2021; originally announced June 2021.

Comments: Accepted as an oral presentation at Medical Imaging with Deep Learning (MIDL) 2021

arXiv:2102.08575 [pdf, ps, other]

End-to-end lyrics Recognition with Voice to Singing Style Transfer

Authors: Sakya Basak, Shrutina Agarwal, Sriram Ganapathy, Naoya Takahashi

Abstract: Automatic transcription of monophonic/polyphonic music is a challenging task due to the lack of availability of large amounts of transcribed data. In this paper, we propose a data augmentation method that converts natural speech to singing voice based on vocoder based speech synthesizer. This approach, called voice to singing (V2S), performs the voice style conversion by modulating the F0 contour… ▽ More Automatic transcription of monophonic/polyphonic music is a challenging task due to the lack of availability of large amounts of transcribed data. In this paper, we propose a data augmentation method that converts natural speech to singing voice based on vocoder based speech synthesizer. This approach, called voice to singing (V2S), performs the voice style conversion by modulating the F0 contour of the natural speech with that of a singing voice. The V2S model based style transfer can generate good quality singing voice thereby enabling the conversion of large corpora of natural speech to singing voice that is useful in building an E2E lyrics transcription system. In our experiments on monophonic singing voice data, the V2S style transfer provides a significant gain (relative improvements of 21%) for the E2E lyrics transcription system. We also discuss additional components like transfer learning and lyrics based language modeling to improve the performance of the lyrics transcription system. △ Less

Submitted 16 February, 2021; originally announced February 2021.

Comments: accepted at ICASSP 2021

arXiv:2012.07079 [pdf, other]

doi 10.1007/s11063-022-10785-x

CHS-Net: A Deep learning approach for hierarchical segmentation of COVID-19 infected CT images

Authors: Narinder Singh Punn, Sonali Agarwal

Abstract: The pandemic of novel SARS-CoV-2 also known as COVID-19 has been spreading worldwide, causing rampant loss of lives. Medical imaging such as CT, X-ray, etc., plays a significant role in diagnosing the patients by presenting the visual representation of the functioning of the organs. However, for any radiologist analyzing such scans is a tedious and time-consuming task. The emerging deep learning t… ▽ More The pandemic of novel SARS-CoV-2 also known as COVID-19 has been spreading worldwide, causing rampant loss of lives. Medical imaging such as CT, X-ray, etc., plays a significant role in diagnosing the patients by presenting the visual representation of the functioning of the organs. However, for any radiologist analyzing such scans is a tedious and time-consuming task. The emerging deep learning technologies have displayed its strength in analyzing such scans to aid in the faster diagnosis of the diseases and viruses such as COVID-19. In the present article, an automated deep learning based model, COVID-19 hierarchical segmentation network (CHS-Net) is proposed that functions as a semantic hierarchical segmenter to identify the COVID-19 infected regions from lungs contour via CT medical imaging using two cascaded residual attention inception U-Net (RAIU-Net) models. RAIU-Net comprises of a residual inception U-Net model with spectral spatial and depth attention network (SSD) that is developed with the contraction and expansion phases of depthwise separable convolutions and hybrid pooling (max and spectral pooling) to efficiently encode and decode the semantic and varying resolution information. The CHS-Net is trained with the segmentation loss function that is the defined as the average of binary cross entropy loss and dice loss to penalize false negative and false positive predictions. The approach is compared with the recently proposed approaches and evaluated using the standard metrics like accuracy, precision, specificity, recall, dice coefficient and Jaccard similarity along with the visualized interpretation of the model prediction with GradCam++ and uncertainty maps. With extensive trials, it is observed that the proposed approach outperformed the recently proposed approaches and effectively segments the COVID-19 infected regions in the lungs. △ Less

Submitted 29 December, 2021; v1 submitted 13 December, 2020; originally announced December 2020.

Journal ref: Neural Processing Letters 2022

arXiv:2010.08115 [pdf, other]

Pinball-OCSVM for early-stage COVID-19 diagnosis with limited posteroanterior chest X-ray images

Authors: Sanjay Kumar Sonbhadra, Sonali Agarwal, P. Nagabhushan

Abstract: The infection of respiratory coronavirus disease 2019 (COVID-19) starts with the upper respiratory tract and as the virus grows, the infection can progress to lungs and develop pneumonia. The conventional way of COVID-19 diagnosis is reverse transcription polymerase chain reaction (RT-PCR), which is less sensitive during early stages; especially if the patient is asymptomatic, which may further ca… ▽ More The infection of respiratory coronavirus disease 2019 (COVID-19) starts with the upper respiratory tract and as the virus grows, the infection can progress to lungs and develop pneumonia. The conventional way of COVID-19 diagnosis is reverse transcription polymerase chain reaction (RT-PCR), which is less sensitive during early stages; especially if the patient is asymptomatic, which may further cause more severe pneumonia. In this context, several deep learning models have been proposed to identify pulmonary infections using publicly available chest X-ray (CXR) image datasets for early diagnosis, better treatment and quick cure. In these datasets, presence of less number of COVID-19 positive samples compared to other classes (normal, pneumonia and Tuberculosis) raises the challenge for unbiased learning of deep learning models. All deep learning models opted class balancing techniques to solve this issue; which however should be avoided in any medical diagnosis process. Moreover, the deep learning models are also data hungry and need massive computation resources. Therefore for quicker diagnosis, this research proposes a novel pinball loss function based one-class support vector machine (PB-OCSVM), that can work in presence of limited COVID-19 positive CXR samples with objectives to maximize the learning efficiency and to minimize the false predictions. The performance of the proposed model is compared with conventional OCSVM and existing deep learning models, and the experimental results prove that the proposed model outperformed over state-of-the-art methods. To validate the robustness of the proposed model, experiments are also performed with noisy CXR images and UCI benchmark datasets. △ Less

Submitted 5 June, 2021; v1 submitted 15 October, 2020; originally announced October 2020.

arXiv:2009.08369 [pdf, other]

doi 10.1007/978-3-030-66665-1_6

Face Mask Detection using Transfer Learning of InceptionV3

Authors: G. Jignesh Chowdary, Narinder Singh Punn, Sanjay Kumar Sonbhadra, Sonali Agarwal

Abstract: The world is facing a huge health crisis due to the rapid transmission of coronavirus (COVID-19). Several guidelines were issued by the World Health Organization (WHO) for protection against the spread of coronavirus. According to WHO, the most effective preventive measure against COVID-19 is wearing a mask in public places and crowded areas. It is very difficult to monitor people manually in thes… ▽ More The world is facing a huge health crisis due to the rapid transmission of coronavirus (COVID-19). Several guidelines were issued by the World Health Organization (WHO) for protection against the spread of coronavirus. According to WHO, the most effective preventive measure against COVID-19 is wearing a mask in public places and crowded areas. It is very difficult to monitor people manually in these areas. In this paper, a transfer learning model is proposed to automate the process of identifying the people who are not wearing mask. The proposed model is built by fine-tuning the pre-trained state-of-the-art deep learning model, InceptionV3. The proposed model is trained and tested on the Simulated Masked Face Dataset (SMFD). Image augmentation technique is adopted to address the limited availability of data for better training and testing of the model. The model outperformed the other recently proposed approaches by achieving an accuracy of 99.9% during training and 100% during testing. △ Less

Submitted 20 October, 2020; v1 submitted 17 September, 2020; originally announced September 2020.

arXiv:2008.10744 [pdf, other]

doi 10.1109/ASPCON49795.2020.9276658

Enhanced Normalized Mutual Information for Localization in Noisy Environments

Authors: Samuel Todd Flanagan, Drupad K. Khublani, J. -F. Chamberland, Siddharth Agarwal, Ankit Vora

Abstract: Fine localization is a crucial task for autonomous vehicles. Although many algorithms have been explored in the literature for this specific task, the goal of getting accurate results from commodity sensors remains a challenge. As autonomous vehicles make the transition from expensive prototypes to production items, the need for inexpensive, yet reliable solutions is increasing rapidly. This artic… ▽ More Fine localization is a crucial task for autonomous vehicles. Although many algorithms have been explored in the literature for this specific task, the goal of getting accurate results from commodity sensors remains a challenge. As autonomous vehicles make the transition from expensive prototypes to production items, the need for inexpensive, yet reliable solutions is increasing rapidly. This article considers scenarios where images are captured with inexpensive cameras and localization takes place using pre-loaded fine maps of local roads as side information. The techniques proposed herein extend schemes based on normalized mutual information by leveraging the likelihood of shades rather than exact sensor readings for localization in noisy environments. This algorithmic enhancement, rooted in statistical signal processing, offers substantial gains in performance. Numerical simulations are used to highlight the benefits of the proposed techniques in representative application scenarios. Analysis of a Ford image set is performed to validate the core findings of this work. △ Less

Submitted 24 August, 2020; originally announced August 2020.

Comments: 5 pages, 9 figures, to be published in 2020 IEEE Conference on Applied Signal Processing (ASPCON)

arXiv:2004.14491 [pdf, other]

Detecting Deep-Fake Videos from Appearance and Behavior

Authors: Shruti Agarwal, Tarek El-Gaaly, Hany Farid, Ser-Nam Lim

Abstract: Synthetically-generated audios and videos -- so-called deep fakes -- continue to capture the imagination of the computer-graphics and computer-vision communities. At the same time, the democratization of access to technology that can create sophisticated manipulated video of anybody saying anything continues to be of concern because of its power to disrupt democratic elections, commit small to lar… ▽ More Synthetically-generated audios and videos -- so-called deep fakes -- continue to capture the imagination of the computer-graphics and computer-vision communities. At the same time, the democratization of access to technology that can create sophisticated manipulated video of anybody saying anything continues to be of concern because of its power to disrupt democratic elections, commit small to large-scale fraud, fuel dis-information campaigns, and create non-consensual pornography. We describe a biometric-based forensic technique for detecting face-swap deep fakes. This technique combines a static biometric based on facial recognition with a temporal, behavioral biometric based on facial expressions and head movements, where the behavioral embedding is learned using a CNN with a metric-learning objective function. We show the efficacy of this approach across several large-scale video datasets, as well as in-the-wild deep fakes. △ Less

Submitted 29 April, 2020; originally announced April 2020.

Journal ref: IEEE Workshop on Image Forensics and Security, 2020

arXiv:2004.11676 [pdf, other]

doi 10.1007/s10489-020-01900-3

Automated diagnosis of COVID-19 with limited posteroanterior chest X-ray images using fine-tuned deep neural networks

Authors: Narinder Singh Punn, Sonali Agarwal

Abstract: The novel coronavirus 2019 (COVID-19) is a respiratory syndrome that resembles pneumonia. The current diagnostic procedure of COVID-19 follows reverse-transcriptase polymerase chain reaction (RT-PCR) based approach which however is less sensitive to identify the virus at the initial stage. Hence, a more robust and alternate diagnosis technique is desirable. Recently, with the release of publicly a… ▽ More The novel coronavirus 2019 (COVID-19) is a respiratory syndrome that resembles pneumonia. The current diagnostic procedure of COVID-19 follows reverse-transcriptase polymerase chain reaction (RT-PCR) based approach which however is less sensitive to identify the virus at the initial stage. Hence, a more robust and alternate diagnosis technique is desirable. Recently, with the release of publicly available datasets of corona positive patients comprising of computed tomography (CT) and chest X-ray (CXR) imaging; scientists, researchers and healthcare experts are contributing for faster and automated diagnosis of COVID-19 by identifying pulmonary infections using deep learning approaches to achieve better cure and treatment. These datasets have limited samples concerned with the positive COVID-19 cases, which raise the challenge for unbiased learning. Following from this context, this article presents the random oversampling and weighted class loss function approach for unbiased fine-tuned learning (transfer learning) in various state-of-the-art deep learning approaches such as baseline ResNet, Inception-v3, Inception ResNet-v2, DenseNet169, and NASNetLarge to perform binary classification (as normal and COVID-19 cases) and also multi-class classification (as COVID-19, pneumonia, and normal case) of posteroanterior CXR images. Accuracy, precision, recall, loss, and area under the curve (AUC) are utilized to evaluate the performance of the models. Considering the experimental results, the performance of each model is scenario dependent; however, NASNetLarge displayed better scores in contrast to other architectures, which is further compared with other recently proposed approaches. This article also added the visual explanation to illustrate the basis of model classification and perception of COVID-19 in CXR images. △ Less

Submitted 21 July, 2020; v1 submitted 23 April, 2020; originally announced April 2020.

Journal ref: Appl Intell (2020)

arXiv:2003.13217 [pdf, other]

Deep Residual Neural Networks for Image in Speech Steganography

Authors: Shivam Agarwal, Siddarth Venkatraman

Abstract: Steganography is the art of hiding a secret message inside a publicly visible carrier message. Ideally, it is done without modifying the carrier, and with minimal loss of information in the secret message. Recently, various deep learning based approaches to steganography have been applied to different message types. We propose a deep learning based technique to hide a source RGB image message insi… ▽ More Steganography is the art of hiding a secret message inside a publicly visible carrier message. Ideally, it is done without modifying the carrier, and with minimal loss of information in the secret message. Recently, various deep learning based approaches to steganography have been applied to different message types. We propose a deep learning based technique to hide a source RGB image message inside finite length speech segments without perceptual loss. To achieve this, we train three neural networks; an encoding network to hide the message in the carrier, a decoding network to reconstruct the message from the carrier and an additional image enhancer network to further improve the reconstructed message. We also discuss future improvements to the algorithm proposed. △ Less

Submitted 30 March, 2020; originally announced March 2020.

arXiv:2001.10482 [pdf, other]

doi 10.1109/GlobalSIP45357.2019.8969453

Localization in Autonomous Vehicles Using a Generalized Inner Product

Authors: Samuel Todd Flanagan, Drupad K. Khublani, Jean-Francois Chamberland, Siddharth Agarwal, Ankit Vora

Abstract: Fine localization in autonomous driving platforms is a task of broad interest, receiving much attention in recent years. Some localization algorithms use the Euclidean distance as a similarity measure between the local image acquired by a camera and a global map, which acts as side information. The global map is typically expressed in terms of the coordinate system of the road plane. Yet, a road i… ▽ More Fine localization in autonomous driving platforms is a task of broad interest, receiving much attention in recent years. Some localization algorithms use the Euclidean distance as a similarity measure between the local image acquired by a camera and a global map, which acts as side information. The global map is typically expressed in terms of the coordinate system of the road plane. Yet, a road image captured by a camera is subject to distortion in that nearby features on the road have much larger footprints on the focal plane of the camera compared with those of equally-sized features that lie farther ahead of the vehicle. Using commodity computational tools, it is straightforward to execute a transformation and, thereby, bring the distorted image into the frame of reference of the global map. However, this nonlinear transformation results in unequal noise amplification. The noise profile induced by this transformation should be accounted for when trying to match an acquired image to a global map, with more reliable regions being given more weight in the process. This physical reality presents an algorithmic opportunity to improve existing localization algorithms, especially in harsh conditions. This article reviews the physics of road feature acquisition through a camera, and it proposes an improved matching method rooted in statistical analysis. Findings are supported by numerical simulations. △ Less

Submitted 28 January, 2020; originally announced January 2020.

Comments: 5 pages, 7 figures, to be published in 2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

arXiv:1912.11516 [pdf, other]

PANTHER: A Programmable Architecture for Neural Network Training Harnessing Energy-efficient ReRAM

Authors: Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Sapan Agarwal, Matthew Marinella, Martin Foltin, John Paul Strachan, Dejan Milojicic, Wen-mei Hwu, Kaushik Roy

Abstract: The wide adoption of deep neural networks has been accompanied by ever-increasing energy and performance demands due to the expensive nature of training them. Numerous special-purpose architectures have been proposed to accelerate training: both digital and hybrid digital-analog using resistive RAM (ReRAM) crossbars. ReRAM-based accelerators have demonstrated the effectiveness of ReRAM crossbars a… ▽ More The wide adoption of deep neural networks has been accompanied by ever-increasing energy and performance demands due to the expensive nature of training them. Numerous special-purpose architectures have been proposed to accelerate training: both digital and hybrid digital-analog using resistive RAM (ReRAM) crossbars. ReRAM-based accelerators have demonstrated the effectiveness of ReRAM crossbars at performing matrix-vector multiplication operations that are prevalent in training. However, they still suffer from inefficiency due to the use of serial reads and writes for performing the weight gradient and update step. A few works have demonstrated the possibility of performing outer products in crossbars, which can be used to realize the weight gradient and update step without the use of serial reads and writes. However, these works have been limited to low precision operations which are not sufficient for typical training workloads. Moreover, they have been confined to a limited set of training algorithms for fully-connected layers only. To address these limitations, we propose a bit-slicing technique for enhancing the precision of ReRAM-based outer products, which is substantially different from bit-slicing for matrix-vector multiplication only. We incorporate this technique into a crossbar architecture with three variants catered to different training algorithms. To evaluate our design on different types of layers in neural networks (fully-connected, convolutional, etc.) and training algorithms, we develop PANTHER, an ISA-programmable training accelerator with compiler support. Our evaluation shows that PANTHER achieves up to $8.02\times$, $54.21\times$, and $103\times$ energy reductions as well as $7.16\times$, $4.02\times$, and $16\times$ execution time reductions compared to digital accelerators, ReRAM-based accelerators, and GPUs, respectively. △ Less

Submitted 24 December, 2019; originally announced December 2019.

Comments: 13 pages, 15 figures

arXiv:1911.06363 [pdf, ps, other]

doi 10.1109/RADAR.2019.8835656

Multiple Patients Behavior Detection in Real-time using mmWave Radar and Deep CNNs

Authors: Feng Jin, Renyuan Zhang, Arindam Sengupta, Siyang Cao, Salim Hariri, Nimit K. Agarwal, Sumit K. Agarwal

Abstract: To address potential gaps noted in patient monitoring in the hospital, a novel patient behavior detection system using mmWave radar and deep convolution neural network (CNN), which supports the simultaneous recognition of multiple patients' behaviors in real-time, is proposed. In this study, we use an mmWave radar to track multiple patients and detect the scattering point cloud of each one. For ea… ▽ More To address potential gaps noted in patient monitoring in the hospital, a novel patient behavior detection system using mmWave radar and deep convolution neural network (CNN), which supports the simultaneous recognition of multiple patients' behaviors in real-time, is proposed. In this study, we use an mmWave radar to track multiple patients and detect the scattering point cloud of each one. For each patient, the Doppler pattern of the point cloud over a time period is collected as the behavior signature. A three-layer CNN model is created to classify the behavior for each patient. The tracking and point clouds detection algorithm was also implemented on an mmWave radar hardware platform with an embedded graphics processing unit (GPU) board to collect Doppler pattern and run the CNN model. A training dataset of six types of behavior were collected, over a long duration, to train the model using Adam optimizer with an objective to minimize cross-entropy loss function. Lastly, the system was tested for real-time operation and obtained a very good inference accuracy when predicting each patient's behavior in a two-patient scenario. △ Less

Submitted 14 November, 2019; originally announced November 2019.

Comments: This paper has been submitted to IEEE Radar Conference 2019

arXiv:1910.12028 [pdf, other]

Blood Vessel Detection using Modified Multiscale MF-FDOG Filters for Diabetic Retinopathy

Authors: Debojyoti Mallick, Kundan Kumar, Sumanshu Agarwal

Abstract: Blindness in diabetic patients caused by retinopathy (characterized by an increase in the diameter and new branches of the blood vessels inside the retina) is a grave concern. Many efforts have been made for the early detection of the disease using various image processing techniques on retinal images. However, most of the methods are plagued with the false detection of the blood vessel pixels. Gi… ▽ More Blindness in diabetic patients caused by retinopathy (characterized by an increase in the diameter and new branches of the blood vessels inside the retina) is a grave concern. Many efforts have been made for the early detection of the disease using various image processing techniques on retinal images. However, most of the methods are plagued with the false detection of the blood vessel pixels. Given that, here, we propose a modified matched filter with the first derivative of Gaussian. The method uses the top-hat transform and contrast limited histogram equalization. Further, we segment the modified multiscale matched filter response by using a binary threshold obtained from the first derivative of Gaussian. The method was assessed on a publicly available database (DRIVE database). As anticipated, the proposed method provides a higher accuracy compared to the literature. Moreover, a lesser false detection from the existing matched filters and its variants have been observed. △ Less

Submitted 26 October, 2019; originally announced October 2019.

Comments: 5 Pages, 7 Figures, ICAML2019

arXiv:1906.01061 [pdf, other]

doi 10.4271/12-02-03-0012

Localization Requirements for Autonomous Vehicles

Authors: Tyler G. R. Reid, Sarah E. Houts, Robert Cammarata, Graham Mills, Siddharth Agarwal, Ankit Vora, Gaurav Pandey

Abstract: Autonomous vehicles require precise knowledge of their position and orientation in all weather and traffic conditions for path planning, perception, control, and general safe operation. Here we derive these requirements for autonomous vehicles based on first principles. We begin with the safety integrity level, defining the allowable probability of failure per hour of operation based on desired im… ▽ More Autonomous vehicles require precise knowledge of their position and orientation in all weather and traffic conditions for path planning, perception, control, and general safe operation. Here we derive these requirements for autonomous vehicles based on first principles. We begin with the safety integrity level, defining the allowable probability of failure per hour of operation based on desired improvements on road safety today. This draws comparisons with the localization integrity levels required in aviation and rail where similar numbers are derived at 10^-8 probability of failure per hour of operation. We then define the geometry of the problem, where the aim is to maintain knowledge that the vehicle is within its lane and to determine what road level it is on. Longitudinal, lateral, and vertical localization error bounds (alert limits) and 95% accuracy requirements are derived based on US road geometry standards (lane width, curvature, and vertical clearance) and allowable vehicle dimensions. For passenger vehicles operating on freeway roads, the result is a required lateral error bound of 0.57 m (0.20 m, 95%), a longitudinal bound of 1.40 m (0.48 m, 95%), a vertical bound of 1.30 m (0.43 m, 95%), and an attitude bound in each direction of 1.50 deg (0.51 deg, 95%). On local streets, the road geometry makes requirements more stringent where lateral and longitudinal error bounds of 0.29 m (0.10 m, 95%) are needed with an orientation requirement of 0.50 deg (0.17 deg, 95%). △ Less

Submitted 3 June, 2019; originally announced June 2019.

Comments: Under review with the SAE Journal of Connected and Automated Vehicles

Journal ref: SAE Intl. J CAV 2(3):2019

arXiv:1609.05235 [pdf, other]

RFM-SLAM: Exploiting Relative Feature Measurements to Separate Orientation and Position Estimation in SLAM

Authors: Saurav Agarwal, Vikram Shree, Suman Chakravorty

Abstract: The SLAM problem is known to have a special property that when robot orientation is known, estimating the history of robot poses and feature locations can be posed as a standard linear least squares problem. In this work, we develop a SLAM framework that uses relative feature-to-feature measurements to exploit this structural property of SLAM. Relative feature measurements are used to pose a linea… ▽ More The SLAM problem is known to have a special property that when robot orientation is known, estimating the history of robot poses and feature locations can be posed as a standard linear least squares problem. In this work, we develop a SLAM framework that uses relative feature-to-feature measurements to exploit this structural property of SLAM. Relative feature measurements are used to pose a linear estimation problem for pose-to-pose orientation constraints. This is followed by solving an iterative non-linear on-manifold optimization problem to compute the maximum likelihood estimate for robot orientation given relative rotation constraints. Once the robot orientation is computed, we solve a linear problem for robot position and map estimation. Our approach reduces the computational burden of non-linear optimization by posing a smaller optimization problem as compared to standard graph-based methods for feature-based SLAM. Further, empirical results show our method avoids catastrophic failures that arise in existing methods due to using odometery as an initial guess for non-linear optimization, while its accuracy degrades gracefully as sensor noise is increased. We demonstrate our method through extensive simulations and comparisons with an existing state-of-the-art solver. △ Less

Submitted 16 September, 2016; originally announced September 2016.

Comments: 9 pages, submitted to IEEE ICRA 2017

arXiv:1511.04634 [pdf, other]

Motion Planning for Global Localization in Non-Gaussian Belief Spaces

Authors: Saurav Agarwal, Amirhossein Tamjidi, Suman Chakravorty

Abstract: This paper presents a method for motion planning under uncertainty to deal with situations where ambiguous data associations result in a multimodal hypothesis on the robot state. In the global localization problem, sometimes referred to as the "lost or kidnapped robot problem", given little to no a priori pose information, the localization algorithm should recover the correct pose of a mobile robo… ▽ More This paper presents a method for motion planning under uncertainty to deal with situations where ambiguous data associations result in a multimodal hypothesis on the robot state. In the global localization problem, sometimes referred to as the "lost or kidnapped robot problem", given little to no a priori pose information, the localization algorithm should recover the correct pose of a mobile robot with respect to a global reference frame. We present a Receding Horizon approach, to plan actions that sequentially disambiguate a multimodal belief to achieve tight localization on the correct pose in finite time, i.e., converge to a unimodal belief. Experimental results are presented using a physical ground robot operating in an artificial maze-like environment. We demonstrate two runs wherein the robot is given no a priori information about its initial pose and the planner is tasked to localize the robot. △ Less

Submitted 27 February, 2016; v1 submitted 14 November, 2015; originally announced November 2015.

Comments: extends previous submission with updated figures, analysis and justifications. arXiv admin note: text overlap with arXiv:1506.01780

arXiv:1510.07380 [pdf, other]

SLAP: Simultaneous Localization and Planning Under Uncertainty for Physical Mobile Robots via Dynamic Replanning in Belief Space: Extended version

Authors: Ali-akbar Agha-mohammadi, Saurav Agarwal, Sung-Kyun Kim, Suman Chakravorty, Nancy M. Amato

Abstract: Simultaneous localization and Planning (SLAP) is a crucial ability for an autonomous robot operating under uncertainty. In its most general form, SLAP induces a continuous POMDP (partially-observable Markov decision process), which needs to be repeatedly solved online. This paper addresses this problem and proposes a dynamic replanning scheme in belief space. The underlying POMDP, which is continu… ▽ More Simultaneous localization and Planning (SLAP) is a crucial ability for an autonomous robot operating under uncertainty. In its most general form, SLAP induces a continuous POMDP (partially-observable Markov decision process), which needs to be repeatedly solved online. This paper addresses this problem and proposes a dynamic replanning scheme in belief space. The underlying POMDP, which is continuous in state, action, and observation space, is approximated offline via sampling-based methods, but operates in a replanning loop online to admit local improvements to the coarse offline policy. This construct enables the proposed method to combat changing environments and large localization errors, even when the change alters the homotopy class of the optimal trajectory. It further outperforms the state-of-the-art FIRM (Feedback-based Information RoadMap) method by eliminating unnecessary stabilization steps. Applying belief space planning to physical systems brings with it a plethora of challenges. A key focus of this paper is to implement the proposed planner on a physical robot and show the SLAP solution performance under uncertainty, in changing environments and in the presence of large disturbances, such as a kidnapped robot situation. △ Less

Submitted 12 May, 2018; v1 submitted 26 October, 2015; originally announced October 2015.

Comments: 20 pages, updated figures, extended theory and simulation results

Showing 1–45 of 45 results for author: Agarwal, S