Search | arXiv e-print repository

H-QuEST: Accelerating Query-by-Example Spoken Term Detection with Hierarchical Indexing

Authors: Akanksha Singh, Yi-Ping Phoebe Chen, Vipul Arora

Abstract: Query-by-example spoken term detection (QbE-STD) searches for matching words or phrases in an audio dataset using a sample spoken query. When annotated data is limited or unavailable, QbE-STD is often done using template matching methods like dynamic time warping (DTW), which are computationally expensive and do not scale well. To address this, we propose H-QuEST (Hierarchical Query-by-Example Spo… ▽ More Query-by-example spoken term detection (QbE-STD) searches for matching words or phrases in an audio dataset using a sample spoken query. When annotated data is limited or unavailable, QbE-STD is often done using template matching methods like dynamic time warping (DTW), which are computationally expensive and do not scale well. To address this, we propose H-QuEST (Hierarchical Query-by-Example Spoken Term Detection), a novel framework that accelerates spoken term retrieval by utilizing Term Frequency and Inverse Document Frequency (TF-IDF)-based sparse representations obtained through advanced audio representation learning techniques and Hierarchical Navigable Small World (HNSW) indexing with further refinement. Experimental results show that H-QuEST delivers substantial improvements in retrieval speed without sacrificing accuracy compared to existing methods. △ Less

Submitted 20 June, 2025; originally announced June 2025.

Journal ref: Interspeech 2025

arXiv:2506.14059 [pdf, ps, other]

A Stochastic Differential Equation Framework for Modeling Queue Length Dynamics Inspired by Self-Similarity

Authors: Shakib Mustavee, Shaurya Agarwal, Arvind Singh

Abstract: This article develops a stochastic differential equation (SDE) for modeling the temporal evolution of queue length dynamics at signalized intersections. Inspired by the observed quasiperiodic and self-similar characteristics of the queue length dynamics, the proposed model incorporates three properties into the SDE: (i) mean reversion with periodic mean, (ii) multiplicative noise, and (iii) fracti… ▽ More This article develops a stochastic differential equation (SDE) for modeling the temporal evolution of queue length dynamics at signalized intersections. Inspired by the observed quasiperiodic and self-similar characteristics of the queue length dynamics, the proposed model incorporates three properties into the SDE: (i) mean reversion with periodic mean, (ii) multiplicative noise, and (iii) fractional Brownian motion. It replicates key statistical features observed in real data, including the probability distribution function (PDF) and PSD of queue lengths. To our knowledge, this is the first equation-based model for queue dynamics. The proposed approach offers a transparent, data-consistent framework that may help inform and enhance the design of black-box learning algorithms with underlying traffic physics. △ Less

Submitted 16 June, 2025; originally announced June 2025.

arXiv:2506.10832 [pdf, ps, other]

A novel visual data-based diagnostic approach for estimation of regime transition in pool boiling

Authors: Pranay Nirapure, Ayushman Singh, Srikanth Rangarajan, Bahgat Sammakia

Abstract: This study introduces a novel metric, the Index of Visual Similarity (IVS), to qualitatively characterize boiling heat transfer regimes using only visual data. The IVS is constructed by combining morphological similarity, through SIFT-based feature matching, with physical similarity, via vapor area estimation using Mask R-CNN. High-speed images of pool boiling on two distinct surfaces, polished co… ▽ More This study introduces a novel metric, the Index of Visual Similarity (IVS), to qualitatively characterize boiling heat transfer regimes using only visual data. The IVS is constructed by combining morphological similarity, through SIFT-based feature matching, with physical similarity, via vapor area estimation using Mask R-CNN. High-speed images of pool boiling on two distinct surfaces, polished copper and porous copper foam, are employed to demonstrate the generalizability of the approach. IVS captures critical changes in bubble shape, size, and distribution that correspond to transitions in heat transfer mechanisms. The metric is validated against an equivalent metric, $Φ$, derived from measured heat transfer coefficients (HTC), showing strong correlation and reliability in detecting boiling regime transitions, including the onset of nucleate boiling and proximity to critical heat flux (CHF). Given experimental limitations in precisely measuring changes in HTC, the sensitivity of IVS to surface superheat is also examined to reinforce the credibility of IVS. IVS thus emerges as a powerful, rapid, and non-intrusive tool for real-time, image-based boiling diagnostics, with promising applications in phase change heat transfer. △ Less

Submitted 12 June, 2025; originally announced June 2025.

arXiv:2505.19839 [pdf, ps, other]

Chance-constrained Solar PV Hosting Capacity Assessment for Distribution Grids Using Gaussian Process and Logit Learning

Authors: Sel Ly, Anshuman Singh, Petr Vorobev, Yeng Chai Soh, Hung Dinh Nguyen

Abstract: Growing penetration of distributed generation such as solar PV can increase the risk of over-voltage in distribution grids, affecting network security. Therefore, assessment of the so-called, PV hosting capacity (HC) - the maximum amount of PV that a given grid can accommodate becomes an important practical problem. In this paper, we propose a novel chance-constrained HC estimation framework using… ▽ More Growing penetration of distributed generation such as solar PV can increase the risk of over-voltage in distribution grids, affecting network security. Therefore, assessment of the so-called, PV hosting capacity (HC) - the maximum amount of PV that a given grid can accommodate becomes an important practical problem. In this paper, we propose a novel chance-constrained HC estimation framework using Gaussian Process and Logit learning that can account for uncertainty and risk management. Also, we consider the assessment of HC under different voltage control strategies. Our results have demonstrated that the proposed models can achieve high accuracy levels of up to 93% in predicting nodal over-voltage events on IEEE 33-bus and 123-bus test-cases. Thus, these models can be effectively employed to estimate the chance-constrained HC with various risk levels. Moreover, our proposed methods have simple forms and low computational costs of only a few seconds. △ Less

Submitted 26 May, 2025; originally announced May 2025.

arXiv:2505.08693 [pdf, ps, other]

VIViT: Variable-Input Vision Transformer Framework for 3D MR Image Segmentation

Authors: Badhan Kumar Das, Ajay Singh, Gengyan Zhao, Han Liu, Thomas J. Re, Dorin Comaniciu, Eli Gibson, Andreas Maier

Abstract: Self-supervised pretrain techniques have been widely used to improve the downstream tasks' performance. However, real-world magnetic resonance (MR) studies usually consist of different sets of contrasts due to different acquisition protocols, which poses challenges for the current deep learning methods on large-scale pretrain and different downstream tasks with different input requirements, since… ▽ More Self-supervised pretrain techniques have been widely used to improve the downstream tasks' performance. However, real-world magnetic resonance (MR) studies usually consist of different sets of contrasts due to different acquisition protocols, which poses challenges for the current deep learning methods on large-scale pretrain and different downstream tasks with different input requirements, since these methods typically require a fixed set of input modalities or, contrasts. To address this challenge, we propose variable-input ViT (VIViT), a transformer-based framework designed for self-supervised pretraining and segmentation finetuning for variable contrasts in each study. With this ability, our approach can maximize the data availability in pretrain, and can transfer the learned knowledge from pretrain to downstream tasks despite variations in input requirements. We validate our method on brain infarct and brain tumor segmentation, where our method outperforms current CNN and ViT-based models with a mean Dice score of 0.624 and 0.883 respectively. These results highlight the efficacy of our design for better adaptability and performance on tasks with real-world heterogeneous MR data. △ Less

Submitted 14 June, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

Comments: 9 pages

arXiv:2505.03695 [pdf, other]

Frenet Corridor Planner: An Optimal Local Path Planning Framework for Autonomous Driving

Authors: Faizan M. Tariq, Zheng-Hang Yeh, Avinash Singh, David Isele, Sangjae Bae

Abstract: Motivated by the requirements for effectiveness and efficiency, path-speed decomposition-based trajectory planning methods have widely been adopted for autonomous driving applications. While a global route can be pre-computed offline, real-time generation of adaptive local paths remains crucial. Therefore, we present the Frenet Corridor Planner (FCP), an optimization-based local path planning stra… ▽ More Motivated by the requirements for effectiveness and efficiency, path-speed decomposition-based trajectory planning methods have widely been adopted for autonomous driving applications. While a global route can be pre-computed offline, real-time generation of adaptive local paths remains crucial. Therefore, we present the Frenet Corridor Planner (FCP), an optimization-based local path planning strategy for autonomous driving that ensures smooth and safe navigation around obstacles. Modeling the vehicles as safety-augmented bounding boxes and pedestrians as convex hulls in the Frenet space, our approach defines a drivable corridor by determining the appropriate deviation side for static obstacles. Thereafter, a modified space-domain bicycle kinematics model enables path optimization for smoothness, boundary clearance, and dynamic obstacle risk minimization. The optimized path is then passed to a speed planner to generate the final trajectory. We validate FCP through extensive simulations and real-world hardware experiments, demonstrating its efficiency and effectiveness. △ Less

Submitted 6 May, 2025; originally announced May 2025.

Comments: 8 pages, 10 figures - Presented at 2025 IEEE 36th Intelligent Vehicles Symposium (IV)

arXiv:2505.02529 [pdf, other]

RobSurv: Vector Quantization-Based Multi-Modal Learning for Robust Cancer Survival Prediction

Authors: Aiman Farooq, Azad Singh, Deepak Mishra, Santanu Chaudhury

Abstract: Cancer survival prediction using multi-modal medical imaging presents a critical challenge in oncology, mainly due to the vulnerability of deep learning models to noise and protocol variations across imaging centers. Current approaches struggle to extract consistent features from heterogeneous CT and PET images, limiting their clinical applicability. We address these challenges by introducing RobS… ▽ More Cancer survival prediction using multi-modal medical imaging presents a critical challenge in oncology, mainly due to the vulnerability of deep learning models to noise and protocol variations across imaging centers. Current approaches struggle to extract consistent features from heterogeneous CT and PET images, limiting their clinical applicability. We address these challenges by introducing RobSurv, a robust deep-learning framework that leverages vector quantization for resilient multi-modal feature learning. The key innovation of our approach lies in its dual-path architecture: one path maps continuous imaging features to learned discrete codebooks for noise-resistant representation, while the parallel path preserves fine-grained details through continuous feature processing. This dual representation is integrated through a novel patch-wise fusion mechanism that maintains local spatial relationships while capturing global context via Transformer-based processing. In extensive evaluations across three diverse datasets (HECKTOR, H\&N1, and NSCLC Radiogenomics), RobSurv demonstrates superior performance, achieving concordance index of 0.771, 0.742, and 0.734 respectively - significantly outperforming existing methods. Most notably, our model maintains robust performance even under severe noise conditions, with performance degradation of only 3.8-4.5\% compared to 8-12\% in baseline methods. These results, combined with strong generalization across different cancer types and imaging protocols, establish RobSurv as a promising solution for reliable clinical prognosis that can enhance treatment planning and patient care. △ Less

Submitted 5 May, 2025; originally announced May 2025.

arXiv:2505.01670 [pdf, other]

Efficient Multi Subject Visual Reconstruction from fMRI Using Aligned Representations

Authors: Christos Zangos, Danish Ebadulla, Thomas Christopher Sprague, Ambuj Singh

Abstract: This work introduces a novel approach to fMRI-based visual image reconstruction using a subject-agnostic common representation space. We show that the brain signals of the subjects can be aligned in this common space during training to form a semantically aligned common brain. This is leveraged to demonstrate that aligning subject-specific lightweight modules to a reference subject is significantl… ▽ More This work introduces a novel approach to fMRI-based visual image reconstruction using a subject-agnostic common representation space. We show that the brain signals of the subjects can be aligned in this common space during training to form a semantically aligned common brain. This is leveraged to demonstrate that aligning subject-specific lightweight modules to a reference subject is significantly more efficient than traditional end-to-end training methods. Our approach excels in low-data scenarios. We evaluate our methods on different datasets, demonstrating that the common space is subject and dataset-agnostic. △ Less

Submitted 2 May, 2025; originally announced May 2025.

arXiv:2504.11045 [pdf, other]

Neural Control Barrier Functions from Physics Informed Neural Networks

Authors: Shreenabh Agrawal, Manan Tayal, Aditya Singh, Shishir Kolathaya

Abstract: As autonomous systems become increasingly prevalent in daily life, ensuring their safety is paramount. Control Barrier Functions (CBFs) have emerged as an effective tool for guaranteeing safety; however, manually designing them for specific applications remains a significant challenge. With the advent of deep learning techniques, recent research has explored synthesizing CBFs using neural networks… ▽ More As autonomous systems become increasingly prevalent in daily life, ensuring their safety is paramount. Control Barrier Functions (CBFs) have emerged as an effective tool for guaranteeing safety; however, manually designing them for specific applications remains a significant challenge. With the advent of deep learning techniques, recent research has explored synthesizing CBFs using neural networks-commonly referred to as neural CBFs. This paper introduces a novel class of neural CBFs that leverages a physics-inspired neural network framework by incorporating Zubov's Partial Differential Equation (PDE) within the context of safety. This approach provides a scalable methodology for synthesizing neural CBFs applicable to high-dimensional systems. Furthermore, by utilizing reciprocal CBFs instead of zeroing CBFs, the proposed framework allows for the specification of flexible, user-defined safe regions. To validate the effectiveness of the approach, we present case studies on three different systems: an inverted pendulum, autonomous ground navigation, and aerial navigation in obstacle-laden environments. △ Less

Submitted 15 April, 2025; originally announced April 2025.

Comments: 8 pages, 5 figures

arXiv:2504.10962 [pdf, ps, other]

doi 10.1109/LRA.2025.3568323

$π$-MPPI: A Projection-based Model Predictive Path Integral Scheme for Smooth Optimal Control of Fixed-Wing Aerial Vehicles

Authors: Edvin Martin Andrejev, Amith Manoharan, Karl-Eerik Unt, Arun Kumar Singh

Abstract: Model Predictive Path Integral (MPPI) is a popular sampling-based Model Predictive Control (MPC) algorithm for nonlinear systems. It optimizes trajectories by sampling control sequences and averaging them. However, a key issue with MPPI is the non-smoothness of the optimal control sequence, leading to oscillations in systems like fixed-wing aerial vehicles (FWVs). Existing solutions use post-hoc s… ▽ More Model Predictive Path Integral (MPPI) is a popular sampling-based Model Predictive Control (MPC) algorithm for nonlinear systems. It optimizes trajectories by sampling control sequences and averaging them. However, a key issue with MPPI is the non-smoothness of the optimal control sequence, leading to oscillations in systems like fixed-wing aerial vehicles (FWVs). Existing solutions use post-hoc smoothing, which fails to bound control derivatives. This paper introduces a new approach: we add a projection filter $π$ to minimally correct control samples, ensuring bounds on control magnitude and higher-order derivatives. The filtered samples are then averaged using MPPI, leading to our $π$-MPPI approach. We minimize computational overhead by using a neural accelerated custom optimizer for the projection filter. $π$-MPPI offers a simple way to achieve arbitrary smoothness in control sequences. While we focus on FWVs, this projection filter can be integrated into any MPPI pipeline. Applied to FWVs, $π$-MPPI is easier to tune than the baseline, resulting in smoother, more robust performance. △ Less

Submitted 16 April, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

Comments: 8 pages, 4 figures, submitted to IEEE RA-L

Journal ref: IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 10, NO. 6, JUNE 2025

arXiv:2504.04532 [pdf, ps, other]

BrainMRDiff: A Diffusion Model for Anatomically Consistent Brain MRI Synthesis

Authors: Moinak Bhattacharya, Saumya Gupta, Annie Singh, Chao Chen, Gagandeep Singh, Prateek Prasanna

Abstract: Accurate brain tumor diagnosis relies on the assessment of multiple Magnetic Resonance Imaging (MRI) sequences. However, in clinical practice, the acquisition of certain sequences may be affected by factors like motion artifacts or contrast agent contraindications, leading to suboptimal outcome, such as poor image quality. This can then affect image interpretation by radiologists. Synthesizing hig… ▽ More Accurate brain tumor diagnosis relies on the assessment of multiple Magnetic Resonance Imaging (MRI) sequences. However, in clinical practice, the acquisition of certain sequences may be affected by factors like motion artifacts or contrast agent contraindications, leading to suboptimal outcome, such as poor image quality. This can then affect image interpretation by radiologists. Synthesizing high quality MRI sequences has thus become a critical research focus. Though recent advancements in controllable generative AI have facilitated the synthesis of diagnostic quality MRI, ensuring anatomical accuracy remains a significant challenge. Preserving critical structural relationships between different anatomical regions is essential, as even minor structural or topological inconsistencies can compromise diagnostic validity. In this work, we propose BrainMRDiff, a novel topology-preserving, anatomy-guided diffusion model for synthesizing brain MRI, leveraging brain and tumor anatomies as conditioning inputs. To achieve this, we introduce two key modules: Tumor+Structure Aggregation (TSA) and Topology-Guided Anatomy Preservation (TGAP). TSA integrates diverse anatomical structures with tumor information, forming a comprehensive conditioning mechanism for the diffusion process. TGAP enforces topological consistency during reverse denoising diffusion process; both these modules ensure that the generated image respects anatomical integrity. Experimental results demonstrate that BrainMRDiff surpasses existing baselines, achieving performance improvements of 23.33% on the BraTS-AG dataset and 33.33% on the BraTS-Met dataset. Code will be made publicly available soon. △ Less

Submitted 29 May, 2025; v1 submitted 6 April, 2025; originally announced April 2025.

arXiv:2503.17395 [pdf, other]

CP-NCBF: A Conformal Prediction-based Approach to Synthesize Verified Neural Control Barrier Functions

Authors: Manan Tayal, Aditya Singh, Pushpak Jagtap, Shishir Kolathaya

Abstract: Control Barrier Functions (CBFs) are a practical approach for designing safety-critical controllers, but constructing them for arbitrary nonlinear dynamical systems remains a challenge. Recent efforts have explored learning-based methods, such as neural CBFs (NCBFs), to address this issue. However, ensuring the validity of NCBFs is difficult due to potential learning errors. In this letter, we pro… ▽ More Control Barrier Functions (CBFs) are a practical approach for designing safety-critical controllers, but constructing them for arbitrary nonlinear dynamical systems remains a challenge. Recent efforts have explored learning-based methods, such as neural CBFs (NCBFs), to address this issue. However, ensuring the validity of NCBFs is difficult due to potential learning errors. In this letter, we propose a novel framework that leverages split-conformal prediction to generate formally verified neural CBFs with probabilistic guarantees based on a user-defined error rate, referred to as CP-NCBF. Unlike existing methods that impose Lipschitz constraints on neural CBF-leading to scalability limitations and overly conservative safe sets--our approach is sample-efficient, scalable, and results in less restrictive safety regions. We validate our framework through case studies on obstacle avoidance in autonomous driving and geo-fencing of aerial vehicles, demonstrating its ability to generate larger and less conservative safe sets compared to conventional techniques. △ Less

Submitted 17 May, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

Comments: 17 Pages, 10 Figures. First two authors have contributed equally

arXiv:2503.16798 [pdf, other]

A Pathway to Near Tissue Computing through Processing-in-CTIA Pixels for Biomedical Applications

Authors: Zihan Yin, Subhradip Chakraborty, Ankur Singh, Chengwei Zhou, Gourav Datta, Akhilesh Jaiswal

Abstract: Near-tissue computing requires sensor-level processing of high-resolution images, essential for real-time biomedical diagnostics and surgical guidance. To address this need, we introduce a novel Capacitive Transimpedance Amplifier-based In-Pixel Computing (CTIA-IPC) architecture. Our design leverages CTIA pixels that are widely used for biomedical imaging owing to the inherent advantages of excell… ▽ More Near-tissue computing requires sensor-level processing of high-resolution images, essential for real-time biomedical diagnostics and surgical guidance. To address this need, we introduce a novel Capacitive Transimpedance Amplifier-based In-Pixel Computing (CTIA-IPC) architecture. Our design leverages CTIA pixels that are widely used for biomedical imaging owing to the inherent advantages of excellent linearity, low noise, and robust operation under low-light conditions. We augment CTIA pixels with IPC to enable precise deep learning computations including multi-channel, multi-bit convolution operations along with integrated batch normalization (BN) and Rectified Linear Unit (ReLU) functionalities in the peripheral ADC (Analog to Digital Converters). This design improves the linearity of Multiply and Accumulate (MAC) operations while enhancing computational efficiency. Leveraging 3D integration to embed pixel circuitry and weight storage, CTIA-IPC maintains pixel density comparable to standard CTIA designs. Moreover, our algorithm-circuit co-design approach enables efficient real-time diagnostics and AI-driven medical analysis. Evaluated on the EndoVis tissu dataset (1280x1024), CTIA-IPC achieves approximately 12x reduction in data bandwidth, yielding segmentation IoUs of 75.91% (parts), and 28.58% (instrument)-a minimal accuracy reduction (1.3%-2.5%) compared to baseline methods. Achieving 1.98 GOPS throughput and 3.39 GOPS/W efficiency, our CTIA-IPC architecture offers a promising computational framework tailored specifically for biomedical near-tissue computing. △ Less

Submitted 20 March, 2025; originally announced March 2025.

arXiv:2502.20636 [pdf, ps, other]

Delayed-Decision Motion Planning in the Presence of Multiple Predictions

Authors: David Isele, Alexandre Miranda Anon, Faizan M. Tariq, Goro Yeh, Avinash Singh, Sangjae Bae

Abstract: Reliable automated driving technology is challenged by various sources of uncertainties, in particular, behavioral uncertainties of traffic agents. It is common for traffic agents to have intentions that are unknown to others, leaving an automated driving car to reason over multiple possible behaviors. This paper formalizes a behavior planning scheme in the presence of multiple possible futures wi… ▽ More Reliable automated driving technology is challenged by various sources of uncertainties, in particular, behavioral uncertainties of traffic agents. It is common for traffic agents to have intentions that are unknown to others, leaving an automated driving car to reason over multiple possible behaviors. This paper formalizes a behavior planning scheme in the presence of multiple possible futures with corresponding probabilities. We present a maximum entropy formulation and show how, under certain assumptions, this allows delayed decision-making to improve safety. The general formulation is then turned into a model predictive control formulation, which is solved as a quadratic program or a set of quadratic programs. We discuss implementation details for improving computation and verify operation in simulation and on a mobile robot. △ Less

Submitted 6 June, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

arXiv:2502.11057 [pdf, other]

A Physics-Informed Machine Learning Framework for Safe and Optimal Control of Autonomous Systems

Authors: Manan Tayal, Aditya Singh, Shishir Kolathaya, Somil Bansal

Abstract: As autonomous systems become more ubiquitous in daily life, ensuring high performance with guaranteed safety is crucial. However, safety and performance could be competing objectives, which makes their co-optimization difficult. Learning-based methods, such as Constrained Reinforcement Learning (CRL), achieve strong performance but lack formal safety guarantees due to safety being enforced as soft… ▽ More As autonomous systems become more ubiquitous in daily life, ensuring high performance with guaranteed safety is crucial. However, safety and performance could be competing objectives, which makes their co-optimization difficult. Learning-based methods, such as Constrained Reinforcement Learning (CRL), achieve strong performance but lack formal safety guarantees due to safety being enforced as soft constraints, limiting their use in safety-critical settings. Conversely, formal methods such as Hamilton-Jacobi (HJ) Reachability Analysis and Control Barrier Functions (CBFs) provide rigorous safety assurances but often neglect performance, resulting in overly conservative controllers. To bridge this gap, we formulate the co-optimization of safety and performance as a state-constrained optimal control problem, where performance objectives are encoded via a cost function and safety requirements are imposed as state constraints. We demonstrate that the resultant value function satisfies a Hamilton-Jacobi-Bellman (HJB) equation, which we approximate efficiently using a novel physics-informed machine learning framework. In addition, we introduce a conformal prediction-based verification strategy to quantify the learning errors, recovering a high-confidence safety value function, along with a probabilistic error bound on performance degradation. Through several case studies, we demonstrate the efficacy of the proposed framework in enabling scalable learning of safe and performant controllers for complex, high-dimensional autonomous systems. △ Less

Submitted 28 May, 2025; v1 submitted 16 February, 2025; originally announced February 2025.

Comments: 22 Pages, 12 Figures. First two authors have contributed equally. Accepted at ICML 2025

arXiv:2501.08058 [pdf, other]

Range-Only Dynamic Output Feedback Controller for Safe and Secure Target Circumnavigation

Authors: Anand Singh, Anoop Jain

Abstract: The safety and security of robotic systems are paramount when navigating around a hostile target. This paper addresses the problem of circumnavigating an unknown target by a unicycle robot while ensuring it maintains a desired safe distance and remains within the sensing region around the target throughout its motion. The proposed control design methodology is based on the construction of a joint… ▽ More The safety and security of robotic systems are paramount when navigating around a hostile target. This paper addresses the problem of circumnavigating an unknown target by a unicycle robot while ensuring it maintains a desired safe distance and remains within the sensing region around the target throughout its motion. The proposed control design methodology is based on the construction of a joint Lyapunov function that incorporates: (i) a quadratic potential function characterizing the desired target-circumnavigation objective, and (ii) a barrier Lyapunov function-based potential term to enforce safety and sensing constraints on the robot's motion. A notable feature of the proposed control design is its reliance exclusively on local range measurements between the robot and the target, realized using a dynamic output feedback controller that treats the range as the only observable output for feedback. Using the Lyapunov stability theory, we show that the desired equilibrium of the closed-loop system is asymptotically stable, and the prescribed safety and security constraints are met under the proposed controllers. We also obtain restrictive bounds on the post-design signals and provide both simulation and experimental results to validate the theoretical contributions. △ Less

Submitted 14 January, 2025; originally announced January 2025.

arXiv:2501.07197 [pdf]

Lung Cancer detection using Deep Learning

Authors: Aryan Chaudhari, Ankush Singh, Sanchi Gajbhiye, Pratham Agrawal

Abstract: In this paper we discuss lung cancer detection using hybrid model of Convolutional-Neural-Networks (CNNs) and Support-Vector-Machines-(SVMs) in order to gain early detection of tumors, benign or malignant. The work uses this hybrid model by training upon the Computed Tomography scans (CT scans) as dataset. Using deep learning for detecting lung cancer early is a cutting-edge method. In this paper we discuss lung cancer detection using hybrid model of Convolutional-Neural-Networks (CNNs) and Support-Vector-Machines-(SVMs) in order to gain early detection of tumors, benign or malignant. The work uses this hybrid model by training upon the Computed Tomography scans (CT scans) as dataset. Using deep learning for detecting lung cancer early is a cutting-edge method. △ Less

Submitted 13 January, 2025; originally announced January 2025.

arXiv:2501.03765 [pdf, other]

Image Segmentation: Inducing graph-based learning

Authors: Aryan Singh, Pepijn Van de Ven, Ciarán Eising, Patrick Denny

Abstract: This study explores the potential of graph neural networks (GNNs) to enhance semantic segmentation across diverse image modalities. We evaluate the effectiveness of a novel GNN-based U-Net architecture on three distinct datasets: PascalVOC, a standard benchmark for natural image segmentation, WoodScape, a challenging dataset of fisheye images commonly used in autonomous driving, introducing signif… ▽ More This study explores the potential of graph neural networks (GNNs) to enhance semantic segmentation across diverse image modalities. We evaluate the effectiveness of a novel GNN-based U-Net architecture on three distinct datasets: PascalVOC, a standard benchmark for natural image segmentation, WoodScape, a challenging dataset of fisheye images commonly used in autonomous driving, introducing significant geometric distortions; and ISIC2016, a dataset of dermoscopic images for skin lesion segmentation. We compare our proposed UNet-GNN model against established convolutional neural networks (CNNs) based segmentation models, including U-Net and U-Net++, as well as the transformer-based SwinUNet. Unlike these methods, which primarily rely on local convolutional operations or global self-attention, GNNs explicitly model relationships between image regions by constructing and operating on a graph representation of the image features. This approach allows the model to capture long-range dependencies and complex spatial relationships, which we hypothesize will be particularly beneficial for handling geometric distortions present in fisheye imagery and capturing intricate boundaries in medical images. Our analysis demonstrates the versatility of GNNs in addressing diverse segmentation challenges and highlights their potential to improve segmentation accuracy in various applications, including autonomous driving and medical image analysis. △ Less

Submitted 19 January, 2025; v1 submitted 7 January, 2025; originally announced January 2025.

arXiv:2412.05216 [pdf, other]

ColonNet: A Hybrid Of DenseNet121 And U-NET Model For Detection And Segmentation Of GI Bleeding

Authors: Ayushman Singh, Sharad Prakash, Aniket Das, Nidhi Kushwaha

Abstract: This study presents an integrated deep learning model for automatic detection and classification of Gastrointestinal bleeding in the frames extracted from Wireless Capsule Endoscopy (WCE) videos. The dataset has been released as part of Auto-WCBleedGen Challenge Version V2 hosted by the MISAHUB team. Our model attained the highest performance among 75 teams that took part in this competition. It a… ▽ More This study presents an integrated deep learning model for automatic detection and classification of Gastrointestinal bleeding in the frames extracted from Wireless Capsule Endoscopy (WCE) videos. The dataset has been released as part of Auto-WCBleedGen Challenge Version V2 hosted by the MISAHUB team. Our model attained the highest performance among 75 teams that took part in this competition. It aims to efficiently utilizes CNN based model i.e. DenseNet and UNet to detect and segment bleeding and non-bleeding areas in the real-world complex dataset. The model achieves an impressive overall accuracy of 80% which would surely help a skilled doctor to carry out further diagnostics. △ Less

Submitted 6 December, 2024; originally announced December 2024.

arXiv:2411.14100 [pdf, other]

BEST-STD: Bidirectional Mamba-Enhanced Speech Tokenization for Spoken Term Detection

Authors: Anup Singh, Kris Demuynck, Vipul Arora

Abstract: Spoken term detection (STD) is often hindered by reliance on frame-level features and the computationally intensive DTW-based template matching, limiting its practicality. To address these challenges, we propose a novel approach that encodes speech into discrete, speaker-agnostic semantic tokens. This facilitates fast retrieval using text-based search algorithms and effectively handles out-of-voca… ▽ More Spoken term detection (STD) is often hindered by reliance on frame-level features and the computationally intensive DTW-based template matching, limiting its practicality. To address these challenges, we propose a novel approach that encodes speech into discrete, speaker-agnostic semantic tokens. This facilitates fast retrieval using text-based search algorithms and effectively handles out-of-vocabulary terms. Our approach focuses on generating consistent token sequences across varying utterances of the same term. We also propose a bidirectional state space modeling within the Mamba encoder, trained in a self-supervised learning framework, to learn contextual frame-level features that are further encoded into discrete tokens. Our analysis shows that our speech tokens exhibit greater speaker invariance than those from existing tokenizers, making them more suitable for STD tasks. Empirical evaluation on LibriSpeech and TIMIT databases indicates that our method outperforms existing STD baselines while being more efficient. △ Less

Submitted 21 December, 2024; v1 submitted 21 November, 2024; originally announced November 2024.

Comments: Accepted at ICASSP 2025

arXiv:2411.12681 [pdf]

AI Guided Early Screening of Cervical Cancer

Authors: Dharanidharan S I, Suhitha Renuka S V, Ajishi Singh, Sheena Christabel Pravin

Abstract: In order to support the creation of reliable machine learning models for anomaly detection, this project focuses on preprocessing, enhancing, and organizing a medical imaging dataset. There are two classifications in the dataset: normal and abnormal, along with extra noise fluctuations. In order to improve the photographs' quality, undesirable artifacts, including visible medical equipment at the… ▽ More In order to support the creation of reliable machine learning models for anomaly detection, this project focuses on preprocessing, enhancing, and organizing a medical imaging dataset. There are two classifications in the dataset: normal and abnormal, along with extra noise fluctuations. In order to improve the photographs' quality, undesirable artifacts, including visible medical equipment at the edges, were eliminated using central cropping. Adjusting the brightness and contrast was one of the additional preprocessing processes. Normalization was then performed to normalize the data. To make classification jobs easier, the dataset was methodically handled by combining several image subsets into two primary categories: normal and pathological. To provide a strong training set that adapts well to real-world situations, sophisticated picture preprocessing techniques were used, such as contrast enhancement and real-time augmentation (including rotations, zooms, and brightness modifications). To guarantee efficient model evaluation, the data was subsequently divided into training and testing subsets. In order to create precise and effective machine learning models for medical anomaly detection, high-quality input data is ensured via this thorough approach. Because of the project pipeline's flexible and scalable design, it can be easily integrated with bigger clinical decision-support systems. △ Less

Submitted 19 November, 2024; originally announced November 2024.

arXiv:2411.09204 [pdf, other]

RibCageImp: A Deep Learning Framework for 3D Ribcage Implant Generation

Authors: Gyanendra Chaubey, Aiman Farooq, Azad Singh, Deepak Mishra

Abstract: The recovery of damaged or resected ribcage structures requires precise, custom-designed implants to restore the integrity and functionality of the thoracic cavity. Traditional implant design methods rely mainly on manual processes, making them time-consuming and susceptible to variability. In this work, we explore the feasibility of automated ribcage implant generation using deep learning. We pre… ▽ More The recovery of damaged or resected ribcage structures requires precise, custom-designed implants to restore the integrity and functionality of the thoracic cavity. Traditional implant design methods rely mainly on manual processes, making them time-consuming and susceptible to variability. In this work, we explore the feasibility of automated ribcage implant generation using deep learning. We present a framework based on 3D U-Net architecture that processes CT scans to generate patient-specific implant designs. To the best of our knowledge, this is the first investigation into automated thoracic implant generation using deep learning approaches. Our preliminary results, while moderate, highlight both the potential and the significant challenges in this complex domain. These findings establish a foundation for future research in automated ribcage reconstruction and identify key technical challenges that need to be addressed for practical implementation. △ Less

Submitted 14 November, 2024; originally announced November 2024.

arXiv:2411.01506 [pdf, other]

Degradation-Infused Energy Portfolio Allocation Framework: Risk-Averse Fair Storage Participation

Authors: Parikshit Pareek, L. P. Mohasha Isuru Sampath, Anshuman Singh, Lalit Goel, Hoay Beng Gooi, Hung Dinh Nguyen

Abstract: This work proposes a novel degradation-infused energy portfolio allocation (DI-EPA) framework for enabling the participation of battery energy storage systems in multi-service electricity markets. The proposed framework attempts to address the challenge of including the rainflow algorithm for cycle counting by directly developing a closed-form of marginal degradation as a function of dispatch deci… ▽ More This work proposes a novel degradation-infused energy portfolio allocation (DI-EPA) framework for enabling the participation of battery energy storage systems in multi-service electricity markets. The proposed framework attempts to address the challenge of including the rainflow algorithm for cycle counting by directly developing a closed-form of marginal degradation as a function of dispatch decisions. Further, this closed-form degradation profile is embedded into an energy portfolio allocation (EPA) problem designed for making the optimal dispatch decisions for all the batteries together, in a shared economy manner. We term the entity taking these decisions as `facilitator' which works as a link between storage units and market operators. The proposed EPA formulation is quipped with a conditional-value-at-risk (CVaR)-based mechanism to bring risk-averseness against uncertainty in market prices. The proposed DI-EPA problem introduces fairness by dividing the profits into various units using the idea of marginal contribution. Simulation results regarding the accuracy of the closed-form of degradation, effectiveness of CVaR in handling uncertainty within the EPA problem, and fairness in the context of degradation awareness are discussed. Numerical results indicate that the DI-EPA framework improves the net profit of the storage units by considering the effect of degradation in optimal market participation. △ Less

Submitted 4 November, 2024; v1 submitted 3 November, 2024; originally announced November 2024.

arXiv:2410.19858 [pdf, other]

Enhancing Deep Learning based RMT Data Inversion using Gaussian Random Field

Authors: Koustav Ghosal, Arun Singh, Samir Malakar, Shalivahan Srivastava, Deepak Gupta

Abstract: Deep learning (DL) methods have emerged as a powerful tool for the inversion of geophysical data. When applied to field data, these models often struggle without additional fine-tuning of the network. This is because they are built on the assumption that the statistical patterns in the training and test datasets are the same. To address this, we propose a DL-based inversion scheme for Radio Magnet… ▽ More Deep learning (DL) methods have emerged as a powerful tool for the inversion of geophysical data. When applied to field data, these models often struggle without additional fine-tuning of the network. This is because they are built on the assumption that the statistical patterns in the training and test datasets are the same. To address this, we propose a DL-based inversion scheme for Radio Magnetotelluric data where the subsurface resistivity models are generated using Gaussian Random Fields (GRF). The network's generalization ability was tested with an out-of-distribution (OOD) dataset comprising a homogeneous background and various rectangular-shaped anomalous bodies. After end-to-end training with the GRF dataset, the pre-trained network successfully identified anomalies in the OOD dataset. Synthetic experiments confirmed that the GRF dataset enhances generalization compared to a homogeneous background OOD dataset. The network accurately recovered structures in a checkerboard resistivity model, and demonstrated robustness to noise, outperforming traditional gradient-based methods. Finally, the developed scheme is tested using exemplary field data from a waste site near Roorkee, India. The proposed scheme enhances generalization in a data-driven supervised learning framework, suggesting a promising direction for OOD generalization in DL methods. △ Less

Submitted 22 October, 2024; originally announced October 2024.

arXiv:2410.19151 [pdf, other]

CapsuleNet: A Deep Learning Model To Classify GI Diseases Using EfficientNet-b7

Authors: Aniket Das, Ayushman Singh, Nishant, Sharad Prakash

Abstract: Gastrointestinal (GI) diseases represent a significant global health concern, with Capsule Endoscopy (CE) offering a non-invasive method for diagnosis by capturing a large number of GI tract images. However, the sheer volume of video frames necessitates automated analysis to reduce the workload on doctors and increase the diagnostic accuracy. In this paper, we present CapsuleNet, a deep learning m… ▽ More Gastrointestinal (GI) diseases represent a significant global health concern, with Capsule Endoscopy (CE) offering a non-invasive method for diagnosis by capturing a large number of GI tract images. However, the sheer volume of video frames necessitates automated analysis to reduce the workload on doctors and increase the diagnostic accuracy. In this paper, we present CapsuleNet, a deep learning model developed for the Capsule Vision 2024 Challenge, aimed at classifying 10 distinct GI abnormalities. Using a highly imbalanced dataset, we implemented various data augmentation strategies, reducing the data imbalance to a manageable level. Our model leverages a pretrained EfficientNet-b7 backbone, tuned with additional layers for classification and optimized with PReLU activation functions. The model demonstrated superior performance on validation data, achieving a micro accuracy of 84.5% and outperforming the VGG16 baseline across most classes. Despite these advances, challenges remain in classifying certain abnormalities, such as Erythema. Our findings suggest that CNN-based models like CapsuleNet can provide an efficient solution for GI tract disease classification, particularly when inference time is a critical factor. △ Less

Submitted 24 October, 2024; originally announced October 2024.

Comments: Capsule Vision 2024 Challenge

arXiv:2410.15321 [pdf, other]

Integrated Design and Control of a Robotic Arm on a Quadcopter for Enhanced Package Delivery

Authors: Animesh Singh, Jason Hillyer, Fariba Ariaei, Hossein Jula

Abstract: This paper presents a comprehensive design process for the integration of a robotic arm into a quadcopter, emphasizing the physical modeling, system integration, and controller development. Utilizing SolidWorks for mechanical design and MATLAB Simscape for simulation and control, this study addresses the challenges encountered in integrating the robotic arm with the drone, encompassing both mechan… ▽ More This paper presents a comprehensive design process for the integration of a robotic arm into a quadcopter, emphasizing the physical modeling, system integration, and controller development. Utilizing SolidWorks for mechanical design and MATLAB Simscape for simulation and control, this study addresses the challenges encountered in integrating the robotic arm with the drone, encompassing both mechanical and control aspects. Two types of controllers are developed and analyzed: a Proportional-Integral-Derivative (PID) controller and a Model Reference Adaptive Controller (MRAC). The design and tuning of these controllers are key components of this research, with the focus on their application in package delivery tasks. Extensive simulations demonstrate the performance of each controller, with PID controllers exhibiting superior trajectory tracking and lower Root Mean Square (RMS) errors under various payload conditions. The results underscore the efficacy of PID control for stable flight and precise maneuvering, while highlighting adaptability of MRAC to changing dynamics. △ Less

Submitted 20 October, 2024; originally announced October 2024.

arXiv:2410.07393 [pdf, other]

How Much Power Must We Extract From a Receiver Antenna to Effect Communications?

Authors: Thomas L. Marzetta, Brian McMinn, Amritpal Singh, Thorkild B. Hansen

Abstract: Subject to the laws of classical physics - the science that governs the design of today's wireless communication systems - there is no need to extract power from a receiver antenna in order to effect communications. If we dispense with a transmission line and, instead, make the front-end electronics colocated with the antenna, then a high input-impedance preamplifier can measure the open-circuit v… ▽ More Subject to the laws of classical physics - the science that governs the design of today's wireless communication systems - there is no need to extract power from a receiver antenna in order to effect communications. If we dispense with a transmission line and, instead, make the front-end electronics colocated with the antenna, then a high input-impedance preamplifier can measure the open-circuit voltage directly on the antenna port without drawing either current or power. Neither Friis' concept of noise figure, nor Shannon information theory, nor electronics technology dictates that we must extract power from an antenna. △ Less

Submitted 9 October, 2024; originally announced October 2024.

Comments: 10 pages

arXiv:2409.19015 [pdf, other]

Textless NLP -- Zero Resource Challenge with Low Resource Compute

Authors: Krithiga Ramadass, Abrit Pal Singh, Srihari J, Sheetal Kalyani

Abstract: This work addresses the persistent challenges of substantial training time and GPU resource requirements even when training lightweight encoder-vocoder models for Textless NLP. We reduce training steps significantly while improving performance by a) leveraging learning rate schedulers for efficient and faster convergence b) optimizing hop length and c) tuning the interpolation scale factors for be… ▽ More This work addresses the persistent challenges of substantial training time and GPU resource requirements even when training lightweight encoder-vocoder models for Textless NLP. We reduce training steps significantly while improving performance by a) leveraging learning rate schedulers for efficient and faster convergence b) optimizing hop length and c) tuning the interpolation scale factors for better audio quality. Additionally, we explore the latent space representation for Indian languages such as Tamil and Bengali for the acoustic unit discovery and voice conversion task. Our approach leverages a quantized encoder architecture, in conjunction with a vocoder which utilizes the proposed mixture of optimized hop length, tuned interpolation scale factors and a cyclic learning rate scheduler. We obtain consistently good results across English, Tamil and Bengali datasets. The proposed method excels in capturing complex linguistic patterns, resulting in clear reconstructed audio during voice conversion with significantly reduced training time. △ Less

Submitted 24 September, 2024; originally announced September 2024.

arXiv:2409.12616 [pdf, other]

Semi-Supervised Safe Visuomotor Policy Synthesis using Barrier Certificates

Authors: Manan Tayal, Aditya Singh, Pushpak Jagtap, Shishir Kolathaya

Abstract: In modern robotics, addressing the lack of accurate state space information in real-world scenarios has led to a significant focus on utilizing visuomotor observation to provide safety assurances. Although supervised learning methods, such as imitation learning, have demonstrated potential in synthesizing control policies based on visuomotor observations, they require ground truth safety labels fo… ▽ More In modern robotics, addressing the lack of accurate state space information in real-world scenarios has led to a significant focus on utilizing visuomotor observation to provide safety assurances. Although supervised learning methods, such as imitation learning, have demonstrated potential in synthesizing control policies based on visuomotor observations, they require ground truth safety labels for the complete dataset and do not provide formal safety assurances. On the other hand, traditional control-theoretic methods like Control Barrier Functions (CBFs) and Hamilton-Jacobi (HJ) Reachability provide formal safety guarantees but depend on accurate knowledge of system dynamics, which is often unavailable for high-dimensional visuomotor data. To overcome these limitations, we propose a novel approach to synthesize a semi-supervised safe visuomotor policy using barrier certificates that integrate the strengths of model-free supervised learning and model-based control methods. This framework synthesizes a provably safe controller without requiring safety labels for the complete dataset and ensures completeness guarantees for both the barrier certificate and the policy. We validate our approach through distinct case studies: an inverted pendulum system and the obstacle avoidance of an autonomous mobile robot. △ Less

Submitted 19 September, 2024; originally announced September 2024.

Comments: First two authors have contributed equally. 8 Pages, 3 figures

arXiv:2409.11262 [pdf, other]

The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection

Authors: Gabriel Bibbó, Thomas Deacon, Arshdeep Singh, Mark D. Plumbley

Abstract: This paper presents a residential audio dataset to support sound event detection research for smart home applications aimed at promoting wellbeing for older adults. The dataset is constructed by deploying audio recording systems in the homes of 8 participants aged 55-80 years for a 7-day period. Acoustic characteristics are documented through detailed floor plans and construction material informat… ▽ More This paper presents a residential audio dataset to support sound event detection research for smart home applications aimed at promoting wellbeing for older adults. The dataset is constructed by deploying audio recording systems in the homes of 8 participants aged 55-80 years for a 7-day period. Acoustic characteristics are documented through detailed floor plans and construction material information to enable replication of the recording environments for AI model deployment. A novel automated speech removal pipeline is developed, using pre-trained audio neural networks to detect and remove segments containing spoken voice, while preserving segments containing other sound events. The resulting dataset consists of privacy-compliant audio recordings that accurately capture the soundscapes and activities of daily living within residential spaces. The paper details the dataset creation methodology, the speech removal pipeline utilizing cascaded model architectures, and an analysis of the vocal label distribution to validate the speech removal process. This dataset enables the development and benchmarking of sound event detection models tailored specifically for in-home applications. △ Less

Submitted 4 October, 2024; v1 submitted 17 September, 2024; originally announced September 2024.

arXiv:2409.08384 [pdf, ps, other]

Noisy Low Rank Column-wise Sensing

Authors: Ankit Pratap Singh, Namrata Vaswani

Abstract: This letter studies the AltGDmin algorithm for solving the noisy low rank column-wise sensing (LRCS) problem. Our sample complexity guarantee improves upon the best existing one by a factor $\max(r, \log(1/ε))/r$ where $r$ is the rank of the unknown matrix and $ε$ is the final desired accuracy. A second contribution of this work is a detailed comparison of guarantees from all work that studies the… ▽ More This letter studies the AltGDmin algorithm for solving the noisy low rank column-wise sensing (LRCS) problem. Our sample complexity guarantee improves upon the best existing one by a factor $\max(r, \log(1/ε))/r$ where $r$ is the rank of the unknown matrix and $ε$ is the final desired accuracy. A second contribution of this work is a detailed comparison of guarantees from all work that studies the exact same mathematical problem as LRCS, but refers to it by different names. △ Less

Submitted 24 March, 2025; v1 submitted 12 September, 2024; originally announced September 2024.

Comments: 9 pages

arXiv:2407.19229 [pdf, other]

Impact of Transmission Dynamics and Treatment Uptake, Frequency and Timing on the Cost-effectiveness of Directly Acting Antivirals for Hepatitis C Virus Infection

Authors: Soham Das, Ajit Sood, Vandana Midha, Arshdeep Singh, Pranjl Sharma, Varun Ramamohan

Abstract: Cost-effectiveness analyses, based on decision-analytic models of disease progression and treatment, are routinely used to assess the economic value of a new intervention and consequently inform reimbursement decisions for the intervention. Many decision-analytic models developed to assess the economic value of highly effective directly acting antiviral (DAA) treatments for the hepatitis C virus (… ▽ More Cost-effectiveness analyses, based on decision-analytic models of disease progression and treatment, are routinely used to assess the economic value of a new intervention and consequently inform reimbursement decisions for the intervention. Many decision-analytic models developed to assess the economic value of highly effective directly acting antiviral (DAA) treatments for the hepatitis C virus (HCV) infection do not incorporate the transmission dynamics of HCV, accounting for which is required to estimate the number of downstream infections prevented by curing an infection. In this study, we develop and validate a comprehensive agent-based simulation (ABS) model of HCV transmission dynamics in the Indian context and use it to: (a) quantify the extent to which the cost-effectiveness of a DAA is underestimated - as a function of its uptake rate - if disease transmission dynamics are not considered in a cost-effectiveness analysis model; and (b) quantify the impact of the frequency and timing of treatment with DAAs, also as a function of their uptake rate, within a disease surveillance period on its cost-effectiveness. The process of accomplishing the above research objectives also motivated the development of a novel random sampling and allocation based approach, along with associated theoretical grounding, to estimate individual-level outcomes within an ABS that incurs substantially lower computational expense than the benchmark incremental accumulation approach. △ Less

Submitted 17 September, 2024; v1 submitted 27 July, 2024; originally announced July 2024.

arXiv:2407.15423 [pdf, other]

Integrating IP Broadcasting with Audio Tags: Workflow and Challenges

Authors: Rhys Burchett-Vass, Arshdeep Singh, Gabriel Bibbó, Mark D. Plumbley

Abstract: The broadcasting industry is increasingly adopting IP techniques, revolutionising both live and pre-recorded content production, from news gathering to live music events. IP broadcasting allows for the transport of audio and video signals in an easily configurable way, aligning with modern networking techniques. This shift towards an IP workflow allows for much greater flexibility, not only in rou… ▽ More The broadcasting industry is increasingly adopting IP techniques, revolutionising both live and pre-recorded content production, from news gathering to live music events. IP broadcasting allows for the transport of audio and video signals in an easily configurable way, aligning with modern networking techniques. This shift towards an IP workflow allows for much greater flexibility, not only in routing signals but with the integration of tools using standard web development techniques. One possible tool could include the use of live audio tagging, which has a number of uses in the production of content. These include from automated closed captioning to identifying unwanted sound events within a scene. In this paper, we describe the process of containerising an audio tagging model into a microservice, a small segregated code module that can be integrated into a multitude of different network setups. The goal is to develop a modular, accessible, and flexible tool capable of seamless deployment into broadcasting workflows of all sizes, from small productions to large corporations. Challenges surrounding latency of the selected audio tagging model and its effect on the usefulness of the end product are discussed. △ Less

Submitted 23 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

Comments: Submitted to DCASE 2024 Workshop

arXiv:2406.20005 [pdf, other]

Malaria Cell Detection Using Deep Neural Networks

Authors: Saurabh Sawant, Anurag Singh

Abstract: Malaria remains one of the most pressing public health concerns globally, causing significant morbidity and mortality, especially in sub-Saharan Africa. Rapid and accurate diagnosis is crucial for effective treatment and disease management. Traditional diagnostic methods, such as microscopic examination of blood smears, are labor-intensive and require significant expertise, which may not be readil… ▽ More Malaria remains one of the most pressing public health concerns globally, causing significant morbidity and mortality, especially in sub-Saharan Africa. Rapid and accurate diagnosis is crucial for effective treatment and disease management. Traditional diagnostic methods, such as microscopic examination of blood smears, are labor-intensive and require significant expertise, which may not be readily available in resource-limited settings. This project aims to automate the detection of malaria-infected cells using a deep learning approach. We employed a convolutional neural network (CNN) based on the ResNet50 architecture, leveraging transfer learning to enhance performance. The Malaria Cell Images Dataset from Kaggle, containing 27,558 images categorized into infected and uninfected cells, was used for training and evaluation. Our model demonstrated high accuracy, precision, and recall, indicating its potential as a reliable tool for assisting in malaria diagnosis. Additionally, a web application was developed using Streamlit to allow users to upload cell images and receive predictions about malaria infection, making the technology accessible and user-friendly. This paper provides a comprehensive overview of the methodology, experiments, and results, highlighting the effectiveness of deep learning in medical image analysis. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.17339 [pdf, other]

Optimizing Configuration Selection in Reconfigurable-Antenna MIMO Systems: Physics-Inspired Heuristic Solvers

Authors: I. Krikidis, C. Psomas, A. K. Singh, K. Jamieson

Abstract: Reconfigurable antenna multiple-input multiple-output (MIMO) is a foundational technology for the continuing evolution of cellular systems, including upcoming 6G communication systems. In this paper, we address the problem of flexible/reconfigurable antenna configuration selection for point-to-point MIMO antenna systems by using physics-inspired heuristics. Firstly, we optimize the antenna configu… ▽ More Reconfigurable antenna multiple-input multiple-output (MIMO) is a foundational technology for the continuing evolution of cellular systems, including upcoming 6G communication systems. In this paper, we address the problem of flexible/reconfigurable antenna configuration selection for point-to-point MIMO antenna systems by using physics-inspired heuristics. Firstly, we optimize the antenna configuration to maximize the signal-to-noise ratio (SNR) at the receiver by leveraging two basic heuristic solvers, i.e., coherent Ising machines (CIMs), that mimic quantum mechanical dynamics, and quantum annealing (QA), where a real-world QA architecture is considered (D-Wave). A mathematical framework that converts the configuration selection problem into CIM- and QA- compatible unconstrained quadratic formulations is investigated. Numerical and experimental results show that the proposed designs outperform classical counterparts and achieve near-optimal performance (similar to exhaustive search with exponential complexity) while ensuring polynomial complexity. Moreover, we study the optimal antenna configuration that maximizes the end-to-end Shannon capacity. A simulated annealing (SA) heuristic which achieves near-optimal performance through appropriate parameterization is adopted. A modified version of the basic SA that exploits parallel tempering to avoid local maxima is also studied, which provides additional performance gains. Extended numerical studies show that the SA solutions outperform conventional heuristics (which are also developed for comparison purposes), while the employment of the SNR-based solutions is highly sub-optimal. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: arXiv admin note: text overlap with arXiv:2403.12571

Journal ref: IEEE Transactions on Communications, 2004

arXiv:2406.09661 [pdf, other]

Temporal Planning via Interval Logic Satisfiability for Autonomous Systems

Authors: Miquel Ramirez, Anubhav Singh, Peter Stuckey, Chris Manzie

Abstract: Many automated planning methods and formulations rely on suitably designed abstractions or simplifications of the constrained dynamics associated with agents to attain computational scalability. We consider formulations of temporal planning where intervals are associated with both action and fluent atoms, and relations between these are given as sentences in Allen's Interval Logic. We propose a no… ▽ More Many automated planning methods and formulations rely on suitably designed abstractions or simplifications of the constrained dynamics associated with agents to attain computational scalability. We consider formulations of temporal planning where intervals are associated with both action and fluent atoms, and relations between these are given as sentences in Allen's Interval Logic. We propose a notion of planning graphs that can account for complex concurrency relations between actions and fluents as a Constraint Programming (CP) model. We test an implementation of our algorithm on a state-of-the-art framework for CP and compare it with PDDL 2.1 planners that capture plans requiring complex concurrent interactions between agents. We demonstrate our algorithm outperforms existing PDDL 2.1 planners in the case studies. Still, scalability remains challenging when plans must comply with intricate concurrent interactions and the sequencing of actions. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: This publication is an extended version of a manuscript submitted to ICAPS-24 (and rejected). Please contact the first author for queries, comments or discussion of the paper

arXiv:2404.03307 [pdf, other]

Bi-level Trajectory Optimization on Uneven Terrains with Differentiable Wheel-Terrain Interaction Model

Authors: Amith Manoharan, Aditya Sharma, Himani Belsare, Kaustab Pal, K. Madhava Krishna, Arun Kumar Singh

Abstract: Navigation of wheeled vehicles on uneven terrain necessitates going beyond the 2D approaches for trajectory planning. Specifically, it is essential to incorporate the full 6dof variation of vehicle pose and its associated stability cost in the planning process. To this end, most recent works aim to learn a neural network model to predict the vehicle evolution. However, such approaches are data-int… ▽ More Navigation of wheeled vehicles on uneven terrain necessitates going beyond the 2D approaches for trajectory planning. Specifically, it is essential to incorporate the full 6dof variation of vehicle pose and its associated stability cost in the planning process. To this end, most recent works aim to learn a neural network model to predict the vehicle evolution. However, such approaches are data-intensive and fraught with generalization issues. In this paper, we present a purely model-based approach that just requires the digital elevation information of the terrain. Specifically, we express the wheel-terrain interaction and 6dof pose prediction as a non-linear least squares (NLS) problem. As a result, trajectory planning can be viewed as a bi-level optimization. The inner optimization layer predicts the pose on the terrain along a given trajectory, while the outer layer deforms the trajectory itself to reduce the stability and kinematic costs of the pose. We improve the state-of-the-art in the following respects. First, we show that our NLS based pose prediction closely matches the output from a high-fidelity physics engine. This result coupled with the fact that we can query gradients of the NLS solver, makes our pose predictor, a differentiable wheel-terrain interaction model. We further leverage this differentiability to efficiently solve the proposed bi-level trajectory optimization problem. Finally, we perform extensive experiments, and comparison with a baseline to showcase the effectiveness of our approach in obtaining smooth, stable trajectories. △ Less

Submitted 22 November, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: 8 pages, 7 figures, submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

arXiv:2404.00814 [pdf, other]

Exact Imposition of Safety Boundary Conditions in Neural Reachable Tubes

Authors: Aditya Singh, Zeyuan Feng, Somil Bansal

Abstract: Hamilton-Jacobi (HJ) reachability analysis is a widely adopted verification tool to provide safety and performance guarantees for autonomous systems. However, it involves solving a partial differential equation (PDE) to compute a safety value function, whose computational and memory complexity scales exponentially with the state dimension, making its direct application to large-scale systems intra… ▽ More Hamilton-Jacobi (HJ) reachability analysis is a widely adopted verification tool to provide safety and performance guarantees for autonomous systems. However, it involves solving a partial differential equation (PDE) to compute a safety value function, whose computational and memory complexity scales exponentially with the state dimension, making its direct application to large-scale systems intractable. To overcome these challenges, DeepReach, a recently proposed learning-based approach, approximates high-dimensional reachable tubes using neural networks (NNs). While shown to be effective, the accuracy of the learned solution decreases with system complexity. One of the reasons for this degradation is a soft imposition of safety constraints during the learning process, which corresponds to the boundary conditions of the PDE, resulting in inaccurate value functions. In this work, we propose ExactBC, a variant of DeepReach that imposes safety constraints exactly during the learning process by restructuring the overall value function as a weighted sum of the boundary condition and the NN output. Moreover, the proposed variant no longer needs a boundary loss term during the training process, thus eliminating the need to balance different loss terms. We demonstrate the efficacy of the proposed approach in significantly improving the accuracy of the learned value function for four challenging reachability tasks: a rimless wheel system with state resets, collision avoidance in a cluttered environment, autonomous rocket landing, and multi-aircraft collision avoidance. △ Less

Submitted 9 May, 2025; v1 submitted 31 March, 2024; originally announced April 2024.

Comments: First two authors have contributed equally. 7 Pages, 3 figures. Accepted at ICRA 2025

arXiv:2403.12571 [pdf, other]

Optimizing Reconfigurable Antenna MIMO Systems with Coherent Ising Machines

Authors: Ioannis Krikidis, Abhishek Kumar Singh, Kyle Jamieson

Abstract: Reconfigurable antenna multiple-input multiple-output (MIMO) is a promising technology for upcoming 6G communication systems. In this paper, we deal with the problem of configuration selection for reconfigurable antenna MIMO by leveraging Coherent Ising Machines (CIMs). By adopting the CIM as a heuristic solver for the Ising problem, the optimal antenna configuration that maximizes the received si… ▽ More Reconfigurable antenna multiple-input multiple-output (MIMO) is a promising technology for upcoming 6G communication systems. In this paper, we deal with the problem of configuration selection for reconfigurable antenna MIMO by leveraging Coherent Ising Machines (CIMs). By adopting the CIM as a heuristic solver for the Ising problem, the optimal antenna configuration that maximizes the received signal-to-noise ratio is investigated. A mathematical framework that converts the selection problem into a CIM-compatible unconstrained quadratic formulation is presented. Numerical studies show that the proposed CIM-based design outperforms classical counterparts and achieves near-optimal performance (similar to exponentially complex exhaustive searching) while ensuring polynomial complexity. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Journal ref: IEEE International Conference on Communications (ICC), June 2024

arXiv:2403.11504 [pdf, other]

MLVICX: Multi-Level Variance-Covariance Exploration for Chest X-ray Self-Supervised Representation Learning

Authors: Azad Singh, Vandan Gorade, Deepak Mishra

Abstract: Self-supervised learning (SSL) is potentially useful in reducing the need for manual annotation and making deep learning models accessible for medical image analysis tasks. By leveraging the representations learned from unlabeled data, self-supervised models perform well on tasks that require little to no fine-tuning. However, for medical images, like chest X-rays, which are characterized by compl… ▽ More Self-supervised learning (SSL) is potentially useful in reducing the need for manual annotation and making deep learning models accessible for medical image analysis tasks. By leveraging the representations learned from unlabeled data, self-supervised models perform well on tasks that require little to no fine-tuning. However, for medical images, like chest X-rays, which are characterized by complex anatomical structures and diverse clinical conditions, there arises a need for representation learning techniques that can encode fine-grained details while preserving the broader contextual information. In this context, we introduce MLVICX (Multi-Level Variance-Covariance Exploration for Chest X-ray Self-Supervised Representation Learning), an approach to capture rich representations in the form of embeddings from chest X-ray images. Central to our approach is a novel multi-level variance and covariance exploration strategy that empowers the model to detect diagnostically meaningful patterns while reducing redundancy effectively. By enhancing the variance and covariance of the learned embeddings, MLVICX promotes the retention of critical medical insights by adapting both global and local contextual details. We demonstrate the performance of MLVICX in advancing self-supervised chest X-ray representation learning through comprehensive experiments. The performance enhancements we observe across various downstream tasks highlight the significance of the proposed approach in enhancing the utility of chest X-ray embeddings for precision medical diagnosis and comprehensive image analysis. For pertaining, we used the NIH-Chest X-ray dataset, while for downstream tasks, we utilized NIH-Chest X-ray, Vinbig-CXR, RSNA pneumonia, and SIIM-ACR Pneumothorax datasets. Overall, we observe more than 3% performance gains over SOTA SSL approaches in various downstream tasks. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.04168 [pdf, other]

doi 10.1109/ICC51166.2024.10622180

Impact of the Antenna on the Sub-Terahertz Indoor Channel Characteristics: An Experimental Approach

Authors: Priyangshu Sen, Sherif Badran, Vitaly Petrov, Arjun Singh, Josep M. Jornet

Abstract: Terahertz-band (100 GHz-10 THz) communication is a promising radio technology envisioned to enable ultra-high data rate, reliable and low-latency wireless connectivity in next-generation wireless systems. However, the low transmission power of THz transmitters, the need for high gain directional antennas, and the complex interaction of THz radiation with common objects along the propagation path m… ▽ More Terahertz-band (100 GHz-10 THz) communication is a promising radio technology envisioned to enable ultra-high data rate, reliable and low-latency wireless connectivity in next-generation wireless systems. However, the low transmission power of THz transmitters, the need for high gain directional antennas, and the complex interaction of THz radiation with common objects along the propagation path make crucial the understanding of the THz channel. In this paper, we conduct an extensive channel measurement campaign in an indoor setting (i.e., a conference room) through a channel sounder with 0.1 ns time resolution and 20 GHz bandwidth at 140 GHz. Particularly, the impact of different antenna directivities (and, thus, beam widths) on the channel characteristics is extensively studied. The experimentally obtained dataset is processed to develop the path loss model and, subsequently, derive key channel metrics such as the path loss exponent, delay spread, and K-factor. The results highlight the multi-faceted impact of the antenna gain on the channel and, by extension, the wireless system and, thus, show that an antenna-agnostic channel model cannot capture the propagation characteristics of the THz channel. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: Accepted and to be published in IEEE ICC 2024. Copyright © 2024 by the Institute of Electrical and Electronics Engineers (IEEE). Permission to make digital or hard copies of portions of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage

Journal ref: ICC 2024 - IEEE International Conference on Communications, Denver, CO, USA, 2024, pp. 2537-2542

arXiv:2312.00698 [pdf, other]

SPIRE-SIES: A Spontaneous Indian English Speech Corpus

Authors: Abhayjeet Singh, Charu Shah, Rajashri Varadaraj, Sonakshi Chauhan, Prasanta Kumar Ghosh

Abstract: In this paper, we present a 170.83 hour Indian English spontaneous speech dataset. Lack of Indian English speech data is one of the major hindrances in developing robust speech systems which are adapted to the Indian speech style. Moreover this scarcity is even more for spontaneous speech. This corpus is crowd sourced over varied Indian nativities, genders and age groups. Traditional spontaneous s… ▽ More In this paper, we present a 170.83 hour Indian English spontaneous speech dataset. Lack of Indian English speech data is one of the major hindrances in developing robust speech systems which are adapted to the Indian speech style. Moreover this scarcity is even more for spontaneous speech. This corpus is crowd sourced over varied Indian nativities, genders and age groups. Traditional spontaneous speech collection strategies involve capturing of speech during interviewing or conversations. In this study, we use images as stimuli to induce spontaneity in speech. Transcripts for 23 hours is generated and validated which can serve as a spontaneous speech ASR benchmark. Quality of the corpus is validated with voice activity detection based segmentation, gender verification and image semantic correlation. Which determines a relationship between image stimulus and recorded speech using caption keywords derived from Image2Text model and high occurring words derived from whisper ASR generated transcripts. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: 6 pages, 7 plots, 3 tables, Accepted at O-COCOSDA 2023

arXiv:2311.07068 [pdf, other]

Shannon Theory for Wireless Communication in a Resonant Chamber

Authors: Amritpal Singh, Thomas Marzetta

Abstract: A closed electromagnetic resonant chamber (RC) is a highly favorable artificial environment for wireless communication. A pair of antennas within the chamber constitutes a two-port network described by an impedance matrix. We analyze communication between the two antennas when the RC has perfectly conducting walls and the impedance matrix is imaginary-valued. The transmit antenna is driven by a cu… ▽ More A closed electromagnetic resonant chamber (RC) is a highly favorable artificial environment for wireless communication. A pair of antennas within the chamber constitutes a two-port network described by an impedance matrix. We analyze communication between the two antennas when the RC has perfectly conducting walls and the impedance matrix is imaginary-valued. The transmit antenna is driven by a current source, and the receive antenna is connected to a load resistor whose voltage is measured by an infinite-impedance amplifier. There are a countably infinite number of poles in the channel, associated with resonance in the RC, which migrate towards the real frequency axis as the load resistance increases. There are two sources of receiver noise: the Johnson noise of the load resistor, and the internal amplifier noise. An application of Shannon theory yields the capacity of the link, subject to bandwidth and power constraints on the transmit current. For a constant transmit power, capacity increases without bound as the load resistance increases. Surprisingly, the capacity-attaining allocation of transmit power versus frequency avoids placing power close to the resonant frequencies. △ Less

Submitted 14 November, 2023; v1 submitted 12 November, 2023; originally announced November 2023.

Comments: 10 pages, 13 figures. To be published in IEEE Journal on Selected Areas in Communications Special Issue on Electromagnetic Signal and Information Theory

arXiv:2311.06329 [pdf]

doi 10.1109/AIRC57904.2023.10303174

A Survey of AI Text-to-Image and AI Text-to-Video Generators

Authors: Aditi Singh

Abstract: Text-to-Image and Text-to-Video AI generation models are revolutionary technologies that use deep learning and natural language processing (NLP) techniques to create images and videos from textual descriptions. This paper investigates cutting-edge approaches in the discipline of Text-to-Image and Text-to-Video AI generations. The survey provides an overview of the existing literature as well as an… ▽ More Text-to-Image and Text-to-Video AI generation models are revolutionary technologies that use deep learning and natural language processing (NLP) techniques to create images and videos from textual descriptions. This paper investigates cutting-edge approaches in the discipline of Text-to-Image and Text-to-Video AI generations. The survey provides an overview of the existing literature as well as an analysis of the approaches used in various studies. It covers data preprocessing techniques, neural network types, and evaluation metrics used in the field. In addition, the paper discusses the challenges and limitations of Text-to-Image and Text-to-Video AI generations, as well as future research directions. Overall, these models have promising potential for a wide range of applications such as video production, content creation, and digital marketing. △ Less

Submitted 10 November, 2023; originally announced November 2023.

Comments: 4 pages, 2 tables, 4th International Conference on Artificial Intelligence, Robotics and Control (AIRC 2023)

arXiv:2310.08846 [pdf, other]

Speaking rate attention-based duration prediction for speed control TTS

Authors: Jesuraj Bandekar, Sathvik Udupa, Abhayjeet Singh, Anjali Jayakumar, Deekshitha G, Sandhya Badiger, Saurabh Kumar, Pooja VH, Prasanta Kumar Ghosh

Abstract: With the advent of high-quality speech synthesis, there is a lot of interest in controlling various prosodic attributes of speech. Speaking rate is an essential attribute towards modelling the expressivity of speech. In this work, we propose a novel approach to control the speaking rate for non-autoregressive TTS. We achieve this by conditioning the speaking rate inside the duration predictor, all… ▽ More With the advent of high-quality speech synthesis, there is a lot of interest in controlling various prosodic attributes of speech. Speaking rate is an essential attribute towards modelling the expressivity of speech. In this work, we propose a novel approach to control the speaking rate for non-autoregressive TTS. We achieve this by conditioning the speaking rate inside the duration predictor, allowing implicit speaking rate control. We show the benefits of this approach by synthesising audio at various speaking rate factors and measuring the quality of speaking rate-controlled synthesised speech. Further, we study the effect of the speaking rate distribution of the training data towards effective rate control. Finally, we fine-tune a baseline pretrained TTS model to obtain speaking rate control TTS. We provide various analyses to showcase the benefits of using this proposed approach, along with objective as well as subjective metrics. We find that the proposed methods have higher subjective scores and lower speaker rate errors across many speaking rate factors over the baseline. △ Less

Submitted 13 October, 2023; originally announced October 2023.

Comments: \c{opyright} 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2310.07727 [pdf, other]

Deep Learning based Systems for Crater Detection: A Review

Authors: Atal Tewari, K Prateek, Amrita Singh, Nitin Khanna

Abstract: Craters are one of the most prominent features on planetary surfaces, used in applications such as age estimation, hazard detection, and spacecraft navigation. Crater detection is a challenging problem due to various aspects, including complex crater characteristics such as varying sizes and shapes, data resolution, and planetary data types. Similar to other computer vision tasks, deep learning-ba… ▽ More Craters are one of the most prominent features on planetary surfaces, used in applications such as age estimation, hazard detection, and spacecraft navigation. Crater detection is a challenging problem due to various aspects, including complex crater characteristics such as varying sizes and shapes, data resolution, and planetary data types. Similar to other computer vision tasks, deep learning-based approaches have significantly impacted research on crater detection in recent years. This survey aims to assist researchers in this field by examining the development of deep learning-based crater detection algorithms (CDAs). The review includes over 140 research works covering diverse crater detection approaches, including planetary data, craters database, and evaluation metrics. To be specific, we discuss the challenges in crater detection due to the complex properties of the craters and survey the DL-based CDAs by categorizing them into three parts: (a) semantic segmentation-based, (b) object detection-based, and (c) classification-based. Additionally, we have conducted training and testing of all the semantic segmentation-based CDAs on a common dataset to evaluate the effectiveness of each architecture for crater detection and its potential applications. Finally, we have provided recommendations for potential future works. △ Less

Submitted 28 September, 2023; originally announced October 2023.

arXiv:2309.05813 [pdf, other]

doi 10.1145/3615360.3625093

Design and Validation of a Metallic Reflectarray for Communications at True Terahertz Frequencies

Authors: Sherif Badran, Arjun Singh, Arpit Jaiswal, Erik Einarsson, Josep M. Jornet

Abstract: Wireless communications in the terahertz band (0.1-10 THz) is a promising and key wireless technology enabling ultra-high data rate communication over multi-gigahertz-wide bandwidths, thus fulfilling the demand for denser networks. The complex propagation environment at such high frequencies introduces several challenges, such as high spreading and molecular absorption losses. As such, intelligent… ▽ More Wireless communications in the terahertz band (0.1-10 THz) is a promising and key wireless technology enabling ultra-high data rate communication over multi-gigahertz-wide bandwidths, thus fulfilling the demand for denser networks. The complex propagation environment at such high frequencies introduces several challenges, such as high spreading and molecular absorption losses. As such, intelligent reflecting surfaces have been proposed as a promising solution to enable communication in the presence of blockage or to aid a resource-limited quasi-omnidirectional transmitter direct its radiated power. In this paper, we present a metallic reflectarray design achieving controlled non-specular reflection at true terahertz frequencies (i.e., 1-1.05 THz). We conduct extensive experiments to further characterize and validate its working principle using terahertz time-domain spectroscopy and demonstrate its effectiveness with information-carrying signals using a continuous-wave terahertz testbed. Our results show that the reflectarray can help facilitate robust communication links over non-specular paths and improve the reliability of terahertz communications, thereby unleashing the true potential of the terahertz band. △ Less

Submitted 18 September, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

Comments: Accepted and to be published in ACM mmNets 2023. Copyright © 2023 by the Association for Computing Machinery, Inc. (ACM). Permission to make digital or hard copies of portions of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage

Journal ref: Proceedings of the 7th ACM Workshop on Millimeter-Wave and Terahertz Networks and Sensing Systems, ser. mmNets '23, Madrid, Spain: Association for Computing Machinery, 2024, pp. 19-24

arXiv:2309.04651 [pdf]

Video and Synthetic MRI Pre-training of 3D Vision Architectures for Neuroimage Analysis

Authors: Nikhil J. Dhinagar, Amit Singh, Saket Ozarkar, Ketaki Buwa, Sophia I. Thomopoulos, Conor Owens-Walton, Emily Laltoo, Yao-Liang Chen, Philip Cook, Corey McMillan, Chih-Chien Tsai, J-J Wang, Yih-Ru Wu, Paul M. Thompson

Abstract: Transfer learning represents a recent paradigm shift in the way we build artificial intelligence (AI) systems. In contrast to training task-specific models, transfer learning involves pre-training deep learning models on a large corpus of data and minimally fine-tuning them for adaptation to specific tasks. Even so, for 3D medical imaging tasks, we do not know if it is best to pre-train models on… ▽ More Transfer learning represents a recent paradigm shift in the way we build artificial intelligence (AI) systems. In contrast to training task-specific models, transfer learning involves pre-training deep learning models on a large corpus of data and minimally fine-tuning them for adaptation to specific tasks. Even so, for 3D medical imaging tasks, we do not know if it is best to pre-train models on natural images, medical images, or even synthetically generated MRI scans or video data. To evaluate these alternatives, here we benchmarked vision transformers (ViTs) and convolutional neural networks (CNNs), initialized with varied upstream pre-training approaches. These methods were then adapted to three unique downstream neuroimaging tasks with a range of difficulty: Alzheimer's disease (AD) and Parkinson's disease (PD) classification, "brain age" prediction. Experimental tests led to the following key observations: 1. Pre-training improved performance across all tasks including a boost of 7.4% for AD classification and 4.6% for PD classification for the ViT and 19.1% for PD classification and reduction in brain age prediction error by 1.26 years for CNNs, 2. Pre-training on large-scale video or synthetic MRI data boosted performance of ViTs, 3. CNNs were robust in limited-data settings, and in-domain pretraining enhanced their performances, 4. Pre-training improved generalization to out-of-distribution datasets and sites. Overall, we benchmarked different vision architectures, revealing the value of pre-training them with emerging datasets for model initialization. The resulting pre-trained models can be adapted to a range of downstream neuroimaging tasks, even when training data for the target task is limited. △ Less

Submitted 8 September, 2023; originally announced September 2023.

arXiv:2308.08713 [pdf, ps, other]

Decoding Emotions: A comprehensive Multilingual Study of Speech Models for Speech Emotion Recognition

Authors: Anant Singh, Akshat Gupta

Abstract: Recent advancements in transformer-based speech representation models have greatly transformed speech processing. However, there has been limited research conducted on evaluating these models for speech emotion recognition (SER) across multiple languages and examining their internal representations. This article addresses these gaps by presenting a comprehensive benchmark for SER with eight speech… ▽ More Recent advancements in transformer-based speech representation models have greatly transformed speech processing. However, there has been limited research conducted on evaluating these models for speech emotion recognition (SER) across multiple languages and examining their internal representations. This article addresses these gaps by presenting a comprehensive benchmark for SER with eight speech representation models and six different languages. We conducted probing experiments to gain insights into inner workings of these models for SER. We find that using features from a single optimal layer of a speech model reduces the error rate by 32\% on average across seven datasets when compared to systems where features from all layers of speech models are used. We also achieve state-of-the-art results for German and Persian languages. Our probing results indicate that the middle layers of speech models capture the most important emotional information for speech emotion recognition. △ Less

Submitted 16 August, 2023; originally announced August 2023.

arXiv:2308.08302 [pdf, ps, other]

PSA Based Power Control for Cell-Free Massive MIMO under LoS/NLoS Channels

Authors: Ashish Pratap Singh, Ribhu Chopra

Abstract: A primary design goal of the cell-free~(CF) massive MIMO architecture is to provide uniformly good coverage to all the user equipments~(UEs) connected to the network. However, it has been found that this requirement may not be satisfied in case the channels between the access points~(APs) and the UEs are mixed LoS/NLoS. In this paper, we try to address this issue via the use of appropriate power c… ▽ More A primary design goal of the cell-free~(CF) massive MIMO architecture is to provide uniformly good coverage to all the user equipments~(UEs) connected to the network. However, it has been found that this requirement may not be satisfied in case the channels between the access points~(APs) and the UEs are mixed LoS/NLoS. In this paper, we try to address this issue via the use of appropriate power control in both the uplink and downlink of a CF massive MIMO system under mixed LoS/NLoS channels. We find that simplistic power control techniques, such as channel inversion-based power control perform sub-optimally as compared to max-min power control. As a consequence, we propose a particle swarm algorithm~(PSA) based power control algorithm to optimize the performance of the system under study. We then use numerical simulations to evaluate the performance of the proposed PSA-based solution and show that it results in a significant improvement in the fairness of the underlying system while incurring a lower computational complexity. △ Less

Submitted 16 August, 2023; originally announced August 2023.

Comments: 10 pages, 10 figures

Showing 1–50 of 140 results for author: Singh, A