-
DuRep: Dual-Mode Speech Representation Learning via ASR-Aware Distillation
Authors:
Prabash Reddy Male,
Swayambhu Nath Ray,
Harish Arsikere,
Akshat Jaiswal,
Prakhar Swarup,
Prantik Sen,
Debmalya Chakrabarty,
K V Vijay Girish,
Nikhil Bhave,
Frederick Weber,
Sambuddha Bhattacharya,
Sri Garimella
Abstract:
Recent advancements in speech encoders have drawn attention due to their integration with Large Language Models for various speech tasks. While most research has focused on either causal or full-context speech encoders, there's limited exploration to effectively handle both streaming and non-streaming applications, while achieving state-of-the-art performance. We introduce DuRep, a Dual-mode Speec…
▽ More
Recent advancements in speech encoders have drawn attention due to their integration with Large Language Models for various speech tasks. While most research has focused on either causal or full-context speech encoders, there's limited exploration to effectively handle both streaming and non-streaming applications, while achieving state-of-the-art performance. We introduce DuRep, a Dual-mode Speech Representation learning setup, which enables a single speech encoder to function efficiently in both offline and online modes without additional parameters or mode-specific adjustments, across downstream tasks. DuRep-200M, our 200M parameter dual-mode encoder, achieves 12% and 11.6% improvements in streaming and non-streaming modes, over baseline encoders on Multilingual ASR. Scaling this approach to 2B parameters, DuRep-2B sets new performance benchmarks across ASR and non-ASR tasks. Our analysis reveals interesting trade-offs between acoustic and semantic information across encoder layers.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
On Weak bounded negativity conjecture
Authors:
Snehajit Misra,
Nabanita Ray
Abstract:
In the first part of this article, we give bounds on self-intersections $C^2$ of integral curves $C$ on blow-ups $Bl_nX$ of surfaces $X$ with the anti-cannonical divisor $-K_X$ effective. In the last part, we prove the weak bounded negativity for self-intersections $C^2$ of integral curves $C$ in a family of surfaces $f:Y\longrightarrow B$ where $B$ is a smooth curve.
In the first part of this article, we give bounds on self-intersections $C^2$ of integral curves $C$ on blow-ups $Bl_nX$ of surfaces $X$ with the anti-cannonical divisor $-K_X$ effective. In the last part, we prove the weak bounded negativity for self-intersections $C^2$ of integral curves $C$ in a family of surfaces $f:Y\longrightarrow B$ where $B$ is a smooth curve.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
Learning Instance-Specific Parameters of Black-Box Models Using Differentiable Surrogates
Authors:
Arnisha Khondaker,
Nilanjan Ray
Abstract:
Tuning parameters of a non-differentiable or black-box compute is challenging. Existing methods rely mostly on random sampling or grid sampling from the parameter space. Further, with all the current methods, it is not possible to supply any input specific parameters to the black-box. To the best of our knowledge, for the first time, we are able to learn input-specific parameters for a black box i…
▽ More
Tuning parameters of a non-differentiable or black-box compute is challenging. Existing methods rely mostly on random sampling or grid sampling from the parameter space. Further, with all the current methods, it is not possible to supply any input specific parameters to the black-box. To the best of our knowledge, for the first time, we are able to learn input-specific parameters for a black box in this work. As a test application, we choose a popular image denoising method BM3D as our black-box compute. Then, we use a differentiable surrogate model (a neural network) to approximate the black-box behaviour. Next, another neural network is used in an end-to-end fashion to learn input instance-specific parameters for the black-box. Motivated by prior advances in surrogate-based optimization, we applied our method to the Smartphone Image Denoising Dataset (SIDD) and the Color Berkeley Segmentation Dataset (CBSD68) for image denoising. The results are compelling, demonstrating a significant increase in PSNR and a notable improvement in SSIM nearing 0.93. Experimental results underscore the effectiveness of our approach in achieving substantial improvements in both model performance and optimization efficiency. For code and implementation details, please refer to our GitHub repository: https://github.com/arnisha-k/instance-specific-param
△ Less
Submitted 26 November, 2024; v1 submitted 23 July, 2024;
originally announced July 2024.
-
Learning Diffeomorphism for Image Registration with Time-Continuous Networks using Semigroup Regularization
Authors:
Mohammadjavad Matinkia,
Nilanjan Ray
Abstract:
Diffeomorphic image registration (DIR) is a fundamental task in 3D medical image analysis that seeks topology-preserving deformations between image pairs. To ensure diffeomorphism, a common approach is to model the deformation field as the flow map solution of a differential equation, which is solved using efficient schemes such as scaling and squaring along with multiple smoothness regularization…
▽ More
Diffeomorphic image registration (DIR) is a fundamental task in 3D medical image analysis that seeks topology-preserving deformations between image pairs. To ensure diffeomorphism, a common approach is to model the deformation field as the flow map solution of a differential equation, which is solved using efficient schemes such as scaling and squaring along with multiple smoothness regularization terms. In this paper, we propose a novel learning-based approach for diffeomorphic 3D image registration that models diffeomorphisms in a continuous-time framework using only a single regularization term, without requiring additional integration. We exploit the semigroup property-a fundamental characteristic of flow maps-as the sole form of regularization, ensuring temporally continuous diffeomorphic flows between image pairs. Leveraging this property, we prove that our formulation directly learns the flow map solution of an ODE, ensuring continuous inverse and cycle consistencies without explicit enforcement, while eliminating additional integration schemes and regularization terms. To achieve time-continuous diffeomorphisms, we employ time-embedded UNets, an architecture commonly used in diffusion models. Our results demonstrate that modeling diffeomorphism continuously in time improves registration performance. Experimental results on four public datasets demonstrate the superiority of our model over state-of-the-art diffeomorphic methods. Additionally, comparison to several recent non-diffeomorphic deformable image registration methods shows that our method achieves competitive Dice scores while significantly improving topology preservation.
△ Less
Submitted 16 March, 2025; v1 submitted 28 May, 2024;
originally announced May 2024.
-
On Stability of Syzygy Bundles
Authors:
Snehajit Misra,
Nabanita Ray
Abstract:
In this article, we investigate the stability of syzygy bundles corresponding to ample and globally generated vector bundles on smooth irreducible projective surfaces.
In this article, we investigate the stability of syzygy bundles corresponding to ample and globally generated vector bundles on smooth irreducible projective surfaces.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Disentangling Hippocampal Shape Variations: A Study of Neurological Disorders Using Mesh Variational Autoencoder with Contrastive Learning
Authors:
Jakaria Rabbi,
Johannes Kiechle,
Christian Beaulieu,
Nilanjan Ray,
Dana Cobzas
Abstract:
This paper presents a comprehensive study focused on disentangling hippocampal shape variations from diffusion tensor imaging (DTI) datasets within the context of neurological disorders. Leveraging a Mesh Variational Autoencoder (VAE) enhanced with Supervised Contrastive Learning, our approach aims to improve interpretability by disentangling two distinct latent variables corresponding to age and…
▽ More
This paper presents a comprehensive study focused on disentangling hippocampal shape variations from diffusion tensor imaging (DTI) datasets within the context of neurological disorders. Leveraging a Mesh Variational Autoencoder (VAE) enhanced with Supervised Contrastive Learning, our approach aims to improve interpretability by disentangling two distinct latent variables corresponding to age and the presence of diseases. In our ablation study, we investigate a range of VAE architectures and contrastive loss functions, showcasing the enhanced disentanglement capabilities of our approach. This evaluation uses synthetic 3D torus mesh data and real 3D hippocampal mesh datasets derived from the DTI hippocampal dataset. Our supervised disentanglement model outperforms several state-of-the-art (SOTA) methods like attribute and guided VAEs in terms of disentanglement scores. Our model distinguishes between age groups and disease status in patients with Multiple Sclerosis (MS) using the hippocampus data. Our Mesh VAE with Supervised Contrastive Learning shows the volume changes of the hippocampus of MS populations at different ages, and the result is consistent with the current neuroimaging literature. This research provides valuable insights into the relationship between neurological disorder and hippocampal shape changes in different age groups of MS populations using a Mesh VAE with Supervised Contrastive loss. Our code is available at https://github.com/Jakaria08/Explaining_Shape_Variability
△ Less
Submitted 9 November, 2024; v1 submitted 31 March, 2024;
originally announced April 2024.
-
Well-posedness of an evaporation model for a spherical droplet exposed to an air flow
Authors:
Eberhard Bänsch,
Martin Doß,
Carsten Gräser,
Nadja Ray
Abstract:
In this paper, we address the well-posedness of an evaporation model for a spherical liquid droplet taking into account the convective impact of an air flow in the ambient gas phase. From a mathematical perspective, we are dealing with a coupled ODE-PDE system for the droplet radius, the temperature distribution, and the vapor concentration. The nonlinear coupling arises from the evaporation rate…
▽ More
In this paper, we address the well-posedness of an evaporation model for a spherical liquid droplet taking into account the convective impact of an air flow in the ambient gas phase. From a mathematical perspective, we are dealing with a coupled ODE-PDE system for the droplet radius, the temperature distribution, and the vapor concentration. The nonlinear coupling arises from the evaporation rate modeled by the Hertz-Knudsen equation. Under physically meaningful assumptions, we prove existence and uniqueness of a weak solution until the droplet has evaporated completely. Numerical simulations are performed to illustrate how different air flows affect the evaporation process.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
ShadowSense: Unsupervised Domain Adaptation and Feature Fusion for Shadow-Agnostic Tree Crown Detection from RGB-Thermal Drone Imagery
Authors:
Rudraksh Kapil,
Seyed Mojtaba Marvasti-Zadeh,
Nadir Erbilgin,
Nilanjan Ray
Abstract:
Accurate detection of individual tree crowns from remote sensing data poses a significant challenge due to the dense nature of forest canopy and the presence of diverse environmental variations, e.g., overlapping canopies, occlusions, and varying lighting conditions. Additionally, the lack of data for training robust models adds another limitation in effectively studying complex forest conditions.…
▽ More
Accurate detection of individual tree crowns from remote sensing data poses a significant challenge due to the dense nature of forest canopy and the presence of diverse environmental variations, e.g., overlapping canopies, occlusions, and varying lighting conditions. Additionally, the lack of data for training robust models adds another limitation in effectively studying complex forest conditions. This paper presents a novel method for detecting shadowed tree crowns and provides a challenging dataset comprising roughly 50k paired RGB-thermal images to facilitate future research for illumination-invariant detection. The proposed method (ShadowSense) is entirely self-supervised, leveraging domain adversarial training without source domain annotations for feature extraction and foreground feature alignment for feature pyramid networks to adapt domain-invariant representations by focusing on visible foreground regions, respectively. It then fuses complementary information of both modalities to effectively improve upon the predictions of an RGB-trained detector and boost the overall accuracy. Extensive experiments demonstrate the superiority of the proposed method over both the baseline RGB-trained detector and state-of-the-art techniques that rely on unsupervised domain adaptation or early image fusion. Our code and data are available: https://github.com/rudrakshkapil/ShadowSense
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
A discrete-time dynamical model of prey and stage-structured predator with juvenile hunting incorporating negative effects of prey refuge
Authors:
Debasish Bhattacharjee,
Nabajit Ray,
Dipam Das,
Hemanta Kumar Sarmah
Abstract:
This paper examines a discrete predator-prey model that incorporates prey refuge and its detrimental impact on the growth of the prey population. Age structure is taken into account for predator species. Furthermore, juvenile hunting as well as prey counter-attack are also considered. This paper provides a comprehensive analysis of the existence and stability conditions pertaining to all possible…
▽ More
This paper examines a discrete predator-prey model that incorporates prey refuge and its detrimental impact on the growth of the prey population. Age structure is taken into account for predator species. Furthermore, juvenile hunting as well as prey counter-attack are also considered. This paper provides a comprehensive analysis of the existence and stability conditions pertaining to all possible fixed points. The analytical and numerical investigation into the occurrence of different bifurcations, such as the Neimark-Sacker bifurcation and period-doubling bifurcation, in relation to various parameters is discussed. The impact of the parameters reflecting prey growth and prey refuge is thoroughly addressed. Numerous numerical simulations are presented in order to validate the theoretical findings.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
Predicting Ki67, ER, PR, and HER2 Statuses from H&E-stained Breast Cancer Images
Authors:
Amir Akbarnejad,
Nilanjan Ray,
Penny J. Barnes,
Gilbert Bigras
Abstract:
Despite the advances in machine learning and digital pathology, it is not yet clear if machine learning methods can accurately predict molecular information merely from histomorphology. In a quest to answer this question, we built a large-scale dataset (185538 images) with reliable measurements for Ki67, ER, PR, and HER2 statuses. The dataset is composed of mirrored images of H\&E and correspondin…
▽ More
Despite the advances in machine learning and digital pathology, it is not yet clear if machine learning methods can accurately predict molecular information merely from histomorphology. In a quest to answer this question, we built a large-scale dataset (185538 images) with reliable measurements for Ki67, ER, PR, and HER2 statuses. The dataset is composed of mirrored images of H\&E and corresponding images of immunohistochemistry (IHC) assays (Ki67, ER, PR, and HER2. These images are mirrored through registration. To increase reliability, individual pairs were inspected and discarded if artifacts were present (tissue folding, bubbles, etc). Measurements for Ki67, ER and PR were determined by calculating H-Score from image analysis. HER2 measurement is based on binary classification: 0 and 1+ (IHC scores representing a negative subset) vs 3+ (IHC score positive subset). Cases with IHC equivocal score (2+) were excluded. We show that a standard ViT-based pipeline can achieve prediction performances around 90% in terms of Area Under the Curve (AUC) when trained with a proper labeling protocol. Finally, we shed light on the ability of the trained classifiers to localize relevant regions, which encourages future work to improve the localizations. Our proposed dataset is publicly available: https://ihc4bc.github.io/
△ Less
Submitted 3 August, 2023;
originally announced August 2023.
-
Training-based Model Refinement and Representation Disagreement for Semi-Supervised Object Detection
Authors:
Seyed Mojtaba Marvasti-Zadeh,
Nilanjan Ray,
Nadir Erbilgin
Abstract:
Semi-supervised object detection (SSOD) aims to improve the performance and generalization of existing object detectors by utilizing limited labeled data and extensive unlabeled data. Despite many advances, recent SSOD methods are still challenged by inadequate model refinement using the classical exponential moving average (EMA) strategy, the consensus of Teacher-Student models in the latter stag…
▽ More
Semi-supervised object detection (SSOD) aims to improve the performance and generalization of existing object detectors by utilizing limited labeled data and extensive unlabeled data. Despite many advances, recent SSOD methods are still challenged by inadequate model refinement using the classical exponential moving average (EMA) strategy, the consensus of Teacher-Student models in the latter stages of training (i.e., losing their distinctiveness), and noisy/misleading pseudo-labels. This paper proposes a novel training-based model refinement (TMR) stage and a simple yet effective representation disagreement (RD) strategy to address the limitations of classical EMA and the consensus problem. The TMR stage of Teacher-Student models optimizes the lightweight scaling operation to refine the model's weights and prevent overfitting or forgetting learned patterns from unlabeled data. Meanwhile, the RD strategy helps keep these models diverged to encourage the student model to explore additional patterns in unlabeled data. Our approach can be integrated into established SSOD methods and is empirically validated using two baseline methods, with and without cascade regression, to generate more reliable pseudo-labels. Extensive experiments demonstrate the superior performance of our approach over state-of-the-art SSOD methods. Specifically, the proposed approach outperforms the baseline Unbiased-Teacher-v2 (& Unbiased-Teacher-v1) method by an average mAP margin of 2.23, 2.1, and 3.36 (& 2.07, 1.9, and 3.27) on COCO-standard, COCO-additional, and Pascal VOC datasets, respectively.
△ Less
Submitted 26 October, 2023; v1 submitted 25 July, 2023;
originally announced July 2023.
-
Document Image Cleaning using Budget-Aware Black-Box Approximation
Authors:
Ganesh Tata,
Katyani Singh,
Eric Van Oeveren,
Nilanjan Ray
Abstract:
Recent work has shown that by approximating the behaviour of a non-differentiable black-box function using a neural network, the black-box can be integrated into a differentiable training pipeline for end-to-end training. This methodology is termed "differentiable bypass,'' and a successful application of this method involves training a document preprocessor to improve the performance of a black-b…
▽ More
Recent work has shown that by approximating the behaviour of a non-differentiable black-box function using a neural network, the black-box can be integrated into a differentiable training pipeline for end-to-end training. This methodology is termed "differentiable bypass,'' and a successful application of this method involves training a document preprocessor to improve the performance of a black-box OCR engine. However, a good approximation of an OCR engine requires querying it for all samples throughout the training process, which can be computationally and financially expensive. Several zeroth-order optimization (ZO) algorithms have been proposed in black-box attack literature to find adversarial examples for a black-box model by computing its gradient in a query-efficient manner. However, the query complexity and convergence rate of such algorithms makes them infeasible for our problem. In this work, we propose two sample selection algorithms to train an OCR preprocessor with less than 10% of the original system's OCR engine queries, resulting in more than 60% reduction of the total training time without significant loss of accuracy. We also show an improvement of 4% in the word-level accuracy of a commercial OCR engine with only 2.5% of the total queries and a 32x reduction in monetary cost. Further, we propose a simple ranking technique to prune 30% of the document images from the training dataset without affecting the system's performance.
△ Less
Submitted 22 June, 2023;
originally announced June 2023.
-
Towards Early Prediction of Human iPSC Reprogramming Success
Authors:
Abhineet Singh,
Ila Jasra,
Omar Mouhammed,
Nidheesh Dadheech,
Nilanjan Ray,
James Shapiro
Abstract:
This paper presents advancements in automated early-stage prediction of the success of reprogramming human induced pluripotent stem cells (iPSCs) as a potential source for regenerative cell therapies.The minuscule success rate of iPSC-reprogramming of around $ 0.01% $ to $ 0.1% $ makes it labor-intensive, time-consuming, and exorbitantly expensive to generate a stable iPSC line. Since that require…
▽ More
This paper presents advancements in automated early-stage prediction of the success of reprogramming human induced pluripotent stem cells (iPSCs) as a potential source for regenerative cell therapies.The minuscule success rate of iPSC-reprogramming of around $ 0.01% $ to $ 0.1% $ makes it labor-intensive, time-consuming, and exorbitantly expensive to generate a stable iPSC line. Since that requires culturing of millions of cells and intense biological scrutiny of multiple clones to identify a single optimal clone. The ability to reliably predict which cells are likely to establish as an optimal iPSC line at an early stage of pluripotency would therefore be ground-breaking in rendering this a practical and cost-effective approach to personalized medicine. Temporal information about changes in cellular appearance over time is crucial for predicting its future growth outcomes. In order to generate this data, we first performed continuous time-lapse imaging of iPSCs in culture using an ultra-high resolution microscope. We then annotated the locations and identities of cells in late-stage images where reliable manual identification is possible. Next, we propagated these labels backwards in time using a semi-automated tracking system to obtain labels for early stages of growth. Finally, we used this data to train deep neural networks to perform automatic cell segmentation and classification. Our code and data are available at https://github.com/abhineet123/ipsc_prediction.
△ Less
Submitted 11 November, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Positivity and base loci for vector bundles revisited
Authors:
Mihai Fulger,
Nabanita Ray
Abstract:
We give equivalent descriptions for the augmented and diminished base loci of vector bundles in characteristic zero. We show that these base loci behave well under pullback, tensor product, and direct sum. Pathological behavior is observed on some nonsplit exact sequences.
We give equivalent descriptions for the augmented and diminished base loci of vector bundles in characteristic zero. We show that these base loci behave well under pullback, tensor product, and direct sum. Pathological behavior is observed on some nonsplit exact sequences.
△ Less
Submitted 23 March, 2023;
originally announced March 2023.
-
Weakly Supervised Realtime Dynamic Background Subtraction
Authors:
Fateme Bahri,
Nilanjan Ray
Abstract:
Background subtraction is a fundamental task in computer vision with numerous real-world applications, ranging from object tracking to video surveillance. Dynamic backgrounds poses a significant challenge here. Supervised deep learning-based techniques are currently considered state-of-the-art for this task. However, these methods require pixel-wise ground-truth labels, which can be time-consuming…
▽ More
Background subtraction is a fundamental task in computer vision with numerous real-world applications, ranging from object tracking to video surveillance. Dynamic backgrounds poses a significant challenge here. Supervised deep learning-based techniques are currently considered state-of-the-art for this task. However, these methods require pixel-wise ground-truth labels, which can be time-consuming and expensive. In this work, we propose a weakly supervised framework that can perform background subtraction without requiring per-pixel ground-truth labels. Our framework is trained on a moving object-free sequence of images and comprises two networks. The first network is an autoencoder that generates background images and prepares dynamic background images for training the second network. The dynamic background images are obtained by thresholding the background-subtracted images. The second network is a U-Net that uses the same object-free video for training and the dynamic background images as pixel-wise ground-truth labels. During the test phase, the input images are processed by the autoencoder and U-Net, which generate background and dynamic background images, respectively. The dynamic background image helps remove dynamic motion from the background-subtracted image, enabling us to obtain a foreground image that is free of dynamic artifacts. To demonstrate the effectiveness of our method, we conducted experiments on selected categories of the CDnet 2014 dataset and the I2R dataset. Our method outperformed all top-ranked unsupervised methods. We also achieved better results than one of the two existing weakly supervised methods, and our performance was similar to the other. Our proposed method is online, real-time, efficient, and requires minimal frame-level annotation, making it suitable for a wide range of real-world applications.
△ Less
Submitted 5 March, 2023;
originally announced March 2023.
-
Reinforcement Learning for Block Decomposition of CAD Models
Authors:
Benjamin C. DiPrete,
Rao V. Garimella,
Cristina Garcia Cardona,
Navamita Ray
Abstract:
We present a novel AI-assisted method for decomposing (segmenting) planar CAD (computer-aided design) models into well shaped rectangular blocks as a proof-of-principle of a general decomposition method applicable to complex 2D and 3D CAD models. The decomposed blocks are required for generating good quality meshes (tilings of quadrilaterals or hexahedra) suitable for numerical simulations of phys…
▽ More
We present a novel AI-assisted method for decomposing (segmenting) planar CAD (computer-aided design) models into well shaped rectangular blocks as a proof-of-principle of a general decomposition method applicable to complex 2D and 3D CAD models. The decomposed blocks are required for generating good quality meshes (tilings of quadrilaterals or hexahedra) suitable for numerical simulations of physical systems governed by conservation laws. The problem of hexahedral mesh generation of general CAD models has vexed researchers for over 3 decades and analysts often spend more than 50% of the design-analysis cycle time decomposing complex models into simpler parts meshable by existing techniques. Our method uses reinforcement learning to train an agent to perform a series of optimal cuts on the CAD model that result in a good quality block decomposition. We show that the agent quickly learns an effective strategy for picking the location and direction of the cuts and maximizing its rewards as opposed to making random cuts. This paper is the first successful demonstration of an agent autonomously learning how to perform this block decomposition task effectively thereby holding the promise of a viable method to automate this challenging process.
△ Less
Submitted 21 February, 2023;
originally announced February 2023.
-
Parameter estimation for cellular automata
Authors:
Alexey Kazarnikov,
Nadja Ray,
Heikki Haario,
Joona Lappalainen,
Andreas Rupp
Abstract:
Self-organizing complex systems can be modeled using cellular automaton models. However, the parametrization of these models is crucial and significantly determines the resulting structural pattern. In this research, we introduce and successfully apply a sound statistical method to estimate these parameters. The decisive difference to earlier applications of such approaches is that, in our case, b…
▽ More
Self-organizing complex systems can be modeled using cellular automaton models. However, the parametrization of these models is crucial and significantly determines the resulting structural pattern. In this research, we introduce and successfully apply a sound statistical method to estimate these parameters. The decisive difference to earlier applications of such approaches is that, in our case, both the CA rules and the resulting patterns are discrete. The method is based on constructing Gaussian likelihoods using characteristics of the structures, such as the mean particle size. We show that our approach is robust for the method parameters, domain size of patterns, or CA iterations.
△ Less
Submitted 11 January, 2025; v1 submitted 30 January, 2023;
originally announced January 2023.
-
Crown-CAM: Interpretable Visual Explanations for Tree Crown Detection in Aerial Images
Authors:
Seyed Mojtaba Marvasti-Zadeh,
Devin Goodsman,
Nilanjan Ray,
Nadir Erbilgin
Abstract:
Visual explanation of ``black-box'' models allows researchers in explainable artificial intelligence (XAI) to interpret the model's decisions in a human-understandable manner. In this paper, we propose interpretable class activation mapping for tree crown detection (Crown-CAM) that overcomes inaccurate localization & computational complexity of previous methods while generating reliable visual exp…
▽ More
Visual explanation of ``black-box'' models allows researchers in explainable artificial intelligence (XAI) to interpret the model's decisions in a human-understandable manner. In this paper, we propose interpretable class activation mapping for tree crown detection (Crown-CAM) that overcomes inaccurate localization & computational complexity of previous methods while generating reliable visual explanations for the challenging and dynamic problem of tree crown detection in aerial images. It consists of an unsupervised selection of activation maps, computation of local score maps, and non-contextual background suppression to efficiently provide fine-grain localization of tree crowns in scenarios with dense forest trees or scenes without tree crowns. Additionally, two Intersection over Union (IoU)-based metrics are introduced to effectively quantify both the accuracy and inaccuracy of generated explanations with respect to regions with or even without tree crowns in the image. Empirical evaluations demonstrate that the proposed Crown-CAM outperforms the Score-CAM, Augmented Score-CAM, and Eigen-CAM methods by an average IoU margin of 8.7, 5.3, and 21.7 (and 3.3, 9.8, and 16.5) respectively in improving the accuracy (and decreasing inaccuracy) of visual explanations on the challenging NEON tree crown dataset.
△ Less
Submitted 26 April, 2023; v1 submitted 23 November, 2022;
originally announced November 2022.
-
Early Detection of Bark Beetle Attack Using Remote Sensing and Machine Learning: A Review
Authors:
Seyed Mojtaba Marvasti-Zadeh,
Devin Goodsman,
Nilanjan Ray,
Nadir Erbilgin
Abstract:
This paper provides a comprehensive review of past and current advances in the early detection of bark beetle-induced tree mortality from three primary perspectives: bark beetle & host interactions, RS, and ML/DL. In contrast to prior efforts, this review encompasses all RS systems and emphasizes ML/DL methods to investigate their strengths and weaknesses. We parse existing literature based on mul…
▽ More
This paper provides a comprehensive review of past and current advances in the early detection of bark beetle-induced tree mortality from three primary perspectives: bark beetle & host interactions, RS, and ML/DL. In contrast to prior efforts, this review encompasses all RS systems and emphasizes ML/DL methods to investigate their strengths and weaknesses. We parse existing literature based on multi- or hyper-spectral analyses and distill their knowledge based on: bark beetle species & attack phases with a primary emphasis on early stages of attacks, host trees, study regions, RS platforms & sensors, spectral/spatial/temporal resolutions, spectral signatures, spectral vegetation indices (SVIs), ML approaches, learning schemes, task categories, models, algorithms, classes/clusters, features, and DL networks & architectures. Although DL-based methods and the random forest (RF) algorithm showed promising results, highlighting their potential to detect subtle changes across visible, thermal, and short-wave infrared (SWIR) spectral regions, they still have limited effectiveness and high uncertainties. To inspire novel solutions to these shortcomings, we delve into the principal challenges & opportunities from different perspectives, enabling a deeper understanding of the current state of research and guiding future research directions.
△ Less
Submitted 24 November, 2023; v1 submitted 7 October, 2022;
originally announced October 2022.
-
Distance-dependent chase-escape on trees
Authors:
Sarai Hernandez-Torres,
Matthew Junge,
Naina Ray,
Nidhi Ray
Abstract:
We give a necessary and sufficient condition for species coexistence in a parasite-host growth process on infinite $d$-ary trees. The novelty of this work is that the spreading and death rates for hosts depend on the distance to the nearest parasite.
We give a necessary and sufficient condition for species coexistence in a parasite-host growth process on infinite $d$-ary trees. The novelty of this work is that the spreading and death rates for hosts depend on the distance to the nearest parasite.
△ Less
Submitted 15 November, 2022; v1 submitted 20 September, 2022;
originally announced September 2022.
-
Unsupervised diffeomorphic cardiac image registration using parameterization of the deformation field
Authors:
Ameneh Sheikhjafari,
Deepa Krishnaswamy,
Michelle Noga,
Nilanjan Ray,
Kumaradevan Punithakumar
Abstract:
This study proposes an end-to-end unsupervised diffeomorphic deformable registration framework based on moving mesh parameterization. Using this parameterization, a deformation field can be modeled with its transformation Jacobian determinant and curl of end velocity field. The new model of the deformation field has three important advantages; firstly, it relaxes the need for an explicit regulariz…
▽ More
This study proposes an end-to-end unsupervised diffeomorphic deformable registration framework based on moving mesh parameterization. Using this parameterization, a deformation field can be modeled with its transformation Jacobian determinant and curl of end velocity field. The new model of the deformation field has three important advantages; firstly, it relaxes the need for an explicit regularization term and the corresponding weight in the cost function. The smoothness is implicitly embedded in the solution which results in a physically plausible deformation field. Secondly, it guarantees diffeomorphism through explicit constraints applied to the transformation Jacobian determinant to keep it positive. Finally, it is suitable for cardiac data processing, since the nature of this parameterization is to define the deformation field in terms of the radial and rotational components. The effectiveness of the algorithm is investigated by evaluating the proposed method on three different data sets including 2D and 3D cardiac MRI scans. The results demonstrate that the proposed framework outperforms existing learning-based and non-learning-based methods while generating diffeomorphic transformations.
△ Less
Submitted 28 August, 2022;
originally announced August 2022.
-
Tiny-HR: Towards an interpretable machine learning pipeline for heart rate estimation on edge devices
Authors:
Preetam Anbukarasu,
Shailesh Nanisetty,
Ganesh Tata,
Nilanjan Ray
Abstract:
The focus of this paper is a proof of concept, machine learning (ML) pipeline that extracts heart rate from pressure sensor data acquired on low-power edge devices. The ML pipeline consists an upsampler neural network, a signal quality classifier, and a 1D-convolutional neural network optimized for efficient and accurate heart rate estimation. The models were designed so the pipeline was less than…
▽ More
The focus of this paper is a proof of concept, machine learning (ML) pipeline that extracts heart rate from pressure sensor data acquired on low-power edge devices. The ML pipeline consists an upsampler neural network, a signal quality classifier, and a 1D-convolutional neural network optimized for efficient and accurate heart rate estimation. The models were designed so the pipeline was less than 40 kB. Further, a hybrid pipeline consisting of the upsampler and classifier, followed by a peak detection algorithm was developed. The pipelines were deployed on ESP32 edge device and benchmarked against signal processing to determine the energy usage, and inference times. The results indicate that the proposed ML and hybrid pipeline reduces energy and time per inference by 82% and 28% compared to traditional algorithms. The main trade-off for ML pipeline was accuracy, with a mean absolute error (MAE) of 3.28, compared to 2.39 and 1.17 for the hybrid and signal processing pipelines. The ML models thus show promise for deployment in energy and computationally constrained devices. Further, the lower sampling rate and computational requirements for the ML pipeline could enable custom hardware solutions to reduce the cost and energy needs of wearable devices.
△ Less
Submitted 16 August, 2022;
originally announced August 2022.
-
Estimating relative diffusion from 3D micro-CT images using CNNs
Authors:
Stephan Gärttner,
Florian Frank,
Fabian Woller,
Andreas Meier,
Nadja Ray
Abstract:
In the past several years, convolutional neural networks (CNNs) have proven their capability to predict characteristic quantities in porous media research directly from pore-space geometries. Due to the frequently observed significant reduction in computation time in comparison to classical computational methods, bulk parameter prediction via CNNs is especially compelling, e.g. for effective diffu…
▽ More
In the past several years, convolutional neural networks (CNNs) have proven their capability to predict characteristic quantities in porous media research directly from pore-space geometries. Due to the frequently observed significant reduction in computation time in comparison to classical computational methods, bulk parameter prediction via CNNs is especially compelling, e.g. for effective diffusion. While the current literature is mainly focused on fully saturated porous media, the partially saturated case is also of high interest. Due to the qualitatively different and more complex geometries of the domain available for diffusive transport present in this case, standard CNNs tend to lose robustness and accuracy with lower saturation rates. In this paper, we demonstrate the ability of CNNs to perform predictions of relative diffusion directly from full pore-space geometries. As such, our CNN conveniently fuses diffusion prediction and a well-established morphological model which describes phase distributions in partially saturated porous media.
△ Less
Submitted 4 August, 2022;
originally announced August 2022.
-
Classification of Bark Beetle-Induced Forest Tree Mortality using Deep Learning
Authors:
Rudraksh Kapil,
Seyed Mojtaba Marvasti-Zadeh,
Devin Goodsman,
Nilanjan Ray,
Nadir Erbilgin
Abstract:
Bark beetle outbreaks can dramatically impact forest ecosystems and services around the world. For the development of effective forest policies and management plans, the early detection of infested trees is essential. Despite the visual symptoms of bark beetle infestation, this task remains challenging, considering overlapping tree crowns and non-homogeneity in crown foliage discolouration. In thi…
▽ More
Bark beetle outbreaks can dramatically impact forest ecosystems and services around the world. For the development of effective forest policies and management plans, the early detection of infested trees is essential. Despite the visual symptoms of bark beetle infestation, this task remains challenging, considering overlapping tree crowns and non-homogeneity in crown foliage discolouration. In this work, a deep learning based method is proposed to effectively classify different stages of bark beetle attacks at the individual tree level. The proposed method uses RetinaNet architecture (exploiting a robust feature extraction backbone pre-trained for tree crown detection) to train a shallow subnetwork for classifying the different attack stages of images captured by unmanned aerial vehicles (UAVs). Moreover, various data augmentation strategies are examined to address the class imbalance problem, and consequently, the affine transformation is selected to be the most effective one for this purpose. Experimental evaluations demonstrate the effectiveness of the proposed method by achieving an average accuracy of 98.95%, considerably outperforming the baseline method by approximately 10%.
△ Less
Submitted 21 August, 2022; v1 submitted 14 July, 2022;
originally announced July 2022.
-
Learning-based Monocular 3D Reconstruction of Birds: A Contemporary Survey
Authors:
Seyed Mojtaba Marvasti-Zadeh,
Mohammad N. S. Jahromi,
Javad Khaghani,
Devin Goodsman,
Nilanjan Ray,
Nadir Erbilgin
Abstract:
In nature, the collective behavior of animals, such as flying birds is dominated by the interactions between individuals of the same species. However, the study of such behavior among the bird species is a complex process that humans cannot perform using conventional visual observational techniques such as focal sampling in nature. For social animals such as birds, the mechanism of group formation…
▽ More
In nature, the collective behavior of animals, such as flying birds is dominated by the interactions between individuals of the same species. However, the study of such behavior among the bird species is a complex process that humans cannot perform using conventional visual observational techniques such as focal sampling in nature. For social animals such as birds, the mechanism of group formation can help ecologists understand the relationship between social cues and their visual characteristics over time (e.g., pose and shape). But, recovering the varying pose and shapes of flying birds is a highly challenging problem. A widely-adopted solution to tackle this bottleneck is to extract the pose and shape information from 2D image to 3D correspondence. Recent advances in 3D vision have led to a number of impressive works on the 3D shape and pose estimation, each with different pros and cons. To the best of our knowledge, this work is the first attempt to provide an overview of recent advances in 3D bird reconstruction based on monocular vision, give both computer vision and biology researchers an overview of existing approaches, and compare their characteristics.
△ Less
Submitted 28 July, 2022; v1 submitted 10 July, 2022;
originally announced July 2022.
-
Slope Semistability and Positive cones of Grassmann bundles
Authors:
Snehajit Misra,
Nabanita Ray
Abstract:
Let $E$ be a vector bundle of rank $r$ on a smooth complex projective variety $X$. In this article, we compute the nef and pseudoeffective cones of divisors in the Grassmann bundle $Gr_X(k,E)$ parametrizing $k$-dimensional subspaces of the fibers of $E$, where $1\leq k \leq rank(E)$, under assumptions on $X$ as well as on the vector bundle $E$. In particular, we show that nef cone and the pseudoef…
▽ More
Let $E$ be a vector bundle of rank $r$ on a smooth complex projective variety $X$. In this article, we compute the nef and pseudoeffective cones of divisors in the Grassmann bundle $Gr_X(k,E)$ parametrizing $k$-dimensional subspaces of the fibers of $E$, where $1\leq k \leq rank(E)$, under assumptions on $X$ as well as on the vector bundle $E$. In particular, we show that nef cone and the pseudoeffective cone of $Gr_X(k,E)$ coincide if and only if $E$ is a slope semistable bundle on $X$ with $c_2(End(E))=0$. We also discuss about the nefness and ampleness of the universal quotient bundle $Q_k$ on $Gr_X(k,E)$.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
Unified Modeling of Multi-Domain Multi-Device ASR Systems
Authors:
Soumyajit Mitra,
Swayambhu Nath Ray,
Bharat Padi,
Arunasish Sen,
Raghavendra Bilgi,
Harish Arsikere,
Shalini Ghosh,
Ajay Srinivasamurthy,
Sri Garimella
Abstract:
Modern Automatic Speech Recognition (ASR) systems often use a portfolio of domain-specific models in order to get high accuracy for distinct user utterance types across different devices. In this paper, we propose an innovative approach that integrates the different per-domain per-device models into a unified model, using a combination of domain embedding, domain experts, mixture of experts and ad…
▽ More
Modern Automatic Speech Recognition (ASR) systems often use a portfolio of domain-specific models in order to get high accuracy for distinct user utterance types across different devices. In this paper, we propose an innovative approach that integrates the different per-domain per-device models into a unified model, using a combination of domain embedding, domain experts, mixture of experts and adversarial training. We run careful ablation studies to show the benefit of each of these innovations in contributing to the accuracy of the overall unified model. Experiments show that our proposed unified modeling approach actually outperforms the carefully tuned per-domain models, giving relative gains of up to 10% over a baseline model with negligible increase in the number of parameters.
△ Less
Submitted 13 October, 2022; v1 submitted 13 May, 2022;
originally announced May 2022.
-
Seshadri constants of parabolic vector bundles
Authors:
Indranil Biswas,
Krishna Hanumanthu,
Snehajit Misra,
Nabanita Ray
Abstract:
Let $X$ be a complex projective variety, and let $E_{\ast}$ be a parabolic vector bundle on $X$. We introduce the notion of \textit{parabolic Seshadri constants} of $E_{\ast}$. It is shown that these constants are analogous to the classical Seshadri constants of vector bundles, in particular, they have parallel definitions and properties. We prove a Seshadri criterion for parabolic ampleness of…
▽ More
Let $X$ be a complex projective variety, and let $E_{\ast}$ be a parabolic vector bundle on $X$. We introduce the notion of \textit{parabolic Seshadri constants} of $E_{\ast}$. It is shown that these constants are analogous to the classical Seshadri constants of vector bundles, in particular, they have parallel definitions and properties. We prove a Seshadri criterion for parabolic ampleness of $E_{\ast}$ in terms of parabolic Seshadri constants. We also compute parabolic Seshadri constants for symmetric powers and tensor products of parabolic vector bundles.
△ Less
Submitted 7 June, 2023; v1 submitted 9 March, 2022;
originally announced March 2022.
-
Dynamic Background Subtraction by Generative Neural Networks
Authors:
Fateme Bahri,
Nilanjan Ray
Abstract:
Background subtraction is a significant task in computer vision and an essential step for many real world applications. One of the challenges for background subtraction methods is dynamic background, which constitute stochastic movements in some parts of the background. In this paper, we have proposed a new background subtraction method, called DBSGen, which uses two generative neural networks, on…
▽ More
Background subtraction is a significant task in computer vision and an essential step for many real world applications. One of the challenges for background subtraction methods is dynamic background, which constitute stochastic movements in some parts of the background. In this paper, we have proposed a new background subtraction method, called DBSGen, which uses two generative neural networks, one for dynamic motion removal and another for background generation. At the end, the foreground moving objects are obtained by a pixel-wise distance threshold based on a dynamic entropy map. The proposed method has a unified framework that can be optimized in an end-to-end and unsupervised fashion. The performance of the method is evaluated over dynamic background sequences and it outperforms most of state-of-the-art methods. Our code is publicly available at https://github.com/FatemeBahri/DBSGen.
△ Less
Submitted 10 February, 2022;
originally announced February 2022.
-
Towards Positive Jacobian: Learn to Postprocess Diffeomorphic Image Registration with Matrix Exponential
Authors:
Soumyadeep Pal,
Matthew Tennant,
Nilanjan Ray
Abstract:
We present a postprocessing layer for deformable image registration to make a registration field more diffeomorphic by encouraging Jacobians of the transformation to be positive. Diffeomorphic image registration is important for medical imaging studies because of the properties like invertibility, smoothness of the transformation, and topology preservation/non-folding of the grid. Violation of the…
▽ More
We present a postprocessing layer for deformable image registration to make a registration field more diffeomorphic by encouraging Jacobians of the transformation to be positive. Diffeomorphic image registration is important for medical imaging studies because of the properties like invertibility, smoothness of the transformation, and topology preservation/non-folding of the grid. Violation of these properties can lead to destruction of the neighbourhood and the connectivity of anatomical structures during image registration. Most of the recent deep learning methods do not explicitly address this folding problem and try to solve it with a smoothness regularization on the registration field. In this paper, we propose a differentiable layer, which takes any registration field as its input, computes exponential of the Jacobian matrices of the input and reconstructs a new registration field from the exponentiated Jacobian matrices using Poisson reconstruction. Our proposed Poisson reconstruction loss enforces positive Jacobians for the final registration field. Thus, our method acts as a post-processing layer without any learnable parameters of its own and can be placed at the end of any deep learning pipeline to form an end-to-end learnable framework. We show the effectiveness of our proposed method for a popular deep learning registration method Voxelmorph and evaluate it with a dataset containing 3D brain MRI scans. Our results show that our post-processing can effectively decrease the number of non-positive Jacobians by a significant amount without any noticeable deterioration of the registration accuracy, thus making the registration field more diffeomorphic. Our code is available online at https://github.com/Soumyadeep-Pal/Diffeomorphic-Image-Registration-Postprocess.
△ Less
Submitted 1 February, 2022;
originally announced February 2022.
-
A training-free recursive multiresolution framework for diffeomorphic deformable image registration
Authors:
Ameneh Sheikhjafari,
Michelle Noga,
Kumaradevan Punithakumar,
Nilanjan Ray
Abstract:
Diffeomorphic deformable image registration is one of the crucial tasks in medical image analysis, which aims to find a unique transformation while preserving the topology and invertibility of the transformation. Deep convolutional neural networks (CNNs) have yielded well-suited approaches for image registration by learning the transformation priors from a large dataset. The improvement in the per…
▽ More
Diffeomorphic deformable image registration is one of the crucial tasks in medical image analysis, which aims to find a unique transformation while preserving the topology and invertibility of the transformation. Deep convolutional neural networks (CNNs) have yielded well-suited approaches for image registration by learning the transformation priors from a large dataset. The improvement in the performance of these methods is related to their ability to learn information from several sample medical images that are difficult to obtain and bias the framework to the specific domain of data. In this paper, we propose a novel diffeomorphic training-free approach; this is built upon the principle of an ordinary differential equation.
Our formulation yields an Euler integration type recursive scheme to estimate the changes of spatial transformations between the fixed and the moving image pyramids at different resolutions. The proposed architecture is simple in design. The moving image is warped successively at each resolution and finally aligned to the fixed image; this procedure is recursive in a way that at each resolution, a fully convolutional network (FCN) models a progressive change of deformation for the current warped image. The entire system is end-to-end and optimized for each pair of images from scratch. In comparison to learning-based methods, the proposed method neither requires a dedicated training set nor suffers from any training bias. We evaluate our method on three cardiac image datasets. The evaluation results demonstrate that the proposed method achieves state-of-the-art registration accuracy while maintaining desirable diffeomorphic properties.
△ Less
Submitted 1 February, 2022;
originally announced February 2022.
-
Local existence of strong solutions to micro-macro models for reactive transport in evolving porous media
Authors:
Stephan Gärttner,
Peter Knabner,
Nadja Ray
Abstract:
Two-scale models pose a promising approach in simulating reactive flow and transport in evolving porous media. Classically, homogenized flow and transport equations are solved on the macroscopic scale, while effective parameters are obtained from auxiliary cell problems on possibly evolving reference geometries (micro-scale). Despite their perspective success in rendering lab/field-scale simulatio…
▽ More
Two-scale models pose a promising approach in simulating reactive flow and transport in evolving porous media. Classically, homogenized flow and transport equations are solved on the macroscopic scale, while effective parameters are obtained from auxiliary cell problems on possibly evolving reference geometries (micro-scale). Despite their perspective success in rendering lab/field-scale simulations computationally feasible, analytic results regarding the arising two-scale bilaterally coupled system often restrict to simplified models. In this paper, we first derive smooth-dependence results concerning the partial coupling from the underlying geometry to macroscopic quantities. Therefore, alterations of the representative fluid domain are described by smooth paths of diffeomorphisms. Exploiting the gained regularity of the effective space- and time-dependent macroscopic coefficients, we present local-in-time existence results for strong solutions to the partially coupled micro-macro system using fixed-point arguments. What is more, we extend our results to the bilaterally coupled diffusive transport model including a level-set description of the evolving geometry.
△ Less
Submitted 31 January, 2022;
originally announced January 2022.
-
GPEX, A Framework For Interpreting Artificial Neural Networks
Authors:
Amir Akbarnejad,
Gilbert Bigras,
Nilanjan Ray
Abstract:
The analogy between Gaussian processes (GPs) and deep artificial neural networks (ANNs) has received a lot of interest, and has shown promise to unbox the blackbox of deep ANNs. Existing theoretical works put strict assumptions on the ANN (e.g. requiring all intermediate layers to be wide, or using specific activation functions). Accommodating those theoretical assumptions is hard in recent deep a…
▽ More
The analogy between Gaussian processes (GPs) and deep artificial neural networks (ANNs) has received a lot of interest, and has shown promise to unbox the blackbox of deep ANNs. Existing theoretical works put strict assumptions on the ANN (e.g. requiring all intermediate layers to be wide, or using specific activation functions). Accommodating those theoretical assumptions is hard in recent deep architectures, and those theoretical conditions need refinement as new deep architectures emerge. In this paper we derive an evidence lower-bound that encourages the GP's posterior to match the ANN's output without any requirement on the ANN. Using our method we find out that on 5 datasets, only a subset of those theoretical assumptions are sufficient. Indeed, in our experiments we used a normal ResNet-18 or feed-forward backbone with a single wide layer in the end. One limitation of training GPs is the lack of scalability with respect to the number of inducing points. We use novel computational techniques that allow us to train GPs with hundreds of thousands of inducing points and with GPU acceleration. As shown in our experiments, doing so has been essential to get a close match between the GPs and the ANNs on 5 datasets. We implement our method as a publicly available tool called GPEX: https://github.com/amirakbarnejad/gpex. On 5 datasets (4 image datasets, and 1 biological dataset) and ANNs with 2 types of functionality (classifier or attention-mechanism) we were able to find GPs whose outputs closely match those of the corresponding ANNs. After matching the GPs to the ANNs, we used the GPs' kernel functions to explain the ANNs' decisions. We provide more than 200 explanations (around 30 explanations in the paper and the rest in the supplementary) which are highly interpretable by humans and show the ability of the obtained GPs to unbox the ANNs' decisions.
△ Less
Submitted 10 January, 2024; v1 submitted 17 December, 2021;
originally announced December 2021.
-
Fire Together Wire Together: A Dynamic Pruning Approach with Self-Supervised Mask Prediction
Authors:
Sara Elkerdawy,
Mostafa Elhoushi,
Hong Zhang,
Nilanjan Ray
Abstract:
Dynamic model pruning is a recent direction that allows for the inference of a different sub-network for each input sample during deployment. However, current dynamic methods rely on learning a continuous channel gating through regularization by inducing sparsity loss. This formulation introduces complexity in balancing different losses (e.g task loss, regularization loss). In addition, regulariza…
▽ More
Dynamic model pruning is a recent direction that allows for the inference of a different sub-network for each input sample during deployment. However, current dynamic methods rely on learning a continuous channel gating through regularization by inducing sparsity loss. This formulation introduces complexity in balancing different losses (e.g task loss, regularization loss). In addition, regularization based methods lack transparent tradeoff hyperparameter selection to realize a computational budget. Our contribution is two-fold: 1) decoupled task and pruning losses. 2) Simple hyperparameter selection that enables FLOPs reduction estimation before training. Inspired by the Hebbian theory in Neuroscience: "neurons that fire together wire together", we propose to predict a mask to process k filters in a layer based on the activation of its previous layer. We pose the problem as a self-supervised binary classification problem. Each mask predictor module is trained to predict if the log-likelihood for each filter in the current layer belongs to the top-k activated filters. The value k is dynamically estimated for each input based on a novel criterion using the mass of heatmaps. We show experiments on several neural architectures, such as VGG, ResNet and MobileNet on CIFAR and ImageNet datasets. On CIFAR, we reach similar accuracy to SOTA methods with 15% and 24% higher FLOPs reduction. Similarly in ImageNet, we achieve lower drop in accuracy with up to 13% improvement in FLOPs reduction.
△ Less
Submitted 28 June, 2022; v1 submitted 15 October, 2021;
originally announced October 2021.
-
Estimating permeability of 3D micro-CT images by physics-informed CNNs based on DNS
Authors:
Stephan Gärttner,
Faruk O. Alpak,
Andreas Meier,
Nadja Ray,
Florian Frank
Abstract:
In recent years, convolutional neural networks (CNNs) have experienced an increasing interest in their ability to perform a fast approximation of effective hydrodynamic parameters in porous media research and applications. This paper presents a novel methodology for permeability prediction from micro-CT scans of geological rock samples. The training data set for CNNs dedicated to permeability pred…
▽ More
In recent years, convolutional neural networks (CNNs) have experienced an increasing interest in their ability to perform a fast approximation of effective hydrodynamic parameters in porous media research and applications. This paper presents a novel methodology for permeability prediction from micro-CT scans of geological rock samples. The training data set for CNNs dedicated to permeability prediction consists of permeability labels that are typically generated by classical lattice Boltzmann methods (LBM) that simulate the flow through the pore space of the segmented image data. We instead perform direct numerical simulation (DNS) by solving the stationary Stokes equation in an efficient and distributed-parallel manner. As such, we circumvent the convergence issues of LBM that frequently are observed on complex pore geometries, and therefore, improve the generality and accuracy of our training data set. Using the DNS-computed permeabilities, a physics-informed CNN PhyCNN) is trained by additionally providing a tailored characteristic quantity of the pore space. More precisely, by exploiting the connection to flow problems on a graph representation of the pore space, additional information about confined structures is provided to the network in terms of the maximum flow value, which is the key innovative component of our workflow. The robustness of this approach is reflected by very high prediction accuracy, which is observed for a variety of sandstone samples from archetypal rock formations.
△ Less
Submitted 13 April, 2022; v1 submitted 4 September, 2021;
originally announced September 2021.
-
Timestamping Documents and Beliefs
Authors:
Swayambhu Nath Ray
Abstract:
Most of the textual information available to us are temporally variable. In a world where information is dynamic, time-stamping them is a very important task. Documents are a good source of information and are used for many tasks like, sentiment analysis, classification of reviews etc. The knowledge of creation date of documents facilitates several tasks like summarization, event extraction, tempo…
▽ More
Most of the textual information available to us are temporally variable. In a world where information is dynamic, time-stamping them is a very important task. Documents are a good source of information and are used for many tasks like, sentiment analysis, classification of reviews etc. The knowledge of creation date of documents facilitates several tasks like summarization, event extraction, temporally focused information extraction etc. Unfortunately, for most of the documents on the web, the time-stamp meta-data is either erroneous or missing. Thus document dating is a challenging problem which requires inference over the temporal structure of the document alongside the contextual information of the document. Prior document dating systems have largely relied on handcrafted features while ignoring such document-internal structures. In this paper we propose NeuralDater, a Graph Convolutional Network (GCN) based document dating approach which jointly exploits syntactic and temporal graph structures of document in a principled way. We also pointed out some limitations of NeuralDater and tried to utilize both context and temporal information in documents in a more flexible and intuitive manner proposing AD3: Attentive Deep Document Dater, an attention-based document dating system. To the best of our knowledge these are the first application of deep learning methods for the task. Through extensive experiments on real-world datasets, we find that our models significantly outperforms state-of-the-art baselines by a significant margin.
△ Less
Submitted 8 June, 2021;
originally announced June 2021.
-
Improving RNN-T ASR Performance with Date-Time and Location Awareness
Authors:
Swayambhu Nath Ray,
Soumyajit Mitra,
Raghavendra Bilgi,
Sri Garimella
Abstract:
In this paper, we explore the benefits of incorporating context into a Recurrent Neural Network (RNN-T) based Automatic Speech Recognition (ASR) model to improve the speech recognition for virtual assistants. Specifically, we use meta information extracted from the time at which the utterance is spoken and the approximate location information to make ASR context aware. We show that these contextua…
▽ More
In this paper, we explore the benefits of incorporating context into a Recurrent Neural Network (RNN-T) based Automatic Speech Recognition (ASR) model to improve the speech recognition for virtual assistants. Specifically, we use meta information extracted from the time at which the utterance is spoken and the approximate location information to make ASR context aware. We show that these contextual information, when used individually, improves overall performance by as much as 3.48% relative to the baseline and when the contexts are combined, the model learns complementary features and the recognition improves by 4.62%. On specific domains, these contextual signals show improvements as high as 11.5%, without any significant degradation on others. We ran experiments with models trained on data of sizes 30K hours and 10K hours. We show that the scale of improvement with the 10K hours dataset is much higher than the one obtained with 30K hours dataset. Our results indicate that with limited data to train the ASR model, contextual signals can improve the performance significantly.
△ Less
Submitted 16 June, 2021; v1 submitted 11 June, 2021;
originally announced June 2021.
-
Novel Deep Learning Architecture for Heart Disease Prediction using Convolutional Neural Network
Authors:
Shadab Hussain,
Santosh Kumar Nanda,
Susmith Barigidad,
Shadab Akhtar,
Md Suaib,
Niranjan K. Ray
Abstract:
Healthcare is one of the most important aspects of human life. Heart disease is known to be one of the deadliest diseases which is hampering the lives of many people around the world. Heart disease must be detected early so the loss of lives can be prevented. The availability of large-scale data for medical diagnosis has helped developed complex machine learning and deep learning-based models for…
▽ More
Healthcare is one of the most important aspects of human life. Heart disease is known to be one of the deadliest diseases which is hampering the lives of many people around the world. Heart disease must be detected early so the loss of lives can be prevented. The availability of large-scale data for medical diagnosis has helped developed complex machine learning and deep learning-based models for automated early diagnosis of heart diseases. The classical approaches have been limited in terms of not generalizing well to new data which have not been seen in the training set. This is indicated by a large gap in training and test accuracies. This paper proposes a novel deep learning architecture using a 1D convolutional neural network for classification between healthy and non-healthy persons to overcome the limitations of classical approaches. Various clinical parameters are used for assessing the risk profile in the patients which helps in early diagnosis. Various techniques are used to avoid overfitting in the proposed network. The proposed network achieves over 97% training accuracy and 96% test accuracy on the dataset. The accuracy of the model is compared in detail with other classification algorithms using various performance parameters which proves the effectiveness of the proposed architecture.
△ Less
Submitted 26 December, 2021; v1 submitted 22 May, 2021;
originally announced May 2021.
-
Unknown-box Approximation to Improve Optical Character Recognition Performance
Authors:
Ayantha Randika,
Nilanjan Ray,
Xiao Xiao,
Allegra Latimer
Abstract:
Optical character recognition (OCR) is a widely used pattern recognition application in numerous domains. There are several feature-rich, general-purpose OCR solutions available for consumers, which can provide moderate to excellent accuracy levels. However, accuracy can diminish with difficult and uncommon document domains. Preprocessing of document images can be used to minimize the effect of do…
▽ More
Optical character recognition (OCR) is a widely used pattern recognition application in numerous domains. There are several feature-rich, general-purpose OCR solutions available for consumers, which can provide moderate to excellent accuracy levels. However, accuracy can diminish with difficult and uncommon document domains. Preprocessing of document images can be used to minimize the effect of domain shift. In this paper, a novel approach is presented for creating a customized preprocessor for a given OCR engine. Unlike the previous OCR agnostic preprocessing techniques, the proposed approach approximates the gradient of a particular OCR engine to train a preprocessor module. Experiments with two datasets and two OCR engines show that the presented preprocessor is able to improve the accuracy of the OCR up to 46% from the baseline by applying pixel-level manipulations to the document image. The implementation of the proposed method and the enhanced public datasets are available for download.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
Listen with Intent: Improving Speech Recognition with Audio-to-Intent Front-End
Authors:
Swayambhu Nath Ray,
Minhua Wu,
Anirudh Raju,
Pegah Ghahremani,
Raghavendra Bilgi,
Milind Rao,
Harish Arsikere,
Ariya Rastrow,
Andreas Stolcke,
Jasha Droppo
Abstract:
Comprehending the overall intent of an utterance helps a listener recognize the individual words spoken. Inspired by this fact, we perform a novel study of the impact of explicitly incorporating intent representations as additional information to improve a recurrent neural network-transducer (RNN-T) based automatic speech recognition (ASR) system. An audio-to-intent (A2I) model encodes the intent…
▽ More
Comprehending the overall intent of an utterance helps a listener recognize the individual words spoken. Inspired by this fact, we perform a novel study of the impact of explicitly incorporating intent representations as additional information to improve a recurrent neural network-transducer (RNN-T) based automatic speech recognition (ASR) system. An audio-to-intent (A2I) model encodes the intent of the utterance in the form of embeddings or posteriors, and these are used as auxiliary inputs for RNN-T training and inference. Experimenting with a 50k-hour far-field English speech corpus, this study shows that when running the system in non-streaming mode, where intent representation is extracted from the entire utterance and then used to bias streaming RNN-T search from the start, it provides a 5.56% relative word error rate reduction (WERR). On the other hand, a streaming system using per-frame intent posteriors as extra inputs for the RNN-T ASR system yields a 3.33% relative WERR. A further detailed analysis of the streaming system indicates that our proposed method brings especially good gain on media-playing related intents (e.g. 9.12% relative WERR on PlayMusicIntent).
△ Less
Submitted 16 June, 2021; v1 submitted 14 May, 2021;
originally announced May 2021.
-
Foldover-free maps in 50 lines of code
Authors:
Vladimir Garanzha,
Igor Kaporin,
Liudmila Kudryavtseva,
François Protais,
Nicolas Ray,
Dmitry Sokolov
Abstract:
Mapping a triangulated surface to 2D space (or a tetrahedral mesh to 3D space) is the most fundamental problem in geometry processing.In computational physics, untangling plays an important role in mesh generation: it takes a mesh as an input, and moves the vertices to get rid of foldovers.In fact, mesh untangling can be considered as a special case of mapping where the geometry of the object is t…
▽ More
Mapping a triangulated surface to 2D space (or a tetrahedral mesh to 3D space) is the most fundamental problem in geometry processing.In computational physics, untangling plays an important role in mesh generation: it takes a mesh as an input, and moves the vertices to get rid of foldovers.In fact, mesh untangling can be considered as a special case of mapping where the geometry of the object is to be defined in the map space and the geometric domain is not explicit, supposing that each element is regular.In this paper, we propose a mapping method inspired by the untangling problem and compare its performance to the state of the art.The main advantage of our method is that the untangling aims at producing locally injective maps, which is the major challenge of mapping.In practice, our method produces locally injective maps in very difficult settings, and with less distortion than the previous work, both in 2D and 3D. We demonstrate it on a large reference database as well as on more difficult stress tests.For a better reproducibility, we publish the code in Python for a basic evaluation, and in C++ for more advanced applications.
△ Less
Submitted 5 February, 2021;
originally announced February 2021.
-
PMLB v1.0: An open source dataset collection for benchmarking machine learning methods
Authors:
Joseph D. Romano,
Trang T. Le,
William La Cava,
John T. Gregg,
Daniel J. Goldberg,
Natasha L. Ray,
Praneel Chakraborty,
Daniel Himmelstein,
Weixuan Fu,
Jason H. Moore
Abstract:
Motivation: Novel machine learning and statistical modeling studies rely on standardized comparisons to existing methods using well-studied benchmark datasets. Few tools exist that provide rapid access to many of these datasets through a standardized, user-friendly interface that integrates well with popular data science workflows.
Results: This release of PMLB provides the largest collection of…
▽ More
Motivation: Novel machine learning and statistical modeling studies rely on standardized comparisons to existing methods using well-studied benchmark datasets. Few tools exist that provide rapid access to many of these datasets through a standardized, user-friendly interface that integrates well with popular data science workflows.
Results: This release of PMLB provides the largest collection of diverse, public benchmark datasets for evaluating new machine learning and data science methods aggregated in one location. v1.0 introduces a number of critical improvements developed following discussions with the open-source community.
Availability: PMLB is available at https://github.com/EpistasisLab/pmlb. Python and R interfaces for PMLB can be installed through the Python Package Index and Comprehensive R Archive Network, respectively.
△ Less
Submitted 6 April, 2021; v1 submitted 30 November, 2020;
originally announced December 2020.
-
Stability and semi-stability of (2,2)-type surfaces
Authors:
A. J. Parameswaran,
Nabanita Ray
Abstract:
We describe the GIT compactification of the moduli of (2,2)-type effective divisors of $\mathbb{P}^1\times\mathbb{P}^2$ (i.e., surfaces of the linear system $\vert π_1^*\mathcal{O}_{\mathbb{P}^1}(2)\otimes π_2^*\mathcal{O}_{\mathbb{P}^2}(2)\vert$ ) which are generically Del Pezzo surfaces of degree two. In order to get the compactification, we characterize stable and semi-stable (2,2)-type surface…
▽ More
We describe the GIT compactification of the moduli of (2,2)-type effective divisors of $\mathbb{P}^1\times\mathbb{P}^2$ (i.e., surfaces of the linear system $\vert π_1^*\mathcal{O}_{\mathbb{P}^1}(2)\otimes π_2^*\mathcal{O}_{\mathbb{P}^2}(2)\vert$ ) which are generically Del Pezzo surfaces of degree two. In order to get the compactification, we characterize stable and semi-stable (2,2)-type surfaces, and also determine the equivalence classes of strictly semi-stable (2,2)-type surfaces. Moreover, we describe the boundary of the moduli of (2,2)-type surfaces.
△ Less
Submitted 30 March, 2023; v1 submitted 14 September, 2020;
originally announced September 2020.
-
Locating Cephalometric X-Ray Landmarks with Foveated Pyramid Attention
Authors:
Logan Gilmour,
Nilanjan Ray
Abstract:
CNNs, initially inspired by human vision, differ in a key way: they sample uniformly, rather than with highest density in a focal point. For very large images, this makes training untenable, as the memory and computation required for activation maps scales quadratically with the side length of an image. We propose an image pyramid based approach that extracts narrow glimpses of the of the input im…
▽ More
CNNs, initially inspired by human vision, differ in a key way: they sample uniformly, rather than with highest density in a focal point. For very large images, this makes training untenable, as the memory and computation required for activation maps scales quadratically with the side length of an image. We propose an image pyramid based approach that extracts narrow glimpses of the of the input image and iteratively refines them to accomplish regression tasks. To assist with high-accuracy regression, we introduce a novel intermediate representation we call 'spatialized features'. Our approach scales logarithmically with the side length, so it works with very large images. We apply our method to Cephalometric X-ray Landmark Detection and get state-of-the-art results.
△ Less
Submitted 10 August, 2020;
originally announced August 2020.
-
Ordinary Differential Equation and Complex Matrix Exponential for Multi-resolution Image Registration
Authors:
Abhishek Nan,
Matthew Tennant,
Uriel Rubin,
Nilanjan Ray
Abstract:
Autograd-based software packages have recently renewed interest in image registration using homography and other geometric models by gradient descent and optimization, e.g., AirLab and DRMIME. In this work, we emphasize on using complex matrix exponential (CME) over real matrix exponential to compute transformation matrices. CME is theoretically more suitable and practically provides faster conver…
▽ More
Autograd-based software packages have recently renewed interest in image registration using homography and other geometric models by gradient descent and optimization, e.g., AirLab and DRMIME. In this work, we emphasize on using complex matrix exponential (CME) over real matrix exponential to compute transformation matrices. CME is theoretically more suitable and practically provides faster convergence as our experiments show. Further, we demonstrate that the use of an ordinary differential equation (ODE) as an optimizable dynamical system can adapt the transformation matrix more accurately to the multi-resolution Gaussian pyramid for image registration. Our experiments include four publicly available benchmark datasets, two of them 2D and the other two being 3D. Experiments demonstrate that our proposed method yields significantly better registration compared to a number of off-the-shelf, popular, state-of-the-art image registration toolboxes.
△ Less
Submitted 27 July, 2020;
originally announced July 2020.
-
To Filter Prune, or to Layer Prune, That Is The Question
Authors:
Sara Elkerdawy,
Mostafa Elhoushi,
Abhineet Singh,
Hong Zhang,
Nilanjan Ray
Abstract:
Recent advances in pruning of neural networks have made it possible to remove a large number of filters or weights without any perceptible drop in accuracy. The number of parameters and that of FLOPs are usually the reported metrics to measure the quality of the pruned models. However, the gain in speed for these pruned models is often overlooked in the literature due to the complex nature of late…
▽ More
Recent advances in pruning of neural networks have made it possible to remove a large number of filters or weights without any perceptible drop in accuracy. The number of parameters and that of FLOPs are usually the reported metrics to measure the quality of the pruned models. However, the gain in speed for these pruned models is often overlooked in the literature due to the complex nature of latency measurements. In this paper, we show the limitation of filter pruning methods in terms of latency reduction and propose LayerPrune framework. LayerPrune presents a set of layer pruning methods based on different criteria that achieve higher latency reduction than filter pruning methods on similar accuracy. The advantage of layer pruning over filter pruning in terms of latency reduction is a result of the fact that the former is not constrained by the original model's depth and thus allows for a larger range of latency reduction. For each filter pruning method we examined, we use the same filter importance criterion to calculate a per-layer importance score in one-shot. We then prune the least important layers and fine-tune the shallower model which obtains comparable or better accuracy than its filter-based pruning counterpart. This one-shot process allows to remove layers from single path networks like VGG before fine-tuning, unlike in iterative filter pruning, a minimum number of filters per layer is required to allow for data flow which constraint the search space. To the best of our knowledge, we are the first to examine the effect of pruning methods on latency metric instead of FLOPs for multiple networks, datasets and hardware targets. LayerPrune also outperforms handcrafted architectures such as Shufflenet, MobileNet, MNASNet and ResNet18 by 7.3%, 4.6%, 2.8% and 0.5% respectively on similar latency budget on ImageNet dataset.
△ Less
Submitted 8 November, 2020; v1 submitted 10 July, 2020;
originally announced July 2020.
-
Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network
Authors:
Jakaria Rabbi,
Nilanjan Ray,
Matthias Schubert,
Subir Chowdhury,
Dennis Chao
Abstract:
The detection performance of small objects in remote sensing images is not satisfactory compared to large objects, especially in low-resolution and noisy images. A generative adversarial network (GAN)-based model called enhanced super-resolution GAN (ESRGAN) shows remarkable image enhancement performance, but reconstructed images miss high-frequency edge information. Therefore, object detection pe…
▽ More
The detection performance of small objects in remote sensing images is not satisfactory compared to large objects, especially in low-resolution and noisy images. A generative adversarial network (GAN)-based model called enhanced super-resolution GAN (ESRGAN) shows remarkable image enhancement performance, but reconstructed images miss high-frequency edge information. Therefore, object detection performance degrades for small objects on recovered noisy and low-resolution remote sensing images. Inspired by the success of edge enhanced GAN (EEGAN) and ESRGAN, we apply a new edge-enhanced super-resolution GAN (EESRGAN) to improve the image quality of remote sensing images and use different detector networks in an end-to-end manner where detector loss is backpropagated into the EESRGAN to improve the detection performance. We propose an architecture with three components: ESRGAN, Edge Enhancement Network (EEN), and Detection network. We use residual-in-residual dense blocks (RRDB) for both the ESRGAN and EEN, and for the detector network, we use the faster region-based convolutional network (FRCNN) (two-stage detector) and single-shot multi-box detector (SSD) (one stage detector). Extensive experiments on a public (car overhead with context) and a self-assembled (oil and gas storage tank) satellite dataset show superior performance of our method compared to the standalone state-of-the-art object detectors.
△ Less
Submitted 28 April, 2020; v1 submitted 19 March, 2020;
originally announced March 2020.
-
DRMIME: Differentiable Mutual Information and Matrix Exponential for Multi-Resolution Image Registration
Authors:
Abhishek Nan,
Matthew Tennant,
Uriel Rubin,
Nilanjan Ray
Abstract:
In this work, we present a novel unsupervised image registration algorithm. It is differentiable end-to-end and can be used for both multi-modal and mono-modal registration. This is done using mutual information (MI) as a metric. The novelty here is that rather than using traditional ways of approximating MI, we use a neural estimator called MINE and supplement it with matrix exponential for trans…
▽ More
In this work, we present a novel unsupervised image registration algorithm. It is differentiable end-to-end and can be used for both multi-modal and mono-modal registration. This is done using mutual information (MI) as a metric. The novelty here is that rather than using traditional ways of approximating MI, we use a neural estimator called MINE and supplement it with matrix exponential for transformation matrix computation. This leads to improved results as compared to the standard algorithms available out-of-the-box in state-of-the-art image registration toolboxes.
△ Less
Submitted 27 January, 2020;
originally announced January 2020.
-
Animal Detection in Man-made Environments
Authors:
Abhineet Singh,
Marcin Pietrasik,
Gabriell Natha,
Nehla Ghouaiel,
Ken Brizel,
Nilanjan Ray
Abstract:
Automatic detection of animals that have strayed into human inhabited areas has important security and road safety applications. This paper attempts to solve this problem using deep learning techniques from a variety of computer vision fields including object detection, tracking, segmentation and edge detection. Several interesting insights into transfer learning are elicited while adapting models…
▽ More
Automatic detection of animals that have strayed into human inhabited areas has important security and road safety applications. This paper attempts to solve this problem using deep learning techniques from a variety of computer vision fields including object detection, tracking, segmentation and edge detection. Several interesting insights into transfer learning are elicited while adapting models trained on benchmark datasets for real world deployment. Empirical evidence is presented to demonstrate the inability of detectors to generalize from training images of animals in their natural habitats to deployment scenarios of man-made environments. A solution is also proposed using semi-automated synthetic data generation for domain specific training. Code and data used in the experiments are made available to facilitate further work in this domain.
△ Less
Submitted 14 January, 2020; v1 submitted 24 October, 2019;
originally announced October 2019.
-
Weyl and Zariski chambers on projective surfaces
Authors:
Krishna Hanumanthu,
Nabanita Ray
Abstract:
Let $X$ be a nonsingular complex projective surface. The Weyl and Zariski chambers give two interesting decompositions of the big cone of $X$. We study these two decompositions and determine when a Weyl chamber is contained in the interior of a Zariski chamber and vice versa. We also determine when a Weyl chamber can intersect non-trivially with a Zariski chamber.
Let $X$ be a nonsingular complex projective surface. The Weyl and Zariski chambers give two interesting decompositions of the big cone of $X$. We study these two decompositions and determine when a Weyl chamber is contained in the interior of a Zariski chamber and vice versa. We also determine when a Weyl chamber can intersect non-trivially with a Zariski chamber.
△ Less
Submitted 28 April, 2020; v1 submitted 11 October, 2019;
originally announced October 2019.