-
Unsupervised Multi-Clustering and Decision-Making Strategies for 4D-STEM Orientation Mapping
Authors:
Junhao Cao,
Nicolas Folastre,
Gozde Oney,
Edgar Rauch,
Stavros Nicolopoulos,
Partha Pratim Das,
Arnaud Demortière
Abstract:
This study presents a novel integration of unsupervised learning and decision-making strategies for the advanced analysis of 4D-STEM datasets, with a focus on non-negative matrix factorization (NMF) as the primary clustering method. Our approach introduces a systematic framework to determine the optimal number of components (k) required for robust and interpretable orientation mapping. By leveragi…
▽ More
This study presents a novel integration of unsupervised learning and decision-making strategies for the advanced analysis of 4D-STEM datasets, with a focus on non-negative matrix factorization (NMF) as the primary clustering method. Our approach introduces a systematic framework to determine the optimal number of components (k) required for robust and interpretable orientation mapping. By leveraging the K-Component Loss method and Image Quality Assessment (IQA) metrics, we effectively balance reconstruction fidelity and model complexity. Additionally, we highlight the critical role of dataset preprocessing in improving clustering stability and accuracy. Furthermore, our spatial weight matrix analysis provides insights into overlapping regions within the dataset by employing threshold-based visualization, facilitating a detailed understanding of cluster interactions. The results demonstrate the potential of combining NMF with advanced IQA metrics and preprocessing techniques for reliable orientation mapping and structural analysis in 4D-STEM datasets, paving the way for future applications in multi-dimensional material characterization.
△ Less
Submitted 9 March, 2025;
originally announced March 2025.
-
Diverse Image Generation with Diffusion Models and Cross Class Label Learning for Polyp Classification
Authors:
Vanshali Sharma,
Debesh Jha,
M. K. Bhuyan,
Pradip K. Das,
Ulas Bagci
Abstract:
Pathologic diagnosis is a critical phase in deciding the optimal treatment procedure for dealing with colorectal cancer (CRC). Colonic polyps, precursors to CRC, can pathologically be classified into two major types: adenomatous and hyperplastic. For precise classification and early diagnosis of such polyps, the medical procedure of colonoscopy has been widely adopted paired with various imaging t…
▽ More
Pathologic diagnosis is a critical phase in deciding the optimal treatment procedure for dealing with colorectal cancer (CRC). Colonic polyps, precursors to CRC, can pathologically be classified into two major types: adenomatous and hyperplastic. For precise classification and early diagnosis of such polyps, the medical procedure of colonoscopy has been widely adopted paired with various imaging techniques, including narrow band imaging and white light imaging. However, the existing classification techniques mainly rely on a single imaging modality and show limited performance due to data scarcity. Recently, generative artificial intelligence has been gaining prominence in overcoming such issues. Additionally, various generation-controlling mechanisms using text prompts and images have been introduced to obtain visually appealing and desired outcomes. However, such mechanisms require class labels to make the model respond efficiently to the provided control input. In the colonoscopy domain, such controlling mechanisms are rarely explored; specifically, the text prompt is a completely uninvestigated area. Moreover, the unavailability of expensive class-wise labels for diverse sets of images limits such explorations. Therefore, we develop a novel model, PathoPolyp-Diff, that generates text-controlled synthetic images with diverse characteristics in terms of pathology, imaging modalities, and quality. We introduce cross-class label learning to make the model learn features from other classes, reducing the burdensome task of data annotation. The experimental results report an improvement of up to 7.91% in balanced accuracy using a publicly available dataset. Moreover, cross-class label learning achieves a statistically significant improvement of up to 18.33% in balanced accuracy during video-level analysis. The code is available at https://github.com/Vanshali/PathoPolyp-Diff.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
Rationality of Learning Algorithms in Repeated Normal-Form Games
Authors:
Shivam Bajaj,
Pranoy Das,
Yevgeniy Vorobeychik,
Vijay Gupta
Abstract:
Many learning algorithms are known to converge to an equilibrium for specific classes of games if the same learning algorithm is adopted by all agents. However, when the agents are self-interested, a natural question is whether agents have a strong incentive to adopt an alternative learning algorithm that yields them greater individual utility. We capture such incentives as an algorithm's rational…
▽ More
Many learning algorithms are known to converge to an equilibrium for specific classes of games if the same learning algorithm is adopted by all agents. However, when the agents are self-interested, a natural question is whether agents have a strong incentive to adopt an alternative learning algorithm that yields them greater individual utility. We capture such incentives as an algorithm's rationality ratio, which is the ratio of the highest payoff an agent can obtain by deviating from a learning algorithm to its payoff from following it. We define a learning algorithm to be $c$-rational if its rationality ratio is at most $c$ irrespective of the game. We first establish that popular learning algorithms such as fictitious play and regret matching are not $c$-rational for any constant $c\geq 1$. We then propose and analyze two algorithms that are provably $1$-rational under mild assumptions, and have the same properties as (a generalized version of) fictitious play and regret matching, respectively, if all agents follow them. Finally, we show that if an assumption of perfect monitoring is not satisfied, there are games for which $c$-rational algorithms do not exist, and illustrate our results with numerical case studies.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Towards a geometric understanding of Spatio Temporal Graph Convolution Networks
Authors:
Pratyusha Das,
Sarath Shekkizhar,
Antonio Ortega
Abstract:
Spatiotemporal graph convolutional networks (STGCNs) have emerged as a desirable model for skeleton-based human action recognition. Despite achieving state-of-the-art performance, there is a limited understanding of the representations learned by these models, which hinders their application in critical and real-world settings. While layerwise analysis of CNN models has been studied in the literat…
▽ More
Spatiotemporal graph convolutional networks (STGCNs) have emerged as a desirable model for skeleton-based human action recognition. Despite achieving state-of-the-art performance, there is a limited understanding of the representations learned by these models, which hinders their application in critical and real-world settings. While layerwise analysis of CNN models has been studied in the literature, to the best of our knowledge, there exists no study on the layerwise explainability of the embeddings learned on spatiotemporal data using STGCNs. In this paper, we first propose to use a local Dataset Graph (DS-Graph) obtained from the feature representation of input data at each layer to develop an understanding of the layer-wise embedding geometry of the STGCN. To do so, we develop a window-based dynamic time warping (DTW) method to compute the distance between data sequences with varying temporal lengths. To validate our findings, we have developed a layer-specific Spatiotemporal Graph Gradient-weighted Class Activation Mapping (L-STG-GradCAM) technique tailored for spatiotemporal data. This approach enables us to visually analyze and interpret each layer within the STGCN network. We characterize the functions learned by each layer of the STGCN using the label smoothness of the representation and visualize them using our L-STG-GradCAM approach. Our proposed method is generic and can yield valuable insights for STGCN architectures in different applications. However, this paper focuses on the human activity recognition task as a representative application. Our experiments show that STGCN models learn representations that capture general human motion in their initial layers while discriminating different actions only in later layers. This justifies experimental observations showing that fine-tuning deeper layers works well for transfer between related tasks.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Validating polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 Challenges
Authors:
Debesh Jha,
Vanshali Sharma,
Debapriya Banik,
Debayan Bhattacharya,
Kaushiki Roy,
Steven A. Hicks,
Nikhil Kumar Tomar,
Vajira Thambawita,
Adrian Krenzer,
Ge-Peng Ji,
Sahadev Poudel,
George Batchkala,
Saruar Alam,
Awadelrahman M. A. Ahmed,
Quoc-Huy Trinh,
Zeshan Khan,
Tien-Phat Nguyen,
Shruti Shrestha,
Sabari Nathan,
Jeonghwan Gwak,
Ritika K. Jha,
Zheyuan Zhang,
Alexander Schlaefer,
Debotosh Bhattacharjee,
M. K. Bhuyan
, et al. (8 additional authors not shown)
Abstract:
Automatic analysis of colonoscopy images has been an active field of research motivated by the importance of early detection of precancerous polyps. However, detecting polyps during the live examination can be challenging due to various factors such as variation of skills and experience among the endoscopists, lack of attentiveness, and fatigue leading to a high polyp miss-rate. Deep learning has…
▽ More
Automatic analysis of colonoscopy images has been an active field of research motivated by the importance of early detection of precancerous polyps. However, detecting polyps during the live examination can be challenging due to various factors such as variation of skills and experience among the endoscopists, lack of attentiveness, and fatigue leading to a high polyp miss-rate. Deep learning has emerged as a promising solution to this challenge as it can assist endoscopists in detecting and classifying overlooked polyps and abnormalities in real time. In addition to the algorithm's accuracy, transparency and interpretability are crucial to explaining the whys and hows of the algorithm's prediction. Further, most algorithms are developed in private data, closed source, or proprietary software, and methods lack reproducibility. Therefore, to promote the development of efficient and transparent methods, we have organized the "Medico automatic polyp segmentation (Medico 2020)" and "MedAI: Transparency in Medical Image Segmentation (MedAI 2021)" competitions. We present a comprehensive summary and analyze each contribution, highlight the strength of the best-performing methods, and discuss the possibility of clinical translations of such methods into the clinic. For the transparency task, a multi-disciplinary team, including expert gastroenterologists, accessed each submission and evaluated the team based on open-source practices, failure case analysis, ablation studies, usability and understandability of evaluations to gain a deeper understanding of the models' credibility for clinical deployment. Through the comprehensive analysis of the challenge, we not only highlight the advancements in polyp and surgical instrument segmentation but also encourage qualitative evaluation for building more transparent and understandable AI-based colonoscopy systems.
△ Less
Submitted 6 May, 2024; v1 submitted 30 July, 2023;
originally announced July 2023.
-
GastroVision: A Multi-class Endoscopy Image Dataset for Computer Aided Gastrointestinal Disease Detection
Authors:
Debesh Jha,
Vanshali Sharma,
Neethi Dasu,
Nikhil Kumar Tomar,
Steven Hicks,
M. K. Bhuyan,
Pradip K. Das,
Michael A. Riegler,
Pål Halvorsen,
Ulas Bagci,
Thomas de Lange
Abstract:
Integrating real-time artificial intelligence (AI) systems in clinical practices faces challenges such as scalability and acceptance. These challenges include data availability, biased outcomes, data quality, lack of transparency, and underperformance on unseen datasets from different distributions. The scarcity of large-scale, precisely labeled, and diverse datasets are the major challenge for cl…
▽ More
Integrating real-time artificial intelligence (AI) systems in clinical practices faces challenges such as scalability and acceptance. These challenges include data availability, biased outcomes, data quality, lack of transparency, and underperformance on unseen datasets from different distributions. The scarcity of large-scale, precisely labeled, and diverse datasets are the major challenge for clinical integration. This scarcity is also due to the legal restrictions and extensive manual efforts required for accurate annotations from clinicians. To address these challenges, we present \textit{GastroVision}, a multi-center open-access gastrointestinal (GI) endoscopy dataset that includes different anatomical landmarks, pathological abnormalities, polyp removal cases and normal findings (a total of 27 classes) from the GI tract. The dataset comprises 8,000 images acquired from Bærum Hospital in Norway and Karolinska University Hospital in Sweden and was annotated and verified by experienced GI endoscopists. Furthermore, we validate the significance of our dataset with extensive benchmarking based on the popular deep learning based baseline models. We believe our dataset can facilitate the development of AI-based algorithms for GI disease detection and classification. Our dataset is available at \url{https://osf.io/84e7f/}.
△ Less
Submitted 17 August, 2023; v1 submitted 16 July, 2023;
originally announced July 2023.
-
Uniqueness of Iris Pattern Based on AR Model
Authors:
Katelyn M. Hampel,
Jinyu Zuo,
Priyanka Das,
Natalia A. Schmid,
Stephanie Schuckers,
Joseph Skufca,
Matthew C. Valenti
Abstract:
The assessment of iris uniqueness plays a crucial role in analyzing the capabilities and limitations of iris recognition systems. Among the various methodologies proposed, Daugman's approach to iris uniqueness stands out as one of the most widely accepted. According to Daugman, uniqueness refers to the iris recognition system's ability to enroll an increasing number of classes while maintaining a…
▽ More
The assessment of iris uniqueness plays a crucial role in analyzing the capabilities and limitations of iris recognition systems. Among the various methodologies proposed, Daugman's approach to iris uniqueness stands out as one of the most widely accepted. According to Daugman, uniqueness refers to the iris recognition system's ability to enroll an increasing number of classes while maintaining a near-zero probability of collision between new and enrolled classes. Daugman's approach involves creating distinct IrisCode templates for each iris class within the system and evaluating the sustainable population under a fixed Hamming distance between codewords. In our previous work [23], we utilized Rate-Distortion Theory (as it pertains to the limits of error-correction codes) to establish boundaries for the maximum possible population of iris classes supported by Daugman's IrisCode, given the constraint of a fixed Hamming distance between codewords. Building upon that research, we propose a novel methodology to evaluate the scalability of an iris recognition system, while also measuring iris quality. We achieve this by employing a sphere-packing bound for Gaussian codewords and adopting a approach similar to Daugman's, which utilizes relative entropy as a distance measure between iris classes. To demonstrate the efficacy of our methodology, we illustrate its application on two small datasets of iris images. We determine the sustainable maximum population for each dataset based on the quality of the images. By providing these illustrations, we aim to assist researchers in comprehending the limitations inherent in their recognition systems, depending on the quality of their iris databases.
△ Less
Submitted 21 June, 2023;
originally announced June 2023.
-
Can Adversarial Networks Make Uninformative Colonoscopy Video Frames Clinically Informative?
Authors:
Vanshali Sharma,
M. K. Bhuyan,
Pradip K. Das
Abstract:
Various artifacts, such as ghost colors, interlacing, and motion blur, hinder diagnosing colorectal cancer (CRC) from videos acquired during colonoscopy. The frames containing these artifacts are called uninformative frames and are present in large proportions in colonoscopy videos. To alleviate the impact of artifacts, we propose an adversarial network based framework to convert uninformative fra…
▽ More
Various artifacts, such as ghost colors, interlacing, and motion blur, hinder diagnosing colorectal cancer (CRC) from videos acquired during colonoscopy. The frames containing these artifacts are called uninformative frames and are present in large proportions in colonoscopy videos. To alleviate the impact of artifacts, we propose an adversarial network based framework to convert uninformative frames to clinically relevant frames. We examine the effectiveness of the proposed approach by evaluating the translated frames for polyp detection using YOLOv5. Preliminary results present improved detection performance along with elegant qualitative outcomes. We also examine the failure cases to determine the directions for future work.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
Empirical Assessment of End-to-End Iris Recognition System Capacity
Authors:
Priyanka Das,
Richard Plesh,
Veeru Talreja,
Natalia Schmid,
Matthew Valenti,
Joseph Skufca,
Stephanie Schuckers
Abstract:
Iris is an established modality in biometric recognition applications including consumer electronics, e-commerce, border security, forensics, and de-duplication of identity at a national scale. In light of the expanding usage of biometric recognition, identity clash (when templates from two different people match) is an imperative factor of consideration for a system's deployment. This study explo…
▽ More
Iris is an established modality in biometric recognition applications including consumer electronics, e-commerce, border security, forensics, and de-duplication of identity at a national scale. In light of the expanding usage of biometric recognition, identity clash (when templates from two different people match) is an imperative factor of consideration for a system's deployment. This study explores system capacity estimation by empirically estimating the constrained capacity of an end-to-end iris recognition system (NIR systems with Daugman-based feature extraction) operating at an acceptable error rate i.e. the number of subjects a system can resolve before encountering an error. We study the impact of six system parameters on an iris recognition system's constrained capacity -- number of enrolled identities, image quality, template dimension, random feature elimination, filter resolution, and system operating point. In our assessment, we analyzed 13.2 million comparisons from 5158 unique identities for each of 24 different system configurations. This work provides a framework to better understand iris recognition system capacity as a function of biometric system configurations beyond the operating point, for large-scale applications.
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
Longitudinal Performance of Iris Recognition in Children: Time Intervals up to Six years
Authors:
Priyanka Das,
Naveen G Venkataswamy,
Laura Holsopple,
Masudul H Imtiaz,
Michael Schuckers,
Stephanie Schuckers
Abstract:
The temporal stability of iris recognition performance is core to its success as a biometric modality. With the expanding horizon of applications for children, gaps in the knowledge base on the temporal stability of iris recognition performance in children have impacted decision-making during applications at the global scale. This report presents the most extensive analysis of longitudinal iris re…
▽ More
The temporal stability of iris recognition performance is core to its success as a biometric modality. With the expanding horizon of applications for children, gaps in the knowledge base on the temporal stability of iris recognition performance in children have impacted decision-making during applications at the global scale. This report presents the most extensive analysis of longitudinal iris recognition performance in children with data from the same 230 children over 6.5 years between enrollment and query for ages 4 to 17 years. Assessment of match scores, statistical modelling of variability factors impacting match scores and in-depth assessment of the root causes of the false rejections concludes no impact on iris recognition performance due to aging.
△ Less
Submitted 9 March, 2023;
originally announced March 2023.
-
Adaptive and Scalable Compression of Multispectral Images using VVC
Authors:
Philipp Seltsam,
Priyanka Das,
Mathias Wien
Abstract:
The VVC codec is applied to the task of multispectral image (MSI) compression using adaptive and scalable coding structures. In a 'plain' VVC approach, concepts from picture-to-picture temporal prediction are employed for decorrelation along the MSI's spectral dimension. The popular principle component analysis (PCA) for spectral decorrelation is further evaluated in combination with VVC intra-cod…
▽ More
The VVC codec is applied to the task of multispectral image (MSI) compression using adaptive and scalable coding structures. In a 'plain' VVC approach, concepts from picture-to-picture temporal prediction are employed for decorrelation along the MSI's spectral dimension. The popular principle component analysis (PCA) for spectral decorrelation is further evaluated in combination with VVC intra-coding for spatial decorrelation. This approach is referred to as PCA-VVC. A novel adaptive MSI compression algorithm, named HPCLS, is introduced, that uses PCA and inter-prediction for spectral and VVC intra-coding for spatial decorrelation. Further, a novel adaptive scalable approach is proposed, that provides a separately decodable spectrally scaled preview of the MSI in the compressed file. Information contained in the preview is exploited in order to reduce the overall file size. All schemes are evaluated on images from the ARAD HS data set containing outdoor scenes with a high variety in brightness and color. We found that 'Plain' VVC is outperformed by both PCA-VVC and HPCLS. HPCLS shows advantageous rate-distortion (RD) behavior compared to PCA-VVC for reconstruction quality above 51dB PSNR. The performance of the scalable approach is compared to the combination of an independent RGB preview and one of HPCLS or PCA-VVC. The scalable approach shows significant benefit especially at higher preview qualities.
△ Less
Submitted 10 January, 2023;
originally announced January 2023.
-
Collective Intelligent Strategy for Improved Segmentation of COVID-19 from CT
Authors:
Surochita Pal Das,
Sushmita Mitra,
B. Uma Shankar
Abstract:
The devastation caused by the coronavirus pandemic makes it imperative to design automated techniques for a fast and accurate detection. We propose a novel non-invasive tool, using deep learning and imaging, for delineating COVID-19 infection in lungs. The Ensembling Attention-based Multi-scaled Convolution network (EAMC), employing Leave-One-Patient-Out (LOPO) training, exhibits high sensitivity…
▽ More
The devastation caused by the coronavirus pandemic makes it imperative to design automated techniques for a fast and accurate detection. We propose a novel non-invasive tool, using deep learning and imaging, for delineating COVID-19 infection in lungs. The Ensembling Attention-based Multi-scaled Convolution network (EAMC), employing Leave-One-Patient-Out (LOPO) training, exhibits high sensitivity and precision in outlining infected regions along with assessment of severity. The Attention module combines contextual with local information, at multiple scales, for accurate segmentation. Ensemble learning integrates heterogeneity of decision through different base classifiers. The superiority of EAMC, even with severe class imbalance, is established through comparison with existing state-of-the-art learning models over four publicly-available COVID-19 datasets. The results are suggestive of the relevance of deep learning in providing assistive intelligence to medical practitioners, when they are overburdened with patients as in pandemics. Its clinical significance lies in its unprecedented scope in providing low-cost decision-making for patients lacking specialized healthcare at remote locations.
△ Less
Submitted 23 December, 2022;
originally announced December 2022.
-
Machine Learning for Smart and Energy-Efficient Buildings
Authors:
Hari Prasanna Das,
Yu-Wen Lin,
Utkarsha Agwan,
Lucas Spangher,
Alex Devonport,
Yu Yang,
Jan Drgona,
Adrian Chong,
Stefano Schiavon,
Costas J. Spanos
Abstract:
Energy consumption in buildings, both residential and commercial, accounts for approximately 40% of all energy usage in the U.S., and similar numbers are being reported from countries around the world. This significant amount of energy is used to maintain a comfortable, secure, and productive environment for the occupants. So, it is crucial that the energy consumption in buildings must be optimize…
▽ More
Energy consumption in buildings, both residential and commercial, accounts for approximately 40% of all energy usage in the U.S., and similar numbers are being reported from countries around the world. This significant amount of energy is used to maintain a comfortable, secure, and productive environment for the occupants. So, it is crucial that the energy consumption in buildings must be optimized, all the while maintaining satisfactory levels of occupant comfort, health, and safety. Recently, Machine Learning has been proven to be an invaluable tool in deriving important insights from data and optimizing various systems. In this work, we review the ways in which machine learning has been leveraged to make buildings smart and energy-efficient. For the convenience of readers, we provide a brief introduction of several machine learning paradigms and the components and functioning of each smart building system we cover. Finally, we discuss challenges faced while implementing machine learning algorithms in smart buildings and provide future avenues for research at the intersection of smart buildings and machine learning.
△ Less
Submitted 27 November, 2022;
originally announced November 2022.
-
Application of Top-hat Transformation for Enhanced Blood Vessel Extraction
Authors:
Tithi Parna Das,
Sheetal Praharaj,
Sarita Swain,
Sumanshu Agarwal,
Kundan Kumar
Abstract:
In the medical domain, different computer-aided diagnosis systems have been proposed to extract blood vessels from retinal fundus images for the clinical treatment of vascular diseases. Accurate extraction of blood vessels from the fundus images using a computer-generated method can help the clinician to produce timely and accurate reports for the patient suffering from these diseases. In this art…
▽ More
In the medical domain, different computer-aided diagnosis systems have been proposed to extract blood vessels from retinal fundus images for the clinical treatment of vascular diseases. Accurate extraction of blood vessels from the fundus images using a computer-generated method can help the clinician to produce timely and accurate reports for the patient suffering from these diseases. In this article, we integrate top-hat based preprocessing approach with fine-tuned B-COSFIRE filter to achieve more accurate segregation of blood vessel pixels from the background. The use of top-hat transformation in the preprocessing stage enhances the efficacy of the algorithm to extract blood vessels in presence of structures like fovea, exudates, haemorrhages, etc. Furthermore, to reduce the false positives, small clusters of blood vessel pixels are removed in the postprocessing stage. Further, we find that the proposed algorithm is more efficient as compared to various modern algorithms reported in the literature.
△ Less
Submitted 18 March, 2022;
originally announced March 2022.
-
Conditional Synthetic Data Generation for Personal Thermal Comfort Models
Authors:
Hari Prasanna Das,
Costas J. Spanos
Abstract:
Personal thermal comfort models aim to predict an individual's thermal comfort response, instead of the average response of a large group. Recently, machine learning algorithms have proven to be having enormous potential as a candidate for personal thermal comfort models. But, often within the normal settings of a building, personal thermal comfort data obtained via experiments are heavily class-i…
▽ More
Personal thermal comfort models aim to predict an individual's thermal comfort response, instead of the average response of a large group. Recently, machine learning algorithms have proven to be having enormous potential as a candidate for personal thermal comfort models. But, often within the normal settings of a building, personal thermal comfort data obtained via experiments are heavily class-imbalanced. There are a disproportionately high number of data samples for the "Prefer No Change" class, as compared with the "Prefer Warmer" and "Prefer Cooler" classes. Machine learning algorithms trained on such class-imbalanced data perform sub-optimally when deployed in the real world. To develop robust machine learning-based applications using the above class-imbalanced data, as well as for privacy-preserving data sharing, we propose to implement a state-of-the-art conditional synthetic data generator to generate synthetic data corresponding to the low-frequency classes. Via experiments, we show that the synthetic data generated has a distribution that mimics the real data distribution. The proposed method can be extended for use by other smart building datasets/use-cases.
△ Less
Submitted 20 November, 2022; v1 submitted 10 March, 2022;
originally announced March 2022.
-
Melody Extraction from Polyphonic Music by Deep Learning Approaches: A Review
Authors:
Gurunath Reddy M,
K. Sreenivasa Rao,
Partha Pratim Das
Abstract:
Melody extraction is a vital music information retrieval task among music researchers for its potential applications in education pedagogy and the music industry. Melody extraction is a notoriously challenging task due to the presence of background instruments. Also, often melodic source exhibits similar characteristics to that of the other instruments. The interfering background accompaniment wit…
▽ More
Melody extraction is a vital music information retrieval task among music researchers for its potential applications in education pedagogy and the music industry. Melody extraction is a notoriously challenging task due to the presence of background instruments. Also, often melodic source exhibits similar characteristics to that of the other instruments. The interfering background accompaniment with the vocals makes extracting the melody from the mixture signal much more challenging. Until recently, classical signal processing-based melody extraction methods were quite popular among melody extraction researchers. The ability of the deep learning models to model large-scale data and the ability of the models to learn automatic features by exploiting spatial and temporal dependencies inspired many researchers to adopt deep learning models for melody extraction. In this paper, an attempt has been made to review the up-to-date data-driven deep learning approaches for melody extraction from polyphonic music. The available deep models have been categorized based on the type of neural network used and the output representation they use for predicting melody. Further, the architectures of the 25 melody extraction models are briefly presented. The loss functions used to optimize the model parameters of the melody extraction models are broadly categorized into four categories and briefly describe the loss functions used by various melody extraction models. Also, the various input representations adopted by the melody extraction models and the parameter settings are deeply described. A section describing the explainability of the block-box melody extraction deep neural networks is included. The performance of 25 melody extraction methods is compared. The possible future directions to explore/improve the melody extraction methods are also presented in the paper.
△ Less
Submitted 2 February, 2022;
originally announced February 2022.
-
Optimal Lockdown Strategy in a Pandemic: An Exploratory Analysis for Covid-19
Authors:
Gopal K. Basak,
Chandramauli Chakraborty,
Pranab Kumar Das
Abstract:
The paper addresses the question of lives versus livelihood in an SIRD model augmented with a macroeconomic structure. The constraints on the availability of health facilities - both infrastructure and health workers determine the probability of receiving treatment which is found to be higher for the patients with severe infection than the patients with mild infection for the specific parametric c…
▽ More
The paper addresses the question of lives versus livelihood in an SIRD model augmented with a macroeconomic structure. The constraints on the availability of health facilities - both infrastructure and health workers determine the probability of receiving treatment which is found to be higher for the patients with severe infection than the patients with mild infection for the specific parametric configuration of the paper. Distinguishing between two types of direct intervention policy - hard lockdown and soft lockdown, the study derives alternative policy options available to the government. The study further indicates that the soft lockdown policy is optimal from a public policy perspective under the specific parametric configuration considered in this paper.
△ Less
Submitted 6 September, 2021;
originally announced September 2021.
-
Iris Recognition Performance in Children: A Longitudinal Study
Authors:
Priyanka Das,
Laura Holsopple,
Dan Rissacher,
Michael Schuckers,
Stephanie Schuckers
Abstract:
There is uncertainty around the effect of aging of children on biometric characteristics impacting applications relying on biometric recognition, particularly as the time between enrollment and query increases. Though there have been studies of such effects for iris recognition in adults, there have been few studies evaluating impact in children. This paper presents longitudinal analysis from 209…
▽ More
There is uncertainty around the effect of aging of children on biometric characteristics impacting applications relying on biometric recognition, particularly as the time between enrollment and query increases. Though there have been studies of such effects for iris recognition in adults, there have been few studies evaluating impact in children. This paper presents longitudinal analysis from 209 subjects aged 4 to 11 years at enrollment and six additional sessions over a period of 3 years. The influence of time, dilation and enrollment age on iris recognition have been analyzed and their statistical importance has been evaluated. A minor aging effect is noted which is statistically significant, but practically insignificant and is comparatively less important than other variability factors. Practical biometric applications of iris recognition in children are feasible for a time frame of at least 3 years between samples, for ages 4 to 11 years, even in presence of aging, though we note practical difficulties in enrolling young children with cameras not designed for the purpose. To the best of our knowledge, the database used in this study is the only dataset of longitudinal iris images from children for this age group and time period that is available for research.
△ Less
Submitted 15 January, 2021;
originally announced January 2021.
-
Online Photometric Calibration of Automatic Gain Thermal Infrared Cameras
Authors:
Manash Pratim Das,
Larry Matthies,
Shreyansh Daftry
Abstract:
Thermal infrared cameras are increasingly being used in various applications such as robot vision, industrial inspection and medical imaging, thanks to their improved resolution and portability. However, the performance of traditional computer vision techniques developed for electro-optical imagery does not directly translate to the thermal domain due to two major reasons: these algorithms require…
▽ More
Thermal infrared cameras are increasingly being used in various applications such as robot vision, industrial inspection and medical imaging, thanks to their improved resolution and portability. However, the performance of traditional computer vision techniques developed for electro-optical imagery does not directly translate to the thermal domain due to two major reasons: these algorithms require photometric assumptions to hold, and methods for photometric calibration of RGB cameras cannot be applied to thermal-infrared cameras due to difference in data acquisition and sensor phenomenology. In this paper, we take a step in this direction, and introduce a novel algorithm for online photometric calibration of thermal-infrared cameras. Our proposed method does not require any specific driver/hardware support and hence can be applied to any commercial off-the-shelf thermal IR camera. We present this in the context of visual odometry and SLAM algorithms, and demonstrate the efficacy of our proposed system through extensive experiments for both standard benchmark datasets, and real-world field tests with a thermal-infrared camera in natural outdoor environments.
△ Less
Submitted 11 January, 2021; v1 submitted 7 December, 2020;
originally announced December 2020.
-
Knowledge Distillation for Singing Voice Detection
Authors:
Soumava Paul,
Gurunath Reddy M,
K Sreenivasa Rao,
Partha Pratim Das
Abstract:
Singing Voice Detection (SVD) has been an active area of research in music information retrieval (MIR). Currently, two deep neural network-based methods, one based on CNN and the other on RNN, exist in literature that learn optimized features for the voice detection (VD) task and achieve state-of-the-art performance on common datasets. Both these models have a huge number of parameters (1.4M for C…
▽ More
Singing Voice Detection (SVD) has been an active area of research in music information retrieval (MIR). Currently, two deep neural network-based methods, one based on CNN and the other on RNN, exist in literature that learn optimized features for the voice detection (VD) task and achieve state-of-the-art performance on common datasets. Both these models have a huge number of parameters (1.4M for CNN and 65.7K for RNN) and hence not suitable for deployment on devices like smartphones or embedded sensors with limited capacity in terms of memory and computation power. The most popular method to address this issue is known as knowledge distillation in deep learning literature (in addition to model compression) where a large pre-trained network known as the teacher is used to train a smaller student network. Given the wide applications of SVD in music information retrieval, to the best of our knowledge, model compression for practical deployment has not yet been explored. In this paper, efforts have been made to investigate this issue using both conventional as well as ensemble knowledge distillation techniques.
△ Less
Submitted 19 August, 2021; v1 submitted 9 November, 2020;
originally announced November 2020.
-
Analysis of Dilation in Children and its Impact on Iris Recognition
Authors:
Priyanka Das,
Laura Holsopple,
Michael Schuckers,
Stephanie Schuckers
Abstract:
The dilation of the pupil and it's variation between a mated pair of irides has been found to be an important factor in the performance of iris recognition systems. Studies on adult irides indicated significant impact of dilation on iris recognition performance at different ages. However, the results of adults may not necessarily translate to children. This study analyzes dilation as a factor of a…
▽ More
The dilation of the pupil and it's variation between a mated pair of irides has been found to be an important factor in the performance of iris recognition systems. Studies on adult irides indicated significant impact of dilation on iris recognition performance at different ages. However, the results of adults may not necessarily translate to children. This study analyzes dilation as a factor of age and over time in children, from data collected from same 209 subjects in the age group of four to 11 years at enrollment, longitudinally over three years spaced by six months. The performance of iris recognition is also analyzed in presence of dilation variation.
△ Less
Submitted 1 September, 2020;
originally announced September 2020.
-
Beat Detection and Automatic Annotation of the Music of Bharatanatyam Dance using Speech Recognition Techniques
Authors:
Tanwi Mallick,
Partha Pratim Das,
Arun Kumar Majumdar
Abstract:
Bharatanatyam, an Indian Classical Dance form, represents the rich cultural heritage of India. Analysis and recognition of such dance forms are critical for the preservation of cultural heritage. Like in most dance forms, a Bharatanatyam dancer performs in synchronization with structured rhythmic music, called Sollukattu, which comprises instrumental beats and vocalized utterances (bols) to create…
▽ More
Bharatanatyam, an Indian Classical Dance form, represents the rich cultural heritage of India. Analysis and recognition of such dance forms are critical for the preservation of cultural heritage. Like in most dance forms, a Bharatanatyam dancer performs in synchronization with structured rhythmic music, called Sollukattu, which comprises instrumental beats and vocalized utterances (bols) to create a rhythmic music structure. Computer analysis of Bharatanatyam, therefore, requires a structural analysis of Sollukattus. In this paper, we use speech processing techniques to recognize bols. Exploiting the predefined structures of Sollukattus and the detected bols, we recognize the Sollukattu. We estimate the tempo period by two methods. Finally, we generate a complete annotation of the audio signal by beat marking. For this, we also use the information of beats detected from the onset envelope of a Sollukattu signal. For training and test, we create a data set for Sollukattus and annotate them. We achieve 85% accuracy in bol recognition, 95% in Sollukattu recognition, 96% in tempo period estimation, and over 90% in beat marking. This is the maiden attempt to fully structurally analyze the music of an Indian Classical Dance form and the use of speech processing techniques for beat marking.
△ Less
Submitted 17 April, 2020;
originally announced April 2020.
-
Early Response Assessment in Lung Cancer Patients using Spatio-temporal CBCT Images
Authors:
Bijju Kranthi Veduruparthi,
Jayanta Mukherjee,
Partha Pratim Das,
Mandira Saha,
Sanjoy Chatterjee,
Raj Kumar Shrimali,
Soumendranath Ray,
Sriram Prasath
Abstract:
We report a model to predict patient's radiological response to curative radiation therapy (RT) for non-small-cell lung cancer (NSCLC).
Cone-Beam Computed Tomography images acquired weekly during the six-week course of RT were contoured with the Gross Tumor Volume (GTV) by senior radiation oncologists for 53 patients (7 images per patient).
Deformable registration of the images yielded six def…
▽ More
We report a model to predict patient's radiological response to curative radiation therapy (RT) for non-small-cell lung cancer (NSCLC).
Cone-Beam Computed Tomography images acquired weekly during the six-week course of RT were contoured with the Gross Tumor Volume (GTV) by senior radiation oncologists for 53 patients (7 images per patient).
Deformable registration of the images yielded six deformation fields for each pair of consecutive images per patient.
Jacobian of a field provides a measure of local expansion/contraction and is used in our model.
Delineations were compared post-registration to compute unchanged ($U$), newly grown ($G$), and reduced ($R$) regions within GTV.
The mean Jacobian of these regions $μ_U$, $μ_G$ and $μ_R$ are statistically compared and a response assessment model is proposed.
A good response is hypothesized if $μ_R < 1.0$, $μ_R < μ_U$, and $μ_G < μ_U$.
For early prediction of post-treatment response, first, three weeks' images are used.
Our model predicted clinical response with a precision of $74\%$.
Using reduction in CT numbers (CTN) and percentage GTV reduction as features in logistic regression, yielded an area-under-curve of 0.65 with p=0.005.
Combining logistic regression model with the proposed hypothesis yielded an odds ratio of 20.0 (p=0.0).
△ Less
Submitted 7 March, 2020;
originally announced March 2020.
-
Speaker Verification Using Simple Temporal Features and Pitch Synchronous Cepstral Coefficients
Authors:
Bhavana V. S,
Pradip K. Das
Abstract:
Speaker verification is the process by which a speakers claim of identity is tested against a claimed speaker by his or her voice. Speaker verification is done by the use of some parameters (features) from the speakers voice which can be used to differentiate among many speakers. The efficiency of speaker verification system mainly depends on the feature set providing high inter-speaker variabilit…
▽ More
Speaker verification is the process by which a speakers claim of identity is tested against a claimed speaker by his or her voice. Speaker verification is done by the use of some parameters (features) from the speakers voice which can be used to differentiate among many speakers. The efficiency of speaker verification system mainly depends on the feature set providing high inter-speaker variability and low intra-speaker variability. There are many methods used for speaker verification. Some systems use Mel Frequency Cepstral Coefficients as features (MFCCs), while others use Hidden Markov Models (HMM) based speaker recognition, Support Vector Machines (SVM), GMMs . In this paper simple intra-pitch temporal information in conjunction with pitch synchronous cepstral coefficients forms the feature set. The distinct feature of a speaker is determined from the steady state part of five cardinal spoken English vowels. The performance was found to be average when these features were used independently. But very encouraging results were observed when both features were combined to form a decision for speaker verification. For a database of twenty speakers of 100 utterances per speaker, an accuracy of 91.04% has been observed. The analysis of speakers whose recognition was incorrect is conducted and discussed .
△ Less
Submitted 15 August, 2019;
originally announced August 2019.
-
Multitaper Spectral Analysis of Neuronal Spiking Activity Driven by Latent Stationary Processes
Authors:
Proloy Das,
Behtash Babadi
Abstract:
Investigating the spectral properties of the neural covariates that underlie spiking activity is an important problem in systems neuroscience, as it allows to study the role of brain rhythms in cognitive functions. While the spectral estimation of continuous time-series is a well-established domain, computing the spectral representation of these neural covariates from spiking data sets forth vario…
▽ More
Investigating the spectral properties of the neural covariates that underlie spiking activity is an important problem in systems neuroscience, as it allows to study the role of brain rhythms in cognitive functions. While the spectral estimation of continuous time-series is a well-established domain, computing the spectral representation of these neural covariates from spiking data sets forth various challenges due to the intrinsic non-linearities involved. In this paper, we address this problem by proposing a variant of the multitaper method specifically tailored for point process data. To this end, we construct auxiliary spiking statistics from which the eigen-spectra of the underlying latent process can be directly inferred using maximum likelihood estimation, and thereby the multitaper estimate can be efficiently computed. Comparison of our proposed technique to existing methods using simulated data reveals significant gains in terms of the bias-variance trade-off.
△ Less
Submitted 20 June, 2019;
originally announced June 2019.
-
Sizing and Placement of Battery Energy Storage Systems and Wind Turbines by Minimizing Costs and System Losses
Authors:
Bahman Khaki,
Pritam Das
Abstract:
Probabilistic and intermittent output power of wind turbines (WT) is one major inconsistency of WTs. Battery Energy Storage Systems (BESSs) are a suitable solution to mitigate this intermittency which use to smoothen the output power injected to the grid by such intermittent sources. This paper proposes a new optimization formulation using genetic algorithm to simultaneous sizing and placement of…
▽ More
Probabilistic and intermittent output power of wind turbines (WT) is one major inconsistency of WTs. Battery Energy Storage Systems (BESSs) are a suitable solution to mitigate this intermittency which use to smoothen the output power injected to the grid by such intermittent sources. This paper proposes a new optimization formulation using genetic algorithm to simultaneous sizing and placement of BESSs and WTs which result in finding best location and size (capacity) of WTs and BESSs in power system by minimizing total system loss (active and reactive loss) and Costs of WTs and BESSs which improves demand bus voltage profiles. The result of optimization problem is best buses to locate WTs and BESSs and the size (installable active and reactive power) of them. The case studies performed on IEEE 33 bus system, validates the suitability of the formulation for loss minimization and bus voltage profiles improvement in the test system in presence of WT and BESS.
△ Less
Submitted 28 March, 2019;
originally announced March 2019.
-
Parameter estimation for optimal path planning in internal transportation
Authors:
Pragna Das,
Lluıs Ribas-Xirgo
Abstract:
The costs incurred in a mobile robot (MR) change due to change in physical and environmental factors. Usually, there are two approaches to consider these costs, either explicitly modelling these different factors to calculate the cost or consider heuristics costs. First approach is lengthy and cumbersome and requires a new model for every new factor. Heuristics cost cannot account for the change i…
▽ More
The costs incurred in a mobile robot (MR) change due to change in physical and environmental factors. Usually, there are two approaches to consider these costs, either explicitly modelling these different factors to calculate the cost or consider heuristics costs. First approach is lengthy and cumbersome and requires a new model for every new factor. Heuristics cost cannot account for the change in cost due to change in state. This work proposes a new method to compute these costs, without the need of explicitly modelling the factors. The identified cost is modelled in a bi-linear state-space form where the change of costs is formed due to the change of these states. This eliminates the need to model all factors to derive the cost for every robot. In context of transportation, the travel time is identified as the key parameter to understand costs of traversing paths to carry material. The necessity to identify and estimate these travel times is proved by using them in route planning. The paths are computed constantly and average of total path costs of these paths are compared with that of paths obtained by heuristics costs. The results show that average total path costs of paths obtained through on-line estimated travel times are 15\% less that of paths obtained by heuristics costs.
△ Less
Submitted 31 July, 2018;
originally announced August 2018.
-
Optimal Control of Networks in the presence of Attackers and Defenders
Authors:
Ishan Kafle,
Sudarshan Bartaula,
Afroza Shirin,
Isaac Klickstein,
Pankaz Das,
Francesco Sorrentino
Abstract:
We consider the problem of a dynamical network whose dynamics is subject to external perturbations (`attacks') locally applied at a subset of the network nodes. We assume that the network has an ability to defend itself against attacks with appropriate countermeasures, which we model as actuators located at (another) subset of the network nodes. We derive the optimal defense strategy as an optimal…
▽ More
We consider the problem of a dynamical network whose dynamics is subject to external perturbations (`attacks') locally applied at a subset of the network nodes. We assume that the network has an ability to defend itself against attacks with appropriate countermeasures, which we model as actuators located at (another) subset of the network nodes. We derive the optimal defense strategy as an optimal control problem. We see that the network topology, as well as the distribution of attackers and defenders over the network affect the optimal control solution and the minimum control energy. We study the optimal control defense strategy for several network topologies, including chain networks, star networks, ring networks, and scale free networks.
△ Less
Submitted 5 April, 2018;
originally announced April 2018.
-
Adaptive Cost Coefficient Identification for Planning Optimal Operation in Mobile Robot based Internal Transportation
Authors:
Pragna Das,
Lluis Ribas-Xirgo
Abstract:
Decisions in automated logistic systems can be improved based on knowledge of real-time state of individual parts and also environmental factors. These knowledge can be obtained through travel time of edges by individual robots which represents the utility based costs in the system. Our work focuses on identifying \textbf{cost coefficients} in an autonomous multi-robot system used for internal tra…
▽ More
Decisions in automated logistic systems can be improved based on knowledge of real-time state of individual parts and also environmental factors. These knowledge can be obtained through travel time of edges by individual robots which represents the utility based costs in the system. Our work focuses on identifying \textbf{cost coefficients} in an autonomous multi-robot system used for internal transportation. With suitable predictions of these travel times the current status of cost involved in traversing from one node to another can be known. Thus suitable state-space model is formulated and Kalman filtering is used to estimate these travel time to use as weights for cost efficient route planning. Experiments show that paths obtained using online \textbf{travel times} as weights have total traversing cost reduces by 15\% on average.
△ Less
Submitted 12 May, 2018; v1 submitted 14 November, 2017;
originally announced November 2017.