Search | arXiv e-print repository

arXiv:2505.19385 [pdf, ps, other]

Advancing Limited-Angle CT Reconstruction Through Diffusion-Based Sinogram Completion

Authors: Jiaqi Guo, Santiago Lopez-Tapia, Aggelos K. Katsaggelos

Abstract: Limited Angle Computed Tomography (LACT) often faces significant challenges due to missing angular information. Unlike previous methods that operate in the image domain, we propose a new method that focuses on sinogram inpainting. We leverage MR-SDEs, a variant of diffusion models that characterize the diffusion process with mean-reverting stochastic differential equations, to fill in missing angu… ▽ More Limited Angle Computed Tomography (LACT) often faces significant challenges due to missing angular information. Unlike previous methods that operate in the image domain, we propose a new method that focuses on sinogram inpainting. We leverage MR-SDEs, a variant of diffusion models that characterize the diffusion process with mean-reverting stochastic differential equations, to fill in missing angular data at the projection level. Furthermore, by combining distillation with constraining the output of the model using the pseudo-inverse of the inpainting matrix, the diffusion process is accelerated and done in a step, enabling efficient and accurate sinogram completion. A subsequent post-processing module back-projects the inpainted sinogram into the image domain and further refines the reconstruction, effectively suppressing artifacts while preserving critical structural details. Quantitative experimental results demonstrate that the proposed method achieves state-of-the-art performance in both perceptual and fidelity quality, offering a promising solution for LACT reconstruction in scientific and clinical applications. △ Less

Submitted 25 May, 2025; originally announced May 2025.

Comments: Accepted at the 2025 IEEE International Conference on Image Processing (Oral)

arXiv:2501.12524 [pdf, other]

doi 10.1109/ISBI60581.2025.10980776

Efficient Lung Ultrasound Severity Scoring Using Dedicated Feature Extractor

Authors: Jiaqi Guo, Yunan Wu, Evangelos Kaimakamis, Georgios Petmezas, Vasileios E. Papageorgiou, Nicos Maglaveras, Aggelos K. Katsaggelos

Abstract: With the advent of the COVID-19 pandemic, ultrasound imaging has emerged as a promising technique for COVID-19 detection, due to its non-invasive nature, affordability, and portability. In response, researchers have focused on developing AI-based scoring systems to provide real-time diagnostic support. However, the limited size and lack of proper annotation in publicly available ultrasound dataset… ▽ More With the advent of the COVID-19 pandemic, ultrasound imaging has emerged as a promising technique for COVID-19 detection, due to its non-invasive nature, affordability, and portability. In response, researchers have focused on developing AI-based scoring systems to provide real-time diagnostic support. However, the limited size and lack of proper annotation in publicly available ultrasound datasets pose significant challenges for training a robust AI model. This paper proposes MeDiVLAD, a novel pipeline to address the above issue for multi-level lung-ultrasound (LUS) severity scoring. In particular, we leverage self-knowledge distillation to pretrain a vision transformer (ViT) without label and aggregate frame-level features via dual-level VLAD aggregation. We show that with minimal finetuning, MeDiVLAD outperforms conventional fully-supervised methods in both frame- and video-level scoring, while offering classification reasoning with exceptional quality. This superior performance enables key applications such as the automatic identification of critical lung pathology areas and provides a robust solution for broader medical video classification tasks. △ Less

Submitted 25 May, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

Comments: Accepted by IEEE ISBI 2025 (Selected for oral presentation); 2025/4/15 (v2): Corrected a notation error in Figure 2

Journal ref: 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI), Houston, TX, USA, 2025, pp. 1-5

arXiv:2501.01372 [pdf]

ScarNet: A Novel Foundation Model for Automated Myocardial Scar Quantification from LGE in Cardiac MRI

Authors: Neda Tavakoli, Amir Ali Rahsepar, Brandon C. Benefield, Daming Shen, Santiago López-Tapia, Florian Schiffers, Jeffrey J. Goldberger, Christine M. Albert, Edwin Wu, Aggelos K. Katsaggelos, Daniel C. Lee, Daniel Kim

Abstract: Background: Late Gadolinium Enhancement (LGE) imaging is the gold standard for assessing myocardial fibrosis and scarring, with left ventricular (LV) LGE extent predicting major adverse cardiac events (MACE). Despite its importance, routine LGE-based LV scar quantification is hindered by labor-intensive manual segmentation and inter-observer variability. Methods: We propose ScarNet, a hybrid model… ▽ More Background: Late Gadolinium Enhancement (LGE) imaging is the gold standard for assessing myocardial fibrosis and scarring, with left ventricular (LV) LGE extent predicting major adverse cardiac events (MACE). Despite its importance, routine LGE-based LV scar quantification is hindered by labor-intensive manual segmentation and inter-observer variability. Methods: We propose ScarNet, a hybrid model combining a transformer-based encoder from the Medical Segment Anything Model (MedSAM) with a convolution-based U-Net decoder, enhanced by tailored attention blocks. ScarNet was trained on 552 ischemic cardiomyopathy patients with expert segmentations of myocardial and scar boundaries and tested on 184 separate patients. Results: ScarNet achieved robust scar segmentation in 184 test patients, yielding a median Dice score of 0.912 (IQR: 0.863--0.944), significantly outperforming MedSAM (median Dice = 0.046, IQR: 0.043--0.047) and nnU-Net (median Dice = 0.638, IQR: 0.604--0.661). ScarNet demonstrated lower bias (-0.63%) and coefficient of variation (4.3%) compared to MedSAM (bias: -13.31%, CoV: 130.3%) and nnU-Net (bias: -2.46%, CoV: 20.3%). In Monte Carlo simulations with noise perturbations, ScarNet achieved significantly higher scar Dice (0.892 \pm 0.053, CoV = 5.9%) than MedSAM (0.048 \pm 0.112, CoV = 233.3%) and nnU-Net (0.615 \pm 0.537, CoV = 28.7%). Conclusion: ScarNet outperformed MedSAM and nnU-Net in accurately segmenting myocardial and scar boundaries in LGE images. The model exhibited robust performance across diverse image qualities and scar patterns. △ Less

Submitted 2 January, 2025; originally announced January 2025.

Comments: 31 pages, 8 figures

arXiv:2411.11863 [pdf, ps, other]

Longitudinal Wrist PPG Analysis for Reliable Hypertension Risk Screening Using Deep Learning

Authors: Hui Lin, Jiyang Li, Ramy Hussein, Xin Sui, Xiaoyu Li, Guangpu Zhu, Aggelos K. Katsaggelos, Zijing Zeng, Yelei Li

Abstract: Hypertension is a leading risk factor for cardiovascular diseases. Traditional blood pressure monitoring methods are cumbersome and inadequate for continuous tracking, prompting the development of PPG-based cuffless blood pressure monitoring wearables. This study leverages deep learning models, including ResNet and Transformer, to analyze wrist PPG data collected with a smartwatch for efficient hy… ▽ More Hypertension is a leading risk factor for cardiovascular diseases. Traditional blood pressure monitoring methods are cumbersome and inadequate for continuous tracking, prompting the development of PPG-based cuffless blood pressure monitoring wearables. This study leverages deep learning models, including ResNet and Transformer, to analyze wrist PPG data collected with a smartwatch for efficient hypertension risk screening, eliminating the need for handcrafted PPG features. Using the Home Blood Pressure Monitoring (HBPM) longitudinal dataset of 448 subjects and five-fold cross-validation, our model was trained on over 68k spot-check instances from 358 subjects and tested on real-world continuous recordings of 90 subjects. The compact ResNet model with 0.124M parameters performed significantly better than traditional machine learning methods, demonstrating its effectiveness in distinguishing between healthy and abnormal cases in real-world scenarios. △ Less

Submitted 2 November, 2024; originally announced November 2024.

Comments: blood pressure, hypertension, cuffless, photoplethysmography, deep learning

arXiv:2410.03276 [pdf, other]

Sm: enhanced localization in Multiple Instance Learning for medical imaging classification

Authors: Francisco M. Castro-Macías, Pablo Morales-Álvarez, Yunan Wu, Rafael Molina, Aggelos K. Katsaggelos

Abstract: Multiple Instance Learning (MIL) is widely used in medical imaging classification to reduce the labeling effort. While only bag labels are available for training, one typically seeks predictions at both bag and instance levels (classification and localization tasks, respectively). Early MIL methods treated the instances in a bag independently. Recent methods account for global and local dependenci… ▽ More Multiple Instance Learning (MIL) is widely used in medical imaging classification to reduce the labeling effort. While only bag labels are available for training, one typically seeks predictions at both bag and instance levels (classification and localization tasks, respectively). Early MIL methods treated the instances in a bag independently. Recent methods account for global and local dependencies among instances. Although they have yielded excellent results in classification, their performance in terms of localization is comparatively limited. We argue that these models have been designed to target the classification task, while implications at the instance level have not been deeply investigated. Motivated by a simple observation -- that neighboring instances are likely to have the same label -- we propose a novel, principled, and flexible mechanism to model local dependencies. It can be used alone or combined with any mechanism to model global dependencies (e.g., transformers). A thorough empirical validation shows that our module leads to state-of-the-art performance in localization while being competitive or superior in classification. Our code is at https://github.com/Franblueee/SmMIL. △ Less

Submitted 15 November, 2024; v1 submitted 4 October, 2024; originally announced October 2024.

Comments: 24 pages, 14 figures, 2024 Conference on Neural Information Processing Systems (NeurIPS 2024)

arXiv:2409.18340 [pdf, ps, other]

DRL-STNet: Unsupervised Domain Adaptation for Cross-modality Medical Image Segmentation via Disentangled Representation Learning

Authors: Hui Lin, Florian Schiffers, Santiago López-Tapia, Neda Tavakoli, Daniel Kim, Aggelos K. Katsaggelos

Abstract: Unsupervised domain adaptation (UDA) is essential for medical image segmentation, especially in cross-modality data scenarios. UDA aims to transfer knowledge from a labeled source domain to an unlabeled target domain, thereby reducing the dependency on extensive manual annotations. This paper presents DRL-STNet, a novel framework for cross-modality medical image segmentation that leverages generat… ▽ More Unsupervised domain adaptation (UDA) is essential for medical image segmentation, especially in cross-modality data scenarios. UDA aims to transfer knowledge from a labeled source domain to an unlabeled target domain, thereby reducing the dependency on extensive manual annotations. This paper presents DRL-STNet, a novel framework for cross-modality medical image segmentation that leverages generative adversarial networks (GANs), disentangled representation learning (DRL), and self-training (ST). Our method leverages DRL within a GAN to translate images from the source to the target modality. Then, the segmentation model is initially trained with these translated images and corresponding source labels and then fine-tuned iteratively using a combination of synthetic and real images with pseudo-labels and real labels. The proposed framework exhibits superior performance in abdominal organ segmentation on the FLARE challenge dataset, surpassing state-of-the-art methods by 11.4% in the Dice similarity coefficient and by 13.1% in the Normalized Surface Dice metric, achieving scores of 74.21% and 80.69%, respectively. The average running time is 41 seconds, and the area under the GPU memory-time curve is 11,292 MB. These results indicate the potential of DRL-STNet for enhancing cross-modality medical image segmentation tasks. △ Less

Submitted 26 September, 2024; originally announced September 2024.

Comments: MICCAI 2024 Challenge, FLARE Challenge, Unsupervised domain adaptation, Organ segmentation, Feature disentanglement, Self-training

arXiv:2409.13930 [pdf, other]

RN-SDEs: Limited-Angle CT Reconstruction with Residual Null-Space Diffusion Stochastic Differential Equations

Authors: Jiaqi Guo, Santiago Lopez-Tapia, Wing Shun Li, Yunnan Wu, Marcelo Carignano, Vadim Backman, Vinayak P. Dravid, Aggelos K. Katsaggelos

Abstract: Computed tomography is a widely used imaging modality with applications ranging from medical imaging to material analysis. One major challenge arises from the lack of scanning information at certain angles, leading to distorted CT images with artifacts. This results in an ill-posed problem known as the Limited Angle Computed Tomography (LACT) reconstruction problem. To address this problem, we pro… ▽ More Computed tomography is a widely used imaging modality with applications ranging from medical imaging to material analysis. One major challenge arises from the lack of scanning information at certain angles, leading to distorted CT images with artifacts. This results in an ill-posed problem known as the Limited Angle Computed Tomography (LACT) reconstruction problem. To address this problem, we propose Residual Null-Space Diffusion Stochastic Differential Equations (RN-SDEs), which are a variant of diffusion models that characterize the diffusion process with mean-reverting (MR) stochastic differential equations. To demonstrate the generalizability of RN-SDEs, our experiments are conducted on two different LACT datasets, i.e., ChromSTEM and C4KC-KiTS. Through extensive experiments, we show that by leveraging learned Mean-Reverting SDEs as a prior and emphasizing data consistency using Range-Null Space Decomposition (RNSD) based rectification, RN-SDEs can restore high-quality images from severe degradation and achieve state-of-the-art performance in most LACT tasks. Additionally, we present a quantitative comparison of computational complexity and runtime efficiency, highlighting the superior effectiveness of our proposed approach. △ Less

Submitted 20 September, 2024; originally announced September 2024.

arXiv:2409.06738 [pdf, other]

Characterization of Crystal Properties and Defects in CdZnTe Radiation Detectors

Authors: Manuel Ballester, Jaromir Kaspar, Francesc Massanes, Srutarshi Banerjee, Alexander Hans Vija, Aggelos K. Katsaggelos

Abstract: CdZnTe-based detectors are highly valued because of their high spectral resolution, which is an essential feature for nuclear medical imaging. However, this resolution is compromised when there are substantial defects in the CdZnTe crystals. In this study, we present a learning-based approach to determine the spatially dependent bulk properties and defects in semiconductor detectors. This characte… ▽ More CdZnTe-based detectors are highly valued because of their high spectral resolution, which is an essential feature for nuclear medical imaging. However, this resolution is compromised when there are substantial defects in the CdZnTe crystals. In this study, we present a learning-based approach to determine the spatially dependent bulk properties and defects in semiconductor detectors. This characterization allows us to mitigate and compensate for the undesired effects caused by crystal impurities. We tested our model with computer-generated noise-free input data, where it showed excellent accuracy, achieving an average RMSE of 0.43% between the predicted and the ground truth crystal properties. In addition, a sensitivity analysis was performed to determine the effect of noisy data on the accuracy of the model. △ Less

Submitted 9 September, 2024; originally announced September 2024.

arXiv:2409.02323 [pdf, other]

Review and Novel Formulae for Transmittance and Reflectance of Wedged Thin Films on absorbing Substrates

Authors: Manuel Ballester, Emilio Marquez, John Bass, Christoph Wuersch, Florian Willomitzer, Aggelos K. Katsaggelos

Abstract: Historically, spectroscopic techniques have been essential for studying the optical properties of thin solid films. However, existing formulae for both normal transmission and reflection spectroscopy often rely on simplified theoretical assumptions, which may not accurately align with real-world conditions. For instance, it is common to assume (1) that the thin solid layers are deposited on comple… ▽ More Historically, spectroscopic techniques have been essential for studying the optical properties of thin solid films. However, existing formulae for both normal transmission and reflection spectroscopy often rely on simplified theoretical assumptions, which may not accurately align with real-world conditions. For instance, it is common to assume (1) that the thin solid layers are deposited on completely transparent thick substrates and (2) that the film surface forms a specular plane with a relatively small wedge angle. While recent studies have addressed these assumptions separately, this work presents an integrated framework that eliminates both assumptions simultaneously. In addition, the current work presents a deep review of various formulae from the literature, each with their corresponding levels of complexity. Our review analysis highlights a critical trade-off between computational complexity and expression accuracy, where the newly developed formulae offer enhanced accuracy at the expense of increased computational time. Our user-friendly code, which includes several classical transmittance and reflectance formulae from the literature and our newly proposed expressions, is publicly available in both Python and Matlab at this link. △ Less

Submitted 3 September, 2024; originally announced September 2024.

arXiv:2409.00777 [pdf, other]

VDPI: Video Deblurring with Pseudo-inverse Modeling

Authors: Zhihao Huang, Santiago Lopez-Tapia, Aggelos K. Katsaggelos

Abstract: Video deblurring is a challenging task that aims to recover sharp sequences from blur and noisy observations. The image-formation model plays a crucial role in traditional model-based methods, constraining the possible solutions. However, this is only the case for some deep learning-based methods. Despite deep-learning models achieving better results, traditional model-based methods remain widely… ▽ More Video deblurring is a challenging task that aims to recover sharp sequences from blur and noisy observations. The image-formation model plays a crucial role in traditional model-based methods, constraining the possible solutions. However, this is only the case for some deep learning-based methods. Despite deep-learning models achieving better results, traditional model-based methods remain widely popular due to their flexibility. An increasing number of scholars combine the two to achieve better deblurring performance. This paper proposes introducing knowledge of the image-formation model into a deep learning network by using the pseudo-inverse of the blur. We use a deep network to fit the blurring and estimate pseudo-inverse. Then, we use this estimation, combined with a variational deep-learning network, to deblur the video sequence. Notably, our experimental results demonstrate that such modifications can significantly improve the performance of deep learning models for video deblurring. Furthermore, our experiments on different datasets achieved notable performance improvements, proving that our proposed method can generalize to different scenarios and cameras. △ Less

Submitted 1 September, 2024; originally announced September 2024.

arXiv:2407.06535 [pdf, other]

An Angular Spectrum Approach to Inverse Synthesis for the Characterization of Optical and Geometrical Properties of Semiconductor Thin Films

Authors: John M. Bass, Manuel Ballester, Susana M. Fernández, Aggelos K. Katsaggelos, Emilio Márquez, Florian Willomitzer

Abstract: To design semiconductor-based optical devices, the optical properties of the used semiconductor materials must be precisely measured over a large band. Transmission spectroscopy stands out as an inexpensive and widely available method for this measurement but requires model assumptions and reconstruction algorithms to convert the measured transmittance spectra into optical properties of the thin f… ▽ More To design semiconductor-based optical devices, the optical properties of the used semiconductor materials must be precisely measured over a large band. Transmission spectroscopy stands out as an inexpensive and widely available method for this measurement but requires model assumptions and reconstruction algorithms to convert the measured transmittance spectra into optical properties of the thin films. Amongst the different reconstruction techniques, inverse synthesis methods generally provide high precision but rely on rigid analytical models of a thin film system. In this paper, we demonstrate a novel flexible inverse synthesis method that uses angular spectrum wave propagation and does not rely on rigid model assumptions. Amongst other evaluated parameters, our algorithm is capable of evaluating the geometrical properties of thin film surfaces, which reduces the variance caused by inverse synthesis optimization routines and significantly improves measurement precision. The proposed method could potentially allow for the characterization of "uncommon" thin film samples that do not fit the current model assumptions, as well as the characterization of samples with higher complexity, e.g., multi-layer systems. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: 10 pages, 7 figures, 5 tables

arXiv:2405.13168 [pdf, other]

Modeling and Simulation of Charge-Induced Signals in Photon-Counting CZT Detectors for Medical Imaging Applications

Authors: Manuel Ballester, Jaromir Kaspar, Francesc Massanes, Srutarshi Banerjee, Alexander Hans Vija, Aggelos K. Katsaggelos

Abstract: Photon-counting detectors based on CZT are essential in nuclear medical imaging, particularly for SPECT applications. Although CZT detectors are known for their precise energy resolution, defects within the CZT crystals significantly impact their performance. These defects result in inhomogeneous material properties throughout the bulk of the detector. The present work introduces an efficient comp… ▽ More Photon-counting detectors based on CZT are essential in nuclear medical imaging, particularly for SPECT applications. Although CZT detectors are known for their precise energy resolution, defects within the CZT crystals significantly impact their performance. These defects result in inhomogeneous material properties throughout the bulk of the detector. The present work introduces an efficient computational model that simulates the operation of semiconductor detectors, accounting for the spatial variability of the crystal properties. Our simulator reproduces the charge-induced pulse signals generated after the X/gamma-rays interact with the detector. The performance evaluation of the model shows an RMSE in the signal below 0.70%. Our simulator can function as a digital twin to accurately replicate the operation of actual detectors. Thus, it can be used to mitigate and compensate for adverse effects arising from crystal impurities. △ Less

Submitted 24 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.00857 [pdf, other]

doi 10.1109/ISBI56570.2024.10635883

Brighteye: Glaucoma Screening with Color Fundus Photographs based on Vision Transformer

Authors: Hui Lin, Charilaos Apostolidis, Aggelos K. Katsaggelos

Abstract: Differences in image quality, lighting conditions, and patient demographics pose challenges to automated glaucoma detection from color fundus photography. Brighteye, a method based on Vision Transformer, is proposed for glaucoma detection and glaucomatous feature classification. Brighteye learns long-range relationships among pixels within large fundus images using a self-attention mechanism. Prio… ▽ More Differences in image quality, lighting conditions, and patient demographics pose challenges to automated glaucoma detection from color fundus photography. Brighteye, a method based on Vision Transformer, is proposed for glaucoma detection and glaucomatous feature classification. Brighteye learns long-range relationships among pixels within large fundus images using a self-attention mechanism. Prior to being input into Brighteye, the optic disc is localized using YOLOv8, and the region of interest (ROI) around the disc center is cropped to ensure alignment with clinical practice. Optic disc detection improves the sensitivity at 95% specificity from 79.20% to 85.70% for glaucoma detection and the Hamming distance from 0.2470 to 0.1250 for glaucomatous feature classification. In the developmental stage of the Justified Referral in AI Glaucoma Screening (JustRAIGS) challenge, the overall outcome secured the fifth position out of 226 entries. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: ISBI 2024, JustRAIGS challenge, glaucoma detection

arXiv:2404.15552 [pdf, other]

Cross-Temporal Spectrogram Autoencoder (CTSAE): Unsupervised Dimensionality Reduction for Clustering Gravitational Wave Glitches

Authors: Yi Li, Yunan Wu, Aggelos K. Katsaggelos

Abstract: The advancement of The Laser Interferometer Gravitational-Wave Observatory (LIGO) has significantly enhanced the feasibility and reliability of gravitational wave detection. However, LIGO's high sensitivity makes it susceptible to transient noises known as glitches, which necessitate effective differentiation from real gravitational wave signals. Traditional approaches predominantly employ fully s… ▽ More The advancement of The Laser Interferometer Gravitational-Wave Observatory (LIGO) has significantly enhanced the feasibility and reliability of gravitational wave detection. However, LIGO's high sensitivity makes it susceptible to transient noises known as glitches, which necessitate effective differentiation from real gravitational wave signals. Traditional approaches predominantly employ fully supervised or semi-supervised algorithms for the task of glitch classification and clustering. In the future task of identifying and classifying glitches across main and auxiliary channels, it is impractical to build a dataset with manually labeled ground-truth. In addition, the patterns of glitches can vary with time, generating new glitches without manual labels. In response to this challenge, we introduce the Cross-Temporal Spectrogram Autoencoder (CTSAE), a pioneering unsupervised method for the dimensionality reduction and clustering of gravitational wave glitches. CTSAE integrates a novel four-branch autoencoder with a hybrid of Convolutional Neural Networks (CNN) and Vision Transformers (ViT). To further extract features across multi-branches, we introduce a novel multi-branch fusion method using the CLS (Class) token. Our model, trained and evaluated on the GravitySpy O3 dataset on the main channel, demonstrates superior performance in clustering tasks when compared to state-of-the-art semi-supervised learning methods. To the best of our knowledge, CTSAE represents the first unsupervised approach tailored specifically for clustering LIGO data, marking a significant step forward in the field of gravitational wave research. The code of this paper is available at https://github.com/Zod-L/CTSAE △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.04663 [pdf, other]

Focused Active Learning for Histopathological Image Classification

Authors: Arne Schmidt, Pablo Morales-Álvarez, Lee A. D. Cooper, Lee A. Newberg, Andinet Enquobahrie, Aggelos K. Katsaggelos, Rafael Molina

Abstract: Active Learning (AL) has the potential to solve a major problem of digital pathology: the efficient acquisition of labeled data for machine learning algorithms. However, existing AL methods often struggle in realistic settings with artifacts, ambiguities, and class imbalances, as commonly seen in the medical field. The lack of precise uncertainty estimations leads to the acquisition of images with… ▽ More Active Learning (AL) has the potential to solve a major problem of digital pathology: the efficient acquisition of labeled data for machine learning algorithms. However, existing AL methods often struggle in realistic settings with artifacts, ambiguities, and class imbalances, as commonly seen in the medical field. The lack of precise uncertainty estimations leads to the acquisition of images with a low informative value. To address these challenges, we propose Focused Active Learning (FocAL), which combines a Bayesian Neural Network with Out-of-Distribution detection to estimate different uncertainties for the acquisition function. Specifically, the weighted epistemic uncertainty accounts for the class imbalance, aleatoric uncertainty for ambiguous images, and an OoD score for artifacts. We perform extensive experiments to validate our method on MNIST and the real-world Panda dataset for the classification of prostate cancer. The results confirm that other AL methods are 'distracted' by ambiguities and artifacts which harm the performance. FocAL effectively focuses on the most informative images, avoiding ambiguities and artifacts during acquisition. For both experiments, FocAL outperforms existing AL approaches, reaching a Cohen's kappa of 0.764 with only 0.69% of the labeled Panda data. △ Less

Submitted 6 April, 2024; originally announced April 2024.

arXiv:2403.14829 [pdf, other]

doi 10.1016/j.artint.2024.104115

Hyperbolic Secant representation of the logistic function: Application to probabilistic Multiple Instance Learning for CT intracranial hemorrhage detection

Authors: F. M. Castro-Macías, P. Morales-Álvarez, Y. Wu, R. Molina, A. K. Katsaggelos

Abstract: Multiple Instance Learning (MIL) is a weakly supervised paradigm that has been successfully applied to many different scientific areas and is particularly well suited to medical imaging. Probabilistic MIL methods, and more specifically Gaussian Processes (GPs), have achieved excellent results due to their high expressiveness and uncertainty quantification capabilities. One of the most successful G… ▽ More Multiple Instance Learning (MIL) is a weakly supervised paradigm that has been successfully applied to many different scientific areas and is particularly well suited to medical imaging. Probabilistic MIL methods, and more specifically Gaussian Processes (GPs), have achieved excellent results due to their high expressiveness and uncertainty quantification capabilities. One of the most successful GP-based MIL methods, VGPMIL, resorts to a variational bound to handle the intractability of the logistic function. Here, we formulate VGPMIL using Pólya-Gamma random variables. This approach yields the same variational posterior approximations as the original VGPMIL, which is a consequence of the two representations that the Hyperbolic Secant distribution admits. This leads us to propose a general GP-based MIL method that takes different forms by simply leveraging distributions other than the Hyperbolic Secant one. Using the Gamma distribution we arrive at a new approach that obtains competitive or superior predictive performance and efficiency. This is validated in a comprehensive experimental study including one synthetic MIL dataset, two well-known MIL benchmarks, and a real-world medical problem. We expect that this work provides useful ideas beyond MIL that can foster further research in the field. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 48 pages, 12 figures, published in Artificial Intelligence Journal

Journal ref: Journal: Artificial Intelligence, Pages: 104115, Publisher: Elsevier, Year: 2024

arXiv:2403.10589 [pdf]

A General Method to Incorporate Spatial Information into Loss Functions for GAN-based Super-resolution Models

Authors: Xijun Wang, Santiago López-Tapia, Alice Lucas, Xinyi Wu, Rafael Molina, Aggelos K. Katsaggelos

Abstract: Generative Adversarial Networks (GANs) have shown great performance on super-resolution problems since they can generate more visually realistic images and video frames. However, these models often introduce side effects into the outputs, such as unexpected artifacts and noises. To reduce these artifacts and enhance the perceptual quality of the results, in this paper, we propose a general method… ▽ More Generative Adversarial Networks (GANs) have shown great performance on super-resolution problems since they can generate more visually realistic images and video frames. However, these models often introduce side effects into the outputs, such as unexpected artifacts and noises. To reduce these artifacts and enhance the perceptual quality of the results, in this paper, we propose a general method that can be effectively used in most GAN-based super-resolution (SR) models by introducing essential spatial information into the training process. We extract spatial information from the input data and incorporate it into the training loss, making the corresponding loss a spatially adaptive (SA) one. After that, we utilize it to guide the training process. We will show that the proposed approach is independent of the methods used to extract the spatial information and independent of the SR tasks and models. This method consistently guides the training process towards generating visually pleasing SR images and video frames, substantially mitigating artifacts and noise, ultimately leading to enhanced perceptual quality. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.06961 [pdf, other]

Explainable Transformer Prototypes for Medical Diagnoses

Authors: Ugur Demir, Debesh Jha, Zheyuan Zhang, Elif Keles, Bradley Allen, Aggelos K. Katsaggelos, Ulas Bagci

Abstract: Deployments of artificial intelligence in medical diagnostics mandate not just accuracy and efficacy but also trust, emphasizing the need for explainability in machine decisions. The recent trend in automated medical image diagnostics leans towards the deployment of Transformer-based architectures, credited to their impressive capabilities. Since the self-attention feature of transformers contribu… ▽ More Deployments of artificial intelligence in medical diagnostics mandate not just accuracy and efficacy but also trust, emphasizing the need for explainability in machine decisions. The recent trend in automated medical image diagnostics leans towards the deployment of Transformer-based architectures, credited to their impressive capabilities. Since the self-attention feature of transformers contributes towards identifying crucial regions during the classification process, they enhance the trustability of the methods. However, the complex intricacies of these attention mechanisms may fall short of effectively pinpointing the regions of interest directly influencing AI decisions. Our research endeavors to innovate a unique attention block that underscores the correlation between 'regions' rather than 'pixels'. To address this challenge, we introduce an innovative system grounded in prototype learning, featuring an advanced self-attention mechanism that goes beyond conventional ad-hoc visual explanation techniques by offering comprehensible visual insights. A combined quantitative and qualitative methodological approach was used to demonstrate the effectiveness of the proposed method on the large-scale NIH chest X-ray dataset. Experimental results showed that our proposed method offers a promising direction for explainability, which can lead to the development of more trustable systems, which can facilitate easier and rapid adoption of such technology into routine clinics. The code is available at www.github.com/NUBagcilab/r2r_proto. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2402.07371 [pdf, other]

Real-World Atmospheric Turbulence Correction via Domain Adaptation

Authors: Xijun Wang, Santiago López-Tapia, Aggelos K. Katsaggelos

Abstract: Atmospheric turbulence, a common phenomenon in daily life, is primarily caused by the uneven heating of the Earth's surface. This phenomenon results in distorted and blurred acquired images or videos and can significantly impact downstream vision tasks, particularly those that rely on capturing clear, stable images or videos from outdoor environments, such as accurately detecting or recognizing ob… ▽ More Atmospheric turbulence, a common phenomenon in daily life, is primarily caused by the uneven heating of the Earth's surface. This phenomenon results in distorted and blurred acquired images or videos and can significantly impact downstream vision tasks, particularly those that rely on capturing clear, stable images or videos from outdoor environments, such as accurately detecting or recognizing objects. Therefore, people have proposed ways to simulate atmospheric turbulence and designed effective deep learning-based methods to remove the atmospheric turbulence effect. However, these synthesized turbulent images can not cover all the range of real-world turbulence effects. Though the models have achieved great performance for synthetic scenarios, there always exists a performance drop when applied to real-world cases. Moreover, reducing real-world turbulence is a more challenging task as there are no clean ground truth counterparts provided to the models during training. In this paper, we propose a real-world atmospheric turbulence mitigation model under a domain adaptation framework, which links the supervised simulated atmospheric turbulence correction with the unsupervised real-world atmospheric turbulence correction. We will show our proposed method enhances performance in real-world atmospheric turbulence scenarios, improving both image quality and downstream vision tasks. △ Less

Submitted 11 February, 2024; originally announced February 2024.

arXiv:2401.12913 [pdf, other]

Advancing Glitch Classification in Gravity Spy: Multi-view Fusion with Attention-based Machine Learning for Advanced LIGO's Fourth Observing Run

Authors: Yunan Wu, Michael Zevin, Christopher P. L. Berry, Kevin Crowston, Carsten Østerlund, Zoheyr Doctor, Sharan Banagiri, Corey B. Jackson, Vicky Kalogera, Aggelos K. Katsaggelos

Abstract: The first successful detection of gravitational waves by ground-based observatories, such as the Laser Interferometer Gravitational-Wave Observatory (LIGO), marked a revolutionary breakthrough in our comprehension of the Universe. However, due to the unprecedented sensitivity required to make such observations, gravitational-wave detectors also capture disruptive noise sources called glitches, pot… ▽ More The first successful detection of gravitational waves by ground-based observatories, such as the Laser Interferometer Gravitational-Wave Observatory (LIGO), marked a revolutionary breakthrough in our comprehension of the Universe. However, due to the unprecedented sensitivity required to make such observations, gravitational-wave detectors also capture disruptive noise sources called glitches, potentially masking or appearing as gravitational-wave signals themselves. To address this problem, a community-science project, Gravity Spy, incorporates human insight and machine learning to classify glitches in LIGO data. The machine learning classifier, integrated into the project since 2017, has evolved over time to accommodate increasing numbers of glitch classes. Despite its success, limitations have arisen in the ongoing LIGO fourth observing run (O4) due to its architecture's simplicity, which led to poor generalization and inability to handle multi-time window inputs effectively. We propose an advanced classifier for O4 glitches. Our contributions include evaluating fusion strategies for multi-time window inputs, using label smoothing to counter noisy labels, and enhancing interpretability through attention module-generated weights. This development seeks to enhance glitch classification, aiding in the ongoing exploration of gravitational-wave phenomena. △ Less

Submitted 23 January, 2024; originally announced January 2024.

arXiv:2311.02290 [pdf, other]

A Physics based Machine Learning Model to characterize Room Temperature Semiconductor Detectors in 3D

Authors: Srutarshi Banerjee, Miesher Rodrigues, Manuel Ballester, Alexander H. Vija, Aggelos K. Katsaggelos

Abstract: Room temperature semiconductor radiation detectors (RTSD) for X-ray and gamma-ray detection are vital tools for medical imaging, astrophysics and other applications. CdZnTe (CZT) has been the main RTSD for more than three decades with desired detection properties. In a typical pixelated configuration, CZT have electrodes on opposite ends. For advanced event reconstruction algorithms at sub-pixel l… ▽ More Room temperature semiconductor radiation detectors (RTSD) for X-ray and gamma-ray detection are vital tools for medical imaging, astrophysics and other applications. CdZnTe (CZT) has been the main RTSD for more than three decades with desired detection properties. In a typical pixelated configuration, CZT have electrodes on opposite ends. For advanced event reconstruction algorithms at sub-pixel level, detailed characterization of the RTSD is required in three dimensional (3D) space. However, 3D characterization of the material defects and charge transport properties in the sub-pixel regime is a labor-intensive process with skilled manpower and novel experimental setups. Presently, state-of-art characterization is done over the bulk of the RTSD considering homogenous properties. In this paper, we propose a novel physics based machine learning (PBML) model to characterize the RTSD over a discretized sub-pixelated 3D volume which is assumed. Our novel approach is the first to characterize a full 3D charge transport model of the RTSD. In this work, we first discretize the RTSD between a pixelated electrodes spatially in 3D - x, y, and z. The resulting discretizations are termed as voxels in 3D space. In each voxel, the different physics based charge transport properties such as drift, trapping, detrapping and recombination of charges are modeled as trainable model weights. The drift of the charges considers second order non-linear motion which is observed in practice with the RTSDs. Based on the electron-hole pair injections as input to the PBML model, and signals at the electrodes, free and trapped charges (electrons and holes) as outputs of the model, the PBML model determines the trainable weights by backpropagating the loss function. The trained weights of the model represents one-to-one relation to that of the actual physical charge transport properties in a voxelized detector. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2310.15898 [pdf, other]

YOLO-Angio: An Algorithm for Coronary Anatomy Segmentation

Authors: Tom Liu, Hui Lin, Aggelos K. Katsaggelos, Adrienne Kline

Abstract: Coronary angiography remains the gold standard for diagnosis of coronary artery disease, the most common cause of death worldwide. While this procedure is performed more than 2 million times annually, there remain few methods for fast and accurate automated measurement of disease and localization of coronary anatomy. Here, we present our solution to the Automatic Region-based Coronary Artery Disea… ▽ More Coronary angiography remains the gold standard for diagnosis of coronary artery disease, the most common cause of death worldwide. While this procedure is performed more than 2 million times annually, there remain few methods for fast and accurate automated measurement of disease and localization of coronary anatomy. Here, we present our solution to the Automatic Region-based Coronary Artery Disease diagnostics using X-ray angiography images (ARCADE) challenge held at MICCAI 2023. For the artery segmentation task, our three-stage approach combines preprocessing and feature selection by classical computer vision to enhance vessel contrast, followed by an ensemble model based on YOLOv8 to propose possible vessel candidates by generating a vessel map. A final segmentation is based on a logic-based approach to reconstruct the coronary tree in a graph-based sorting method. Our entry to the ARCADE challenge placed 3rd overall. Using the official metric for evaluation, we achieved an F1 score of 0.422 and 0.4289 on the validation and hold-out sets respectively. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: MICCAI Conference ARCADE Grand Challenge, YOLO, Computer Vision,

arXiv:2308.15530 [pdf, other]

doi 10.1140/epjp/s13360-023-04795-4

Gravity Spy: Lessons Learned and a Path Forward

Authors: Michael Zevin, Corey B. Jackson, Zoheyr Doctor, Yunan Wu, Carsten Østerlund, L. Clifton Johnson, Christopher P. L. Berry, Kevin Crowston, Scott B. Coughlin, Vicky Kalogera, Sharan Banagiri, Derek Davis, Jane Glanzer, Renzhi Hao, Aggelos K. Katsaggelos, Oli Patane, Jennifer Sanchez, Joshua Smith, Siddharth Soni, Laura Trouille, Marissa Walker, Irina Aerith, Wilfried Domainko, Victor-Georges Baranowski, Gerhard Niklasch , et al. (1 additional authors not shown)

Abstract: The Gravity Spy project aims to uncover the origins of glitches, transient bursts of noise that hamper analysis of gravitational-wave data. By using both the work of citizen-science volunteers and machine-learning algorithms, the Gravity Spy project enables reliable classification of glitches. Citizen science and machine learning are intrinsically coupled within the Gravity Spy framework, with mac… ▽ More The Gravity Spy project aims to uncover the origins of glitches, transient bursts of noise that hamper analysis of gravitational-wave data. By using both the work of citizen-science volunteers and machine-learning algorithms, the Gravity Spy project enables reliable classification of glitches. Citizen science and machine learning are intrinsically coupled within the Gravity Spy framework, with machine-learning classifications providing a rapid first-pass classification of the dataset and enabling tiered volunteer training, and volunteer-based classifications verifying the machine classifications, bolstering the machine-learning training set and identifying new morphological classes of glitches. These classifications are now routinely used in studies characterizing the performance of the LIGO gravitational-wave detectors. Providing the volunteers with a training framework that teaches them to classify a wide range of glitches, as well as additional tools to aid their investigations of interesting glitches, empowers them to make discoveries of new classes of glitches. This demonstrates that, when giving suitable support, volunteers can go beyond simple classification tasks to identify new features in data at a level comparable to domain experts. The Gravity Spy project is now providing volunteers with more complicated data that includes auxiliary monitors of the detector to identify the root cause of glitches. △ Less

Submitted 31 January, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

Comments: 33 pages, 5 figures, published in European Physical Journal Plus for focus issue on "Citizen science for physics: From Education and Outreach to Crowdsourcing fundamental research"

Journal ref: The European Physical Journal Plus, 139, 100 (2024)

arXiv:2307.09457 [pdf, other]

Smooth Attention for Deep Multiple Instance Learning: Application to CT Intracranial Hemorrhage Detection

Authors: Yunan Wu, Francisco M. Castro-Macías, Pablo Morales-Álvarez, Rafael Molina, Aggelos K. Katsaggelos

Abstract: Multiple Instance Learning (MIL) has been widely applied to medical imaging diagnosis, where bag labels are known and instance labels inside bags are unknown. Traditional MIL assumes that instances in each bag are independent samples from a given distribution. However, instances are often spatially or sequentially ordered, and one would expect similar diagnostic importance for neighboring instance… ▽ More Multiple Instance Learning (MIL) has been widely applied to medical imaging diagnosis, where bag labels are known and instance labels inside bags are unknown. Traditional MIL assumes that instances in each bag are independent samples from a given distribution. However, instances are often spatially or sequentially ordered, and one would expect similar diagnostic importance for neighboring instances. To address this, in this study, we propose a smooth attention deep MIL (SA-DMIL) model. Smoothness is achieved by the introduction of first and second order constraints on the latent function encoding the attention paid to each instance in a bag. The method is applied to the detection of intracranial hemorrhage (ICH) on head CT scans. The results show that this novel SA-DMIL: (a) achieves better performance than the non-smooth attention MIL at both scan (bag) and slice (instance) levels; (b) learns spatial dependencies between slices; and (c) outperforms current state-of-the-art MIL methods on the same ICH test set. △ Less

Submitted 18 July, 2023; originally announced July 2023.

arXiv:2305.05077 [pdf, other]

Atmospheric Turbulence Correction via Variational Deep Diffusion

Authors: Xijun Wang, Santiago López-Tapia, Aggelos K. Katsaggelos

Abstract: Atmospheric Turbulence (AT) correction is a challenging restoration task as it consists of two distortions: geometric distortion and spatially variant blur. Diffusion models have shown impressive accomplishments in photo-realistic image synthesis and beyond. In this paper, we propose a novel deep conditional diffusion model under a variational inference framework to solve the AT correction problem… ▽ More Atmospheric Turbulence (AT) correction is a challenging restoration task as it consists of two distortions: geometric distortion and spatially variant blur. Diffusion models have shown impressive accomplishments in photo-realistic image synthesis and beyond. In this paper, we propose a novel deep conditional diffusion model under a variational inference framework to solve the AT correction problem. We use this framework to improve performance by learning latent prior information from the input and degradation processes. We use the learned information to further condition the diffusion model. Experiments are conducted in a comprehensive synthetic AT dataset. We show that the proposed framework achieves good quantitative and qualitative results. △ Less

Submitted 26 July, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

Comments: This work has been accepted to the 2023 IEEE 6th International Conference on Multimedia Information Processing and Retrieval (MIPR)

arXiv:2305.04186 [pdf, other]

Video-Specific Query-Key Attention Modeling for Weakly-Supervised Temporal Action Localization

Authors: Xijun Wang, Aggelos K. Katsaggelos

Abstract: Weakly-supervised temporal action localization aims to identify and localize the action instances in the untrimmed videos with only video-level action labels. When humans watch videos, we can adapt our abstract-level knowledge about actions in different video scenarios and detect whether some actions are occurring. In this paper, we mimic how humans do and bring a new perspective for locating and… ▽ More Weakly-supervised temporal action localization aims to identify and localize the action instances in the untrimmed videos with only video-level action labels. When humans watch videos, we can adapt our abstract-level knowledge about actions in different video scenarios and detect whether some actions are occurring. In this paper, we mimic how humans do and bring a new perspective for locating and identifying multiple actions in a video. We propose a network named VQK-Net with a video-specific query-key attention modeling that learns a unique query for each action category of each input video. The learned queries not only contain the actions' knowledge features at the abstract level but also have the ability to fit this knowledge into the target video scenario, and they will be used to detect the presence of the corresponding action along the temporal dimension. To better learn these action category queries, we exploit not only the features of the current input video but also the correlation between different videos through a novel video-specific action category query learner worked with a query similarity loss. Finally, we conduct extensive experiments on three commonly used datasets (THUMOS14, ActivityNet1.2, and ActivityNet1.3) and achieve state-of-the-art performance. △ Less

Submitted 25 December, 2023; v1 submitted 7 May, 2023; originally announced May 2023.

arXiv:2303.17041 [pdf, other]

Automatic Camera Trajectory Control with Enhanced Immersion for Virtual Cinematography

Authors: Xinyi Wu, Haohong Wang, Aggelos K. Katsaggelos

Abstract: User-generated cinematic creations are gaining popularity as our daily entertainment, yet it is a challenge to master cinematography for producing immersive contents. Many existing automatic methods focus on roughly controlling predefined shot types or movement patterns, which struggle to engage viewers with the circumstances of the actor. Real-world cinematographic rules show that directors can c… ▽ More User-generated cinematic creations are gaining popularity as our daily entertainment, yet it is a challenge to master cinematography for producing immersive contents. Many existing automatic methods focus on roughly controlling predefined shot types or movement patterns, which struggle to engage viewers with the circumstances of the actor. Real-world cinematographic rules show that directors can create immersion by comprehensively synchronizing the camera with the actor. Inspired by this strategy, we propose a deep camera control framework that enables actor-camera synchronization in three aspects, considering frame aesthetics, spatial action, and emotional status in the 3D virtual stage. Following rule-of-thirds, our framework first modifies the initial camera placement to position the actor aesthetically. This adjustment is facilitated by a self-supervised adjustor that analyzes frame composition via camera projection. We then design a GAN model that can adversarially synthesize fine-grained camera movement based on the physical action and psychological state of the actor, using an encoder-decoder generator to map kinematics and emotional variables into camera trajectories. Moreover, we incorporate a regularizer to align the generated stylistic variances with specific emotional categories and intensities. The experimental results show that our proposed method yields immersive cinematic videos of high quality, both quantitatively and qualitatively. Live examples can be found in the supplementary video. △ Less

Submitted 21 May, 2024; v1 submitted 29 March, 2023; originally announced March 2023.

arXiv:2301.08798 [pdf]

DeepCOVID-Fuse: A Multi-modality Deep Learning Model Fusing Chest X-Radiographs and Clinical Variables to Predict COVID-19 Risk Levels

Authors: Yunan Wu, Amil Dravid, Ramsey Michael Wehbe, Aggelos K. Katsaggelos

Abstract: Propose: To present DeepCOVID-Fuse, a deep learning fusion model to predict risk levels in patients with confirmed coronavirus disease 2019 (COVID-19) and to evaluate the performance of pre-trained fusion models on full or partial combination of chest x-ray (CXRs) or chest radiograph and clinical variables. Materials and Methods: The initial CXRs, clinical variables and outcomes (i.e., mortality… ▽ More Propose: To present DeepCOVID-Fuse, a deep learning fusion model to predict risk levels in patients with confirmed coronavirus disease 2019 (COVID-19) and to evaluate the performance of pre-trained fusion models on full or partial combination of chest x-ray (CXRs) or chest radiograph and clinical variables. Materials and Methods: The initial CXRs, clinical variables and outcomes (i.e., mortality, intubation, hospital length of stay, ICU admission) were collected from February 2020 to April 2020 with reverse-transcription polymerase chain reaction (RT-PCR) test results as the reference standard. The risk level was determined by the outcome. The fusion model was trained on 1657 patients (Age: 58.30 +/- 17.74; Female: 807) and validated on 428 patients (56.41 +/- 17.03; 190) from Northwestern Memorial HealthCare system and was tested on 439 patients (56.51 +/- 17.78; 205) from a single holdout hospital. Performance of pre-trained fusion models on full or partial modalities were compared on the test set using the DeLong test for the area under the receiver operating characteristic curve (AUC) and the McNemar test for accuracy, precision, recall and F1. Results: The accuracy of DeepCOVID-Fuse trained on CXRs and clinical variables is 0.658, with an AUC of 0.842, which significantly outperformed (p < 0.05) models trained only on CXRs with an accuracy of 0.621 and AUC of 0.807 and only on clinical variables with an accuracy of 0.440 and AUC of 0.502. The pre-trained fusion model with only CXRs as input increases accuracy to 0.632 and AUC to 0.813 and with only clinical variables as input increases accuracy to 0.539 and AUC to 0.733. Conclusion: The fusion model learns better feature representations across different modalities during training and achieves good outcome predictions even when only some of the modalities are used in testing. △ Less

Submitted 20 January, 2023; originally announced January 2023.

arXiv:2207.14392 [pdf, other]

A Deep Generative Approach to Oversampling in Ptychography

Authors: Semih Barutcu, Aggelos K. Katsaggelos, Doğa Gürsoy

Abstract: Ptychography is a well-studied phase imaging method that makes non-invasive imaging possible at a nanometer scale. It has developed into a mainstream technique with various applications across a range of areas such as material science or the defense industry. One major drawback of ptychography is the long data acquisition time due to the high overlap requirement between adjacent illumination areas… ▽ More Ptychography is a well-studied phase imaging method that makes non-invasive imaging possible at a nanometer scale. It has developed into a mainstream technique with various applications across a range of areas such as material science or the defense industry. One major drawback of ptychography is the long data acquisition time due to the high overlap requirement between adjacent illumination areas to achieve a reasonable reconstruction. Traditional approaches with reduced overlap between scanning areas result in reconstructions with artifacts. In this paper, we propose complementing sparsely acquired or undersampled data with data sampled from a deep generative network to satisfy the oversampling requirement in ptychography. Because the deep generative network is pre-trained and its output can be computed as we collect data, the experimental data and the time to acquire the data can be reduced. We validate the method by presenting the reconstruction quality compared to the previously proposed and traditional approaches and comment on the strengths and drawbacks of the proposed approach. △ Less

Submitted 28 July, 2022; originally announced July 2022.

arXiv:2205.13672 [pdf, other]

Discriminative Dimensionality Reduction using Deep Neural Networks for Clustering of LIGO Data

Authors: Sara Bahaadini, Yunan Wu, Scott Coughlin, Michael Zevin, Aggelos K. Katsaggelos

Abstract: In this paper, leveraging the capabilities of neural networks for modeling the non-linearities that exist in the data, we propose several models that can project data into a low dimensional, discriminative, and smooth manifold. The proposed models can transfer knowledge from the domain of known classes to a new domain where the classes are unknown. A clustering algorithm is further applied in the… ▽ More In this paper, leveraging the capabilities of neural networks for modeling the non-linearities that exist in the data, we propose several models that can project data into a low dimensional, discriminative, and smooth manifold. The proposed models can transfer knowledge from the domain of known classes to a new domain where the classes are unknown. A clustering algorithm is further applied in the new domain to find potentially new classes from the pool of unlabeled data. The research problem and data for this paper originated from the Gravity Spy project which is a side project of Advanced Laser Interferometer Gravitational-wave Observatory (LIGO). The LIGO project aims at detecting cosmic gravitational waves using huge detectors. However non-cosmic, non-Gaussian disturbances known as "glitches", show up in gravitational-wave data of LIGO. This is undesirable as it creates problems for the gravitational wave detection process. Gravity Spy aids in glitch identification with the purpose of understanding their origin. Since new types of glitches appear over time, one of the objective of Gravity Spy is to create new glitch classes. Towards this task, we offer a methodology in this paper to accomplish this. △ Less

Submitted 26 May, 2022; originally announced May 2022.

arXiv:2205.02397 [pdf, other]

Compressive Ptychography using Deep Image and Generative Priors

Authors: Semih Barutcu, Doğa Gürsoy, Aggelos K. Katsaggelos

Abstract: Ptychography is a well-established coherent diffraction imaging technique that enables non-invasive imaging of samples at a nanometer scale. It has been extensively used in various areas such as the defense industry or materials science. One major limitation of ptychography is the long data acquisition time due to mechanical scanning of the sample; therefore, approaches to reduce the scan points a… ▽ More Ptychography is a well-established coherent diffraction imaging technique that enables non-invasive imaging of samples at a nanometer scale. It has been extensively used in various areas such as the defense industry or materials science. One major limitation of ptychography is the long data acquisition time due to mechanical scanning of the sample; therefore, approaches to reduce the scan points are highly desired. However, reconstructions with less number of scan points lead to imaging artifacts and significant distortions, hindering a quantitative evaluation of the results. To address this bottleneck, we propose a generative model combining deep image priors with deep generative priors. The self-training approach optimizes the deep generative neural network to create a solution for a given dataset. We complement our approach with a prior acquired from a previously trained discriminator network to avoid a possible divergence from the desired output caused by the noise in the measurements. We also suggest using the total variation as a complementary before combat artifacts due to measurement noise. We analyze our approach with numerical experiments through different probe overlap percentages and varying noise levels. We also demonstrate improved reconstruction accuracy compared to the state-of-the-art method and discuss the advantages and disadvantages of our approach. △ Less

Submitted 23 May, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

arXiv:2204.05376 [pdf, other]

medXGAN: Visual Explanations for Medical Classifiers through a Generative Latent Space

Authors: Amil Dravid, Florian Schiffers, Boqing Gong, Aggelos K. Katsaggelos

Abstract: Despite the surge of deep learning in the past decade, some users are skeptical to deploy these models in practice due to their black-box nature. Specifically, in the medical space where there are severe potential repercussions, we need to develop methods to gain confidence in the models' decisions. To this end, we propose a novel medical imaging generative adversarial framework, medXGAN (medical… ▽ More Despite the surge of deep learning in the past decade, some users are skeptical to deploy these models in practice due to their black-box nature. Specifically, in the medical space where there are severe potential repercussions, we need to develop methods to gain confidence in the models' decisions. To this end, we propose a novel medical imaging generative adversarial framework, medXGAN (medical eXplanation GAN), to visually explain what a medical classifier focuses on in its binary predictions. By encoding domain knowledge of medical images, we are able to disentangle anatomical structure and pathology, leading to fine-grained visualization through latent interpolation. Furthermore, we optimize the latent space such that interpolation explains how the features contribute to the classifier's output. Our method outperforms baselines such as Gradient-Weighted Class Activation Mapping (Grad-CAM) and Integrated Gradients in localization and explanatory ability. Additionally, a combination of the medXGAN with Integrated Gradients can yield explanations more robust to noise. The code is available at: https://avdravid.github.io/medXGAN_page/. △ Less

Submitted 17 April, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

Comments: 10 pages, 11 figures, accepted to CVPR TCV workshop

ACM Class: I.5.4; I.5.1; I.4.9; I.4.5; I.2.10

arXiv:2203.16683 [pdf, other]

doi 10.3847/1538-4357/ac8b05

Active Learning for Computationally Efficient Distribution of Binary Evolution Simulations

Authors: Kyle Akira Rocha, Jeff J. Andrews, Christopher P. L. Berry, Zoheyr Doctor, Aggelos K. Katsaggelos, Juan Gabriel Serra Pérez, Pablo Marchant, Vicky Kalogera, Scott Coughlin, Simone S. Bavera, Aaron Dotter, Tassos Fragos, Konstantinos Kovlakas, Devina Misra, Zepei Xing, Emmanouil Zapartas

Abstract: Binary stars undergo a variety of interactions and evolutionary phases, critical for predicting and explaining observed properties. Binary population synthesis with full stellar-structure and evolution simulations are computationally expensive requiring a large number of mass-transfer sequences. The recently developed binary population synthesis code POSYDON incorporates grids of MESA binary star… ▽ More Binary stars undergo a variety of interactions and evolutionary phases, critical for predicting and explaining observed properties. Binary population synthesis with full stellar-structure and evolution simulations are computationally expensive requiring a large number of mass-transfer sequences. The recently developed binary population synthesis code POSYDON incorporates grids of MESA binary star simulations which are then interpolated to model large-scale populations of massive binaries. The traditional method of computing a high-density rectilinear grid of simulations is not scalable for higher-dimension grids, accounting for a range of metallicities, rotation, and eccentricity. We present a new active learning algorithm, psy-cris, which uses machine learning in the data-gathering process to adaptively and iteratively select targeted simulations to run, resulting in a custom, high-performance training set. We test psy-cris on a toy problem and find the resulting training sets require fewer simulations for accurate classification and regression than either regular or randomly sampled grids. We further apply psy-cris to the target problem of building a dynamic grid of MESA simulations, and we demonstrate that, even without fine tuning, a simulation set of only $\sim 1/4$ the size of a rectilinear grid is sufficient to achieve the same classification accuracy. We anticipate further gains when algorithmic parameters are optimized for the targeted application. We find that optimizing for classification only may lead to performance losses in regression, and vice versa. Lowering the computational cost of producing grids will enable future versions of POSYDON to cover more input parameters while preserving interpolation accuracies. △ Less

Submitted 16 September, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

Comments: 21 pages, 10 figures, ApJ in press

Journal ref: Astrophysical Journal; 938(1):64(15); 2022

arXiv:2203.06448 [pdf]

doi 10.1007/s10339-024-01250-9

Discrete, recurrent, and scalable patterns in human judgement underlie affective picture ratings

Authors: Emanuel A. Azcona, Byoung-Woo Kim, Nicole L. Vike, Sumra Bari, Shamal Lalvani, Leandros Stefanopoulos, Sean Woodward, Martin Block, Aggelos K. Katsaggelos, Hans C. Breiter

Abstract: Operant keypress tasks, where each action has a consequence, have been analogized to the construct of "wanting" and produce lawful relationships in humans that quantify preferences for approach and avoidance behavior. It is unknown if rating tasks without an operant framework, which can be analogized to "liking", show similar lawful relationships. We studied three independent cohorts of participan… ▽ More Operant keypress tasks, where each action has a consequence, have been analogized to the construct of "wanting" and produce lawful relationships in humans that quantify preferences for approach and avoidance behavior. It is unknown if rating tasks without an operant framework, which can be analogized to "liking", show similar lawful relationships. We studied three independent cohorts of participants (N = 501, 506, and 4,019 participants) collected by two distinct organizations, using the same 7-point Likert scale to rate negative to positive preferences for pictures from the International Affective Picture Set. Picture ratings without an operant framework produced similar value functions, limit functions, and trade-off functions to those reported in the literature for operant keypress tasks, all with goodness of fits above 0.75. These value, limit, and trade-off functions were discrete in their mathematical formulation, recurrent across all three independent cohorts, and demonstrated scaling between individual and group curves. In all three experiments, the computation of loss aversion showed 95% confidence intervals below the value of 2, arguing against a strong overweighting of losses relative to gains, as has previously been reported for keypress tasks or games of chance with calibrated uncertainty. Graphed features from the three cohorts were similar and argue that preference assessments meet three of four criteria for lawfulness, providing a simple, short, and low-cost method for the quantitative assessment of preference without forced choice decisions, games of chance, or operant keypressing. This approach can easily be implemented on any digital device with a screen (e.g., cellphones). △ Less

Submitted 12 March, 2022; originally announced March 2022.

arXiv:2201.09120 [pdf, other]

Investigating the Potential of Auxiliary-Classifier GANs for Image Classification in Low Data Regimes

Authors: Amil Dravid, Florian Schiffers, Yunan Wu, Oliver Cossairt, Aggelos K. Katsaggelos

Abstract: Generative Adversarial Networks (GANs) have shown promise in augmenting datasets and boosting convolutional neural networks' (CNN) performance on image classification tasks. But they introduce more hyperparameters to tune as well as the need for additional time and computational power to train supplementary to the CNN. In this work, we examine the potential for Auxiliary-Classifier GANs (AC-GANs)… ▽ More Generative Adversarial Networks (GANs) have shown promise in augmenting datasets and boosting convolutional neural networks' (CNN) performance on image classification tasks. But they introduce more hyperparameters to tune as well as the need for additional time and computational power to train supplementary to the CNN. In this work, we examine the potential for Auxiliary-Classifier GANs (AC-GANs) as a 'one-stop-shop' architecture for image classification, particularly in low data regimes. Additionally, we explore modifications to the typical AC-GAN framework, changing the generator's latent space sampling scheme and employing a Wasserstein loss with gradient penalty to stabilize the simultaneous training of image synthesis and classification. Through experiments on images of varying resolutions and complexity, we demonstrate that AC-GANs show promise in image classification, achieving competitive performance with standard CNNs. These methods can be employed as an 'all-in-one' framework with particular utility in the absence of large amounts of training data. △ Less

Submitted 22 January, 2022; originally announced January 2022.

Comments: 4 pages content, 1 page references, 3 figures, 2 tables, to appear in ICASSP 2022

ACM Class: I.5.4; I.5.1; I.4.9; I.2.10

arXiv:2111.00116 [pdf, other]

Visual Explanations for Convolutional Neural Networks via Latent Traversal of Generative Adversarial Networks

Authors: Amil Dravid, Aggelos K. Katsaggelos

Abstract: Lack of explainability in artificial intelligence, specifically deep neural networks, remains a bottleneck for implementing models in practice. Popular techniques such as Gradient-weighted Class Activation Mapping (Grad-CAM) provide a coarse map of salient features in an image, which rarely tells the whole story of what a convolutional neural network (CNN) learned. Using COVID-19 chest X-rays, we… ▽ More Lack of explainability in artificial intelligence, specifically deep neural networks, remains a bottleneck for implementing models in practice. Popular techniques such as Gradient-weighted Class Activation Mapping (Grad-CAM) provide a coarse map of salient features in an image, which rarely tells the whole story of what a convolutional neural network (CNN) learned. Using COVID-19 chest X-rays, we present a method for interpreting what a CNN has learned by utilizing Generative Adversarial Networks (GANs). Our GAN framework disentangles lung structure from COVID-19 features. Using this GAN, we can visualize the transition of a pair of COVID negative lungs in a chest radiograph to a COVID positive pair by interpolating in the latent space of the GAN, which provides fine-grained visualization of how the CNN responds to varying features within the lungs. △ Less

Submitted 1 November, 2021; v1 submitted 29 October, 2021; originally announced November 2021.

Comments: 2 pages, 2 figures, to appear as extended abstract at AAAI-22

ACM Class: I.5.4; I.5.1; I.4.9; I.2.10

arXiv:2105.09892 [pdf, other]

Improving Acquisition Speed of X-Ray Ptychography through Spatial Undersampling and Regularization

Authors: Prasan Shedligeri, Florian Schiffers, Semih Barutcu, Pablo Ruiz, Aggelos K Katsaggelos, Oliver Cossairt

Abstract: X-ray ptychography is one of the versatile techniques for nanometer resolution imaging. The magnitude of the diffraction patterns is recorded on a detector and the phase of the diffraction patterns is estimated using phase retrieval techniques. Most phase retrieval algorithms make the solution well-posed by relying on the constraints imposed by the overlapping region between neighboring diffractio… ▽ More X-ray ptychography is one of the versatile techniques for nanometer resolution imaging. The magnitude of the diffraction patterns is recorded on a detector and the phase of the diffraction patterns is estimated using phase retrieval techniques. Most phase retrieval algorithms make the solution well-posed by relying on the constraints imposed by the overlapping region between neighboring diffraction pattern samples. As the overlap between neighboring diffraction patterns reduces, the problem becomes ill-posed and the object cannot be recovered. To avoid the ill-posedness, we investigate the effect of regularizing the phase retrieval algorithm with image priors for various overlap ratios between the neighboring diffraction patterns. We show that the object can be faithfully reconstructed at low overlap ratios by regularizing the phase retrieval algorithm with image priors such as Total-Variation and Structure Tensor Prior. We also show the effectiveness of our proposed algorithm on real data acquired from an IC chip with a coherent X-ray beam. △ Less

Submitted 20 May, 2021; originally announced May 2021.

Comments: Accepted at ICIP 2021; 5 pages, 6 figures

arXiv:2105.08205 [pdf, other]

Reinforcement Learning for Adaptive Video Compressive Sensing

Authors: Sidi Lu, Xin Yuan, Aggelos K Katsaggelos, Weisong Shi

Abstract: We apply reinforcement learning to video compressive sensing to adapt the compression ratio. Specifically, video snapshot compressive imaging (SCI), which captures high-speed video using a low-speed camera is considered in this work, in which multiple (B) video frames can be reconstructed from a snapshot measurement. One research gap in previous studies is how to adapt B in the video SCI system fo… ▽ More We apply reinforcement learning to video compressive sensing to adapt the compression ratio. Specifically, video snapshot compressive imaging (SCI), which captures high-speed video using a low-speed camera is considered in this work, in which multiple (B) video frames can be reconstructed from a snapshot measurement. One research gap in previous studies is how to adapt B in the video SCI system for different scenes. In this paper, we fill this gap utilizing reinforcement learning (RL). An RL model, as well as various convolutional neural networks for reconstruction, are learned to achieve adaptive sensing of video SCI systems. Furthermore, the performance of an object detection network using directly the video SCI measurements without reconstruction is also used to perform RL-based adaptive video compressive sensing. Our proposed adaptive SCI method can thus be implemented in low cost and real time. Our work takes the technology one step further towards real applications of video SCI. △ Less

Submitted 17 May, 2021; originally announced May 2021.

Comments: 12 pages, 11 figures, 2 tables

ACM Class: I.2.10

arXiv:2105.05973 [pdf, other]

Removing Blocking Artifacts in Video Streams Using Event Cameras

Authors: Henry H. Chopp, Srutarshi Banerjee, Oliver Cossairt, Aggelos K. Katsaggelos

Abstract: In this paper, we propose EveRestNet, a convolutional neural network designed to remove blocking artifacts in videostreams using events from neuromorphic sensors. We first degrade the video frame using a quadtree structure to produce the blocking artifacts to simulate transmitting a video under a heavily constrained bandwidth. Events from the neuromorphic sensor are also simulated, but are transmi… ▽ More In this paper, we propose EveRestNet, a convolutional neural network designed to remove blocking artifacts in videostreams using events from neuromorphic sensors. We first degrade the video frame using a quadtree structure to produce the blocking artifacts to simulate transmitting a video under a heavily constrained bandwidth. Events from the neuromorphic sensor are also simulated, but are transmitted in full. Using the distorted frames and the event stream, EveRestNet is able to improve the image quality. △ Less

Submitted 12 May, 2021; originally announced May 2021.

arXiv:2103.12297 [pdf, other]

Adaptive Illumination based Depth Sensing using Deep Superpixel and Soft Sampling Approximation

Authors: Qiqin Dai, Fengqiang Li, Oliver Cossairt, Aggelos K Katsaggelos

Abstract: Dense depth map capture is challenging in existing active sparse illumination based depth acquisition techniques, such as LiDAR. Various techniques have been proposed to estimate a dense depth map based on fusion of the sparse depth map measurement with the RGB image. Recent advances in hardware enable adaptive depth measurements resulting in further improvement of the dense depth map estimation.… ▽ More Dense depth map capture is challenging in existing active sparse illumination based depth acquisition techniques, such as LiDAR. Various techniques have been proposed to estimate a dense depth map based on fusion of the sparse depth map measurement with the RGB image. Recent advances in hardware enable adaptive depth measurements resulting in further improvement of the dense depth map estimation. In this paper, we study the topic of estimating dense depth from depth sampling. The adaptive sparse depth sampling network is jointly trained with a fusion network of an RGB image and sparse depth, to generate optimal adaptive sampling masks. We show that such adaptive sampling masks can generalize well to many RGB and sparse depth fusion algorithms under a variety of sampling rates (as low as $0.0625\%$). The proposed adaptive sampling method is fully differentiable and flexible to be trained end-to-end with upstream perception algorithms. △ Less

Submitted 22 February, 2022; v1 submitted 23 March, 2021; originally announced March 2021.

arXiv:2103.12104 [pdf, other]

doi 10.1088/1361-6382/ac1ccb

Discovering features in gravitational-wave data through detector characterization, citizen science and machine learning

Authors: S Soni, C P L Berry, S B Coughlin, M Harandi, C B Jackson, K Crowston, C Østerlund, O Patane, A K Katsaggelos, L Trouille, V-G Baranowski, W F Domainko, K Kaminski, M A Lobato Rodriguez, U Marciniak, P Nauta, G Niklasch, R R Rote, B Téglás, C Unsworth, C Zhang

Abstract: The observation of gravitational waves is hindered by the presence of transient noise (glitches). We study data from the third observing run of the Advanced LIGO detectors, and identify new glitch classes. Using training sets assembled by monitoring of the state of the detector, and by citizen-science volunteers, we update the Gravity Spy machine-learning algorithm for glitch classification. We fi… ▽ More The observation of gravitational waves is hindered by the presence of transient noise (glitches). We study data from the third observing run of the Advanced LIGO detectors, and identify new glitch classes. Using training sets assembled by monitoring of the state of the detector, and by citizen-science volunteers, we update the Gravity Spy machine-learning algorithm for glitch classification. We find that a new glitch class linked to ground motion at the detector sites is especially prevalent, and identify two subclasses of this linked to different types of ground motion. Reclassification of data based on the updated model finds that 27 % of all transient noise at LIGO Livingston belongs to the new glitch class, making it the most frequent source of transient noise at that site. Our results demonstrate both how glitch classification can reveal potential improvements to gravitational-wave detectors, and how, given an appropriate framework, citizen-science volunteers may make discoveries in large data sets. △ Less

Submitted 6 September, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

Comments: 26 pages, 10 figures

Journal ref: Classical and Quantum Gravity, 2021, Volume 38, Number 19

arXiv:2103.04421 [pdf, other]

doi 10.1109/MSP.2020.3023869.

Snapshot Compressive Imaging: Principle, Implementation, Theory, Algorithms and Applications

Authors: Xin Yuan, David J. Brady, Aggelos K. Katsaggelos

Abstract: Capturing high-dimensional (HD) data is a long-term challenge in signal processing and related fields. Snapshot compressive imaging (SCI) uses a two-dimensional (2D) detector to capture HD ($\ge3$D) data in a {\em snapshot} measurement. Via novel optical designs, the 2D detector samples the HD data in a {\em compressive} manner; following this, algorithms are employed to reconstruct the desired HD… ▽ More Capturing high-dimensional (HD) data is a long-term challenge in signal processing and related fields. Snapshot compressive imaging (SCI) uses a two-dimensional (2D) detector to capture HD ($\ge3$D) data in a {\em snapshot} measurement. Via novel optical designs, the 2D detector samples the HD data in a {\em compressive} manner; following this, algorithms are employed to reconstruct the desired HD data-cube. SCI has been used in hyperspectral imaging, video, holography, tomography, focal depth imaging, polarization imaging, microscopy, \etc.~Though the hardware has been investigated for more than a decade, the theoretical guarantees have only recently been derived. Inspired by deep learning, various deep neural networks have also been developed to reconstruct the HD data-cube in spectral SCI and video SCI. This article reviews recent advances in SCI hardware, theory and algorithms, including both optimization-based and deep-learning-based algorithms. Diverse applications and the outlook of SCI are also discussed. △ Less

Submitted 7 March, 2021; originally announced March 2021.

Comments: Extension of X. Yuan, D. J. Brady and A. K. Katsaggelos, "Snapshot Compressive Imaging: Theory, Algorithms, and Applications," in IEEE Signal Processing Magazine, vol. 38, no. 2, pp. 65-88, March 2021, doi: 10.1109/MSP.2020.3023869

Journal ref: in IEEE Signal Processing Magazine, vol. 38, no. 2, pp. 65-88, March 2021

arXiv:2102.12046 [pdf, other]

An Adaptive Video Acquisition Scheme for Object Tracking and its Performance Optimization

Authors: Srutarshi Banerjee, Henry H. Chopp, Juan G. Serra, Hao Tian Yang, Oliver Cossairt, A. K. Katsaggelos

Abstract: We present a novel adaptive host-chip modular architecture for video acquisition to optimize an overall objective task constrained under a given bit rate. The chip is a high resolution imaging sensor such as gigapixel focal plane array (FPA) with low computational power deployed on the field remotely, while the host is a server with high computational power. The communication channel data bandwidt… ▽ More We present a novel adaptive host-chip modular architecture for video acquisition to optimize an overall objective task constrained under a given bit rate. The chip is a high resolution imaging sensor such as gigapixel focal plane array (FPA) with low computational power deployed on the field remotely, while the host is a server with high computational power. The communication channel data bandwidth between the chip and host is constrained to accommodate transfer of all captured data from the chip. The host performs objective task specific computations and also intelligently guides the chip to optimize (compress) the data sent to host. This proposed system is modular and highly versatile in terms of flexibility in re-orienting the objective task. In this work, object tracking is the objective task. While our architecture supports any form of compression/distortion, in this paper we use quadtree (QT)-segmented video frames. We use Viterbi (Dynamic Programming) algorithm to minimize the area normalized weighted rate-distortion allocation of resources. The host receives only these degraded frames for analysis. An object detector is used to detect objects, and a Kalman Filter based tracker is used to track those objects. Evaluation of system performance is done in terms of Multiple Object Tracking Accuracy (MOTA) metric. In this proposed novel architecture, performance gains in MOTA is obtained by twice training the object detector with different system generated distortions as a novel 2-step process. Additionally, object detector is assisted by tracker to upscore the region proposals in the detector to further improve the performance. △ Less

Submitted 23 February, 2021; originally announced February 2021.

arXiv:2102.00508 [pdf, other]

doi 10.1109/ICIP42928.2021.9506335.

SkinScan: Low-Cost 3D-Scanning for Dermatologic Diagnosis and Documentation

Authors: Merlin A. Nau, Florian Schiffers, Yunhao Li, Bingjie Xu, Andreas Maier, Jack Tumblin, Marc Walton, Aggelos K. Katsaggelos, Florian Willomitzer, Oliver Cossairt

Abstract: The utilization of computational photography becomes increasingly essential in the medical field. Today, imaging techniques for dermatology range from two-dimensional (2D) color imagery with a mobile device to professional clinical imaging systems measuring additional detailed three-dimensional (3D) data. The latter are commonly expensive and not accessible to a broad audience. In this work, we pr… ▽ More The utilization of computational photography becomes increasingly essential in the medical field. Today, imaging techniques for dermatology range from two-dimensional (2D) color imagery with a mobile device to professional clinical imaging systems measuring additional detailed three-dimensional (3D) data. The latter are commonly expensive and not accessible to a broad audience. In this work, we propose a novel system and software framework that relies only on low-cost (and even mobile) commodity devices present in every household to measure detailed 3D information of the human skin with a 3D-gradient-illumination-based method. We believe that our system has great potential for early-stage diagnosis and monitoring of skin diseases, especially in vastly populated or underdeveloped areas. △ Less

Submitted 31 January, 2021; originally announced February 2021.

Comments: 5 pages, 4 Figures, Submitted at ICIP 2021

arXiv:2012.05214 [pdf, other]

E3D: Event-Based 3D Shape Reconstruction

Authors: Alexis Baudron, Zihao W. Wang, Oliver Cossairt, Aggelos K. Katsaggelos

Abstract: 3D shape reconstruction is a primary component of augmented/virtual reality. Despite being highly advanced, existing solutions based on RGB, RGB-D and Lidar sensors are power and data intensive, which introduces challenges for deployment in edge devices. We approach 3D reconstruction with an event camera, a sensor with significantly lower power, latency and data expense while enabling high dynamic… ▽ More 3D shape reconstruction is a primary component of augmented/virtual reality. Despite being highly advanced, existing solutions based on RGB, RGB-D and Lidar sensors are power and data intensive, which introduces challenges for deployment in edge devices. We approach 3D reconstruction with an event camera, a sensor with significantly lower power, latency and data expense while enabling high dynamic range. While previous event-based 3D reconstruction methods are primarily based on stereo vision, we cast the problem as multi-view shape from silhouette using a monocular event camera. The output from a moving event camera is a sparse point set of space-time gradients, largely sketching scene/object edges and contours. We first introduce an event-to-silhouette (E2S) neural network module to transform a stack of event frames to the corresponding silhouettes, with additional neural branches for camera pose regression. Second, we introduce E3D, which employs a 3D differentiable renderer (PyTorch3D) to enforce cross-view 3D mesh consistency and fine-tune the E2S and pose network. Lastly, we introduce a 3D-to-events simulation pipeline and apply it to publicly available object datasets and generate synthetic event/silhouette training pairs for supervised learning. △ Less

Submitted 10 December, 2020; v1 submitted 9 December, 2020; originally announced December 2020.

Comments: Correct author names and only include primary author email

arXiv:2012.04743 [pdf, other]

2-Step Sparse-View CT Reconstruction with a Domain-Specific Perceptual Network

Authors: Haoyu Wei, Florian Schiffers, Tobias Würfl, Daming Shen, Daniel Kim, Aggelos K. Katsaggelos, Oliver Cossairt

Abstract: Computed tomography is widely used to examine internal structures in a non-destructive manner. To obtain high-quality reconstructions, one typically has to acquire a densely sampled trajectory to avoid angular undersampling. However, many scenarios require a sparse-view measurement leading to streak-artifacts if unaccounted for. Current methods do not make full use of the domain-specific informati… ▽ More Computed tomography is widely used to examine internal structures in a non-destructive manner. To obtain high-quality reconstructions, one typically has to acquire a densely sampled trajectory to avoid angular undersampling. However, many scenarios require a sparse-view measurement leading to streak-artifacts if unaccounted for. Current methods do not make full use of the domain-specific information, and hence fail to provide reliable reconstructions for highly undersampled data. We present a novel framework for sparse-view tomography by decoupling the reconstruction into two steps: First, we overcome its ill-posedness using a super-resolution network, SIN, trained on the sparse projections. The intermediate result allows for a closed-form tomographic reconstruction with preserved details and highly reduced streak-artifacts. Second, a refinement network, PRN, trained on the reconstructions reduces any remaining artifacts. We further propose a light-weight variant of the perceptual-loss that enhances domain-specific information, boosting restoration accuracy. Our experiments demonstrate an improvement over current solutions by 4 dB. △ Less

Submitted 8 December, 2020; originally announced December 2020.

arXiv:2008.06151 [pdf, other]

doi 10.1007/978-3-030-61056-2_8

Interpretation of Brain Morphology in Association to Alzheimer's Disease Dementia Classification Using Graph Convolutional Networks on Triangulated Meshes

Authors: Emanuel A. Azcona, Pierre Besson, Yunan Wu, Arjun Punjabi, Adam Martersteck, Amil Dravid, Todd B. Parrish, S. Kathleen Bandt, Aggelos K. Katsaggelos

Abstract: We propose a mesh-based technique to aid in the classification of Alzheimer's disease dementia (ADD) using mesh representations of the cortex and subcortical structures. Deep learning methods for classification tasks that utilize structural neuroimaging often require extensive learning parameters to optimize. Frequently, these approaches for automated medical diagnosis also lack visual interpretab… ▽ More We propose a mesh-based technique to aid in the classification of Alzheimer's disease dementia (ADD) using mesh representations of the cortex and subcortical structures. Deep learning methods for classification tasks that utilize structural neuroimaging often require extensive learning parameters to optimize. Frequently, these approaches for automated medical diagnosis also lack visual interpretability for areas in the brain involved in making a diagnosis. This work: (a) analyzes brain shape using surface information of the cortex and subcortical structures, (b) proposes a residual learning framework for state-of-the-art graph convolutional networks which offer a significant reduction in learnable parameters, and (c) offers visual interpretability of the network via class-specific gradient information that localizes important regions of interest in our inputs. With our proposed method leveraging the use of cortical and subcortical surface information, we outperform other machine learning methods with a 96.35% testing accuracy for the ADD vs. healthy control problem. We confirm the validity of our model by observing its performance in a 25-trial Monte Carlo cross-validation. The generated visualization maps in our study show correspondences with current knowledge regarding the structural localization of pathological changes in the brain associated to dementia of the Alzheimer's type. △ Less

Submitted 20 August, 2020; v1 submitted 13 August, 2020; originally announced August 2020.

Comments: Accepted for the Shape in Medical Imaging (ShapeMI) workshop at MICCAI International Conference 2020

arXiv:2001.10964 [pdf, other]

Examining the Benefits of Capsule Neural Networks

Authors: Arjun Punjabi, Jonas Schmid, Aggelos K. Katsaggelos

Abstract: Capsule networks are a recently developed class of neural networks that potentially address some of the deficiencies with traditional convolutional neural networks. By replacing the standard scalar activations with vectors, and by connecting the artificial neurons in a new way, capsule networks aim to be the next great development for computer vision applications. However, in order to determine wh… ▽ More Capsule networks are a recently developed class of neural networks that potentially address some of the deficiencies with traditional convolutional neural networks. By replacing the standard scalar activations with vectors, and by connecting the artificial neurons in a new way, capsule networks aim to be the next great development for computer vision applications. However, in order to determine whether these networks truly operate differently than traditional networks, one must look at the differences in the capsule features. To this end, we perform several analyses with the purpose of elucidating capsule features and determining whether they perform as described in the initial publication. First, we perform a deep visualization analysis to visually compare capsule features and convolutional neural network features. Then, we look at the ability for capsule features to encode information across the vector components and address what changes in the capsule architecture provides the most benefit. Finally, we look at how well the capsule features are able to encode instantiation parameters of class objects via visual transformations. △ Less

Submitted 29 January, 2020; originally announced January 2020.

arXiv:1912.12879 [pdf]

Self-supervised Fine-tuning for Correcting Super-Resolution Convolutional Neural Networks

Authors: Alice Lucas, Santiago Lopez-Tapia, Rafael Molina, Aggelos K. Katsaggelos

Abstract: While Convolutional Neural Networks (CNNs) trained for image and video super-resolution (SR) regularly achieve new state-of-the-art performance, they also suffer from significant drawbacks. One of their limitations is their lack of robustness to unseen image formation models during training. Other limitations include the generation of artifacts and hallucinated content when training Generative Adv… ▽ More While Convolutional Neural Networks (CNNs) trained for image and video super-resolution (SR) regularly achieve new state-of-the-art performance, they also suffer from significant drawbacks. One of their limitations is their lack of robustness to unseen image formation models during training. Other limitations include the generation of artifacts and hallucinated content when training Generative Adversarial Networks (GANs) for SR. While the Deep Learning literature focuses on presenting new training schemes and settings to resolve these various issues, we show that one can avoid training and correct for SR results with a fully self-supervised fine-tuning approach. More specifically, at test time, given an image and its known image formation model, we fine-tune the parameters of the trained network and iteratively update them using a data fidelity loss. We apply our fine-tuning algorithm on multiple image and video SR CNNs and show that it can successfully correct for a sub-optimal SR solution by entirely relying on internal learning at test time. We apply our method on the problem of fine-tuning for unseen image formation models and on removal of artifacts introduced by GANs. △ Less

Submitted 15 June, 2020; v1 submitted 30 December, 2019; originally announced December 2019.

Comments: 15 pages, 11 figures

arXiv:1911.01915 [pdf, other]

Scalable Variational Gaussian Processes for Crowdsourcing: Glitch Detection in LIGO

Authors: Pablo Morales-Álvarez, Pablo Ruiz, Scott Coughlin, Rafael Molina, Aggelos K. Katsaggelos

Abstract: In the last years, crowdsourcing is transforming the way classification training sets are obtained. Instead of relying on a single expert annotator, crowdsourcing shares the labelling effort among a large number of collaborators. For instance, this is being applied to the data acquired by the laureate Laser Interferometer Gravitational Waves Observatory (LIGO), in order to detect glitches which mi… ▽ More In the last years, crowdsourcing is transforming the way classification training sets are obtained. Instead of relying on a single expert annotator, crowdsourcing shares the labelling effort among a large number of collaborators. For instance, this is being applied to the data acquired by the laureate Laser Interferometer Gravitational Waves Observatory (LIGO), in order to detect glitches which might hinder the identification of true gravitational-waves. The crowdsourcing scenario poses new challenging difficulties, as it deals with different opinions from a heterogeneous group of annotators with unknown degrees of expertise. Probabilistic methods, such as Gaussian Processes (GP), have proven successful in modeling this setting. However, GPs do not scale well to large data sets, which hampers their broad adoption in real practice (in particular at LIGO). This has led to the recent introduction of deep learning based crowdsourcing methods, which have become the state-of-the-art. However, the accurate uncertainty quantification of GPs has been partially sacrificed. This is an important aspect for astrophysicists in LIGO, since a glitch detection system should provide very accurate probability distributions of its predictions. In this work, we leverage the most popular sparse GP approximation to develop a novel GP based crowdsourcing method that factorizes into mini-batches. This makes it able to cope with previously-prohibitive data sets. The approach, which we refer to as Scalable Variational Gaussian Processes for Crowdsourcing (SVGPCR), brings back GP-based methods to the state-of-the-art, and excels at uncertainty quantification. SVGPCR is shown to outperform deep learning based methods and previous probabilistic approaches when applied to the LIGO data. Moreover, its behavior and main properties are carefully analyzed in a controlled experiment based on the MNIST data set. △ Less

Submitted 5 November, 2019; originally announced November 2019.

Comments: 16 pages, under review

Showing 1–50 of 68 results for author: Katsaggelos, A K