Search | arXiv e-print repository

MR2US-Pro: Prostate MR to Ultrasound Image Translation and Registration Based on Diffusion Models

Authors: Xudong Ma, Nantheera Anantrasirichai, Stefanos Bolomytis, Alin Achim

Abstract: The diagnosis of prostate cancer increasingly depends on multimodal imaging, particularly magnetic resonance imaging (MRI) and transrectal ultrasound (TRUS). However, accurate registration between these modalities remains a fundamental challenge due to the differences in dimensionality and anatomical representations. In this work, we present a novel framework that addresses these challenges throug… ▽ More The diagnosis of prostate cancer increasingly depends on multimodal imaging, particularly magnetic resonance imaging (MRI) and transrectal ultrasound (TRUS). However, accurate registration between these modalities remains a fundamental challenge due to the differences in dimensionality and anatomical representations. In this work, we present a novel framework that addresses these challenges through a two-stage process: TRUS 3D reconstruction followed by cross-modal registration. Unlike existing TRUS 3D reconstruction methods that rely heavily on external probe tracking information, we propose a totally probe-location-independent approach that leverages the natural correlation between sagittal and transverse TRUS views. With the help of our clustering-based feature matching method, we enable the spatial localization of 2D frames without any additional probe tracking information. For the registration stage, we introduce an unsupervised diffusion-based framework guided by modality translation. Unlike existing methods that translate one modality into another, we map both MR and US into a pseudo intermediate modality. This design enables us to customize it to retain only registration-critical features, greatly easing registration. To further enhance anatomical alignment, we incorporate an anatomy-aware registration strategy that prioritizes internal structural coherence while adaptively reducing the influence of boundary inconsistencies. Extensive validation demonstrates that our approach outperforms state-of-the-art methods by achieving superior registration accuracy with physically realistic deformations in a completely unsupervised fashion. △ Less

Submitted 31 May, 2025; originally announced June 2025.

arXiv:2504.12169 [pdf, other]

Towards a General-Purpose Zero-Shot Synthetic Low-Light Image and Video Pipeline

Authors: Joanne Lin, Crispian Morris, Ruirui Lin, Fan Zhang, David Bull, Nantheera Anantrasirichai

Abstract: Low-light conditions pose significant challenges for both human and machine annotation. This in turn has led to a lack of research into machine understanding for low-light images and (in particular) videos. A common approach is to apply annotations obtained from high quality datasets to synthetically created low light versions. In addition, these approaches are often limited through the use of unr… ▽ More Low-light conditions pose significant challenges for both human and machine annotation. This in turn has led to a lack of research into machine understanding for low-light images and (in particular) videos. A common approach is to apply annotations obtained from high quality datasets to synthetically created low light versions. In addition, these approaches are often limited through the use of unrealistic noise models. In this paper, we propose a new Degradation Estimation Network (DEN), which synthetically generates realistic standard RGB (sRGB) noise without the requirement for camera metadata. This is achieved by estimating the parameters of physics-informed noise distributions, trained in a self-supervised manner. This zero-shot approach allows our method to generate synthetic noisy content with a diverse range of realistic noise characteristics, unlike other methods which focus on recreating the noise characteristics of the training data. We evaluate our proposed synthetic pipeline using various methods trained on its synthetic data for typical low-light tasks including synthetic noise replication, video enhancement, and object detection, showing improvements of up to 24\% KLD, 21\% LPIPS, and 62\% AP$_{50-95}$, respectively. △ Less

Submitted 16 April, 2025; originally announced April 2025.

arXiv:2411.11199 [pdf, other]

doi 10.1109/ISCAS56072.2025.11044241

BVI-CR: A Multi-View Human Dataset for Volumetric Video Compression

Authors: Ge Gao, Adrian Azzarelli, Ho Man Kwan, Nantheera Anantrasirichai, Fan Zhang, Oliver Moolan-Feroze, David Bull

Abstract: The advances in immersive technologies and 3D reconstruction have enabled the creation of digital replicas of real-world objects and environments with fine details. These processes generate vast amounts of 3D data, requiring more efficient compression methods to satisfy the memory and bandwidth constraints associated with data storage and transmission. However, the development and validation of ef… ▽ More The advances in immersive technologies and 3D reconstruction have enabled the creation of digital replicas of real-world objects and environments with fine details. These processes generate vast amounts of 3D data, requiring more efficient compression methods to satisfy the memory and bandwidth constraints associated with data storage and transmission. However, the development and validation of efficient 3D data compression methods are constrained by the lack of comprehensive and high-quality volumetric video datasets, which typically require much more effort to acquire and consume increased resources compared to 2D image and video databases. To bridge this gap, we present an open multi-view volumetric human dataset, denoted BVI-CR, which contains 18 multi-view RGB-D captures and their corresponding textured polygonal meshes, depicting a range of diverse human actions. Each video sequence contains 10 views in 1080p resolution with durations between 10-15 seconds at 30FPS. Using BVI-CR, we benchmarked three conventional and neural coordinate-based multi-view video compression methods, following the MPEG MIV Common Test Conditions, and reported their rate quality performance based on various quality metrics. The results show the great potential of neural representation based methods in volumetric video compression compared to conventional video coding methods (with an up to 38\% average coding gain in PSNR). This dataset provides a development and validation platform for a variety of tasks including volumetric reconstruction, compression, and quality assessment. The database will be shared publicly at \url{https://github.com/fan-aaron-zhang/bvi-cr}. △ Less

Submitted 17 November, 2024; originally announced November 2024.

arXiv:2409.08790

A Multimodal Approach for Fluid Overload Prediction: Integrating Lung Ultrasound and Clinical Data

Authors: Tianqi Yang, Nantheera Anantrasirichai, Oktay Karakuş, Marco Allinovi, Alin Achim

Abstract: Managing fluid balance in dialysis patients is crucial, as improper management can lead to severe complications. In this paper, we propose a multimodal approach that integrates visual features from lung ultrasound images with clinical data to enhance the prediction of excess body fluid. Our framework employs independent encoders to extract features for each modality and combines them through a cro… ▽ More Managing fluid balance in dialysis patients is crucial, as improper management can lead to severe complications. In this paper, we propose a multimodal approach that integrates visual features from lung ultrasound images with clinical data to enhance the prediction of excess body fluid. Our framework employs independent encoders to extract features for each modality and combines them through a cross-domain attention mechanism to capture complementary information. By framing the prediction as a classification task, the model achieves significantly better performance than regression. The results demonstrate that multimodal models consistently outperform single-modality models, particularly when attention mechanisms prioritize tabular data. Pseudo-sample generation further contributes to mitigating the imbalanced classification problem, achieving the highest accuracy of 88.31%. This study underscores the effectiveness of multimodal learning for fluid overload management in dialysis patients, offering valuable insights for improved clinical outcomes. △ Less

Submitted 3 October, 2024; v1 submitted 13 September, 2024; originally announced September 2024.

Comments: In the experiment, for the classification tasks, the network was informed with ground truth during training, significantly improving the performance. This makes the results invalid. Therefore, corrections and more validations are needed to evaluate the performance of the method

arXiv:2408.04091 [pdf, other]

doi 10.1017/S2633903X24000163

The Quest for Early Detection of Retinal Disease: 3D CycleGAN-based Translation of Optical Coherence Tomography into Confocal Microscopy

Authors: Xin Tian, Nantheera Anantrasirichai, Lindsay Nicholson, Alin Achim

Abstract: Optical coherence tomography (OCT) and confocal microscopy are pivotal in retinal imaging, offering distinct advantages and limitations. In vivo OCT offers rapid, non-invasive imaging but can suffer from clarity issues and motion artifacts, while ex vivo confocal microscopy, providing high-resolution, cellular-detailed color images, is invasive and raises ethical concerns. To bridge the benefits o… ▽ More Optical coherence tomography (OCT) and confocal microscopy are pivotal in retinal imaging, offering distinct advantages and limitations. In vivo OCT offers rapid, non-invasive imaging but can suffer from clarity issues and motion artifacts, while ex vivo confocal microscopy, providing high-resolution, cellular-detailed color images, is invasive and raises ethical concerns. To bridge the benefits of both modalities, we propose a novel framework based on unsupervised 3D CycleGAN for translating unpaired in vivo OCT to ex vivo confocal microscopy images. This marks the first attempt to exploit the inherent 3D information of OCT and translate it into the rich, detailed color domain of confocal microscopy. We also introduce a unique dataset, OCT2Confocal, comprising mouse OCT and confocal retinal images, facilitating the development of and establishing a benchmark for cross-modal image translation research. Our model has been evaluated both quantitatively and qualitatively, achieving Fréchet Inception Distance (FID) scores of 0.766 and Kernel Inception Distance (KID) scores as low as 0.153, and leading subjective Mean Opinion Scores (MOS). Our model demonstrated superior image fidelity and quality with limited data over existing methods. Our approach effectively synthesizes color information from 3D confocal images, closely approximating target outcomes and suggesting enhanced potential for diagnostic and monitoring applications in ophthalmology. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: 30 pages, 11 figures, 5 tables

Journal ref: Biol. Imaging 4 (2024) e15

arXiv:2407.14188 [pdf, other]

TaGAT: Topology-Aware Graph Attention Network For Multi-modal Retinal Image Fusion

Authors: Xin Tian, Nantheera Anantrasirichai, Lindsay Nicholson, Alin Achim

Abstract: In the realm of medical image fusion, integrating information from various modalities is crucial for improving diagnostics and treatment planning, especially in retinal health, where the important features exhibit differently in different imaging modalities. Existing deep learning-based approaches insufficiently focus on retinal image fusion, and thus fail to preserve enough anatomical structure a… ▽ More In the realm of medical image fusion, integrating information from various modalities is crucial for improving diagnostics and treatment planning, especially in retinal health, where the important features exhibit differently in different imaging modalities. Existing deep learning-based approaches insufficiently focus on retinal image fusion, and thus fail to preserve enough anatomical structure and fine vessel details in retinal image fusion. To address this, we propose the Topology-Aware Graph Attention Network (TaGAT) for multi-modal retinal image fusion, leveraging a novel Topology-Aware Encoder (TAE) with Graph Attention Networks (GAT) to effectively enhance spatial features with retinal vasculature's graph topology across modalities. The TAE encodes the base and detail features, extracted via a Long-short Range (LSR) encoder from retinal images, into the graph extracted from the retinal vessel. Within the TAE, the GAT-based Graph Information Update (GIU) block dynamically refines and aggregates the node features to generate topology-aware graph features. The updated graph features with base and detail features are combined and decoded as a fused image. Our model outperforms state-of-the-art methods in Fluorescein Fundus Angiography (FFA) with Color Fundus (CF) and Optical Coherence Tomography (OCT) with confocal microscopy retinal image fusion. The source code can be accessed via https://github.com/xintian-99/TaGAT. △ Less

Submitted 19 July, 2024; originally announced July 2024.

Comments: 11 pages, 2 figures, accepted by MICCAI 2024

arXiv:2407.10667 [pdf, other]

DUCPS: Deep Unfolding the Cauchy Proximal Splitting Algorithm for B-Lines Quantification in Lung Ultrasound Images

Authors: Tianqi Yang, Oktay Karakuş, Nantheera Anantrasirichai, Marco Allinovi, Alin Achim

Abstract: The identification of artefacts, particularly B-lines, in lung ultrasound (LUS), is crucial for assisting clinical diagnosis, prompting the development of innovative methodologies. While the Cauchy proximal splitting (CPS) algorithm has demonstrated effective performance in B-line detection, the process is slow and has limited generalization. This paper addresses these issues with a novel unsuperv… ▽ More The identification of artefacts, particularly B-lines, in lung ultrasound (LUS), is crucial for assisting clinical diagnosis, prompting the development of innovative methodologies. While the Cauchy proximal splitting (CPS) algorithm has demonstrated effective performance in B-line detection, the process is slow and has limited generalization. This paper addresses these issues with a novel unsupervised deep unfolding network structure (DUCPS). The framework utilizes deep unfolding procedures to merge traditional model-based techniques with deep learning approaches. By unfolding the CPS algorithm into a deep network, DUCPS enables the parameters in the optimization algorithm to be learnable, thus enhancing generalization performance and facilitating rapid convergence. We conducted entirely unsupervised training using the Neighbor2Neighbor (N2N) and the Structural Similarity Index Measure (SSIM) losses. When combined with an improved line identification method proposed in this paper, state-of-the-art performance is achieved, with the recall and F2 score reaching 0.70 and 0.64, respectively. Notably, DUCPS significantly improves computational efficiency eliminating the need for extensive data labeling, representing a notable advancement over both traditional algorithms and existing deep learning approaches. △ Less

Submitted 16 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

Comments: 16 pages, 6 figures, IEEE TMI

arXiv:2405.18487 [pdf, other]

Anomaly detection for the identification of volcanic unrest in satellite imagery

Authors: Robert Gabriel Popescu, Nantheera Anantrasirichai, Juliet Biggs

Abstract: Satellite images have the potential to detect volcanic deformation prior to eruptions, but while a vast number of images are routinely acquired, only a small percentage contain volcanic deformation events. Manual inspection could miss these anomalies, and an automatic system modelled with supervised learning requires suitably labelled datasets. To tackle these issues, this paper explores the use o… ▽ More Satellite images have the potential to detect volcanic deformation prior to eruptions, but while a vast number of images are routinely acquired, only a small percentage contain volcanic deformation events. Manual inspection could miss these anomalies, and an automatic system modelled with supervised learning requires suitably labelled datasets. To tackle these issues, this paper explores the use of unsupervised deep learning on satellite data for the purpose of identifying volcanic deformation as anomalies. Our detector is based on Patch Distribution Modeling (PaDiM), and the detection performance is enhanced with a weighted distance, assigning greater importance to features from deeper layers. Additionally, we propose a preprocessing approach to handle noisy and incomplete data points. The final framework was tested with five volcanoes, which have different deformation characteristics and its performance was compared against the supervised learning method for volcanic deformation detection. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2403.02408 [pdf, other]

A Spatio-temporal Aligned SUNet Model for Low-light Video Enhancement

Authors: Ruirui Lin, Nantheera Anantrasirichai, Alexandra Malyugina, David Bull

Abstract: Distortions caused by low-light conditions are not only visually unpleasant but also degrade the performance of computer vision tasks. The restoration and enhancement have proven to be highly beneficial. However, there are only a limited number of enhancement methods explicitly designed for videos acquired in low-light conditions. We propose a Spatio-Temporal Aligned SUNet (STA-SUNet) model using… ▽ More Distortions caused by low-light conditions are not only visually unpleasant but also degrade the performance of computer vision tasks. The restoration and enhancement have proven to be highly beneficial. However, there are only a limited number of enhancement methods explicitly designed for videos acquired in low-light conditions. We propose a Spatio-Temporal Aligned SUNet (STA-SUNet) model using a Swin Transformer as a backbone to capture low light video features and exploit their spatio-temporal correlations. The STA-SUNet model is trained on a novel, fully registered dataset (BVI), which comprises dynamic scenes captured under varying light conditions. It is further analysed comparatively against various other models over three test datasets. The model demonstrates superior adaptivity across all datasets, obtaining the highest PSNR and SSIM values. It is particularly effective in extreme low-light conditions, yielding fairly good visualisation results. △ Less

Submitted 12 July, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.19041 [pdf, other]

Atmospheric Turbulence Removal with Video Sequence Deep Visual Priors

Authors: P. Hill, N. Anantrasirichai, A. Achim, D. R. Bull

Abstract: Atmospheric turbulence poses a challenge for the interpretation and visual perception of visual imagery due to its distortion effects. Model-based approaches have been used to address this, but such methods often suffer from artefacts associated with moving content. Conversely, deep learning based methods are dependent on large and diverse datasets that may not effectively represent any specific c… ▽ More Atmospheric turbulence poses a challenge for the interpretation and visual perception of visual imagery due to its distortion effects. Model-based approaches have been used to address this, but such methods often suffer from artefacts associated with moving content. Conversely, deep learning based methods are dependent on large and diverse datasets that may not effectively represent any specific content. In this paper, we address these problems with a self-supervised learning method that does not require ground truth. The proposed method is not dependent on any dataset outside of the single data sequence being processed but is also able to improve the quality of any input raw sequences or pre-processed sequences. Specifically, our method is based on an accelerated Deep Image Prior (DIP), but integrates temporal information using pixel shuffling and a temporal sliding window. This efficiently learns spatio-temporal priors leading to a system that effectively mitigates atmospheric turbulence distortions. The experiments show that our method improves visual quality results qualitatively and quantitatively. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2311.10902 [pdf, other]

doi 10.1109/ISBI56570.2024.10635123

OCT2Confocal: 3D CycleGAN based Translation of Retinal OCT Images to Confocal Microscopy

Authors: Xin Tian, Nantheera Anantrasirichai, Lindsay Nicholson, Alin Achim

Abstract: Optical coherence tomography (OCT) and confocal microscopy are pivotal in retinal imaging, each presenting unique benefits and limitations. In-vivo OCT offers rapid, non-invasive imaging but can be hampered by clarity issues and motion artifacts. Ex-vivo confocal microscopy provides high-resolution, cellular detailed color images but is invasive and poses ethical concerns and potential tissue dama… ▽ More Optical coherence tomography (OCT) and confocal microscopy are pivotal in retinal imaging, each presenting unique benefits and limitations. In-vivo OCT offers rapid, non-invasive imaging but can be hampered by clarity issues and motion artifacts. Ex-vivo confocal microscopy provides high-resolution, cellular detailed color images but is invasive and poses ethical concerns and potential tissue damage. To bridge these modalities, we developed a 3D CycleGAN framework for unsupervised translation of in-vivo OCT to ex-vivo confocal microscopy images. Applied to our OCT2Confocal dataset, this framework effectively translates between 3D medical data domains, capturing vascular, textural, and cellular details with precision. This marks the first attempt to exploit the inherent 3D information of OCT and translate it into the rich, detailed color domain of confocal microscopy. Assessed through quantitative and qualitative evaluations, the 3D CycleGAN framework demonstrates commendable image fidelity and quality, outperforming existing methods despite the constraints of limited data. This non-invasive generation of retinal confocal images has the potential to further enhance diagnostic and monitoring capabilities in ophthalmology. Our source code and OCT2Confocal dataset are available at https://github.com/xintian-99/OCT2Confocal. △ Less

Submitted 16 February, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

Comments: 4pages, 5 figures

arXiv:2311.06672 [pdf, other]

DUBLINE: A Deep Unfolding Network for B-line Detection in Lung Ultrasound Images

Authors: Tianqi Yang, Nantheera Anantrasirichai, Oktay Karakuş, Marco Allinovi, Hatice Ceylan Koydemir, Alin Achim

Abstract: In the context of lung ultrasound, the detection of B-lines, which are indicative of interstitial lung disease and pulmonary edema, plays a pivotal role in clinical diagnosis. Current methods still rely on visual inspection by experts. Vision-based automatic B-line detection methods have been developed, but their performance has yet to improve in terms of both accuracy and computational speed. Thi… ▽ More In the context of lung ultrasound, the detection of B-lines, which are indicative of interstitial lung disease and pulmonary edema, plays a pivotal role in clinical diagnosis. Current methods still rely on visual inspection by experts. Vision-based automatic B-line detection methods have been developed, but their performance has yet to improve in terms of both accuracy and computational speed. This paper presents a novel approach to posing B-line detection as an inverse problem via deep unfolding of the Alternating Direction Method of Multipliers (ADMM). It tackles the challenges of data labelling and model training in lung ultrasound image analysis by harnessing the capabilities of deep neural networks and model-based methods. Our objective is to substantially enhance diagnostic accuracy while ensuring efficient real-time capabilities. The results show that the proposed method runs more than 90 times faster than the traditional model-based method and achieves an F1 score that is 10.6% higher. △ Less

Submitted 11 November, 2023; originally announced November 2023.

Comments: 4 pages, 3 figures, conference

arXiv:2309.08975 [pdf, other]

Wavelet-based Topological Loss for Low-Light Image Denoising

Authors: Alexandra Malyugina, Nantheera Anantrasirichai, David Bull

Abstract: Despite extensive research conducted in the field of image denoising, many algorithms still heavily depend on supervised learning and their effectiveness primarily relies on the quality and diversity of training data. It is widely assumed that digital image distortions are caused by spatially invariant Additive White Gaussian Noise (AWGN). However, the analysis of real-world data suggests that thi… ▽ More Despite extensive research conducted in the field of image denoising, many algorithms still heavily depend on supervised learning and their effectiveness primarily relies on the quality and diversity of training data. It is widely assumed that digital image distortions are caused by spatially invariant Additive White Gaussian Noise (AWGN). However, the analysis of real-world data suggests that this assumption is invalid. Therefore, this paper tackles image corruption by real noise, providing a framework to capture and utilise the underlying structural information of an image along with the spatial information conventionally used for deep learning tasks. We propose a novel denoising loss function that incorporates topological invariants and is informed by textural information extracted from the image wavelet domain. The effectiveness of this proposed method was evaluated by training state-of-the-art denoising models on the BVI-Lowlight dataset, which features a wide range of real noise distortions. Adding a topological term to common loss functions leads to a significant increase in the LPIPS (Learned Perceptual Image Patch Similarity) metric, with the improvement reaching up to 25\%. The results indicate that the proposed loss function enables neural networks to learn noise characteristics better. We demonstrate that they can consequently extract the topological features of noise-free images, resulting in enhanced contrast and preserved textural information. △ Less

Submitted 20 September, 2023; v1 submitted 16 September, 2023; originally announced September 2023.

arXiv:2302.08455 [pdf, other]

ST-MFNet Mini: Knowledge Distillation-Driven Frame Interpolation

Authors: Crispian Morris, Duolikun Danier, Fan Zhang, Nantheera Anantrasirichai, David R. Bull

Abstract: Currently, one of the major challenges in deep learning-based video frame interpolation (VFI) is the large model sizes and high computational complexity associated with many high performance VFI approaches. In this paper, we present a distillation-based two-stage workflow for obtaining compressed VFI models which perform competitively to the state of the arts, at a greatly reduced model size and c… ▽ More Currently, one of the major challenges in deep learning-based video frame interpolation (VFI) is the large model sizes and high computational complexity associated with many high performance VFI approaches. In this paper, we present a distillation-based two-stage workflow for obtaining compressed VFI models which perform competitively to the state of the arts, at a greatly reduced model size and complexity. Specifically, an optimisation-based network pruning method is first applied to a recently proposed frame interpolation model, ST-MFNet, which outperforms many other VFI methods but suffers from large model size. The resulting new network architecture achieves a 91% reduction in parameters and 35% increase in speed. Secondly, the performance of the new network is further enhanced through a teacher-student knowledge distillation training process using a Laplacian distillation loss. The final low complexity model, ST-MFNet Mini, achieves a comparable performance to most existing high-complex VFI methods, only outperformed by the original ST-MFNet. Our source code is available at https://github.com/crispianm/ST-MFNet-Mini △ Less

Submitted 23 February, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

arXiv:2211.14050 [pdf, other]

A Semi-supervised Learning Approach for B-line Detection in Lung Ultrasound Images

Authors: Tianqi Yang, Nantheera Anantrasirichai, Oktay Karakuş, Marco Allinovi, Alin Achim

Abstract: Studies have proved that the number of B-lines in lung ultrasound images has a strong statistical link to the amount of extravascular lung water, which is significant for hemodialysis treatment. Manual inspection of B-lines requires experts and is time-consuming, whilst modelling automation methods is currently problematic because of a lack of ground truth. Therefore, in this paper, we propose a n… ▽ More Studies have proved that the number of B-lines in lung ultrasound images has a strong statistical link to the amount of extravascular lung water, which is significant for hemodialysis treatment. Manual inspection of B-lines requires experts and is time-consuming, whilst modelling automation methods is currently problematic because of a lack of ground truth. Therefore, in this paper, we propose a novel semi-supervised learning method for the B-line detection task based on contrastive learning. Through multi-level unsupervised learning on unlabelled lung ultrasound images, the features of the artefacts are learnt. In the downstream task, we introduce a fine-tuning process on a small number of labelled images using the EIoU-based loss function. Apart from reducing the data labelling workload, the proposed method shows a superior performance to model-based algorithm with the recall of 91.43%, the accuracy of 84.21% and the F1 score of 91.43%. △ Less

Submitted 23 March, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

Comments: 5 pages, 3 figures, conference

arXiv:2208.04573 [pdf, other]

A Topological Loss Function: Image Denoising on a Low-Light Dataset

Authors: Alexandra Malyugina, Nantheera Anantrasirichai, David Bull

Abstract: Although image denoising algorithms have attracted significant research attention, surprisingly few have been proposed for, or evaluated on, noise from imagery acquired under real low-light conditions. Moreover, noise characteristics are often assumed to be spatially invariant, leading to edges and textures being distorted after denoising. Here, we introduce a novel topological loss function which… ▽ More Although image denoising algorithms have attracted significant research attention, surprisingly few have been proposed for, or evaluated on, noise from imagery acquired under real low-light conditions. Moreover, noise characteristics are often assumed to be spatially invariant, leading to edges and textures being distorted after denoising. Here, we introduce a novel topological loss function which is based on persistent homology. The method performs in the space of image patches, where topological invariants are calculated and represented in persistent diagrams. The loss function is a combination of $\ell_1$ or $\ell_2$ losses with the new persistence-based topological loss. We compare its performance across popular denoising architectures and loss functions, training the networks on our new comprehensive dataset of natural images captured in low-light conditions -- BVI-LOWLIGHT. Analysis reveals that this approach outperforms existing methods, adapting well to complex structures and suppressing common artifacts. △ Less

Submitted 26 June, 2023; v1 submitted 9 August, 2022; originally announced August 2022.

arXiv:2204.06989 [pdf, other]

Atmospheric Turbulence Removal with Complex-Valued Convolutional Neural Network

Authors: Nantheera Anantrasirichai

Abstract: Atmospheric turbulence distorts visual imagery and is always problematic for information interpretation by both human and machine. Most well-developed approaches to remove atmospheric turbulence distortion are model-based. However, these methods require high computation and large memory making real-time operation infeasible. Deep learning-based approaches have hence gained more attention but curre… ▽ More Atmospheric turbulence distorts visual imagery and is always problematic for information interpretation by both human and machine. Most well-developed approaches to remove atmospheric turbulence distortion are model-based. However, these methods require high computation and large memory making real-time operation infeasible. Deep learning-based approaches have hence gained more attention but currently work efficiently only on static scenes. This paper presents a novel learning-based framework offering short temporal spanning to support dynamic scenes. We exploit complex-valued convolutions as phase information, altered by atmospheric turbulence, is captured better than using ordinary real-valued convolutions. Two concatenated modules are proposed. The first module aims to remove geometric distortions and, if enough memory, the second module is applied to refine micro details of the videos. Experimental results show that our proposed framework efficiently mitigates the atmospheric turbulence distortion and significantly outperforms existing methods. △ Less

Submitted 8 May, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

arXiv:2203.02940 [pdf, other]

Detection of Parasitic Eggs from Microscopy Images and the emergence of a new dataset

Authors: Perla Mayo, Nantheera Anantrasirichai, Thanarat H. Chalidabhongse, Duangdao Palasuwan, Alin Achim

Abstract: Automatic detection of parasitic eggs in microscopy images has the potential to increase the efficiency of human experts whilst also providing an objective assessment. The time saved by such a process would both help ensure a prompt treatment to patients, and off-load excessive work from experts' shoulders. Advances in deep learning inspired us to exploit successful architectures for detection, ad… ▽ More Automatic detection of parasitic eggs in microscopy images has the potential to increase the efficiency of human experts whilst also providing an objective assessment. The time saved by such a process would both help ensure a prompt treatment to patients, and off-load excessive work from experts' shoulders. Advances in deep learning inspired us to exploit successful architectures for detection, adapting them to tackle a different domain. We propose a framework that exploits two such state-of-the-art models. Specifically, we demonstrate results produced by both a Generative Adversarial Network (GAN) and Faster-RCNN, for image enhancement and object detection respectively, on microscopy images of varying quality. The use of these techniques yields encouraging results, though further improvements are still needed for certain egg types whose detection still proves challenging. As a result, a new dataset has been created and made publicly available, providing an even wider range of classes and variability. △ Less

Submitted 6 March, 2022; originally announced March 2022.

Comments: 7 pages, 3 figures, 1 table

arXiv:2203.02708 [pdf, other]

High-resolution Coastline Extraction in SAR Images via MISP-GGD Superpixel Segmentation

Authors: Odysseas Pappas, Nantheera Anantrasirichai, Byron Adams, Alin Achim

Abstract: High accuracy coastline/shoreline extraction from SAR imagery is a crucial step in a number of maritime and coastal monitoring applications. We present a method based on image segmentation using the Generalised Gamma Mixture Model superpixel algorithm (MISP-GGD). MISP-GGD produces superpixels adhering with great accuracy to object edges in the image, such as the coastline. Unsupervised clustering… ▽ More High accuracy coastline/shoreline extraction from SAR imagery is a crucial step in a number of maritime and coastal monitoring applications. We present a method based on image segmentation using the Generalised Gamma Mixture Model superpixel algorithm (MISP-GGD). MISP-GGD produces superpixels adhering with great accuracy to object edges in the image, such as the coastline. Unsupervised clustering of the generated superpixels according to textural and radiometric features allows for generation of a land/water mask from which a highly accurate coastline can be extracted. We present results of our proposed method on a number of SAR images of varying characteristics. △ Less

Submitted 5 March, 2022; originally announced March 2022.

Comments: To appear in proceedings CIE RADAR 2021

arXiv:2203.02407 [pdf, other]

Sparse InSAR Data 3D Inpainting for Ground Deformation Detection Along the Rail Corridor

Authors: Odysseas Pappas, Juliet Biggs, David Bull, Alin Achim, Nantheera Anantrasirichai

Abstract: Monitoring of ground movement close to the rail corridor, such as that associated with landslips caused by ground subsidence and/or uplift, is of great interest for the detection and prevention of possible railway faults. Interferometric synthetic-aperture radar (InSAR) data can be used to measure ground deformation, but its use poses distinct challenges, as the data is highly sparse and can be pa… ▽ More Monitoring of ground movement close to the rail corridor, such as that associated with landslips caused by ground subsidence and/or uplift, is of great interest for the detection and prevention of possible railway faults. Interferometric synthetic-aperture radar (InSAR) data can be used to measure ground deformation, but its use poses distinct challenges, as the data is highly sparse and can be particularly noisy. Here we present a scheme for processing and interpolating noisy, sparse InSAR data into a dense spatio-temporal stack, helping suppress noise and opening up the possibility for treatment with deep learning and other image processing methods. △ Less

Submitted 4 March, 2022; originally announced March 2022.

Comments: in submission to ICIP 2022

arXiv:2203.00069 [pdf, other]

doi 10.1109/ICIP46576.2022.9897650

Optimal Transport-based Graph Matching for 3D retinal OCT image registration

Authors: Xin Tian, Nantheera Anantrasirichai, Lindsay Nicholson, Alin Achim

Abstract: Registration of longitudinal optical coherence tomography (OCT) images assists disease monitoring and is essential in image fusion applications. Mouse retinal OCT images are often collected for longitudinal study of eye disease models such as uveitis, but their quality is often poor compared with human imaging. This paper presents a novel but efficient framework involving an optimal transport base… ▽ More Registration of longitudinal optical coherence tomography (OCT) images assists disease monitoring and is essential in image fusion applications. Mouse retinal OCT images are often collected for longitudinal study of eye disease models such as uveitis, but their quality is often poor compared with human imaging. This paper presents a novel but efficient framework involving an optimal transport based graph matching (OT-GM) method for 3D mouse OCT image registration. We first perform registration of fundus-like images obtained by projecting all b-scans of a volume on a plane orthogonal to them, hereafter referred to as the x-y plane. We introduce Adaptive Weighted Vessel Graph Descriptors (AWVGD) and 3D Cube Descriptors (CD) to identify the correspondence between nodes of graphs extracted from segmented vessels within the OCT projection images. The AWVGD comprises scaling, translation and rotation, which are computationally efficient, whereas CD exploits 3D spatial and frequency domain information. The OT-GM method subsequently performs the correct alignment in the x-y plane. Finally, registration along the direction orthogonal to the x-y plane (the z-direction) is guided by the segmentation of two important anatomical features peculiar to mouse b-scans, the Internal Limiting Membrane (ILM) and the hyaloid remnant (HR). Both subjective and objective evaluation results demonstrate that our framework outperforms other well-established methods on mouse OCT images within a reasonable execution time. △ Less

Submitted 28 February, 2022; originally announced March 2022.

arXiv:2202.04030 [pdf, other]

doi 10.1109/LGRS.2021.3104506

Self-supervised Contrastive Learning for Volcanic Unrest Detection

Authors: Nikolaos Ioannis Bountos, Ioannis Papoutsis, Dimitrios Michail, Nantheera Anantrasirichai

Abstract: Ground deformation measured from Interferometric Synthetic Aperture Radar (InSAR) data is considered a sign of volcanic unrest, statistically linked to a volcanic eruption. Recent studies have shown the potential of using Sentinel-1 InSAR data and supervised deep learning (DL) methods for the detection of volcanic deformation signals, towards global volcanic hazard mitigation. However, detection a… ▽ More Ground deformation measured from Interferometric Synthetic Aperture Radar (InSAR) data is considered a sign of volcanic unrest, statistically linked to a volcanic eruption. Recent studies have shown the potential of using Sentinel-1 InSAR data and supervised deep learning (DL) methods for the detection of volcanic deformation signals, towards global volcanic hazard mitigation. However, detection accuracy is compromised from the lack of labelled data and class imbalance. To overcome this, synthetic data are typically used for finetuning DL models pre-trained on the ImageNet dataset. This approach suffers from poor generalisation on real InSAR data. This letter proposes the use of self-supervised contrastive learning to learn quality visual representations hidden in unlabeled InSAR data. Our approach, based on the SimCLR framework, provides a solution that does not require a specialized architecture nor a large labelled or synthetic dataset. We show that our self-supervised pipeline achieves higher accuracy with respect to the state-of-the-art methods, and shows excellent generalisation even for out-of-distribution test data. Finally, we showcase the effectiveness of our approach for detecting the unrest episodes preceding the recent Icelandic Fagradalsfjall volcanic eruption. △ Less

Submitted 8 February, 2022; originally announced February 2022.

Comments: 5 pages, 3 figures

ACM Class: I.2.10; I.4.10

Journal ref: IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1-5, 2022

arXiv:2103.11366 [pdf, other]

Current Advances in Computational Lung Ultrasound Imaging: A Review

Authors: Tianqi Yang, Oktay Karakuş, Nantheera Anantrasirichai, Alin Achim

Abstract: In the field of biomedical imaging, ultrasonography has become increasingly widespread, and an important auxiliary diagnostic tool with unique advantages, such as being non-ionising and often portable. This article reviews the state-of-the-art in medical ultrasound image computing and in particular its application in the examination of the lungs. First, we review the current developments in medica… ▽ More In the field of biomedical imaging, ultrasonography has become increasingly widespread, and an important auxiliary diagnostic tool with unique advantages, such as being non-ionising and often portable. This article reviews the state-of-the-art in medical ultrasound image computing and in particular its application in the examination of the lungs. First, we review the current developments in medical ultrasound technology. We then focus on the characteristics of lung ultrasonography and on its ability to diagnose a variety of diseases through the identification of various artefacts. We review medical ultrasound image processing methods by splitting them into two categories: (1) traditional model-based methods, and (2) data driven methods. For the former, we consider inverse problem based methods by focusing in particular on ultrasound image despeckling, deconvolution, and line artefacts detection. Among the data-driven approaches, we discuss various works based on deep/machine learning, which include various effective network architectures implementing supervised, weakly supervised and unsupervised learning. △ Less

Submitted 7 November, 2022; v1 submitted 21 March, 2021; originally announced March 2021.

Comments: 14 pages, 12 figures

arXiv:2101.01597 [pdf, other]

doi 10.1109/ICIP42928.2021.9506694

Contextual colorization and denoising for low-light ultra high resolution sequences

Authors: N. Anantrasirichai, David Bull

Abstract: Low-light image sequences generally suffer from spatio-temporal incoherent noise, flicker and blurring of moving objects. These artefacts significantly reduce visual quality and, in most cases, post-processing is needed in order to generate acceptable quality. Most state-of-the-art enhancement methods based on machine learning require ground truth data but this is not usually available for natural… ▽ More Low-light image sequences generally suffer from spatio-temporal incoherent noise, flicker and blurring of moving objects. These artefacts significantly reduce visual quality and, in most cases, post-processing is needed in order to generate acceptable quality. Most state-of-the-art enhancement methods based on machine learning require ground truth data but this is not usually available for naturally captured low light sequences. We tackle these problems with an unpaired-learning method that offers simultaneous colorization and denoising. Our approach is an adaptation of the CycleGAN structure. To overcome the excessive memory limitations associated with ultra high resolution content, we propose a multiscale patch-based framework, capturing both local and contextual features. Additionally, an adaptive temporal smoothing technique is employed to remove flickering artefacts. Experimental results show that our method outperforms existing approaches in terms of subjective quality and that it is robust to variations in brightness levels and noise. △ Less

Submitted 5 January, 2021; originally announced January 2021.

Comments: 5 pages

Journal ref: 2021 IEEE International Conference on Image Processing (ICIP)

arXiv:2012.01321 [pdf, other]

Red Blood Cell Segmentation with Overlapping Cell Separation and Classification on Imbalanced Dataset

Authors: Korranat Naruenatthanaset, Thanarat H. Chalidabhongse, Duangdao Palasuwan, Nantheera Anantrasirichai, Attakorn Palasuwan

Abstract: Automated red blood cell (RBC) classification on blood smear images helps hematologists to analyze RBC lab results in a reduced time and cost. However, overlapping cells can cause incorrect predicted results, and so they have to be separated into multiple single RBCs before classifying. To classify multiple classes with deep learning, imbalance problems are common in medical imaging because normal… ▽ More Automated red blood cell (RBC) classification on blood smear images helps hematologists to analyze RBC lab results in a reduced time and cost. However, overlapping cells can cause incorrect predicted results, and so they have to be separated into multiple single RBCs before classifying. To classify multiple classes with deep learning, imbalance problems are common in medical imaging because normal samples are always higher than rare disease samples. This paper presents a new method to segment and classify RBCs from blood smear images, specifically to tackle cell overlapping and data imbalance problems. Focusing on overlapping cell separation, our segmentation process first estimates ellipses to represent RBCs. The method detects the concave points and then finds the ellipses using directed ellipse fitting. The accuracy from 20 blood smear images was 0.889. Classification requires balanced training datasets. However, some RBC types are rare. The imbalance ratio of this dataset was 34.538 for 12 RBC classes from 20,875 individual RBC samples. The use of machine learning for RBC classification with an imbalanced dataset is hence more challenging than many other applications. We analyzed techniques to deal with this problem. The best accuracy and F1-score were 0.921 and 0.8679, respectively, using EfficientNet-B1 with augmentation. Experimental results showed that the weight balancing technique with augmentation had the potential to deal with imbalance problems by improving the F1-score on minority classes, while data augmentation significantly improved the overall classification performance. △ Less

Submitted 6 March, 2023; v1 submitted 2 December, 2020; originally announced December 2020.

Comments: This work has been submitted to Intelligent Systems with Applications (ISWA) for possible publication

arXiv:2005.03315 [pdf, other]

doi 10.1109/ICMEW46912.2020.9106011

Encoding in the Dark Grand Challenge: An Overview

Authors: Nantheera Anantrasirichai, Fan Zhang, Alexandra Malyugina, Paul Hill, Angeliki Katsenou

Abstract: A big part of the video content we consume from video providers consists of genres featuring low-light aesthetics. Low light sequences have special characteristics, such as spatio-temporal varying acquisition noise and light flickering, that make the encoding process challenging. To deal with the spatio-temporal incoherent noise, higher bitrates are used to achieve high objective quality. Addition… ▽ More A big part of the video content we consume from video providers consists of genres featuring low-light aesthetics. Low light sequences have special characteristics, such as spatio-temporal varying acquisition noise and light flickering, that make the encoding process challenging. To deal with the spatio-temporal incoherent noise, higher bitrates are used to achieve high objective quality. Additionally, the quality assessment metrics and methods have not been designed, trained or tested for this type of content. This has inspired us to trigger research in that area and propose a Grand Challenge on encoding low-light video sequences. In this paper, we present an overview of the proposed challenge, and test state-of-the-art methods that will be part of the benchmark methods at the stage of the participants' deliverable assessment. From this exploration, our results show that VVC already achieves a high performance compared to simply denoising the video source prior to encoding. Moreover, the quality of the video streams can be further improved by employing a post-processing image enhancement method. △ Less

Submitted 7 May, 2020; originally announced May 2020.

arXiv:2005.03080 [pdf, other]

doi 10.1109/TUFFC.2020.3016092

Detection of Line Artefacts in Lung Ultrasound Images of COVID-19 Patients via Non-Convex Regularization

Authors: Oktay Karakuş, Nantheera Anantrasirichai, Amazigh Aguersif, Stein Silva, Adrian Basarab, Alin Achim

Abstract: In this paper, we present a novel method for line artefacts quantification in lung ultrasound (LUS) images of COVID-19 patients. We formulate this as a non-convex regularisation problem involving a sparsity-enforcing, Cauchy-based penalty function, and the inverse Radon transform. We employ a simple local maxima detection technique in the Radon transform domain, associated with known clinical defi… ▽ More In this paper, we present a novel method for line artefacts quantification in lung ultrasound (LUS) images of COVID-19 patients. We formulate this as a non-convex regularisation problem involving a sparsity-enforcing, Cauchy-based penalty function, and the inverse Radon transform. We employ a simple local maxima detection technique in the Radon transform domain, associated with known clinical definitions of line artefacts. Despite being non-convex, the proposed technique is guaranteed to convergence through our proposed Cauchy proximal splitting (CPS) method and accurately identifies both horizontal and vertical line artefacts in LUS images. In order to reduce the number of false and missed detection, our method includes a two-stage validation mechanism, which is performed in both Radon and image domains. We evaluate the performance of the proposed method in comparison to the current state-of-the-art B-line identification method and show a considerable performance gain with 87% correctly detected B-lines in LUS images of nine COVID-19 patients. In addition, owing to its fast convergence, our proposed method is readily applicable for processing LUS image sequences. △ Less

Submitted 9 September, 2020; v1 submitted 6 May, 2020; originally announced May 2020.

Comments: 16 pages, 9 figures

arXiv:2003.06637 [pdf, other]

Fast Depth Estimation for View Synthesis

Authors: Nantheera Anantrasirichai, Majid Geravand, David Braendler, David R. Bull

Abstract: Disparity/depth estimation from sequences of stereo images is an important element in 3D vision. Owing to occlusions, imperfect settings and homogeneous luminance, accurate estimate of depth remains a challenging problem. Targetting view synthesis, we propose a novel learning-based framework making use of dilated convolution, densely connected convolutional modules, compact decoder and skip connec… ▽ More Disparity/depth estimation from sequences of stereo images is an important element in 3D vision. Owing to occlusions, imperfect settings and homogeneous luminance, accurate estimate of depth remains a challenging problem. Targetting view synthesis, we propose a novel learning-based framework making use of dilated convolution, densely connected convolutional modules, compact decoder and skip connections. The network is shallow but dense, so it is fast and accurate. Two additional contributions -- a non-linear adjustment of the depth resolution and the introduction of a projection loss, lead to reduction of estimation error by up to 20% and 25% respectively. The results show that our network outperforms state-of-the-art methods with an average improvement in accuracy of depth estimation and view synthesis by approximately 45% and 34% respectively. Where our method generates comparable quality of estimated depth, it performs 10 times faster than those methods. △ Less

Submitted 14 March, 2020; originally announced March 2020.

Comments: 5 pages

arXiv:1912.11350 [pdf, other]

Atmospheric turbulence removal using convolutional neural network

Authors: Jing Gao, N. Anantrasirichai, David Bull

Abstract: This paper describes a novel deep learning-based method for mitigating the effects of atmospheric distortion. We have built an end-to-end supervised convolutional neural network (CNN) to reconstruct turbulence-corrupted video sequence. Our framework has been developed on the residual learning concept, where the spatio-temporal distortions are learnt and predicted. Our experiments demonstrate that… ▽ More This paper describes a novel deep learning-based method for mitigating the effects of atmospheric distortion. We have built an end-to-end supervised convolutional neural network (CNN) to reconstruct turbulence-corrupted video sequence. Our framework has been developed on the residual learning concept, where the spatio-temporal distortions are learnt and predicted. Our experiments demonstrate that the proposed method can deblur, remove ripple effect and enhance contrast of the video sequences simultaneously. Our model was trained and tested with both simulated and real distortions. Experimental results of the real distortions show that our method outperforms the existing ones by up to 3.8% in term of the quality of restored images, and it achieves faster speed than the state-of-the-art methods by up to 23 times with GPU implementation. △ Less

Submitted 22 December, 2019; originally announced December 2019.

arXiv:1909.02321 [pdf, other]

doi 10.1029/2019GL084993

The application of Convolutional Neural Networks to Detect Slow, Sustained Deformation in InSAR Timeseries

Authors: N. Anantrasirichai, J. Biggs, F. Albino, D. Bull

Abstract: Automated systems for detecting deformation in satellite InSAR imagery could be used to develop a global monitoring system for volcanic and urban environments. Here we explore the limits of a CNN for detecting slow, sustained deformations in wrapped interferograms. Using synthetic data, we estimate a detection threshold of 3.9cm for deformation signals alone, and 6.3cm when atmospheric artefacts a… ▽ More Automated systems for detecting deformation in satellite InSAR imagery could be used to develop a global monitoring system for volcanic and urban environments. Here we explore the limits of a CNN for detecting slow, sustained deformations in wrapped interferograms. Using synthetic data, we estimate a detection threshold of 3.9cm for deformation signals alone, and 6.3cm when atmospheric artefacts are considered. Over-wrapping reduces this to 1.8cm and 5.0cm respectively as more fringes are generated without altering SNR. We test the approach on timeseries of cumulative deformation from Campi Flegrei and Dallol, where over-wrapping improves classication performance by up to 15%. We propose a mean-filtering method for combining results of different wrap parameters to flag deformation. At Campi Flegrei, deformation of 8.5cm/yr was detected after 60days and at Dallol, deformation of 3.5cm/yr was detected after 310 days. This corresponds to cumulative displacements of 3 cm and 4 cm consistent with estimates based on synthetic data. △ Less

Submitted 5 September, 2019; originally announced September 2019.

arXiv:1905.07286 [pdf, other]

A deep learning approach to detecting volcano deformation from satellite imagery using synthetic datasets

Authors: Nantheera Anantrasirichai, Juliet Biggs, Fabien Albino, David Bull

Abstract: Satellites enable widespread, regional or global surveillance of volcanoes and can provide the first indication of volcanic unrest or eruption. Here we consider Interferometric Synthetic Aperture Radar (InSAR), which can be employed to detect surface deformation with a strong statistical link to eruption. The ability of machine learning to automatically identify signals of interest in these large… ▽ More Satellites enable widespread, regional or global surveillance of volcanoes and can provide the first indication of volcanic unrest or eruption. Here we consider Interferometric Synthetic Aperture Radar (InSAR), which can be employed to detect surface deformation with a strong statistical link to eruption. The ability of machine learning to automatically identify signals of interest in these large InSAR datasets has already been demonstrated, but data-driven techniques, such as convolutional neutral networks (CNN) require balanced training datasets of positive and negative signals to effectively differentiate between real deformation and noise. As only a small proportion of volcanoes are deforming and atmospheric noise is ubiquitous, the use of machine learning for detecting volcanic unrest is more challenging. In this paper, we address this problem using synthetic interferograms to train the AlexNet. The synthetic interferograms are composed of 3 parts: 1) deformation patterns based on a Monte Carlo selection of parameters for analytic forward models, 2) stratified atmospheric effects derived from weather models and 3) turbulent atmospheric effects based on statistical simulations of correlated noise. The AlexNet architecture trained with synthetic data outperforms that trained using real interferograms alone, based on classification accuracy and positive predictive value (PPV). However, the models used to generate the synthetic signals are a simplification of the natural processes, so we retrain the CNN with a combined dataset consisting of synthetic models and selected real examples, achieving a final PPV of 82%. Although applying atmospheric corrections to the entire dataset is computationally expensive, it is relatively simple to apply them to the small subset of positive results. This further improves the detection performance without a significant increase in computational burden. △ Less

Submitted 17 May, 2019; originally announced May 2019.

Showing 1–31 of 31 results for author: Anantrasirichai, N