Search | arXiv e-print repository

doi 10.1109/TGRS.2025.3583877

Zero-Shot Hyperspectral Pansharpening Using Hysteresis-Based Tuning for Spectral Quality Control

Authors: Giuseppe Guarino, Matteo Ciotola, Gemine Vivone, Giovanni Poggi, Giuseppe Scarpa

Abstract: Hyperspectral pansharpening has received much attention in recent years due to technological and methodological advances that open the door to new application scenarios. However, research on this topic is only now gaining momentum. The most popular methods are still borrowed from the more mature field of multispectral pansharpening and often overlook the unique challenges posed by hyperspectral da… ▽ More Hyperspectral pansharpening has received much attention in recent years due to technological and methodological advances that open the door to new application scenarios. However, research on this topic is only now gaining momentum. The most popular methods are still borrowed from the more mature field of multispectral pansharpening and often overlook the unique challenges posed by hyperspectral data fusion, such as i) the very large number of bands, ii) the overwhelming noise in selected spectral ranges, iii) the significant spectral mismatch between panchromatic and hyperspectral components, iv) a typically high resolution ratio. Imprecise data modeling especially affects spectral fidelity. Even state-of-the-art methods perform well in certain spectral ranges and much worse in others, failing to ensure consistent quality across all bands, with the risk of generating unreliable results. Here, we propose a hyperspectral pansharpening method that explicitly addresses this problem and ensures uniform spectral quality. To this end, a single lightweight neural network is used, with weights that adapt on the fly to each band. During fine-tuning, the spatial loss is turned on and off to ensure a fast convergence of the spectral loss to the desired level, according to a hysteresis-like dynamic. Furthermore, the spatial loss itself is appropriately redefined to account for nonlinear dependencies between panchromatic and spectral bands. Overall, the proposed method is fully unsupervised, with no prior training on external data, flexible, and low-complexity. Experiments on a recently published benchmarking toolbox show that it ensures excellent sharpening quality, competitive with the state-of-the-art, consistently across all bands. The software code and the full set of results are shared online on https://github.com/giu-guarino/rho-PNN. △ Less

Submitted 15 July, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

Journal ref: IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1-19, 2025, Art no. 5541219

arXiv:2407.01355 [pdf, other]

doi 10.1109/MGRS.2024.3509139

Hyperspectral Pansharpening: Critical Review, Tools and Future Perspectives

Authors: Matteo Ciotola, Giuseppe Guarino, Gemine Vivone, Giovanni Poggi, Jocelyn Chanussot, Antonio Plaza, Giuseppe Scarpa

Abstract: Hyperspectral pansharpening consists of fusing a high-resolution panchromatic band and a low-resolution hyperspectral image to obtain a new image with high resolution in both the spatial and spectral domains. These remote sensing products are valuable for a wide range of applications, driving ever growing research efforts. Nonetheless, results still do not meet application demands. In part, this c… ▽ More Hyperspectral pansharpening consists of fusing a high-resolution panchromatic band and a low-resolution hyperspectral image to obtain a new image with high resolution in both the spatial and spectral domains. These remote sensing products are valuable for a wide range of applications, driving ever growing research efforts. Nonetheless, results still do not meet application demands. In part, this comes from the technical complexity of the task: compared to multispectral pansharpening, many more bands are involved, in a spectral range only partially covered by the panchromatic component and with overwhelming noise. However, another major limiting factor is the absence of a comprehensive framework for the rapid development and accurate evaluation of new methods. This paper attempts to address this issue. We started by designing a dataset large and diverse enough to allow reliable training (for data-driven methods) and testing of new methods. Then, we selected a set of state-of-the-art methods, following different approaches, characterized by promising performance, and reimplemented them in a single PyTorch framework. Finally, we carried out a critical comparative analysis of all methods, using the most accredited quality indicators. The analysis highlights the main limitations of current solutions in terms of spectral/spatial quality and computational efficiency, and suggests promising research directions. To ensure full reproducibility of the results and support future research, the framework (including codes, evaluation procedures and links to the dataset) is shared on https://github.com/matciotola/hyperspectral_pansharpening_toolbox, as a single Python-based reference benchmark toolbox. △ Less

Submitted 27 December, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

arXiv:2405.02179 [pdf, other]

doi 10.1145/3658664.3659662

Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models

Authors: Alessandro Pianese, Davide Cozzolino, Giovanni Poggi, Luisa Verdoliva

Abstract: Generalization is a main issue for current audio deepfake detectors, which struggle to provide reliable results on out-of-distribution data. Given the speed at which more and more accurate synthesis methods are developed, it is very important to design techniques that work well also on data they were not trained for. In this paper we study the potential of large-scale pre-trained models for audio… ▽ More Generalization is a main issue for current audio deepfake detectors, which struggle to provide reliable results on out-of-distribution data. Given the speed at which more and more accurate synthesis methods are developed, it is very important to design techniques that work well also on data they were not trained for. In this paper we study the potential of large-scale pre-trained models for audio deepfake detection, with special focus on generalization ability. To this end, the detection problem is reformulated in a speaker verification framework and fake audios are exposed by the mismatch between the voice sample under test and the voice of the claimed identity. With this paradigm, no fake speech sample is necessary in training, cutting off any link with the generation method at the root, and ensuring full generalization ability. Features are extracted by general-purpose large pre-trained models, with no need for training or fine-tuning on specific fake detection or speaker verification datasets. At detection time only a limited set of voice fragments of the identity under test is required. Experiments on several datasets widespread in the community show that detectors based on pre-trained models achieve excellent performance and show strong generalization ability, rivaling supervised methods on in-distribution data and largely overcoming them on out-of-distribution data. △ Less

Submitted 1 July, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

arXiv:2309.07973 [pdf, other]

M3Dsynth: A dataset of medical 3D images with AI-generated local manipulations

Authors: Giada Zingarini, Davide Cozzolino, Riccardo Corvi, Giovanni Poggi, Luisa Verdoliva

Abstract: The ability to detect manipulated visual content is becoming increasingly important in many application fields, given the rapid advances in image synthesis methods. Of particular concern is the possibility of modifying the content of medical images, altering the resulting diagnoses. Despite its relevance, this issue has received limited attention from the research community. One reason is the lack… ▽ More The ability to detect manipulated visual content is becoming increasingly important in many application fields, given the rapid advances in image synthesis methods. Of particular concern is the possibility of modifying the content of medical images, altering the resulting diagnoses. Despite its relevance, this issue has received limited attention from the research community. One reason is the lack of large and curated datasets to use for development and benchmarking purposes. Here, we investigate this issue and propose M3Dsynth, a large dataset of manipulated Computed Tomography (CT) lung images. We create manipulated images by injecting or removing lung cancer nodules in real CT scans, using three different methods based on Generative Adversarial Networks (GAN) or Diffusion Models (DM), for a total of 8,577 manipulated samples. Experiments show that these images easily fool automated diagnostic tools. We also tested several state-of-the-art forensic detectors and demonstrated that, once trained on the proposed dataset, they are able to accurately detect and localize manipulated synthetic content, even when training and test sets are not aligned, showing good generalization ability. Dataset and code are publicly available at https://grip-unina.github.io/M3Dsynth/. △ Less

Submitted 1 February, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

arXiv:2307.14864 [pdf, other]

doi 10.1109/IGARSS47720.2021.9553199

A full-resolution training framework for Sentinel-2 image fusion

Authors: Matteo Ciotola, Mario Ragosta, Giovanni Poggi, Giuseppe Scarpa

Abstract: This work presents a new unsupervised framework for training deep learning models for super-resolution of Sentinel-2 images by fusion of its 10-m and 20-m bands. The proposed scheme avoids the resolution downgrade process needed to generate training data in the supervised case. On the other hand, a proper loss that accounts for cycle-consistency between the network prediction and the input compone… ▽ More This work presents a new unsupervised framework for training deep learning models for super-resolution of Sentinel-2 images by fusion of its 10-m and 20-m bands. The proposed scheme avoids the resolution downgrade process needed to generate training data in the supervised case. On the other hand, a proper loss that accounts for cycle-consistency between the network prediction and the input components to be fused is proposed. Despite its unsupervised nature, in our preliminary experiments the proposed scheme has shown promising results in comparison to the supervised approach. Besides, by construction of the proposed loss, the resulting trained network can be ascribed to the class of multi-resolution analysis methods. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Journal ref: 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 2021, pp. 1260-1263

arXiv:2307.14403 [pdf, other]

Unsupervised Deep Learning-based Pansharpening with Jointly-Enhanced Spectral and Spatial Fidelity

Authors: Matteo Ciotola, Giovanni Poggi, Giuseppe Scarpa

Abstract: In latest years, deep learning has gained a leading role in the pansharpening of multiresolution images. Given the lack of ground truth data, most deep learning-based methods carry out supervised training in a reduced-resolution domain. However, models trained on downsized images tend to perform poorly on high-resolution target images. For this reason, several research groups are now turning to un… ▽ More In latest years, deep learning has gained a leading role in the pansharpening of multiresolution images. Given the lack of ground truth data, most deep learning-based methods carry out supervised training in a reduced-resolution domain. However, models trained on downsized images tend to perform poorly on high-resolution target images. For this reason, several research groups are now turning to unsupervised training in the full-resolution domain, through the definition of appropriate loss functions and training paradigms. In this context, we have recently proposed a full-resolution training framework which can be applied to many existing architectures. Here, we propose a new deep learning-based pansharpening model that fully exploits the potential of this approach and provides cutting-edge performance. Besides architectural improvements with respect to previous work, such as the use of residual attention modules, the proposed model features a novel loss function that jointly promotes the spectral and spatial quality of the pansharpened data. In addition, thanks to a new fine-tuning strategy, it improves inference-time adaptation to target images. Experiments on a large variety of test images, performed in challenging scenarios, demonstrate that the proposed method compares favorably with the state of the art both in terms of numerical results and visual output. Code is available online at https://github.com/matciotola/Lambda-PNN. △ Less

Submitted 26 July, 2023; originally announced July 2023.

arXiv:2209.14098 [pdf, other]

Deepfake audio detection by speaker verification

Authors: Alessandro Pianese, Davide Cozzolino, Giovanni Poggi, Luisa Verdoliva

Abstract: Thanks to recent advances in deep learning, sophisticated generation tools exist, nowadays, that produce extremely realistic synthetic speech. However, malicious uses of such tools are possible and likely, posing a serious threat to our society. Hence, synthetic voice detection has become a pressing research topic, and a large variety of detection methods have been recently proposed. Unfortunately… ▽ More Thanks to recent advances in deep learning, sophisticated generation tools exist, nowadays, that produce extremely realistic synthetic speech. However, malicious uses of such tools are possible and likely, posing a serious threat to our society. Hence, synthetic voice detection has become a pressing research topic, and a large variety of detection methods have been recently proposed. Unfortunately, they hardly generalize to synthetic audios generated by tools never seen in the training phase, which makes them unfit to face real-world scenarios. In this work, we aim at overcoming this issue by proposing a new detection approach that leverages only the biometric characteristics of the speaker, with no reference to specific manipulations. Since the detector is trained only on real data, generalization is automatically ensured. The proposed approach can be implemented based on off-the-shelf speaker verification tools. We test several such solutions on three popular test sets, obtaining good performance, high generalization ability, and high robustness to audio impairment. △ Less

Submitted 28 September, 2022; originally announced September 2022.

arXiv:2111.08334 [pdf, other]

doi 10.1109/TGRS.2022.3163887

Pansharpening by convolutional neural networks in the full resolution framework

Authors: Matteo Ciotola, Sergio Vitale, Antonio Mazza, Giovanni Poggi, Giuseppe Scarpa

Abstract: In recent years, there has been a growing interest in deep learning-based pansharpening. Thus far, research has mainly focused on architectures. Nonetheless, model training is an equally important issue. A first problem is the absence of ground truths, unavoidable in pansharpening. This is often addressed by training networks in a reduced resolution domain and using the original data as ground tru… ▽ More In recent years, there has been a growing interest in deep learning-based pansharpening. Thus far, research has mainly focused on architectures. Nonetheless, model training is an equally important issue. A first problem is the absence of ground truths, unavoidable in pansharpening. This is often addressed by training networks in a reduced resolution domain and using the original data as ground truth, relying on an implicit scale invariance assumption. However, on full resolution images results are often disappointing, suggesting such invariance not to hold. A further problem is the scarcity of training data, which causes a limited generalization ability and a poor performance on off-training test images. In this paper, we propose a full-resolution training framework for deep learning-based pansharpening. The framework is fully general and can be used for any deep learning-based pansharpening model. Training takes place in the high-resolution domain, relying only on the original data, thus avoiding any loss of information. To ensure spectral and spatial fidelity, a suitable two-component loss is defined. The spectral component enforces consistency between the pansharpened output and the low-resolution multispectral input. The spatial component, computed at high-resolution, maximizes the local correlation between each pansharpened band and the panchromatic input. At testing time, the target-adaptive operating modality is adopted, achieving good generalization with a limited computational overhead. Experiments carried out on WorldView-3, WorldView-2, and GeoEye-1 images show that methods trained with the proposed framework guarantee a pretty good performance in terms of both full-resolution numerical indexes and visual quality. △ Less

Submitted 4 April, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

Journal ref: IEEE Transactions on Geoscience and Remote Sensing

arXiv:2012.05508 [pdf, other]

doi 10.1109/MGRS.2021.3070956

Deep Learning Methods For Synthetic Aperture Radar Image Despeckling: An Overview Of Trends And Perspectives

Authors: Giulia Fracastoro, Enrico Magli, Giovanni Poggi, Giuseppe Scarpa, Diego Valsesia, Luisa Verdoliva

Abstract: Synthetic aperture radar (SAR) images are affected by a spatially-correlated and signal-dependent noise called speckle, which is very severe and may hinder image exploitation. Despeckling is an important task that aims at removing such noise, so as to improve the accuracy of all downstream image processing tasks. The first despeckling methods date back to the 1970's, and several model-based algori… ▽ More Synthetic aperture radar (SAR) images are affected by a spatially-correlated and signal-dependent noise called speckle, which is very severe and may hinder image exploitation. Despeckling is an important task that aims at removing such noise, so as to improve the accuracy of all downstream image processing tasks. The first despeckling methods date back to the 1970's, and several model-based algorithms have been developed in the subsequent years. The field has received growing attention, sparkled by the availability of powerful deep learning models that have yielded excellent performance for inverse problems in image processing. This paper surveys the literature on deep learning methods applied to SAR despeckling, covering both the supervised and the more recent self-supervised approaches. We provide a critical analysis of existing methods with the objective to recognize the most promising research lines, to identify the factors that have limited the success of deep models, and to propose ways forward in an attempt to fully exploit the potential of deep learning for SAR despeckling. △ Less

Submitted 2 May, 2021; v1 submitted 10 December, 2020; originally announced December 2020.

arXiv:2011.00337 [pdf, other]

Deep learning in the ultrasound evaluation of neonatal respiratory status

Authors: Michela Gravina, Diego Gragnaniello, Luisa Verdoliva, Giovanni Poggi, Iuri Corsini, Carlo Dani, Fabio Meneghin, Gianluca Lista, Salvatore Aversa, Francesco Raimondi, Fiorella Migliaro, Carlo Sansone

Abstract: Lung ultrasound imaging is reaching growing interest from the scientific community. On one side, thanks to its harmlessness and high descriptive power, this kind of diagnostic imaging has been largely adopted in sensitive applications, like the diagnosis and follow-up of preterm newborns in neonatal intensive care units. On the other side, state-of-the-art image analysis and pattern recognition ap… ▽ More Lung ultrasound imaging is reaching growing interest from the scientific community. On one side, thanks to its harmlessness and high descriptive power, this kind of diagnostic imaging has been largely adopted in sensitive applications, like the diagnosis and follow-up of preterm newborns in neonatal intensive care units. On the other side, state-of-the-art image analysis and pattern recognition approaches have recently proven their ability to fully exploit the rich information contained in these data, making them attractive for the research community. In this work, we present a thorough analysis of recent deep learning networks and training strategies carried out on a vast and challenging multicenter dataset comprising 87 patients with different diseases and gestational ages. These approaches are employed to assess the lung respiratory status from ultrasound images and are evaluated against a reference marker. The conducted analysis sheds some light on this problem by showing the critical points that can mislead the training procedure and proposes some adaptations to the specific data and task. The achieved results sensibly outperform those obtained by a previous work, which is based on textural features, and narrow the gap with the visual score predicted by the human experts. △ Less

Submitted 31 October, 2020; originally announced November 2020.

Comments: 7 pages

arXiv:2001.06440 [pdf, other]

Combining PRNU and noiseprint for robust and efficient device source identification

Authors: Davide Cozzolino, Francesco Marra, Diego Gragnaniello, Giovanni Poggi, Luisa Verdoliva

Abstract: PRNU-based image processing is a key asset in digital multimedia forensics. It allows for reliable device identification and effective detection and localization of image forgeries, in very general conditions. However, performance impairs significantly in challenging conditions involving low quality and quantity of data. These include working on compressed and cropped images, or estimating the cam… ▽ More PRNU-based image processing is a key asset in digital multimedia forensics. It allows for reliable device identification and effective detection and localization of image forgeries, in very general conditions. However, performance impairs significantly in challenging conditions involving low quality and quantity of data. These include working on compressed and cropped images, or estimating the camera PRNU pattern based on only a few images. To boost the performance of PRNU-based analyses in such conditions we propose to leverage the image noiseprint, a recently proposed camera-model fingerprint that has proved effective for several forensic tasks. Numerical experiments on datasets widely used for source identification prove that the proposed method ensures a significant performance improvement in a wide range of challenging situations. △ Less

Submitted 17 January, 2020; originally announced January 2020.

Showing 1–11 of 11 results for author: Poggi, G