-
Lossy Neural Compression for Geospatial Analytics: A Review
Authors:
Carlos Gomes,
Isabelle Wittmann,
Damien Robert,
Johannes Jakubik,
Tim Reichelt,
Michele Martone,
Stefano Maurogiovanni,
Rikard Vinge,
Jonas Hurst,
Erik Scheurer,
Rocco Sedona,
Thomas Brunschwiler,
Stefan Kesselheim,
Matej Batic,
Philip Stier,
Jan Dirk Wegner,
Gabriele Cavallaro,
Edzer Pebesma,
Michael Marszalek,
Miguel A Belenguer-Plomer,
Kennedy Adriko,
Paolo Fraccaro,
Romeo Kienzler,
Rania Briq,
Sabrina Benassou
, et al. (2 additional authors not shown)
Abstract:
Over the past decades, there has been an explosion in the amount of available Earth Observation (EO) data. The unprecedented coverage of the Earth's surface and atmosphere by satellite imagery has resulted in large volumes of data that must be transmitted to ground stations, stored in data centers, and distributed to end users. Modern Earth System Models (ESMs) face similar challenges, operating a…
▽ More
Over the past decades, there has been an explosion in the amount of available Earth Observation (EO) data. The unprecedented coverage of the Earth's surface and atmosphere by satellite imagery has resulted in large volumes of data that must be transmitted to ground stations, stored in data centers, and distributed to end users. Modern Earth System Models (ESMs) face similar challenges, operating at high spatial and temporal resolutions, producing petabytes of data per simulated day. Data compression has gained relevance over the past decade, with neural compression (NC) emerging from deep learning and information theory, making EO data and ESM outputs ideal candidates due to their abundance of unlabeled data. In this review, we outline recent developments in NC applied to geospatial data. We introduce the fundamental concepts of NC including seminal works in its traditional applications to image and video compression domains with focus on lossy compression. We discuss the unique characteristics of EO and ESM data, contrasting them with "natural images", and explain the additional challenges and opportunities they present. Moreover, we review current applications of NC across various EO modalities and explore the limited efforts in ESM compression to date. The advent of self-supervised learning (SSL) and foundation models (FM) has advanced methods to efficiently distill representations from vast unlabeled data. We connect these developments to NC for EO, highlighting the similarities between the two fields and elaborate on the potential of transferring compressed feature representations for machine--to--machine communication. Based on insights drawn from this review, we devise future directions relevant to applications in EO and ESM.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
Multispectral to Hyperspectral using Pretrained Foundational model
Authors:
Ruben Gonzalez,
Conrad M Albrecht,
Nassim Ait Ali Braham,
Devyani Lambhate,
Joao Lucas de Sousa Almeida,
Paolo Fraccaro,
Benedikt Blumenstiel,
Thomas Brunschwiler,
Ranjini Bangalore
Abstract:
Hyperspectral imaging provides detailed spectral information, offering significant potential for monitoring greenhouse gases like CH4 and NO2. However, its application is constrained by limited spatial coverage and infrequent revisit times. In contrast, multispectral imaging delivers broader spatial and temporal coverage but lacks the spectral granularity required for precise GHG detection. To add…
▽ More
Hyperspectral imaging provides detailed spectral information, offering significant potential for monitoring greenhouse gases like CH4 and NO2. However, its application is constrained by limited spatial coverage and infrequent revisit times. In contrast, multispectral imaging delivers broader spatial and temporal coverage but lacks the spectral granularity required for precise GHG detection. To address these challenges, this study proposes Spectral and Spatial-Spectral transformer models that reconstruct hyperspectral data from multispectral inputs. The models in this paper are pretrained on EnMAP and EMIT datasets and fine-tuned on spatio-temporally aligned (Sentinel-2, EnMAP) and (HLS-S30, EMIT) image pairs respectively. Our model has the potential to enhance atmospheric monitoring by combining the strengths of hyperspectral and multispectral imaging systems.
△ Less
Submitted 26 February, 2025;
originally announced February 2025.
-
Aboveground carbon biomass estimate with Physics-informed deep network
Authors:
Juan Nathaniel,
Levente J. Klein,
Campbell D. Watson,
Gabrielle Nyirjesy,
Conrad M. Albrecht
Abstract:
The global carbon cycle is a key process to understand how our climate is changing. However, monitoring the dynamics is difficult because a high-resolution robust measurement of key state parameters including the aboveground carbon biomass (AGB) is required. Here, we use deep neural network to generate a wall-to-wall map of AGB within the Continental USA (CONUS) with 30-meter spatial resolution fo…
▽ More
The global carbon cycle is a key process to understand how our climate is changing. However, monitoring the dynamics is difficult because a high-resolution robust measurement of key state parameters including the aboveground carbon biomass (AGB) is required. Here, we use deep neural network to generate a wall-to-wall map of AGB within the Continental USA (CONUS) with 30-meter spatial resolution for the year 2021. We combine radar and optical hyperspectral imagery, with a physical climate parameter of SIF-based GPP. Validation results show that a masked variation of UNet has the lowest validation RMSE of 37.93 $\pm$ 1.36 Mg C/ha, as compared to 52.30 $\pm$ 0.03 Mg C/ha for random forest algorithm. Furthermore, models that learn from SIF-based GPP in addition to radar and optical imagery reduce validation RMSE by almost 10% and the standard deviation by 40%. Finally, we apply our model to measure losses in AGB from the recent 2021 Caldor wildfire in California, and validate our analysis with Sentinel-based burn index.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
AutoGeoLabel: Automated Label Generation for Geospatial Machine Learning
Authors:
Conrad M Albrecht,
Fernando Marianno,
Levente J Klein
Abstract:
A key challenge of supervised learning is the availability of human-labeled data. We evaluate a big data processing pipeline to auto-generate labels for remote sensing data. It is based on rasterized statistical features extracted from surveys such as e.g. LiDAR measurements. Using simple combinations of the rasterized statistical layers, it is demonstrated that multiple classes can be generated a…
▽ More
A key challenge of supervised learning is the availability of human-labeled data. We evaluate a big data processing pipeline to auto-generate labels for remote sensing data. It is based on rasterized statistical features extracted from surveys such as e.g. LiDAR measurements. Using simple combinations of the rasterized statistical layers, it is demonstrated that multiple classes can be generated at accuracies of ~0.9. As proof of concept, we utilize the big geo-data platform IBM PAIRS to dynamically generate such labels in dense urban areas with multiple land cover classes. The general method proposed here is platform independent, and it can be adapted to generate labels for other satellite modalities in order to enable machine learning on overhead imagery for land use classification and object detection.
△ Less
Submitted 31 January, 2022;
originally announced February 2022.
-
Quantification of Carbon Sequestration in Urban Forests
Authors:
Levente J. Klein,
Wang Zhou,
Conrad M. Albrecht
Abstract:
Vegetation, trees in particular, sequester carbon by absorbing carbon dioxide from the atmosphere. However, the lack of efficient quantification methods of carbon stored in trees renders it difficult to track the process. We present an approach to estimate the carbon storage in trees based on fusing multi-spectral aerial imagery and LiDAR data to identify tree coverage, geometric shape, and tree s…
▽ More
Vegetation, trees in particular, sequester carbon by absorbing carbon dioxide from the atmosphere. However, the lack of efficient quantification methods of carbon stored in trees renders it difficult to track the process. We present an approach to estimate the carbon storage in trees based on fusing multi-spectral aerial imagery and LiDAR data to identify tree coverage, geometric shape, and tree species -- key attributes to carbon storage quantification. We demonstrate that tree species information and their three-dimensional geometric shapes can be estimated from aerial imagery in order to determine the tree's biomass. Specifically, we estimate a total of $52,000$ tons of carbon sequestered in trees for New York City's borough Manhattan.
△ Less
Submitted 20 July, 2021; v1 submitted 31 May, 2021;
originally announced June 2021.