-
TerraMesh: A Planetary Mosaic of Multimodal Earth Observation Data
Authors:
Benedikt Blumenstiel,
Paolo Fraccaro,
Valerio Marsocci,
Johannes Jakubik,
Stefano Maurogiovanni,
Mikolaj Czerkawski,
Rocco Sedona,
Gabriele Cavallaro,
Thomas Brunschwiler,
Juan Bernabe-Moreno,
Nicolas Longépé
Abstract:
Large-scale foundation models in Earth Observation can learn versatile, label-efficient representations by leveraging massive amounts of unlabeled data. However, existing public datasets are often limited in scale, geographic coverage, or sensor variety. We introduce TerraMesh, a new globally diverse, multimodal dataset combining optical, synthetic aperture radar, elevation, and land-cover modalit…
▽ More
Large-scale foundation models in Earth Observation can learn versatile, label-efficient representations by leveraging massive amounts of unlabeled data. However, existing public datasets are often limited in scale, geographic coverage, or sensor variety. We introduce TerraMesh, a new globally diverse, multimodal dataset combining optical, synthetic aperture radar, elevation, and land-cover modalities in an Analysis-Ready Data format. TerraMesh includes over 9 million samples with eight spatiotemporal aligned modalities, enabling large-scale pre-training and fostering robust cross-modal correlation learning. We provide detailed data processing steps, comprehensive statistics, and empirical evidence demonstrating improved model performance when pre-trained on TerraMesh. The dataset will be made publicly available with a permissive license.
△ Less
Submitted 15 April, 2025;
originally announced April 2025.
-
TerraMind: Large-Scale Generative Multimodality for Earth Observation
Authors:
Johannes Jakubik,
Felix Yang,
Benedikt Blumenstiel,
Erik Scheurer,
Rocco Sedona,
Stefano Maurogiovanni,
Jente Bosmans,
Nikolaos Dionelis,
Valerio Marsocci,
Niklas Kopp,
Rahul Ramachandran,
Paolo Fraccaro,
Thomas Brunschwiler,
Gabriele Cavallaro,
Juan Bernabe-Moreno,
Nicolas Longépé
Abstract:
We present TerraMind, the first any-to-any generative, multimodal foundation model for Earth observation (EO). Unlike other multimodal models, TerraMind is pretrained on dual-scale representations combining both token-level and pixel-level data across modalities. On a token level, TerraMind encodes high-level contextual information to learn cross-modal relationships, while on a pixel level, TerraM…
▽ More
We present TerraMind, the first any-to-any generative, multimodal foundation model for Earth observation (EO). Unlike other multimodal models, TerraMind is pretrained on dual-scale representations combining both token-level and pixel-level data across modalities. On a token level, TerraMind encodes high-level contextual information to learn cross-modal relationships, while on a pixel level, TerraMind leverages fine-grained representations to capture critical spatial nuances. We pretrained TerraMind on nine geospatial modalities of a global, large-scale dataset. In this paper, we demonstrate that (i) TerraMind's dual-scale early fusion approach unlocks a range of zero-shot and few-shot applications for Earth observation, (ii) TerraMind introduces "Thinking-in-Modalities" (TiM) -- the capability of generating additional artificial data during finetuning and inference to improve the model output -- and (iii) TerraMind achieves beyond state-of-the-art performance in community-standard benchmarks for EO like PANGAEA. The pretraining dataset, the model weights, and our code are open-sourced under a permissive license.
△ Less
Submitted 11 June, 2025; v1 submitted 15 April, 2025;
originally announced April 2025.
-
Lossy Neural Compression for Geospatial Analytics: A Review
Authors:
Carlos Gomes,
Isabelle Wittmann,
Damien Robert,
Johannes Jakubik,
Tim Reichelt,
Michele Martone,
Stefano Maurogiovanni,
Rikard Vinge,
Jonas Hurst,
Erik Scheurer,
Rocco Sedona,
Thomas Brunschwiler,
Stefan Kesselheim,
Matej Batic,
Philip Stier,
Jan Dirk Wegner,
Gabriele Cavallaro,
Edzer Pebesma,
Michael Marszalek,
Miguel A Belenguer-Plomer,
Kennedy Adriko,
Paolo Fraccaro,
Romeo Kienzler,
Rania Briq,
Sabrina Benassou
, et al. (2 additional authors not shown)
Abstract:
Over the past decades, there has been an explosion in the amount of available Earth Observation (EO) data. The unprecedented coverage of the Earth's surface and atmosphere by satellite imagery has resulted in large volumes of data that must be transmitted to ground stations, stored in data centers, and distributed to end users. Modern Earth System Models (ESMs) face similar challenges, operating a…
▽ More
Over the past decades, there has been an explosion in the amount of available Earth Observation (EO) data. The unprecedented coverage of the Earth's surface and atmosphere by satellite imagery has resulted in large volumes of data that must be transmitted to ground stations, stored in data centers, and distributed to end users. Modern Earth System Models (ESMs) face similar challenges, operating at high spatial and temporal resolutions, producing petabytes of data per simulated day. Data compression has gained relevance over the past decade, with neural compression (NC) emerging from deep learning and information theory, making EO data and ESM outputs ideal candidates due to their abundance of unlabeled data. In this review, we outline recent developments in NC applied to geospatial data. We introduce the fundamental concepts of NC including seminal works in its traditional applications to image and video compression domains with focus on lossy compression. We discuss the unique characteristics of EO and ESM data, contrasting them with "natural images", and explain the additional challenges and opportunities they present. Moreover, we review current applications of NC across various EO modalities and explore the limited efforts in ESM compression to date. The advent of self-supervised learning (SSL) and foundation models (FM) has advanced methods to efficiently distill representations from vast unlabeled data. We connect these developments to NC for EO, highlighting the similarities between the two fields and elaborate on the potential of transferring compressed feature representations for machine--to--machine communication. Based on insights drawn from this review, we devise future directions relevant to applications in EO and ESM.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
SSL4EO-S12 v1.1: A Multimodal, Multiseasonal Dataset for Pretraining, Updated
Authors:
Benedikt Blumenstiel,
Nassim Ait Ali Braham,
Conrad M Albrecht,
Stefano Maurogiovanni,
Paolo Fraccaro
Abstract:
This technical report presents SSL4EO-S12 v1.1, a multimodal, multitemporal Earth Observation dataset designed for pretraining large-scale foundation models. Building on the success of SSL4EO-S12 v1.0, the new version addresses the previous challenges of data misalignment and a limited data structure for low-barrier, analysis-ready EO processing. SSL4EO-S12 v1.1 covers the world's 10,000 largest c…
▽ More
This technical report presents SSL4EO-S12 v1.1, a multimodal, multitemporal Earth Observation dataset designed for pretraining large-scale foundation models. Building on the success of SSL4EO-S12 v1.0, the new version addresses the previous challenges of data misalignment and a limited data structure for low-barrier, analysis-ready EO processing. SSL4EO-S12 v1.1 covers the world's 10,000 largest cities and its surroundings within a 50 km radius across four seasons, resulting in a diverse collection of nearly one million patches. SSL4EO-S12 v1.1 packages the data in Zarr file format for cloud-efficient loading and representation of meta-information such as including cloud masks and geolocation. Released under the CC-BY-4.0 license, SSL4EO-S12 v1.1 facilitates open research and provides a robust foundation for future advancements in self-supervised learning and geospatial analysis. The dataset is available online through https://datapub.fz-juelich.de/ssl4eo-s12, and we provided additional resources at https://github.com/DLR-MF-DAS/SSL4EO-S12-v1.1.
△ Less
Submitted 6 March, 2025; v1 submitted 28 February, 2025;
originally announced March 2025.