MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning

Nedungadi, Vishal; Kariryaa, Ankit; Oehmcke, Stefan; Belongie, Serge; Igel, Christian; Lang, Nico

Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.02771 (cs)

[Submitted on 4 May 2024 (v1), last revised 29 Jul 2024 (this version, v2)]

Title:MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning

Authors:Vishal Nedungadi, Ankit Kariryaa, Stefan Oehmcke, Serge Belongie, Christian Igel, Nico Lang

View PDF HTML (experimental)

Abstract:The volume of unlabelled Earth observation (EO) data is huge, but many important applications lack labelled training data. However, EO data offers the unique opportunity to pair data from different modalities and sensors automatically based on geographic location and time, at virtually no human labor cost. We seize this opportunity to create MMEarth, a diverse multi-modal pretraining dataset at global scale. Using this new corpus of 1.2 million locations, we propose a Multi-Pretext Masked Autoencoder (MP-MAE) approach to learn general-purpose representations for optical satellite images. Our approach builds on the ConvNeXt V2 architecture, a fully convolutional masked autoencoder (MAE). Drawing upon a suite of multi-modal pretext tasks, we demonstrate that our MP-MAE approach outperforms both MAEs pretrained on ImageNet and MAEs pretrained on domain-specific satellite images. This is shown on several downstream tasks including image classification and semantic segmentation. We find that pretraining with multi-modal pretext tasks notably improves the linear probing performance compared to pretraining on optical satellite images only. This also leads to better label efficiency and parameter efficiency which are crucial aspects in global scale applications.

Comments:	Accepted for ECCV 2024. Data and code: this https URL Update arXiv v2 (ECCV): 1. Dataset fix: Removed duplicates and corrected ERA5 yearly statistics. 2. Data augmentation fix: Random crops are now aligned. 3. Test metrics fix: Metrics are now overall instead of mini-batch averages, matching GEO-Bench metrics. 4. Pretrained on MMEarth v001 & evaluated on GEO-Bench v1.0
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2405.02771 [cs.CV]
	(or arXiv:2405.02771v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.02771

Submission history

From: Nico Lang [view email]
[v1] Sat, 4 May 2024 23:16:48 UTC (1,214 KB)
[v2] Mon, 29 Jul 2024 10:35:50 UTC (1,186 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators