OmniSat: Self-Supervised Modality Fusion for Earth Observation

ECCV 2024

Guillaume AstrucNicolas GonthierClément MalletLoic Landrieu

LIGM, École des Ponts, Univ Gustave Eiffel, CNRS, Marne-la-Vallée, France Univ Gustave Eiffel, IGN, ENSG, LASTIG, France IGN, France CNES, France
teaser.jpg

Paper Code

Abstract


The diversity and complementarity of sensors available for Earth Observations (EO) calls for developing bespoke self-supervised multimodal learning approaches. However, current multimodal EO datasets and models typically focus on a single data type, either mono-date images or time series, which limits their impact. To address this issue, we introduce OmniSat, a novel architecture able to merge diverse EO modalities into expressive features without labels by exploiting their alignment. To demonstrate the advantages of our approach, we create two new multimodal datasets by augmenting existing ones with new modalities. As demonstrated for three downstream tasks---forestry, land cover classification, and crop mapping---OmniSat can learn rich representations without supervision, leading to state-of-the-art performances in semi- and fully supervised settings. Furthermore, our multimodal pretraining scheme improves performance even when only one modality is available for inference.

Datasets

Dataset name Modalities Labels Link
PASTIS-HD SPOT 6-7 (1m) + S1/S2 (30-140 / year) Crop mapping (0.2m) huggingface or zenodo
TreeSatAI-TS Aerial (0.2m) + S1/S2 (10-70 / year) Forestry (60m) huggingface
FLAIR aerial (0.2m) + S2 (20-114 / year) Land cover (0.2m) huggingface

We represent three tiles from the considered multilabel classification datasets: FLAIR (a), TreeSatAI-TS (b) and PASTIS-HD (c). TreeSatAI-TS is a new dataset built by replacing the single-date Sentinel-1 and 2 images of TreeSatAI by year-long time series. PASTIS-HD (c) adds VHR satellite images to PASTIS-R.

Results

We perform experiments with 100% and 10-20% of labels. When using all modalities, OmniSat outperforms all competing methods. Our pre-training leads to more expressive multimodal features. See below, the F1 Score results on 100% of the training data with all modalities available:

F1 Score All Modalities UT&T Scale-MAE DOFA OmniSat (no pretraining) OmniSat (with pretraining)
PASTIS-HD 53.5 42.2 55.7 59.1 69.9
TreeSatAI-TS 56.7 60.4 71.3 73.3 74.2
FLAIR 48.8 70.0 74.9 70.0 73.4

OmniSat also improves performance even when only one modality is available for inference. Our self-supervised pre-training scheme improves the features learned by each encoder despite not relying on annotated data. F1 Score results on 100% of the training data with only S2 data available:

F1 Score S2 only UT&T Scale-MAE DOFA OmniSat (no pretraining) OmniSat (with pretraining)
PASTIS-HD 61.3 46.1 53.4 60.1 70.8
TreeSatAI-TS 57.0 31.5 39.4 49.7 62.9
FLAIR 62.0 61.0 61.0 65.4 65.4

Efficiency

We report the best performance of different models between TreeSatAI and TreeSatAI-TS, with pre-training and fine-tuning using 100% of labels. The area of the markers is proportional to the training time, broken down in pre-training and fine-tuning when applicable. OmniSat is more compact, faster to train, and performs better than all evaluated models, including the DOFA foundation model.

Resources


Paper

paper.jpg

Code

github_repo.png

BibTeX

If you find this work useful for your research, please cite:
          @article{astruc2024omnisat,
            title={OmniSat: Self-Supervised Modality Fusion for Earth Observation},
            author={Astruc, Guillaume and Gonthier, Nicolas and Mallet, Clement and Landrieu, Loic},
            journal={ECCV},
            year={2024}
          }

Acknowledgements


This work was supported by ANR project READY3D ANR-19-CE23-0007, and was granted access to the HPC resources of IDRIS under the allocation AD011014719 made by GENCI. We thank Anatol Garioud and Sebastien Giordano for their help on the creation of ´ TreeSatAI-TS and PASTIS-HD datasets. The SPOT images are opendata thanks to the Dataterra Dinamis initiative in the case of the "Couverture France DINAMIS" program. We thank Jordi Inglada for inspiring discussions and valuable feedback.

© This webpage was in part inspired from this template.