geoai-data

You are provided with four types of pre-computed embeddings from the following GFM:

  • Google Alpha Earth embeddings: These annual representations are derived from S1/S2 data within GEE, featuring 64 bands (Size: 256x256x64)
  • TESSERA embeddings: These annual representations are based on S1 and S2 satellite image time series, employing a pixel-wise approach that results in 128 bands.
  • TerraMind embeddings: These have been generated for a single epoch using both S1 and S2 images.
  • THOR embeddings: These have been generated for a single epoch using both S1 and S2 images

Study Area and Reference Data

We have generated a training dataset of 2024 patches 256x256 at 10m resolution  in France sampling over major cities and some rural areas. The labels are derived by IGN data based on airborne LiDAR. An example of the label is reported below (Reference Labels at high resolution (1m)):

VHR Image Basemap

Segmentation Classes (RGB Visualization)

nDSM Heights

The labels are not discrete categories. Instead, for each pixel, they provide the percentage contribution of each class within a 10x10 meter resolution cell. This data is stored in TIFF files, which have four bands:

  • Band 1: Percentage of building
  • Band 2: Percentage of vegetation
  • Band 3: Percentage of water
  • Band 4: Relative height above ground (nDSM)

The data, generated at a 1m spatial resolution, includes four classes: Background, Buildings, Trees/HighVegetation, and Unclassified. The "Unclassified" category addresses instances where vegetation and buildings overlap (e.g., a tree attached to a house). 

The test set (around 1000 patches) has been generated using similar data from different regions and years.

Dataset and starter notebook access

For the challenge, you have access to the training and test datasets and a range of tools in the form of a Jupyter Notebook (Starter Pack). Both are hosted on the Earth Observation Training Dataset (EOTDL). To get direct access to the dataset please follow this link:

EOTDL

Please note that the Jupyter Notebook can be found in the dataset files.

Limited cloud GPU access

Do you have only limited storage space or GPU capabilities available? The EOTDL has build-in cloud GPU functions that lets you stage the dataset directly in the cloud. The ESA Φ-lab in cooperation with the EOTDL offer a limited set of cloud GPU access. If you are interested in a spot please get in touch at: hello@ai4eo.eu (please do not use the forum for a request as they are a very individual subject).

This is an optional, limited offering and does not guarantee access to a cloud GPU. Please note that due to reasons of internal organisation we will not be able to provide a reasoning for selection or non-selection for a cloud GPU access. The ESA Φ-lab and EOTDL reserve the right to discontinue this opportunity at any time. Once a spot has been granted, participation will be maintained for the full duration of the Challenge, even if the opportunity is discontinued. As only a limited number of spots are available, we encourage you to express interest thoughtfully. Please note that, per team, only one access can be granted - to streamline the process we'd ask only the team leader of a team to be reaching out for us.

We'll try to be in contact with you as quick as possible, but please allow up to 3 business days for a response from our team.