geoai-data

You are provided with four types of pre-computed embeddings from the following GFM:

Google Alpha Earth embeddings: These annual representations are derived from S1/S2 data within GEE, featuring 64 bands (Size: 256x256x64)
TESSERA embeddings: These annual representations are based on S1 and S2 satellite image time series, employing a pixel-wise approach that results in 128 bands.
TerraMind embeddings: These have been generated for a single epoch using both S1 and S2 images.
THOR embeddings: These have been generated for a single epoch using both S1 and S2 images

Study Area and Reference Data

We have generated a training dataset of 2024 patches 256x256 at 10m resolution in France sampling over major cities and some rural areas. The labels are derived by IGN data based on airborne LiDAR. An example of the label is reported below (Reference Labels at high resolution (1m)):

VHR Image Basemap	Segmentation Classes (RGB Visualization)	nDSM Heights

The labels are not discrete categories. Instead, for each pixel, they provide the percentage contribution of each class within a 10x10 meter resolution cell. This data is stored in TIFF files, which have four bands:

Band 1: Percentage of building
Band 2: Percentage of vegetation
Band 3: Percentage of water
Band 4: Relative height above ground (nDSM)

The data, generated at a 1m spatial resolution, includes four classes: Background, Buildings, Trees/HighVegetation, and Unclassified. The "Unclassified" category addresses instances where vegetation and buildings overlap (e.g., a tree attached to a house).

The test set (around 1000 patches) has been generated using similar data from different regions and years.

Dataset and starter notebook access

For the challenge, you have access to the training and test datasets and a range of tools in the form of a Jupyter Notebook (Starter Pack). Both are hosted on the Earth Observation Training Dataset (EOTDL). To get direct access to the dataset please follow this link:

EOTDL

Please note that the Jupyter Notebook can be found in the dataset files.