AGBD: A Global-scale Biomass Dataset

Overview of the method

AGBD is the first globally representative, high-resolution, machine learning-ready dataset for biomass estimation.

Abstract

Accurate estimates of Above Ground Biomass (AGB) are essential in addressing two of humanity’s biggest challenges: climate change and biodiversity loss. Existing datasets for AGB estimation from satellite imagery are limited. Either they focus on specific, local regions at high resolution, or they offer global coverage at low resolution. There is a need for a machine learning-ready, globally representative, high-resolution benchmark dataset. Our findings indicate significant variability in biomass estimates across different vegetation types, emphasizing the necessity for a dataset that accurately captures global diversity. To address these gaps, we introduce a comprehensive new dataset that is globally distributed, covers a range of vegetation types, and spans several years. This dataset combines AGB reference data from the GEDI mission with data from Sentinel-2 and PALSAR-2 imagery. Additionally, it includes pre-processed high-level features such as a dense canopy height map, an elevation map, and a land-cover classification map. We also produce a dense, high-resolution (10 m) map of AGB predictions for the entire area covered by the dataset. Rigorously tested, our dataset is accompanied by several benchmark models and is publicly available. It can be easily accessed using a single line of code, offering a solid basis for efforts towards global AGB estimation.


        #!pip install datasets
        from datasets import load_dataset
        dataset = load_dataset("prs-eth/AGBD", streaming=True)["train"] # or test, val
      

Quantitative results

Our results indicate that including input features beyond Sentinel-2 is beneficial for the task of AGB estimation, and that more complex models tend to better capture the complex relationship between input features and AGB.

Results

Table 2. Mean test RMSE (↓) and associated standard deviation per model, with various inputs. Crosses denote the presence of a feature, values in brackets denote patch size. Values are colored from dark red (higher) to dark blue (lower).

Qualitative results

The high spatial resolution of our estimates is able to preserve spatial details that are not captured in the CCI product, meaning that they would not be restored by conventional upscaling of existing maps.

Results scheme

Figure 8. Sentinel-2 tile 30NXM (Ghana), our best model’s prediction, the corresponding ESA CCI map, and GEDI L4B product. Global view (top) and zoomed in views (bottom).

More results

A common issue for biomass estimation is the over-estimation of low biomass values and under-estimation of high values. Our method somewhat mitigates this phenomenon, compared with the ESA CCI map.

Results scheme

Figure 5. Binned test residuals for the best-performing model of each architecture, and for the ESA CCI predictions.

Another issue for AGB estimation is saturation, which happens when beyond a certain AGB threshold, the remote sensing information no longer reflects the change in biomass, making it challenging to estimate high values. Our results indicate that having access to additional features makes the AGB estimates better, particularly in the higher bins.

Results scheme

Figure 7. Binned mean test RMSE (↓) and associated standard deviation per model, with various inputs.

Citation

If you find our work helpful, consider citing our paper 🙂

@article{sialelli2025agbd,
      title={AGBD: A Global-scale Biomass Dataset}, 
      author={Ghjulia Sialelli and Torben Peters and Jan D. Wegner and Konrad Schindler},
      year={2025},
      journal={arXiv preprint arXiv:2406.04928}
}