Back

Using Machine Learning to identify climate models

12. July 2024

Nowadays, Machine Learning (ML) can assist in identifying climate models based on daily output. How? Firstly, it is important to understand ML as the process of developing algorithms that enable computers to learn and make predictions or decisions based on data, without being explicitly programmed for each scenario. In that context, ML techniques, such as Convolutional Neural Networks (or CNNs), are increasingly utilized in Climate Science to evaluate climate models; identify model characteristics; and assess model performance in comparison to observational data.

The study “Identifying climate models based on their daily output using machine learning”, by researchers Lukas Brunner and Sebastian Sippel, shed light on the use of ML classifiers – such as the CNNs mentioned before. Specifically, on how they can be trained to robustly identify climate models using daily temperature output.

By analyzing individual daily temperature maps, ML methods can separate models from observations and from each other, even in the presence of considerable noise from internal variability on specific weather timescales (Brunner & Sippel, 2023). Internal variability refers to the fluctuations in the climate system that arise from various processes within the Earth’s atmosphere, oceans, and land surfaces that we might refer to as weather. Hence, the ML approach allows for the identification of models and observations based on short timescales, providing new ways to evaluate and interpret model differences.

Separating models from observations, and from each other

The study used daily temperature maps from 43 Coupled Model Intercomparison Projects (CMIP6) models and four different observational datasets. Additionally, ICON-Sapphire, one of the Earth system models developed by nextGEMS, was utilized as an experimental km-scale model. With that basis, two different statistical and ML methods were used to separate models from observations, and from each other.

Firstly, through logistic regression, the researchers were able to distinguish between models and observations because it allows the appreciation of the learned coefficients (Brunner & Sippel, 2023). The coefficients learned by the logistic regression classifier reveal that many well-known climatological model biases are already emerging as important for identifying daily maps. Nonetheless, other regions like the Arctic are not relevant for daily classification at all.

It is important to mention that logistic regression is a linear method and, after bias correction with the mean seasonal cycle, it is no longer skilful. To complement logistic regression, a second methodnamely CNN, specially due to the possibility of obtaining more trainable parameters that can also lean more complex, non-linear relationships within the data. (Brunner & Sippel, 2023)

Main findings and future directions

Some of the main results of this research work are related to the high accuracy achieved by CNN classifiers in identifying models and observational datasets, even when faced with complex classification tasks. Overall accuracy of 83% was achieved in identifying 43 models and four observational datasets (Brunner & Sippel, 2023). Moreover, CNNs could pick up unique patterns specific to each dataset, enabling successful separation from other datasets. Generally, it is important to take away that dependencies between models – and observations – emerge even on daily time scales.

On another note, Brunner and Sippel (2023) clarified that misclassifications often occurred within model families or were related to common “ancestors”, indicating shared features among related datasets. However, the study revealed the ability of the CNN to correctly identify a significant portion of test samples, even those from distant time periods and under different climate scenarios.

The authors anticipate a planned follow-up study that aims to analyze the origin of classification skill in more detail, using explainable ML techniques and domain-specific approaches from Climate Sciences. In other words, the follow-up study will investigate the coupling of atmosphere and ocean; surface energy balance in models; and targeted masking of regions to understand model performance dependencies.

If you are interested in working with this method, feel free to do so! The researchers have made the code used in the paper freely available on Github.

References:

Brunner, L., & Sippel, S. (2023). Identifying climate models based on their daily output using machine learning. Environmental Data Science, 2. https://doi.org/10.1017/eds.2023.23

Further Articles