How are scientists using Machine Learning to evaluate climate models?

Nowadays, Machine Learning (ML) can assist in identifying climate models based on daily output. How? Firstly, it is important to understand ML as the process of developing algorithms that enable computers to learn and make predictions or decisions based on data, without being explicitly programmed for each scenario. In that context, ML techniques, such as Convolutional Neural Networks (or CNNs), are increasingly utilized in Climate Science to evaluate climate models; identify model characteristics; and assess model performance in comparison to observational data.

The study “Identifying climate models based on their daily output using machine learning”, by researchers Lukas Brunner and Sebastian Sippel, shed light on the use of ML classifiers – such as the CNNs mentioned before. Specifically, on how they can be trained to robustly identify climate models using daily temperature output.

By analyzing individual daily temperature maps, ML methods can separate models from observations and from each other, even in the presence of considerable noise from internal variability on specific weather timescales (Brunner & Sippel, 2023). Internal variability refers to the fluctuations in the climate system that arise from various processes within the Earth’s atmosphere, oceans, and land surfaces that we might refer to as weather. Hence, the ML approach allows for the identification of models and observations based on short timescales, providing new ways to evaluate and interpret model differences.

Separating models from observations, and from each other

The study used daily temperature maps from 43 Coupled Model Intercomparison Projects (CMIP6) models and four different observational datasets. Additionally, ICON-Sapphire, one of the Earth system models developed by nextGEMS, was utilized as an experimental km-scale model. With that basis, two different statistical and ML methods were used to separate models from observations, and from each other.

Firstly, through logistic regression, the researchers were able to distinguish between models and observations because it allows the appreciation of the learned coefficients (Brunner & Sippel, 2023). The coefficients learned by the logistic regression classifier reveal that many well-known climatological model biases are already emerging as important for identifying daily maps. Nonetheless, other regions like the Arctic are not relevant for daily classification at all.

It is important to mention that logistic regression is a linear method and, after bias correction with the mean seasonal cycle, it is no longer skilful. To complement logistic regression, a second methodnamely CNN, specially due to the possibility of obtaining more trainable parameters that can also lean more complex, non-linear relationships within the data. (Brunner & Sippel, 2023)

Main findings and future directions

Some of the main results of this research work are related to the high accuracy achieved by CNN classifiers in identifying models and observational datasets, even when faced with complex classification tasks. Overall accuracy of 83% was achieved in identifying 43 models and four observational datasets (Brunner & Sippel, 2023). Moreover, CNNs could pick up unique patterns specific to each dataset, enabling successful separation from other datasets. Generally, it is important to take away that dependencies between models – and observations – emerge even on daily time scales.

On another note, Brunner and Sippel (2023) clarified that misclassifications often occurred within model families or were related to common “ancestors”, indicating shared features among related datasets. However, the study revealed the ability of the CNN to correctly identify a significant portion of test samples, even those from distant time periods and under different climate scenarios.

The authors anticipate a planned follow-up study that aims to analyze the origin of classification skill in more detail, using explainable ML techniques and domain-specific approaches from Climate Sciences. In other words, the follow-up study will investigate the coupling of atmosphere and ocean; surface energy balance in models; and targeted masking of regions to understand model performance dependencies.

If you are interested in working with this method, feel free to do so! The researchers have made the code used in the paper freely available on Github.

References:

Brunner, L., & Sippel, S. (2023). Identifying climate models based on their daily output using machine learning. Environmental Data Science, 2. https://doi.org/10.1017/eds.2023.23

by Jonathan Wille from the Swiss Federal Institute of Technology in Zurich (ETH Zurich)

Temperature changes within a day (diurnal temperature range) and the day-to-day temperature changes (inter-diurnal temperature variability) have major impacts on agricultural practices and energy providers. Large swings in temperature can introduce heat and water stress for crop yields while creating sudden spikes in energy demand (Lobell, 2007). Thus, properly simulating temperature variability in both the present and future climate is essential for projecting potential global warming impacts on daily heat stress. The enhanced resolution of the nextGEMS models offer a more detailed picture on local temperature variability thus potentially enabling communities to better prepare for future temperature variations. Before this can be realized, we must test the realism of these processes in the nextGEMS IFS and ICON models (cycle 3) in the present climate to ensure their future projections can be considered reliable.

Temperature variability in mountainous regions

Focusing first on the high-resolution capabilities of the IFS and ICON models, we see that possessing a horizontal resolution of ~5 km allows both models to capture the variations in the DTR (diurnal temperature range) across the complex topography of the European Alpes during the wintertime months of December, January, and February (Figure 1a and 1b). To verify if the values shown here can be considered realistic, we use the Copernicus European Regional ReAnalysis (CERRA) which uses past measurements and data assimilation to create a high-resolution depiction of temperature behavior. When compared with the nextGEMS IFS and ICON, we see that the ICON has a DTR that is much larger than observed in CERRA, while the IFS is closer to the reanalysis temperature behavior (Figure 1c and 1d). When looking at the IDTV (inter-diurnal temperature variability), similar patterns in the ICON and IFS biases are observed. These differences which are greatest in the winter months may be the result of the ICON having too few clouds thus creating greater daily temperature variability, but further testing is needed to verify this.

Figure 1: The average diurnal temperature range for DJF (December, January, and February) for a) ICON and b) IFS models along with their respective differences (depicted as difference in standard deviations) from the CERRA reanalysis (c, d).

A global look at temperature variability

To see whether these patterns in temperature variability and associated biases in the European Alpes are isolated examples or representative of the broader globe, we repeated a similar analysis globally using the ERA5 reanalysis with the nextGEMS ICON and IFS interpolated to a lower resolution. Within the ICON model, there is a great deal of spatial variability, but some patterns emerge. For instance, the high DTR bias in the European Alpes are observed again in the South American Andes and parts of the Himalayas (Figure 1a and 1c). There is also a strong gradient between overestimated and underestimated DTR separating the higher and lower latitudes respectively in the Northern hemisphere during winter. The DTR simulated in the IFS model is generally closer to the ERA5 reanalysis aside from notable areas of overestimation in equatorial regions of Southern America and Africa (Figure 1b and 1d).

Figure 2: The global differences in average diurnal temperature range for(a, c) ICON and (b, d) IFS for (a ,b) DJF (December, January, and February) and (c, d) JJA (June, July, and August) when compared with the ERA5 reanalysis.

Final thoughts

These preliminary results demonstrate the added benefits in resolving temperature variability in complex terrain using the nextGEMS storm-resolving Earth system models. The ability to resolve temperature variability within individual mountain valleys and peaks will prove beneficial to the communities residing in these areas when planning for future changes in temperature. This analysis also reveals areas where the nextGEMS models’ temperature behavior differs from observations, especially in the ICON model. While these biases are sometimes significant, similar biases also appear in coarser resolution CMIP5 simulations (Cattiaux et al., 2015), highlighting the challenge of accurately simulating daily temperature variability.

References

Cattiaux, J., Douville, H., Schoetter, R., Parey, S., & Yiou, P. (2015). Projected increase in diurnal and interdiurnal variations of European summer temperatures. Geophysical Research Letters, 42(3), 899–907. https://doi.org/10.1002/2014GL062531
Lobell, D. B. (2007). Changes in diurnal temperature range and national cereal yields. Agricultural and Forest Meteorology, 145(3), 229–238. https://doi.org/10.1016/j.agrformet.2007.05.002