Easy Gems is an online platform that consolidates information on high-resolution climate simulations produced by the nextGEMS project and other European initiatives, such as European Eddy-Rich ESMs (EERIE), WarmWorld, and the DYAMOND initiative. Developed by the German Climate Computing Center (DKRZ), this platform serves as a repository of best practices and a comprehensive guide to create climate models. According to DKRZ Senior Scientist Florian Ziemen, one of the key aspects behind the creation of Easy Gems was “the idea was to have one place where you can find everything you need when you want to analyze high-resolution simulations.”
At the same time, the goal was to build something that was not tied strictly to a single project, thereby ensuring the sustainability of the platform and its data even after individual projects conclude. This approach prevents the platform from ending up in the „website dumpster,“ Ziemen explained. As a result, Easy Gems is designed to extend beyond individual projects by offering access to all simulation data—or „simulation gems“—hosted at DKRZ, while also functioning as a how-to resource or e-book.
Since nextGEMS drives the development of two European storm-resolving Earth-system models — the ECMWF Integrated Forecasting System (IFS) and the Icosahedral Nonhydrostatic Weather and Climate Model (ICON) — Easy Gems includes details on three development cycles of these Earth System Models, as well as some pre-final simulations. This encompasses simulations at various horizontal resolutions and evolving model configurations.
Additionally, the platform offers guidance on logging data, plotting data, and applying a variety of best practices in data processing. It is entirely user-driven, meaning that the Easy Gems community actively contributes and keeps the content up-to-date. Beyond serving as a guide, the platform acts as a comprehensive documentation tool for the project, providing access to project outputs that are further illustrated and explained.
The platform is organized into three main sections: Simulations, Processing, and Contribute. The Simulations section provides detailed information on the simulations currently available, while the Processing section offers tips and example scripts for data handling. Finally, the Contribute section explains how users can collaborate and become part of this community-driven effort.
Easy Gems encourages contributions from anyone, regardless of their background. Contributions can range from reporting errors and requesting additional articles to suggesting clearer descriptions or improved illustrations. To contribute and interact with Easy Gems, users need a DKRZ account. Additionally, additions require approval from other members of the community to ensure that the input is correct and avoid redundancies. Currently, there are 3,260 registered accounts, with approximately 490 users granted access to all the data hosted on the platform.
If you’re interested, feel free to check out Easy Gems here!
Nowadays, Machine Learning (ML) can assist in identifying climate models based on daily output. How? Firstly, it is important to understand ML as the process of developing algorithms that enable computers to learn and make predictions or decisions based on data, without being explicitly programmed for each scenario. In that context, ML techniques, such as Convolutional Neural Networks (or CNNs), are increasingly utilized in Climate Science to evaluate climate models; identify model characteristics; and assess model performance in comparison to observational data.
The study “Identifying climate models based on their daily output using machine learning”, by researchers Lukas Brunner and Sebastian Sippel, shed light on the use of ML classifiers – such as the CNNs mentioned before. Specifically, on how they can be trained to robustly identify climate models using daily temperature output.
By analyzing individual daily temperature maps, ML methods can separate models from observations and from each other, even in the presence of considerable noise from internal variability on specific weather timescales (Brunner & Sippel, 2023). Internal variability refers to the fluctuations in the climate system that arise from various processes within the Earth’s atmosphere, oceans, and land surfaces that we might refer to as weather. Hence, the ML approach allows for the identification of models and observations based on short timescales, providing new ways to evaluate and interpret model differences.
The study used daily temperature maps from 43 Coupled Model Intercomparison Projects (CMIP6) models and four different observational datasets. Additionally, ICON-Sapphire, one of the Earth system models developed by nextGEMS, was utilized as an experimental km-scale model. With that basis, two different statistical and ML methods were used to separate models from observations, and from each other.
Firstly, through logistic regression, the researchers were able to distinguish between models and observations because it allows the appreciation of the learned coefficients (Brunner & Sippel, 2023). The coefficients learned by the logistic regression classifier reveal that many well-known climatological model biases are already emerging as important for identifying daily maps. Nonetheless, other regions like the Arctic are not relevant for daily classification at all.
It is important to mention that logistic regression is a linear method and, after bias correction with the mean seasonal cycle, it is no longer skilful. To complement logistic regression, a second methodnamely CNN, specially due to the possibility of obtaining more trainable parameters that can also lean more complex, non-linear relationships within the data. (Brunner & Sippel, 2023)
Some of the main results of this research work are related to the high accuracy achieved by CNN classifiers in identifying models and observational datasets, even when faced with complex classification tasks. Overall accuracy of 83% was achieved in identifying 43 models and four observational datasets (Brunner & Sippel, 2023). Moreover, CNNs could pick up unique patterns specific to each dataset, enabling successful separation from other datasets. Generally, it is important to take away that dependencies between models – and observations – emerge even on daily time scales.
On another note, Brunner and Sippel (2023) clarified that misclassifications often occurred within model families or were related to common “ancestors”, indicating shared features among related datasets. However, the study revealed the ability of the CNN to correctly identify a significant portion of test samples, even those from distant time periods and under different climate scenarios.
The authors anticipate a planned follow-up study that aims to analyze the origin of classification skill in more detail, using explainable ML techniques and domain-specific approaches from Climate Sciences. In other words, the follow-up study will investigate the coupling of atmosphere and ocean; surface energy balance in models; and targeted masking of regions to understand model performance dependencies.
If you are interested in working with this method, feel free to do so! The researchers have made the code used in the paper freely available on Github.
References:
Brunner, L., & Sippel, S. (2023). Identifying climate models based on their daily output using machine learning. Environmental Data Science, 2. https://doi.org/10.1017/eds.2023.23