How important is causality in data science

On the trail of the causal connections of climate change

Measurement methods are becoming more and more sophisticated, research questions more complex, and data sources and types more diverse. The amount of data has increased by leaps and bounds over the past few years, especially in the natural sciences. Gaining new scientific knowledge about the development of the climate is hardly possible without computer science and data science. For this reason, a team from atmospheric research and data science has come together at the German Aerospace Center (DLR) to develop a new approach with climate researchers from Imperial College London. Their goal: to be able to use machine learning methods to better check whether they are suitable as tools for climate research and forecasting.

Computer models - the basis for a better understanding of weather and climate

Climate models are currently being developed in more than 40 research centers around the world. You are part of the Coupled Model Intercomparison Project Phase 6 (CMIP6) of the world climate research program currently headed by the DLR Institute for Atmospheric Physics. Climate models are used to calculate how climate change will affect global temperature rise and regional trends such as the amount of precipitation. They are therefore an important instrument to support the work of decision-makers in governments, state planning and companies.

With machine learning for causal climate model evaluation

In all sciences it is always very difficult to determine whether the change in one parameter is also the cause of that in another. In order to be able to make reliable analyzes and prognoses for the climate development, however, precisely such findings are important: Statements on cause-effect relationships, the causality. A correlation, the common occurrence of phenomena, cannot automatically be used to infer causality. Causality is highly complex and so-called causal inference is a sub-discipline of machine learning within the large research field of artificial intelligence. The DLR Institute for Data Science is developing under the direction of Dr. Jakob Runge modern methods of causal inference for earth system research.

Cause and effect in climate dynamics - the digital fingerprint

"Based on a causal algorithm developed at our institute, we have developed a method for evaluating climate models together with the Grantham Institute at Imperial College London and the DLR Institute for Atmospheric Physics create different climate models. Each fingerprint characterizes a causal network, i.e. cause-effect relationships, "explains Dr. Jakob Runge. "We can create such fingerprints on the basis of real measurement data and compare them with modeled fingerprints. With the method, climate models can be evaluated and uncertainties in precipitation and other climate projections can be reduced in the future."

The authors of the studyCausal networks for climate model evaluation and constrained projectionswho recently appeared in Nature Communications appeared, were able to deduce from such comparisons how well climate models depict reality. They found that models that more accurately reproduce the causal networks - the "fingerprints" - of real observational data also better model the precipitation patterns for different regions of the world, such as densely populated areas in Africa, Europe, North America or East Asia. In contrast, the researchers found that simpler methods for evaluating climate models do not allow exactly this in the same form. The results show how new, data-driven machine learning methods can be used to evaluate climate models - and, in the future, to improve understanding of climate change.