Why is predicting protein function important

Press release: What artificial intelligence reveals about proteins

Alexa, Siri and Google Assistant have long since moved into everyday life as intelligent, machine companions. Intelligent computer programs, so-called algorithms, have also become indispensable in science. The large amounts of data that occur in life science research can be efficiently examined for recurring patterns with the help of algorithms. For example, certain programs can recognize which recurring structures occur in large protein molecules and draw conclusions from this as to which tasks they take on in cells, whether they are active as a gene switch, molecular motor or signal molecule. The predictions that such algorithms make on the basis of protein sequences, i.e. the pearl-chain-like sequence of protein building blocks, are now astonishingly accurate.

A decisive disadvantage of previous methods, however, is that it is in no way possible for users to understand why the algorithm assigns a certain function to certain protein sequences. The precise computer knowledge about proteins cannot be called up directly, although this knowledge would be of great value for research as well as the development of active substances.

A student team led by Roland Eils and Irina Lehmann from the Berlin Institute of Health (BIH) and the Charité – Universitätsmedizin Berlin and their colleague Dominik Niopek from the Institute for Pharmacy and Molecular Biotechnology (IPMB) at the University of Heidelberg have set themselves the goal of Computers to elicit this knowledge. It has been working on this topic since 2017 and developed the “DeeProtein” algorithm, a comprehensive, intelligent neural network that can predict the function of proteins based on the sequence of the individual protein building blocks, the amino acids. Like most learning algorithms, DeeProtein is also a “black box”; the way it works is hidden from both developers and users. But with a trick, the students now managed to elicit this secret from the network.

First of all, the young scientists developed a way of looking over the shoulder of the program while working: “In the sensitivity analysis, we cover each individual position in the protein sequence one after the other and let“ DeeProtein ”calculate and / or calculate the function of the protein from this incomplete information . prediction ", explains Julius Upmeier zu Belzen. He is a student in the master's program in Molecular Biotechnology at the IPMB and first author of the publication that has just been published in the journal" Nature Machine Intelligence "*." Then we give "DeeProtein" the complete sequence information and compare the two predictions. In this way we calculate for each individual position in the protein sequence how important it is for the correct prediction of the function. This means that we give each position or amino acid within the protein chain a sensitivity value for protein function ”.

The scientists then used this new analytical method to identify those areas in proteins that are crucial for their function. This worked for signal proteins that play a role during cancer development, as well as for the CRISPR-Cas9 gene scissors, which are already being tested many times in preclinical and clinical studies. “With the sensitivity analysis, we can identify areas in proteins that tolerate changes better or less well. This is an important first step if we want to change proteins in a targeted manner in order to give them new functions or to eliminate undesirable properties, ”says Dominik Niopek.

"With this work we show that not only the predictions of neural networks can be helpful, but that we can now also use their implicit knowledge in practice for the first time," explains Roland Eils. This approach is relevant for many questions in molecular biology and medicine. “If we want to develop targeted drugs or gene therapies, for example, we need to know exactly where to start. "DeeProtein" can now support us. "

* Upmeier to Belzen et al. (2019): Leveraging implicit knowledge in neural networks for functional dissection and engineering of proteins. Nature Machine Intelligence.

DOI: 10.1038 / s42256-019-0049-9