AlphaFold creates a 3D view of the protein universe

AlphaFold creates a 3D view of the protein universe

AlphaFold protein structure prediction

AlphaFold predicts the structure of almost every cataloged protein known to science. Credit: Karen Arnott/EMBL-EBI

AI-assisted predictions of the three-dimensional structures of almost every cataloged protein known to science have been made by DeepMind and EMBL’s European Bioinformatics Institute (EMBL-EBI). The catalog is freely and openly available to the scientific community via the AlphaFold Protein Structure Database.

The two organizations hope that the expanded database will further improve our understanding of biology and help countless more scientists in their work to tackle global challenges.

With this important milestone, the database will be expanded by a factor of about 200. It has grown from almost 1 million protein structures to over 200 million and now includes almost every organism on earth whose genome has been sequenced. Predicted structures for a wide range of species including plants, bacteria, animals and other organisms are now included in the expanded database. This opens new avenues of research in life sciences that will impact global challenges including sustainability, food insecurity and neglected diseases.

A predicted structure will now be available for virtually all protein sequences in the UniProt protein database. This release will also open up new research avenues, including support for bioinformatics and computational work, by allowing scientists to potentially identify patterns and trends in the database.

“AlphaFold now offers a 3D view of the protein universe,” said Edith Heard, Director General of EMBL. “The popularity and growth of the AlphaFold database testifies to the success of the collaboration between DeepMind and EMBL. It shows us a glimpse of the power of multidisciplinary science.”

“We have been amazed at the speed at which AlphaFold has already become an indispensable tool for hundreds of thousands of scientists in laboratories and universities around the world,” said Demis Hassabis, founder and CEO of DeepMind. “From fighting disease to tackling plastic pollution, AlphaFold has already made an incredible impact on some of our biggest global challenges. We hope that this expanded database will support countless more scientists in their important work and open entirely new avenues of scientific discovery.”


Q8W3K0: A potential resistance protein to plant diseases. Photo credit: AlphaFold

An indispensable tool for scientists

DeepMind and EMBL-EBI launched the AlphaFold database in July 2021. At the time, it contained more than 350,000 protein structure predictions, including the entire human proteome. Subsequent updates saw the addition of UniProtKB/SwissProt and 27 new proteomes, 17 of which represent neglected tropical diseases that continue to destroy the lives of more than 1 billion people worldwide.

More than 1,000 scholarly papers have cited the database, and over 500,000 researchers from over 190 countries have accessed the AlphaFold database to view over two million structures in just over a year.

The team has also seen researchers build on top of AlphaFold to create and customize tools like Foldseek and Dali that allow users to search for entries that resemble a specific protein. Others have taken the core machine learning ideas behind AlphaFold and form the backbone of a number of new algorithms in this field, or apply them to areas such as predicting RNA structures or developing new models for protein design.

Implications and future of AlphaFold and the database

AlphaFold has also contributed in areas such as improving our ability to fight plastic pollution, gaining insight into Parkinson’s disease, improving honey bee health, understanding ice formation, tackling neglected diseases like Chagas’ disease and leishmaniasis, and the Research into human evolution made an impact.

“We released AlphaFold in the hope that other teams could learn from and build on our progress, and it was exciting to see that happening so quickly. Many other AI research organizations have now entered the field and are building on AlphaFold’s advances to create further breakthroughs. This is truly a new era in structural biology, and AI-based methods will bring incredible advances,” said John Jumper, research scientist and leader of AlphaFold at DeepMind.

“AlphaFold has been sending waves through the molecular biology community. In the past year alone there have been over a thousand scholarly articles on a wide range of research topics using AlphaFold structures; I’ve never seen anything like it,” said Sameer Velankar, team leader of the EMBL-EBI protein database in Europe. “And that’s just the impact of a million predictions; Imagine the impact of having over 200 million protein structure predictions openly available in the AlphaFold database.”

DeepMind and EMBL-EBI will continue to update the database regularly with the aim of improving features and functionality in response to user feedback. Access to structures will continue to be fully open under a CC-BY 4.0 license, and bulk downloads will be made available through Google Cloud Public Datasets.

Leave a Reply

Your email address will not be published.