Here is the announcement from the European Bioinformatics Institute of the European Molecular Biology Laboratory (EMBL-EBI), “DeepMind and EMBL release the most complete database of predicted 3D structures of human proteins”.
22 July 2021, London – DeepMind today announced its partnership with the European Molecular Biology Laboratory (EMBL), Europe’s flagship laboratory for the life sciences, to make the most complete and accurate database yet of predicted protein structure models for the human proteome. This will cover all ~20,000 proteins expressed by the human genome, and the data will be freely and openly available to the scientific community. The database and artificial intelligence system provide structural biologists with powerful new tools for examining a protein’s three-dimensional structure, and offer a treasure trove of data that could unlock future advances and herald a new era for AI-enabled biology.
Last week, the methodology behind the latest highly innovative version of AlphaFold, the sophisticated AI system announced last December that powers these structure predictions, and its open source code were published in Nature . Today’s announcement coincides with a second Nature paper that provides the fullest picture of proteins that make up the human proteome, and the release of 20 additional organisms that are important for biological research.
In addition to the human proteome, the database launches with ~350,000 structures including 20 biologically-significant organisms such as E.coli, fruit fly, mouse, zebrafish, malaria parasite and tuberculosis bacteria. Research into these organisms has been the subject of countless research papers and numerous major breakthroughs. These structures will enable researchers across a huge variety of fields – from neuroscience to medicine – to accelerate their work.
The database and system will be periodically updated as we continue to invest in future improvements to AlphaFold, and over the coming months we plan to vastly expand the coverage to almost every sequenced protein known to science - over 100 million structures covering most of the UniProt reference database.