The Pfam protein families database: embracing AI/ML

The Pfam protein families database is a comprehensive collection of protein domains and families used for genome annotation and protein structure and function analysis (https://www.ebi.ac.uk/interpro/). This update describes major developments in Pfam since 2020, including decommissioning the Pfam w...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nucleic acids research 2024-11, Vol.53 (D1), p.D523-D534
Hauptverfasser: Paysan-Lafosse, Typhaine, Andreeva, Antonina, Blum, Matthias, Chuguransky, Sara Rocio, Grego, Tiago, Pinto, Beatriz Lazaro, Salazar, Gustavo A, Bileschi, Maxwell L, Llinares-López, Felipe, Meng-Papaxanthos, Laetitia, Colwell, Lucy J, Grishin, Nick V, Schaeffer, R Dustin, Clementel, Damiano, Tosatto, Silvio C E, Sonhammer, Erik, Wood, Valerie, Bateman, Alex
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The Pfam protein families database is a comprehensive collection of protein domains and families used for genome annotation and protein structure and function analysis (https://www.ebi.ac.uk/interpro/). This update describes major developments in Pfam since 2020, including decommissioning the Pfam website and integration with InterPro, harmonization with the ECOD structural classification, and expanded curation of metagenomic, microprotein and repeat-containing families. We highlight how AlphaFold structure predictions are being leveraged to refine domain boundaries and identify new domains. New families discovered through large-scale sequence similarity analysis of AlphaFold models are described. We also detail the development of Pfam-N, which uses deep learning to expand family coverage, achieving an 8.8% increase in UniProtKB coverage compared to standard Pfam. We discuss plans for more frequent Pfam releases integrated with InterPro and the potential for artificial intelligence to further assist curation. Despite recent advances, many protein families remain to be classified, and Pfam continues working toward comprehensive coverage of the protein universe.
ISSN:0305-1048
1362-4962
1362-4962
DOI:10.1093/nar/gkae997