Examining evolutionary scale modeling‐derived different‐dimensional embeddings in the antimicrobial peptide classification through a KNIME workflow

Molecular features play an important role in different bio‐chem‐informatics tasks, such as the Quantitative Structure–Activity Relationships (QSAR) modeling. Several pre‐trained models have been recently created to be used in downstream tasks, either by fine‐tuning a specific model or by extracting...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Protein science 2024-04, Vol.33 (4), p.e4928-n/a
Hauptverfasser: Martínez‐Mauricio, Karla L., García‐Jacas, César R., Cordoves‐Delgado, Greneter
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Molecular features play an important role in different bio‐chem‐informatics tasks, such as the Quantitative Structure–Activity Relationships (QSAR) modeling. Several pre‐trained models have been recently created to be used in downstream tasks, either by fine‐tuning a specific model or by extracting features to feed traditional classifiers. In this regard, a new family of Evolutionary Scale Modeling models (termed as ESM‐2 models) was recently introduced, demonstrating outstanding results in protein structure prediction benchmarks. Herein, we studied the usefulness of the different‐dimensional embeddings derived from the ESM‐2 models to classify antimicrobial peptides (AMPs). To this end, we built a KNIME workflow to use the same modeling methodology across experiments in order to guarantee fair analyses. As a result, the 640‐ and 1280‐dimensional embeddings derived from the 30‐ and 33‐layer ESM‐2 models, respectively, are the most valuable  since statistically better performances were achieved by the QSAR models built from them. We also fused features of the different ESM‐2 models, and it was concluded that the fusion contributes to getting better QSAR models than using features of a single ESM‐2 model. Frequency studies revealed that only a portion of the ESM‐2 embeddings is valuable for modeling tasks since between 43% and 66% of the features were never used. Comparisons regarding state‐of‐the‐art deep learning (DL) models confirm that when performing methodologically principled studies in the prediction of AMPs, non‐DL based QSAR models yield comparable‐to‐superior performances to DL‐based QSAR models. The developed KNIME workflow is available‐freely at https://github.com/cicese-biocom/classification-QSAR-bioKom. This workflow can be valuable to avoid unfair comparisons regarding new computational methods, as well as to propose new non‐DL based QSAR models.
ISSN:0961-8368
1469-896X
1469-896X
DOI:10.1002/pro.4928