Examining evolutionary scale modeling‐derived different‐dimensional embeddings in the antimicrobial peptide classification through a KNIME workflow
Molecular features play an important role in different bio‐chem‐informatics tasks, such as the Quantitative Structure–Activity Relationships (QSAR) modeling. Several pre‐trained models have been recently created to be used in downstream tasks, either by fine‐tuning a specific model or by extracting...
Gespeichert in:
Veröffentlicht in: | Protein science 2024-04, Vol.33 (4), p.e4928-n/a |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Molecular features play an important role in different bio‐chem‐informatics tasks, such as the Quantitative Structure–Activity Relationships (QSAR) modeling. Several pre‐trained models have been recently created to be used in downstream tasks, either by fine‐tuning a specific model or by extracting features to feed traditional classifiers. In this regard, a new family of Evolutionary Scale Modeling models (termed as ESM‐2 models) was recently introduced, demonstrating outstanding results in protein structure prediction benchmarks. Herein, we studied the usefulness of the different‐dimensional embeddings derived from the ESM‐2 models to classify antimicrobial peptides (AMPs). To this end, we built a KNIME workflow to use the same modeling methodology across experiments in order to guarantee fair analyses. As a result, the 640‐ and 1280‐dimensional embeddings derived from the 30‐ and 33‐layer ESM‐2 models, respectively, are the most valuable since statistically better performances were achieved by the QSAR models built from them. We also fused features of the different ESM‐2 models, and it was concluded that the fusion contributes to getting better QSAR models than using features of a single ESM‐2 model. Frequency studies revealed that only a portion of the ESM‐2 embeddings is valuable for modeling tasks since between 43% and 66% of the features were never used. Comparisons regarding state‐of‐the‐art deep learning (DL) models confirm that when performing methodologically principled studies in the prediction of AMPs, non‐DL based QSAR models yield comparable‐to‐superior performances to DL‐based QSAR models. The developed KNIME workflow is available‐freely at https://github.com/cicese-biocom/classification-QSAR-bioKom. This workflow can be valuable to avoid unfair comparisons regarding new computational methods, as well as to propose new non‐DL based QSAR models. |
---|---|
ISSN: | 0961-8368 1469-896X 1469-896X |
DOI: | 10.1002/pro.4928 |