Biologically weighted LASSO: enhancing functional interpretability in gene expression data analysis

Abstract Motivation Feature selection approaches are widely used in gene expression data analysis to identify the most relevant features and boost performance in regression and classification tasks. However, such algorithms solely consider each feature’s quantitative contribution to the task, possib...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics (Oxford, England) England), 2024-10, Vol.40 (10)
Hauptverfasser: Mongardi, Sofia, Cascianelli, Silvia, Masseroli, Marco
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Abstract Motivation Feature selection approaches are widely used in gene expression data analysis to identify the most relevant features and boost performance in regression and classification tasks. However, such algorithms solely consider each feature’s quantitative contribution to the task, possibly limiting the biological interpretability of the results. Feature-related prior knowledge, such as functional annotations and pathways information, can be incorporated into feature selection algorithms to potentially improve model performance and interpretability. Results We propose an embedded integrative approach to feature selection that combines weighted LASSO feature selection and prior biological knowledge in a single step, by means of a novel score of biological relevance that summarizes information extracted from popular biological knowledge bases. Findings from the performed experiments indicate that our proposed approach is able to identify the most predictive genes while simultaneously enhancing the biological interpretability of the results compared to the standard LASSO regularized model. Availability and implementation Code is available at https://github.com/DEIB-GECO/GIS-weigthed_LASSO.
ISSN:1367-4811
1367-4803
1367-4811
DOI:10.1093/bioinformatics/btae605