Elliptic geometry-based kernel matrix for improved biological sequence classification

Protein sequence classification plays a pivotal role in bioinformatics as it enables the comprehension of protein functions and their involvement in diverse biological processes. While numerous machine learning models have been proposed to tackle this challenge, traditional approaches face limitatio...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge-based systems 2024-11, Vol.304, p.112479, Article 112479
Hauptverfasser: Ali, Sarwan, Shabbir, Madiha, Mansoor, Haris, Chourasia, Prakash, Patterson, Murray
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Protein sequence classification plays a pivotal role in bioinformatics as it enables the comprehension of protein functions and their involvement in diverse biological processes. While numerous machine learning models have been proposed to tackle this challenge, traditional approaches face limitations in capturing the intricate relationships and hierarchical structures inherent in genomic sequences. These limitations stem from operating within high-dimensional non-Euclidean spaces. To address this issue, we introduce the application of the elliptic geometry-based approach for protein sequence classification. First, we transform the problem in elliptic geometry and integrate it with the Gaussian kernel to map the problem into the Mercer kernel. The Gaussian-Elliptic approach allows for the implicit mapping of data into a higher-dimensional feature space, enabling the capture of complex nonlinear relationships. This feature becomes particularly advantageous when dealing with hierarchical or tree-like structures commonly encountered in biological sequences. Experimental results highlight the effectiveness of the proposed model in protein sequence classification, showcasing the advantages of utilizing elliptic geometry in bioinformatics analyses. It outperforms state-of-the-art methods by achieving 76% and 84% accuracies for DNA and Protein datasets, respectively. Furthermore, we provide theoretical justifications for the proposed model. This study contributes to the burgeoning field of geometric deep learning, offering insights into the potential applications of elliptic representations in the analysis of biological data.
ISSN:0950-7051
DOI:10.1016/j.knosys.2024.112479