Optimal modeling of anti-breast cancer candidate drugs screening based on multi-model ensemble learning with imbalanced data

The imbalanced data makes the machine learning model seriously biased, which leads to false positive in screening of therapeutic drugs for breast cancer. In order to deal with this problem, a multi-model ensemble framework based on tree-model, linear model and deep-learning model is proposed. Based...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Mathematical biosciences and engineering : MBE 2023-01, Vol.20 (3), p.5117-5134
Hauptverfasser:	Zhou, Juan, Li, Xiong, Ma, Yuanting, Wu, Zejiu, Xie, Ziruo, Zhang, Yuqi, Wei, Yiming
Format:	Artikel
Sprache:	eng
Schlagworte:	admet Breast Neoplasms - drug therapy Early Detection of Cancer ensemble algorithm estrogen receptor feature selection Female Humans imbalanced data Linear Models Machine Learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The imbalanced data makes the machine learning model seriously biased, which leads to false positive in screening of therapeutic drugs for breast cancer. In order to deal with this problem, a multi-model ensemble framework based on tree-model, linear model and deep-learning model is proposed. Based on the methodology constructed in this study, we screened the 20 most critical molecular descriptors from 729 molecular descriptors of 1974 anti-breast cancer drug candidates and, in order to measure the pharmacokinetic properties and safety of the drug candidates, the screened molecular descriptors were used in this study for subsequent bioactivity, absorption, distribution metabolism, excretion, toxicity, and other prediction tasks. The results show that the method constructed in this study is superior and more stable than the individual models used in the ensemble approach.
ISSN:	1551-0018 1551-0018
DOI:	10.3934/mbe.2023237