TCM2Vec: a detached feature extraction deep learning approach of traditional Chinese medicine for formula efficacy prediction

In current era, the intelligent development of traditional Chinese medicine (TCM) has attracted more and more attention. As the main carrier of clinical medication, formulas use synergies of active substances to enhance efficacy and reduce side effects. Related studies show that there is a nonlinear...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia tools and applications 2023-07, Vol.82 (17), p.26987-27004
Hauptverfasser: Gao, Wanqing, Cheng, Ning, Xin, Guojiang, Khantong, Sommai, Ding, Changsong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In current era, the intelligent development of traditional Chinese medicine (TCM) has attracted more and more attention. As the main carrier of clinical medication, formulas use synergies of active substances to enhance efficacy and reduce side effects. Related studies show that there is a nonlinear relationship between the efficacy of formulas and herbs. Deep learning is an effective technique for fitting nonlinear relationships. However, it is not good for using deep learning model directly due to ignoring the characteristics of formulas. In this paper, we propose a detached feature extraction approach (TCM2Vec) based on deep learning for better feature extraction and efficacy prediction. We build two detached encoders, one of it uses cross-feature-based unsupervised pre-training model (FMh2v) to extract the relationship features of herbal medicines for initializing, while the other one simulates multi-dimensional characteristics of medicines by normal distribution. Then we integrate relationships and medicinal characteristics for deep feature extraction. We processed 31,114 unlabeled formulas for pre-training and two classification tasks in-domain for predicting and fine-tuning. One of tasks is multi-classed with 1036 formulas, other one is multi-labelled with 1,723 formulas. For labelled formulas, different feature extraction models based on detached encoder are trained to predict efficacy. Compared with the no pre-training, CBOW and BERT baseline models, FMh2v leads to performance gains. Moreover, the detached encoder offers large positive effects in different models which for efficacy prediction, where ACC increased by 5.80% on average and F1 increased by 12.06% on average. Overall, the proposed feature extraction is an effective method for obtaining characteristic representation of TCM formulas, and provides reference for the adaptability of artificial intelligence technology in the domain of TCM.
ISSN:1380-7501
1573-7721
DOI:10.1007/s11042-023-14701-w