Cross-document attention-based gated fusion network for automated medical licensing exam

One of the applications of machine-learning in the medical industry is to automatically learn knowledge from medical textbooks and transfer medical knowledge into diagnosis abilities. Because of complex nature of medical issues, the learning process usually requires multiple knowledge documents to f...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications 2022-11, Vol.205, p.117588, Article 117588
Hauptverfasser:	Liu, Jiandong, Ren, Jianfeng, Lu, Zheng, He, Wentao, Cui, Menglin, Zhang, Zibo, Bai, Ruibin
Format:	Artikel
Sprache:	eng
Schlagworte:	Clinical diagnosis Machine reading comprehension Multiple document reasoning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	One of the applications of machine-learning in the medical industry is to automatically learn knowledge from medical textbooks and transfer medical knowledge into diagnosis abilities. Because of complex nature of medical issues, the learning process usually requires multiple knowledge documents to form a comprehensive reasoning chain for diagnosis, which increases the difficulty of the automatic learning process. Existing models for multiple document comprehension either concatenate multiple documents together for inference or reason on every document independently. In this paper, we propose a Co-Attention-based Multi-document Inference (CAMI) framework for better reasoning over multiple documents. The proposed framework makes use of not only the attentional information among questions, answers and support documents but also the complementary attentional information across different documents. In addition, a gated fusion network is designed to fuse the cross-document information. The proposed model outperforms the state-of-the-art methods on Chinese National Medical Licensing Examination (CNMLE) dataset, ClinicQA, which contains 27,432 plain text documents and 13,827 CNMLE questions. We intend to make it publicly available as the first clinical OpenQA dataset. •A CAMI frame is proposed to tackle the OpenQA medical MRC tasks.•The proposed CDCA could extract the attentional information across documents.•The proposed HGFN could dynamically fuse information from multiple documents.•The proposed ClinicQA is the first public dataset to evaluate clinical diagnosis ability.•The proposed method greatly outperforms SOTA openQA medical MRC models.
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2022.117588