Chemical-protein Interaction Extraction via Gaussian Probability Distribution and External Biomedical Knowledge
Motivation: The biomedical literature contains a wealth of chemical-protein interactions (CPIs). Automatically extracting CPIs described in biomedical literature is essential for drug discovery, precision medicine, as well as basic biomedical research. Most existing methods focus only on the sentenc...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Motivation: The biomedical literature contains a wealth of chemical-protein
interactions (CPIs). Automatically extracting CPIs described in biomedical
literature is essential for drug discovery, precision medicine, as well as
basic biomedical research. Most existing methods focus only on the sentence
sequence to identify these CPIs. However, the local structure of sentences and
external biomedical knowledge also contain valuable information. Effective use
of such information may improve the performance of CPI extraction. Results: In
this paper, we propose a novel neural network-based approach to improve CPI
extraction. Specifically, the approach first employs BERT to generate
high-quality contextual representations of the title sequence, instance
sequence, and knowledge sequence. Then, the Gaussian probability distribution
is introduced to capture the local structure of the instance. Meanwhile, the
attention mechanism is applied to fuse the title information and biomedical
knowledge, respectively. Finally, the related representations are concatenated
and fed into the softmax function to extract CPIs. We evaluate our proposed
model on the CHEMPROT corpus. Our proposed model is superior in performance as
compared with other state-of-the-art models. The experimental results show that
the Gaussian probability distribution and external knowledge are complementary
to each other. Integrating them can effectively improve the CPI extraction
performance. Furthermore, the Gaussian probability distribution can effectively
improve the extraction performance of sentences with overlapping relations in
biomedical relation extraction tasks. Availability: Data and code are available
at https://github.com/CongSun-dlut/CPI_extraction. Contact: yangzh@dlut.edu.cn,
wangleibihami@gmail.com Supplementary information: Supplementary data are
available at Bioinformatics online. |
---|---|
DOI: | 10.48550/arxiv.1911.09487 |