Tree kernel-based protein–protein interaction extraction from biomedical literature

[Display omitted] ► We investigate the synergy between constituent and dependency-based information. ► We reshape the constituent parse tree using the shortest dependency path. ► We find that this tree can significantly outperform other trees for PPI extraction. ► We conclude that this tree can repr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of biomedical informatics 2012-06, Vol.45 (3), p.535-543
Hauptverfasser: Qian, Longhua, Zhou, Guodong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:[Display omitted] ► We investigate the synergy between constituent and dependency-based information. ► We reshape the constituent parse tree using the shortest dependency path. ► We find that this tree can significantly outperform other trees for PPI extraction. ► We conclude that this tree can represent PPI instances concisely and precisely. There is a surge of research interest in protein–protein interaction (PPI) extraction from biomedical literature. While most of the state-of-the-art PPI extraction systems focus on dependency-based structured information, the rich structured information inherent in constituent parse trees has not been extensively explored for PPI extraction. In this paper, we propose a novel approach to tree kernel-based PPI extraction, where the tree representation generated from a constituent syntactic parser is further refined using the shortest dependency path between two proteins derived from a dependency parser. Specifically, all the constituent tree nodes associated with the nodes on the shortest dependency path are kept intact, while other nodes are removed safely to make the constituent tree concise and precise for PPI extraction. Compared with previously used constituent tree setups, our dependency-motivated constituent tree setup achieves the best results across five commonly used PPI corpora. Moreover, our tree kernel-based method outperforms other single kernel-based ones and performs comparably with some multiple kernel ones on the most commonly tested AIMed corpus.
ISSN:1532-0464
1532-0480
DOI:10.1016/j.jbi.2012.02.004