CPTAM: Constituency Parse Tree Aggregation Method
Diverse Natural Language Processing tasks employ constituency parsing to understand the syntactic structure of a sentence according to a phrase structure grammar. Many state-of-the-art constituency parsers are proposed, but they may provide different results for the same sentences, especially for co...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Diverse Natural Language Processing tasks employ constituency parsing to
understand the syntactic structure of a sentence according to a phrase
structure grammar. Many state-of-the-art constituency parsers are proposed, but
they may provide different results for the same sentences, especially for
corpora outside their training domains. This paper adopts the truth discovery
idea to aggregate constituency parse trees from different parsers by estimating
their reliability in the absence of ground truth. Our goal is to consistently
obtain high-quality aggregated constituency parse trees. We formulate the
constituency parse tree aggregation problem in two steps, structure aggregation
and constituent label aggregation. Specifically, we propose the first truth
discovery solution for tree structures by minimizing the weighted sum of
Robinson-Foulds (RF) distances, a classic symmetric distance metric between two
trees. Extensive experiments are conducted on benchmark datasets in different
languages and domains. The experimental results show that our method, CPTAM,
outperforms the state-of-the-art aggregation baselines. We also demonstrate
that the weights estimated by CPTAM can adequately evaluate constituency
parsers in the absence of ground truth. |
---|---|
DOI: | 10.48550/arxiv.2201.07905 |