A multi-cascaded model with data augmentation for enhanced paraphrase detection in short texts
•We present strategy to augment existing paraphrase and non-paraphrase annotations in sound manner for deep learning models.•We develop a novel multi-cascaded learning model for robust paraphrase detection in both clean and noisy texts.•We address both clean and noisy texts in our presentation and s...
Gespeichert in:
Veröffentlicht in: | Information processing & management 2020-05, Vol.57 (3), p.102204, Article 102204 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •We present strategy to augment existing paraphrase and non-paraphrase annotations in sound manner for deep learning models.•We develop a novel multi-cascaded learning model for robust paraphrase detection in both clean and noisy texts.•We address both clean and noisy texts in our presentation and show current best performances on benchmark datasets.•We study the impact of different components of our multi-cascaded model on paraphrase detection performance.•We study the impact of various data augmentation steps on paraphrase detection performance.
Paraphrase detection is an important task in text analytics with numerous applications such as plagiarism detection, duplicate question identification, and enhanced customer support helpdesks. Deep models have been proposed for representing and classifying paraphrases. These models, however, require large quantities of human-labeled data, which is expensive to obtain. In this work, we present a data augmentation strategy and a multi-cascaded model for improved paraphrase detection in short texts. Our data augmentation strategy considers the notions of paraphrases and non-paraphrases as binary relations over the set of texts. Subsequently, it uses graph theoretic concepts to efficiently generate additional paraphrase and non-paraphrase pairs in a sound manner. Our multi-cascaded model employs three supervised feature learners (cascades) based on CNN and LSTM networks with and without soft-attention. The learned features, together with hand-crafted linguistic features, are then forwarded to a discriminator network for final classification. Our model is both wide and deep and provides greater robustness across clean and noisy short texts. We evaluate our approach on three benchmark datasets and show that it produces a comparable or state-of-the-art performance on all three. |
---|---|
ISSN: | 0306-4573 1873-5371 |
DOI: | 10.1016/j.ipm.2020.102204 |