A Survey of Recent Abstract Summarization Techniques
This paper surveys several recent abstract summarization methods: T5, Pegasus, and ProphetNet. We implement the systems in two languages: English and Indonesian languages. We investigate the impact of pre-training models (one T5, three Pegasuses, three ProphetNets) on several Wikipedia datasets in E...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper surveys several recent abstract summarization methods: T5,
Pegasus, and ProphetNet. We implement the systems in two languages: English and
Indonesian languages. We investigate the impact of pre-training models (one T5,
three Pegasuses, three ProphetNets) on several Wikipedia datasets in English
and Indonesian language and compare the results to the Wikipedia systems'
summaries. The T5-Large, the Pegasus-XSum, and the ProphetNet-CNNDM provide the
best summarization. The most significant factors that influence ROUGE
performance are coverage, density, and compression. The higher the scores, the
better the summary. Other factors that influence the ROUGE scores are the
pre-training goal, the dataset's characteristics, the dataset used for testing
the pre-trained model, and the cross-lingual function. Several suggestions to
improve this paper's limitation are: 1) assure that the dataset used for the
pre-training model must sufficiently large, contains adequate instances for
handling cross-lingual purpose; 2) Advanced process (finetuning) shall be
reasonable. We recommend using the large dataset consists of comprehensive
coverage of topics from many languages before implementing advanced processes
such as the train-infer-train procedure to the zero-shot translation in the
training stage of the pre-training model. |
---|---|
DOI: | 10.48550/arxiv.2105.00824 |