Abstractive Summarization of Large Document Collections Using GPT
This paper proposes a method of abstractive summarization designed to scale to document collections instead of individual documents. Our approach applies a combination of semantic clustering, document size reduction within topic clusters, semantic chunking of a cluster's documents, GPT-based su...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper proposes a method of abstractive summarization designed to scale
to document collections instead of individual documents. Our approach applies a
combination of semantic clustering, document size reduction within topic
clusters, semantic chunking of a cluster's documents, GPT-based summarization
and concatenation, and a combined sentiment and text visualization of each
topic to support exploratory data analysis. Statistical comparison of our
results to existing state-of-the-art systems BART, BRIO, PEGASUS, and MoCa
using ROGUE summary scores showed statistically equivalent performance with
BART and PEGASUS on the CNN/Daily Mail test dataset, and with BART on the
Gigaword test dataset. This finding is promising since we view document
collection summarization as more challenging than individual document
summarization. We conclude with a discussion of how issues of scale are |
---|---|
DOI: | 10.48550/arxiv.2310.05690 |