Topic modeling combined with classification technique for extractive multi-document text summarization

The qualities of human readable summaries available in the datasets are not up to the mark, leading to issues in creating an accurate model for text summarization. Although recent works have been largely built upon this issue and set up a strong platform for further improvements, they still have man...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Soft computing (Berlin, Germany) Germany), 2021, Vol.25 (2), p.1113-1127
1. Verfasser: Roul, Rajendra Kumar
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The qualities of human readable summaries available in the datasets are not up to the mark, leading to issues in creating an accurate model for text summarization. Although recent works have been largely built upon this issue and set up a strong platform for further improvements, they still have many limitations. Looking in this direction, the paper proposes a novel methodology for summarizing a corpus of documents to generate a coherent summary using topic modeling and classification technique. The objectives of the propose work are highlighted below: A novel heuristic approach is introduced to find out the actual number of topics that exist in a corpus of documents which handles the stochastic nature of latent dirichlet allocation. A large corpus of documents is handled by minimizing the huge set of sentences into a small set without losing the important one and thus providing a concise and information rich summary at the end. Ensuring that the sentences are arranged as per their importance in the coherent summary. Results of the experiment are compared with the state-of-the-art summary systems. The outcomes of the empirical work show that the proposed model is more promising compared to the well-known text summarization models.
ISSN:1432-7643
1433-7479
DOI:10.1007/s00500-020-05207-w