Topic modeling combined with classification technique for extractive multi-document text summarization
The qualities of human readable summaries available in the datasets are not up to the mark, leading to issues in creating an accurate model for text summarization. Although recent works have been largely built upon this issue and set up a strong platform for further improvements, they still have man...
Gespeichert in:
Veröffentlicht in: | Soft computing (Berlin, Germany) Germany), 2021, Vol.25 (2), p.1113-1127 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The qualities of human readable summaries available in the datasets are not up to the mark, leading to issues in creating an accurate model for text summarization. Although recent works have been largely built upon this issue and set up a strong platform for further improvements, they still have many limitations. Looking in this direction, the paper proposes a novel methodology for summarizing a corpus of documents to generate a coherent summary using topic modeling and classification technique. The objectives of the propose work are highlighted below:
A novel heuristic approach is introduced to find out the actual number of topics that exist in a corpus of documents which handles the stochastic nature of latent dirichlet allocation.
A large corpus of documents is handled by minimizing the huge set of sentences into a small set without losing the important one and thus providing a concise and information rich summary at the end.
Ensuring that the sentences are arranged as per their importance in the coherent summary.
Results of the experiment are compared with the state-of-the-art summary systems.
The outcomes of the empirical work show that the proposed model is more promising compared to the well-known text summarization models. |
---|---|
ISSN: | 1432-7643 1433-7479 |
DOI: | 10.1007/s00500-020-05207-w |