The combination of term relations analysis and weighted frequent itemset model for multidocument summarization
Nowadays, it is necessary that users have access to information in a concise form without losing any critical information. Document summarization is an automatic process of generating a short form from a document. In itemset‐based document summarization, the weights of all terms are considered the s...
Gespeichert in:
Veröffentlicht in: | Computational intelligence 2020-05, Vol.36 (2), p.783-812 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Nowadays, it is necessary that users have access to information in a concise form without losing any critical information. Document summarization is an automatic process of generating a short form from a document. In itemset‐based document summarization, the weights of all terms are considered the same. In this paper, a new approach is proposed for multidocument summarization based on weighted patterns and term association measures. In the present study, the weights of the terms are not equal in the context and are computed based on weighted frequent itemset mining. Indeed, the proposed method enriches frequent itemset mining by weighting the terms in the corpus. In addition, the relationships among the terms in the corpus have been considered using term association measures. Also, the statistical features such as sentence length and sentence position have been modified and matched to generate a summary based on the greedy method. Based on the results of the DUC 2002 and DUC 2004 datasets obtained by the ROUGE toolkit, the proposed approach can outperform the state‐of‐the‐art approaches significantly. |
---|---|
ISSN: | 0824-7935 1467-8640 |
DOI: | 10.1111/coin.12270 |