Automatic Text Summarization Method Based on Improved TextRank Algorithm and K-Means Clustering

Automatic text summarization is to obtain a summary by compressing the text while retaining its important information. Then users can obtain the important content of the text by reading the summary. In the research literatures, the extraction summary method is widely used and is also one type of the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge-based systems 2024-03, Vol.287, p.111447, Article 111447
Hauptverfasser: Liu, Wenjun, Sun, Yuyan, Yu, Bao, Wang, Hailan, Peng, Qingcheng, Hou, Mengshu, Guo, Huan, Wang, Hai, Liu, Cheng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Automatic text summarization is to obtain a summary by compressing the text while retaining its important information. Then users can obtain the important content of the text by reading the summary. In the research literatures, the extraction summary method is widely used and is also one type of the main research methods of summary methods. However, this extraction summary method still has some problems. The selection of the initial cluster center has not been carefully determined, and the sentence redundancy summarized is high in articles with complex sentences. In order to solve the above problems, this paper proposes an automatic text summarization method based on improved TextRank algorithm and K-Means clustering. This method combines the improved BM25 model and the TextRank algorithm to calculate the BM25 similarity between sentences and obtain the TR scores of sentences. The TR scores are used to select the initial center of clustering based on similarity difference judgment and maximum judgment. The final summary is obtained by combining the cluster scores and sentence scores. The experimental results show that the proposed method in this paper has better evaluation indicators containing ROUGE-1, ROUGE-2 and ROUGE-L than other comparison algorithms including Lead-3, TextRank and MBM25EMB on the DUC2004 dataset. In conclusion, the proposed method in this paper improves the accuracy of automatic text summarization and reduce the redundancy from documents.
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2024.111447