Automatic text summarization for government news reports based on multiple features
The purpose of government news summarization is to extract the most important information from official government news reports. It is important for readers to be able to understand government news quickly in the age of information overload. Compared with other types of news, government news reports...
Gespeichert in:
Veröffentlicht in: | The Journal of supercomputing 2024-02, Vol.80 (3), p.3212-3228 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The purpose of government news summarization is to extract the most important information from official government news reports. It is important for readers to be able to understand government news quickly in the age of information overload. Compared with other types of news, government news reports have more detailed content and possess a more normative format, resulting in greater length. To resolve the contradiction between the length of these reports and the trend of fragmented reading, this research proposes an automatic text summarization model based on multiple features. First, the features are extracted using the TF–IDF algorithm and word vector embedding method based on the bidirectional encoder representation from the transformers model. Second, we score the sentences based on the position, keywords, and similarity features. Finally, the top-ranked sentence is selected to form the summarization. To verify the effectiveness and superiority of the proposed method, Edmundson and ROUGE were adopted. First, based on the Edmundson evaluation criteria, the summarization results of various methods were scored. The score differences between ATS summarization based on the proposed method and manual summarization were minimal across consistency, grammaticality, time sequence, conciseness, and readability, with values of 0.14, 0.18, 0.12, 0.10, and 0.16, respectively. This suggests that our summarization exhibits the highest similarity to manual summarization. Second, we evaluated the summarization results using the ROUGE evaluation criteria. The results indicate that the proposed method achieved significantly higher scores compared to other models. Specifically, for character-level ROUGE-1, the
P
,
R
, and
F
scores reached 0.84, 0.93, and 0.88, respectively. At the word-level ROUGE-1, the
P
,
R
, and
F
scores were 0.81, 0.89, and 0.85, respectively, which demonstrates a noticeable improvement over other models. Furthermore, compared to manual methods, the proposed method has advantages in assisting the reader to obtain important information rapidly. |
---|---|
ISSN: | 0920-8542 1573-0484 |
DOI: | 10.1007/s11227-023-05599-0 |