Social context summarization using user-generated content and third-party sources

•A novel framework for social context summarization is proposed.•The framework relies on the reinforcement support of external information.•23 features in three groups: local, user-generated, and third-party are proposed.•A new open-domain dataset is created and manually annotated.•Combining interna...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge-based systems 2018-03, Vol.144, p.51-64
Hauptverfasser: Nguyen, Minh-Tien, Tran, Duc-Vu, Nguyen, Le-Minh
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•A novel framework for social context summarization is proposed.•The framework relies on the reinforcement support of external information.•23 features in three groups: local, user-generated, and third-party are proposed.•A new open-domain dataset is created and manually annotated.•Combining internal and external information benefits the summarization. In the context of social media, users mutually share their interests of an event mentioned in a Web document. Its content can also be found in different news providers with a writing variation. This paper presents a framework which exploits the support of social context (user-generated content such as comments or tweets and third-party sources such as relevant documents retrieved from a search engine) to extract high-quality summaries. The extraction was formulated in two steps: sentence scoring and selection. The scoring is modeled as a learning to rank problem, which employs Ranking SVM to mutually exploits sentences, user-generated content, and third-party sources in the form of features to cover summary aspects. For the selection, summaries are extracted by using a score-based or voting method. For evaluation, three datasets of sentence and highlight extraction in two languages were taken as a case study. Experimental results indicate that by integrating user-generated content and third-party sources, our framework obtains improvements of ROUGE-scores over state-of-the-art methods for single-document summarization.
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2017.12.023