Social context summarization using user-generated content and third-party sources

•A novel framework for social context summarization is proposed.•The framework relies on the reinforcement support of external information.•23 features in three groups: local, user-generated, and third-party are proposed.•A new open-domain dataset is created and manually annotated.•Combining interna...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge-based systems 2018-03, Vol.144, p.51-64
Hauptverfasser:	Nguyen, Minh-Tien, Tran, Duc-Vu, Nguyen, Le-Minh
Format:	Artikel
Sprache:	eng
Schlagworte:	Case studies Data mining Digital media Document summarization Electronic documents Feature extraction Information retrieval Learning to rank Sentences Social context summarization Social networks Summaries User generated content
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•A novel framework for social context summarization is proposed.•The framework relies on the reinforcement support of external information.•23 features in three groups: local, user-generated, and third-party are proposed.•A new open-domain dataset is created and manually annotated.•Combining internal and external information benefits the summarization. In the context of social media, users mutually share their interests of an event mentioned in a Web document. Its content can also be found in different news providers with a writing variation. This paper presents a framework which exploits the support of social context (user-generated content such as comments or tweets and third-party sources such as relevant documents retrieved from a search engine) to extract high-quality summaries. The extraction was formulated in two steps: sentence scoring and selection. The scoring is modeled as a learning to rank problem, which employs Ranking SVM to mutually exploits sentences, user-generated content, and third-party sources in the form of features to cover summary aspects. For the selection, summaries are extracted by using a score-based or voting method. For evaluation, three datasets of sentence and highlight extraction in two languages were taken as a case study. Experimental results indicate that by integrating user-generated content and third-party sources, our framework obtains improvements of ROUGE-scores over state-of-the-art methods for single-document summarization.
ISSN:	0950-7051 1872-7409
DOI:	10.1016/j.knosys.2017.12.023