Crowdsourcing for web genre annotation

Recently, genre collection and automatic genre identification for the web has attracted much attention. However, currently there is no genre-annotated corpus of web pages where inter-annotator reliability has been established, i.e. the corpora are either not tested for inter-annotator reliability or...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Language Resources and Evaluation 2016-09, Vol.50 (3), p.603-641
Hauptverfasser:	Asheghi, Noushin Rezapour, Sharoff, Serge, Markert, Katja
Format:	Artikel
Sprache:	eng
Schlagworte:	Annotations Automation Blogs Collection Computational Linguistics Computer mediated communication Computer Science Computerized corpora Corpus linguistics Crowdsourcing Genre Genre analysis Home pages Hyperlinks Identification Keywords Language and Literature Linguistics Literary genres Original Paper Reliability Search engines Social Sciences Web pages Webs Websites World Wide Web
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recently, genre collection and automatic genre identification for the web has attracted much attention. However, currently there is no genre-annotated corpus of web pages where inter-annotator reliability has been established, i.e. the corpora are either not tested for inter-annotator reliability or exhibit low inter-coder agreement. Annotation has also mostly been carried out by a small number of experts, leading to concerns with regard to scalability of these annotation efforts and transferability of the schemes to annotators outside these small expert groups. In this paper, we tackle these problems by using crowd-sourcing for genre annotation, leading to the Leeds Web Genre Corpus—the first web corpus which is, demonstrably reliably annotated for genre and which can be easily and cost-effectively expanded using naive annotators. We also show that the corpus is source and topic diverse.
ISSN:	1574-020X 1572-8412 1574-0218
DOI:	10.1007/s10579-015-9331-6