Fine-grained Czech News Article Dataset: An Interdisciplinary Approach to Trustworthiness Analysis
We present the Verifee Dataset: a novel dataset of news articles with fine-grained trustworthiness annotations. We develop a detailed methodology that assesses the texts based on their parameters encompassing editorial transparency, journalist conventions, and objective reporting while penalizing ma...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present the Verifee Dataset: a novel dataset of news articles with
fine-grained trustworthiness annotations. We develop a detailed methodology
that assesses the texts based on their parameters encompassing editorial
transparency, journalist conventions, and objective reporting while penalizing
manipulative techniques. We bring aboard a diverse set of researchers from
social, media, and computer sciences to overcome barriers and limited framing
of this interdisciplinary problem. We collect over $10,000$ unique articles
from almost $60$ Czech online news sources. These are categorized into one of
the $4$ classes across the credibility spectrum we propose, raging from
entirely trustworthy articles all the way to the manipulative ones. We produce
detailed statistics and study trends emerging throughout the set. Lastly, we
fine-tune multiple popular sequence-to-sequence language models using our
dataset on the trustworthiness classification task and report the best testing
F-1 score of $0.52$. We open-source the dataset, annotation methodology, and
annotators' instructions in full length at https://verifee.ai/research to
enable easy build-up work. We believe similar methods can help prevent
disinformation and educate in the realm of media literacy. |
---|---|
DOI: | 10.48550/arxiv.2212.08550 |