NLP and machine learning to measure peace from news media
“Hate speech” can mobilize violence and destruction. What are the characteristics of “peace speech” that reflect and support the social processes that maintain peace? In this study we used a data driven, machine learning approach to identify the words most associated with lower-peace versus higher...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Dataset |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | “Hate speech” can mobilize violence and destruction. What are the
characteristics of “peace speech” that reflect and support the social
processes that maintain peace? In this study we used a data driven,
machine learning approach to identify the words most associated with
lower-peace versus higher-peace countries. Logistic regression and random
forest classifiers were trained using five respected, traditional peace
indices: Global Peace Index, Positive Peace Index, World Happiness Index,
Fragile States Index, and Human Development Index. The feature inputs into
the machine learning model were the word frequencies from the news media
in each country and the output classifications were the level of peace in
that country. The machine learning model was successful in properly
classifying the level of peace from the news media in a country (both
accuracy and F1: 96% - 100%). We also used that trained machine model to
create a machine learning peace index that measured the level of peace in
countries, including countries not in the training set, which correlated
with the average of those five traditional peace indices (r-squared =
0.8349). Using the random forest feature importance method we found that
the words in news media in lower-peace countries were characterized by
words related to government, order, control and fear (such as government,
state, law, security and court), while higher-peace countries were
characterized by an increased prevalence of words related to optimism for
the future and fun (such as time, like, home, believe and game). |
---|---|
DOI: | 10.5061/dryad.2v6wwpzv6 |