Discrete-Choice Mining of Social Processes

Poor decisions and selfish behaviors give rise to seemingly intractable global problems, such as the lack of transparency in democratic processes, the spread of conspiracy theories, and the rise in greenhouse gas emissions. However, people are more predictable than we think, and with machine-learnin...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Kristof, Victor
Format:	Web Resource
Sprache:	eng
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Poor decisions and selfish behaviors give rise to seemingly intractable global problems, such as the lack of transparency in democratic processes, the spread of conspiracy theories, and the rise in greenhouse gas emissions. However, people are more predictable than we think, and with machine-learning algorithms and sufficiently large datasets, we can design accurate models of human behavior in a variety of settings. In this thesis, to gain insight into social processes, we develop highly interpretable probabilistic choice-models. We draw from the econometrics literature on discrete-choice models and combine them with matrix factorization methods, Bayesian statistics, and generalized linear models. These predictive models enable interpretability through their learned parameters and latent factors. First, we study the social dynamics behind group collaborations for the collective creation of content, such as in Wikipedia, the Linux kernel, and the European Union law-making process. By combining the Bradley-Terry and Rasch models with matrix factorization and natural language processing, we develop a model of edit acceptance in peer-production systems. We discover controversial components (e.g., Wikipedia articles and European laws) and influential users (e.g., Wikipedia editors and parliamentarians), as well as features that correlate with a high probability of edit acceptance. The latent representations capture non-linear interactions between components and users, and they cluster well into different topics (e.g., historical figures and TV characters in Wikipedia, business and environment in European laws). Second, we develop an algorithm for predicting the outcome of elections and of referenda by combining matrix factorization and generalized linear models. Our algorithm learns representations of votes and regions, which capture ideological and cultural voting patterns (e.g., liberal/conservative, rural/urban), and it predicts the vote results in unobserved regions from partial observations. We test our model on voting data in Germany, Switzerland, and the US, and we deploy it on a Web platform to predict Swiss referendum votes in real-time. On average, our predictions reach a mean absolute error of 1% after observing only 5% of the regions. Third, we study how people perceive the carbon footprint of their day-to-day actions. We cast this problem as a comparison problem between pairs of actions (e.g., the difference between flying across continents and usi
DOI:	10.5075/epfl-thesis-7186