Inference of Users Demographic Attributes based on Homophily in Communication Networks
Over the past decade, mobile phones have become prevalent in all parts of the world, across all demographic backgrounds. Mobile phones are used by men and women across a wide age range in both developed and developing countries. Consequently, they have become one of the most important mechanisms for...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Over the past decade, mobile phones have become prevalent in all parts of the
world, across all demographic backgrounds. Mobile phones are used by men and
women across a wide age range in both developed and developing countries.
Consequently, they have become one of the most important mechanisms for social
interaction within a population, making them an increasingly important source
of information to understand human demographics and human behaviour.
In this work we combine two sources of information: communication logs from a
major mobile operator in a Latin American country, and information on the
demographics of a subset of the users population. This allows us to perform an
observational study of mobile phone usage, differentiated by age groups
categories. This study is interesting in its own right, since it provides
knowledge on the structure and demographics of the mobile phone market in the
studied country.
We then tackle the problem of inferring the age group for all users in the
network. We present here an exclusively graph-based inference method relying
solely on the topological structure of the mobile network, together with a
topological analysis of the performance of the algorithm. The equations for our
algorithm can be described as a diffusion process with two added properties:
(i) memory of its initial state, and (ii) the information is propagated as a
probability vector for each node attribute (instead of the value of the
attribute itself). Our algorithm can successfully infer different age groups
within the network population given known values for a subset of nodes (seed
nodes). Most interestingly, we show that by carefully analysing the topological
relationships between correctly predicted nodes and the seed nodes, we can
characterize particular subsets of nodes for which our inference method has
significantly higher accuracy. |
---|---|
DOI: | 10.48550/arxiv.1808.00527 |