Predicting Race and Ethnicity From the Sequence of Characters in a Name
To answer questions about racial inequality and fairness, we often need a way to infer race and ethnicity from names. One way to infer race and ethnicity from names is by relying on the Census Bureau's list of popular last names. The list, however, suffers from at least three limitations: 1. it...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | To answer questions about racial inequality and fairness, we often need a way
to infer race and ethnicity from names. One way to infer race and ethnicity
from names is by relying on the Census Bureau's list of popular last names. The
list, however, suffers from at least three limitations: 1. it only contains
last names, 2. it only includes popular last names, and 3. it is updated once
every 10 years. To provide better generalization, and higher accuracy when
first names are available, we model the relationship between characters in a
name and race and ethnicity using various techniques. A model using Long
Short-Term Memory works best with out-of-sample accuracy of .85. The
best-performing last-name model achieves out-of-sample accuracy of .81. To
illustrate the utility of the models, we apply them to campaign finance data to
estimate the share of donations made by people of various racial groups, and to
news data to estimate the coverage of various races and ethnicities in the
news. |
---|---|
DOI: | 10.48550/arxiv.1805.02109 |