Predicting the Gender of Indonesian Names
We investigated a way to predict the gender of a name using character-level Long-Short Term Memory (char-LSTM). We compared our method with some conventional machine learning methods, namely Naive Bayes, logistic regression, and XGBoost with n-grams as the features. We evaluated the models on a data...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2017-09 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We investigated a way to predict the gender of a name using character-level Long-Short Term Memory (char-LSTM). We compared our method with some conventional machine learning methods, namely Naive Bayes, logistic regression, and XGBoost with n-grams as the features. We evaluated the models on a dataset consisting of the names of Indonesian people. It is not common to use a family name as the surname in Indonesian culture, except in some ethnicities. Therefore, we inferred the gender from both full names and first names. The results show that we can achieve 92.25% accuracy from full names, while using first names only yields 90.65% accuracy. These results are better than the ones from applying the classical machine learning algorithms to n-grams. |
---|---|
ISSN: | 2331-8422 |