Classifying encyclopedia articles: Comparing machine and deep learning methods and exploring their predictions

This article presents a comparative study of supervised classification approaches applied to the automatic classification of encyclopedia articles written in French. Our dataset includes all  70k text articles from Diderot and d’Alembert’s Encyclopédie (1751-72). In a two-task experiment we test com...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Data & knowledge engineering 2022-11, Vol.142, p.102098, Article 102098
Hauptverfasser: Brenon, Alice, Moncla, Ludovic, McDonough, Katherine
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This article presents a comparative study of supervised classification approaches applied to the automatic classification of encyclopedia articles written in French. Our dataset includes all  70k text articles from Diderot and d’Alembert’s Encyclopédie (1751-72). In a two-task experiment we test combinations of (1) text vectorization methods (bags-of-words and word embeddings) and (2) traditional Machine Learning and newer Deep Learning classification methods (including transformer architectures). In addition to evaluating each approach, we review the results quantitatively and qualitatively. The best model obtains an average F-score of 86% for 38 classes. Using network analysis, we highlight the difficulty of labeling semantically close classes. We also discuss misclassifications in order to understand the relationship between content and different ways of ordering knowledge. We openly release all code and results, and data is available on request.11https://gitlab.liris.cnrs.fr/geode/EDdA-Classification.
ISSN:0169-023X
1872-6933
DOI:10.1016/j.datak.2022.102098