Information Extraction From Free-Form CV Documents in Multiple Languages

This paper proposes two natural language processing models for extracting useful information from multilingual, unstructured (free form) CV documents. The model identifies the relevant document sections (personal information, education, employment, etc.) and the corresponding specific information at...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2021, Vol.9, p.84559-84575
Hauptverfasser:	Vukadin, Davor, Kurdija, Adrian Satja, Delac, Goran, Silic, Marin
Format:	Artikel
Sprache:	eng
Schlagworte:	Bit error rate Coders Context modeling CV parsing Data mining Free form Hidden Markov models Information retrieval Model accuracy Multilingualism Natural language processing recurrent neural networks Solid modeling text analysis Training Unstructured data
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper proposes two natural language processing models for extracting useful information from multilingual, unstructured (free form) CV documents. The model identifies the relevant document sections (personal information, education, employment, etc.) and the corresponding specific information at the lower hierarchy level (names, addresses, roles, skill competences, etc.). Our approach employs the transformer architecture and its multilingual implementation of the encoder part in the form of the BERT language model. The models are trained and tested on a large, manually annotated CV dataset, achieving high scores on standard accuracy measures. The proposed models exhibit important properties of end-to-end training and interpretability, which was investigated by visualizing the model attention and its vector representations.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2021.3087913