Comparing free reference extraction pipelines

In this paper, we compare the performance of several popular pre-trained reference extraction and segmentation toolkits combined in different pipeline configurations on three different datasets. The extraction is end-to-end, i.e. the input is PDF documents, and the output is parsed reference objects...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal on digital libraries 2024-12, Vol.25 (4), p.841-853
Hauptverfasser: Backes, Tobias, Iurshina, Anastasiia, Shahid, Muhammad Ahsan, Mayr, Philipp
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this paper, we compare the performance of several popular pre-trained reference extraction and segmentation toolkits combined in different pipeline configurations on three different datasets. The extraction is end-to-end, i.e. the input is PDF documents, and the output is parsed reference objects. The evaluation is for reference strings and individual fields in the reference objects using alignment by identical fields and close-to-identical values. Our results show that Grobid and AnyStyle perform best of all compared tools, although one may want to use them in combination. Our work is meant to serve as a reference for researchers interested in applying out-of-the-box reference extraction and -parsing tools, for example, as a preprocessing step to a more complex research question. Our detailed results on different datasets with results for individual parsed fields will allow them to focus on aspects that are particularly important to them.
ISSN:1432-5012
1432-1300
DOI:10.1007/s00799-024-00404-6