On the implementation of Latin part-of-speech taggers in intertextuality analysis: TreeTagger, CLTK, Cracovia system, LatinCy, and ChatGPT compared
Digital-assisted intertextuality analysis often yields large amounts of results, many of which are irrelevant to researchers from a hermeneutic point of view. One strategy for minimizing these hermeneutically non meaningful findings is to apply a filter that sorts the results based on specified part...
Gespeichert in:
Veröffentlicht in: | Digital Scholarship in the Humanities 2024-12 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Digital-assisted intertextuality analysis often yields large amounts of results, many of which are irrelevant to researchers from a hermeneutic point of view. One strategy for minimizing these hermeneutically non meaningful findings is to apply a filter that sorts the results based on specified parts-of-speech. Building on Mare Revellio’s historical text-reuse grammar (HTRG) we demonstrate that an implementation of such a filter accommodating the hermeneutical context proves to be advantageous. We assessed the performance of various Latin part-of-speech (POS) taggers to refine our filtering process, using evaluation data on text congruencies from the Latin authors Virgil and Jerome. Among the Classical Language Toolkit, the TreeTagger, the Cracovia system, LatinCy, and ChatGPT, the Cracovia system surpassed the other taggers by approximately 2 percentage points in accuracy. While this tagger leverages transformer-based machine learning algorithms, the older, probabilistic-based TreeTagger demonstrated competitive performance. Although GPT-4 showed remarkable results, it still lags behind the state-of-the-art taggers for Latin. In order to build a powerful digital citation detection tool for intertextual relationships in ancient texts, the most accurate analysis of POS is crucial in filtering and evaluating valid citations. |
---|---|
ISSN: | 2055-7671 2055-768X |
DOI: | 10.1093/llc/fqae078 |