SQSE: A Measure to Assess Sample Quality of Authorial Style as a Cognitive Biometric Trait

Stylistic analysis of text is a widely researched topic in both cognitive biometrics and linguistics. Often referred to as Authorship Attribution (AA), the scope of this problem has expanded from a few hundred authors with similar data characteristics to large-scale corpora having several thousand a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on biometrics, behavior, and identity science behavior, and identity science, 2021-10, Vol.3 (4), p.583-596
Hauptverfasser: Wilson, Ronald, Bhandarkar, Avanti, Lyons, Princess, Woodard, Damon L.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Stylistic analysis of text is a widely researched topic in both cognitive biometrics and linguistics. Often referred to as Authorship Attribution (AA), the scope of this problem has expanded from a few hundred authors with similar data characteristics to large-scale corpora having several thousand authors and cross-domain samples. Even though the AA algorithms have evolved to keep up with the requirements of the community, the process for choosing an appropriate text sample with good style characteristics has remained poorly defined. This paper, for the first time, formalizes the sample selection process using a style quality evaluation measure for AA, called Sample Quality for Style Extraction (SQSE). Furthermore, we will demonstrate the utility of the measure on multiple large-scale cross-domain corpora with over 6,500 authors and 250,000 text samples. The SQSE measure, supported by over 200 experiments and 4 million comparisons, exhibits a strong positive correlation with matching performance on a wide variety of AA algorithms resulting in a Pearson correlation coefficient of 0.87, and positively identifies samples of good stylometric quality.
ISSN:2637-6407
2637-6407
DOI:10.1109/TBIOM.2021.3120985