SQSE: A Measure to Assess Sample Quality of Authorial Style as a Cognitive Biometric Trait
Stylistic analysis of text is a widely researched topic in both cognitive biometrics and linguistics. Often referred to as Authorship Attribution (AA), the scope of this problem has expanded from a few hundred authors with similar data characteristics to large-scale corpora having several thousand a...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on biometrics, behavior, and identity science behavior, and identity science, 2021-10, Vol.3 (4), p.583-596 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Stylistic analysis of text is a widely researched topic in both cognitive biometrics and linguistics. Often referred to as Authorship Attribution (AA), the scope of this problem has expanded from a few hundred authors with similar data characteristics to large-scale corpora having several thousand authors and cross-domain samples. Even though the AA algorithms have evolved to keep up with the requirements of the community, the process for choosing an appropriate text sample with good style characteristics has remained poorly defined. This paper, for the first time, formalizes the sample selection process using a style quality evaluation measure for AA, called Sample Quality for Style Extraction (SQSE). Furthermore, we will demonstrate the utility of the measure on multiple large-scale cross-domain corpora with over 6,500 authors and 250,000 text samples. The SQSE measure, supported by over 200 experiments and 4 million comparisons, exhibits a strong positive correlation with matching performance on a wide variety of AA algorithms resulting in a Pearson correlation coefficient of 0.87, and positively identifies samples of good stylometric quality. |
---|---|
ISSN: | 2637-6407 2637-6407 |
DOI: | 10.1109/TBIOM.2021.3120985 |