Evaluation of a semi-automated data extraction tool for public health literature-based reviews: Dextr

•Dextr is a semi-automated data extraction tool that can capture complex data.•Dextr connects extracted entities to support hierarchical data extraction.•Dextr supports development of annotated datasets within a standard review workflow.•Dextr’s user verification option assures user-driven semi-auto...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Environment international 2022-01, Vol.159, p.107025-107025, Article 107025
Hauptverfasser: Walker, Vickie R., Schmitt, Charles P., Wolfe, Mary S., Nowak, Artur J., Kulesza, Kuba, Williams, Ashley R., Shin, Rob, Cohen, Jonathan, Burch, Dave, Stout, Matthew D., Shipkowski, Kelly A., Rooney, Andrew A.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Dextr is a semi-automated data extraction tool that can capture complex data.•Dextr connects extracted entities to support hierarchical data extraction.•Dextr supports development of annotated datasets within a standard review workflow.•Dextr’s user verification option assures user-driven semi-automated data extraction. There has been limited development and uptake of machine-learning methods to automate data extraction for literature-based assessments. Although advanced extraction approaches have been applied to some clinical research reviews, existing methods are not well suited for addressing toxicology or environmental health questions due to unique data needs to support reviews in these fields. To develop and evaluate a flexible, web-based tool for semi-automated data extraction that: 1) makes data extraction predictions with user verification, 2) integrates token-level annotations, and 3) connects extracted entities to support hierarchical data extraction. Dextr was developed with Agile software methodology using a two-team approach. The development team outlined proposed features and coded the software. The advisory team guided developers and evaluated Dextr’s performance on precision, recall, and extraction time by comparing a manual extraction workflow to a semi-automated extraction workflow using a dataset of 51 environmental health animal studies. The semi-automated workflow did not appear to affect precision rate (96.0% vs. 95.4% manual, p = 0.38), resulted in a small reduction in recall rate (91.8% vs. 97.0% manual, p 
ISSN:0160-4120
1873-6750
DOI:10.1016/j.envint.2021.107025