Automated Text Structuring: Natural Language Processing and Regular Expressions in XML Tag Filling
The conversion of documents into XML markup requires efficient algorithms and automated solutions. The focus is on tagging documents to meet NISO STS standards, ensuring compatibility across systems. A method combining Natural Language Processing (NLP) and Regular Expressions (regex) for automated X...
Gespeichert in:
Veröffentlicht in: | IEEE access 2024, Vol.12, p.190582-190597 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The conversion of documents into XML markup requires efficient algorithms and automated solutions. The focus is on tagging documents to meet NISO STS standards, ensuring compatibility across systems. A method combining Natural Language Processing (NLP) and Regular Expressions (regex) for automated XML tag filling is proposed. NLP enhances content understanding, while regex enables precise pattern matching. This approach streamlines the conversion process, reducing manual effort and ensuring standardized tagging. Through experiments, the effectiveness of the method in achieving accurate XML markup aligned with NISO STS guidelines is validated. This research advances automated data structuring, exemplified by the GOST R ontology within NISO STS standards, providing a template for other ontology-based document XML-structuring. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2024.3511674 |