Automated Text Structuring: Natural Language Processing and Regular Expressions in XML Tag Filling

The conversion of documents into XML markup requires efficient algorithms and automated solutions. The focus is on tagging documents to meet NISO STS standards, ensuring compatibility across systems. A method combining Natural Language Processing (NLP) and Regular Expressions (regex) for automated X...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2024, Vol.12, p.190582-190597
Hauptverfasser: Malashin, Ivan P., Tynchenko, Vadim S., Gantimurov, Andrei P., Nelyub, Vladimir A., Borodulin, Aleksei S.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The conversion of documents into XML markup requires efficient algorithms and automated solutions. The focus is on tagging documents to meet NISO STS standards, ensuring compatibility across systems. A method combining Natural Language Processing (NLP) and Regular Expressions (regex) for automated XML tag filling is proposed. NLP enhances content understanding, while regex enables precise pattern matching. This approach streamlines the conversion process, reducing manual effort and ensuring standardized tagging. Through experiments, the effectiveness of the method in achieving accurate XML markup aligned with NISO STS guidelines is validated. This research advances automated data structuring, exemplified by the GOST R ontology within NISO STS standards, providing a template for other ontology-based document XML-structuring.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2024.3511674