HuSpaCy: an industrial-strength Hungarian natural language processing toolkit
Although there are a couple of open-source language processing pipelines available for Hungarian, none of them satisfies the requirements of today's NLP applications. A language processing pipeline should consist of close to state-of-the-art lemmatization, morphosyntactic analysis, entity recog...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Although there are a couple of open-source language processing pipelines
available for Hungarian, none of them satisfies the requirements of today's NLP
applications. A language processing pipeline should consist of close to
state-of-the-art lemmatization, morphosyntactic analysis, entity recognition
and word embeddings. Industrial text processing applications have to satisfy
non-functional software quality requirements, what is more, frameworks
supporting multiple languages are more and more favored. This paper introduces
HuSpaCy, an industry-ready Hungarian language processing toolkit. The presented
tool provides components for the most important basic linguistic analysis
tasks. It is open-source and is available under a permissive license. Our
system is built upon spaCy's NLP components resulting in an easily usable, fast
yet accurate application. Experiments confirm that HuSpaCy has high accuracy
while maintaining resource-efficient prediction capabilities. |
---|---|
DOI: | 10.48550/arxiv.2201.01956 |