BanglaAutoKG: Automatic Bangla Knowledge Graph Construction with Semantic Neural Graph Filtering
The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) Knowledge Graphs (KGs) have proven essential in information processing and reasoning applications because they link related entities and give context-rich information, supportin...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The 2024 Joint International Conference on Computational
Linguistics, Language Resources and Evaluation (LREC-COLING 2024) Knowledge Graphs (KGs) have proven essential in information processing and
reasoning applications because they link related entities and give context-rich
information, supporting efficient information retrieval and knowledge
discovery; presenting information flow in a very effective manner. Despite
being widely used globally, Bangla is relatively underrepresented in KGs due to
a lack of comprehensive datasets, encoders, NER (named entity recognition)
models, POS (part-of-speech) taggers, and lemmatizers, hindering efficient
information processing and reasoning applications in the language. Addressing
the KG scarcity in Bengali, we propose BanglaAutoKG, a pioneering framework
that is able to automatically construct Bengali KGs from any Bangla text. We
utilize multilingual LLMs to understand various languages and correlate
entities and relations universally. By employing a translation dictionary to
identify English equivalents and extracting word features from pre-trained BERT
models, we construct the foundational KG. To reduce noise and align word
embeddings with our goal, we employ graph-based polynomial filters. Lastly, we
implement a GNN-based semantic filter, which elevates contextual understanding
and trims unnecessary edges, culminating in the formation of the definitive KG.
Empirical findings and case studies demonstrate the universal effectiveness of
our model, capable of autonomously constructing semantically enriched KGs from
any text. |
---|---|
DOI: | 10.48550/arxiv.2404.03528 |