Automatically Constructing Multi-Dimensional Resource Space by Extracting Class Trees From Texts for Operating and Analyzing Texts From Multiple Abstraction Dimensions

Abstraction is a key part of understanding and representation. Discovering different abstraction dimensions on a large set of texts can help understand the texts from multiple dimensions therefore support multi-dimensional operations required by advanced applications. This paper proposes a low-cost...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2025, Vol.13, p.4737-4758
Hauptverfasser: Zhou, Jian, Li, Jiazheng, Zhuge, Sirui, Zhuge, Hai
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Abstraction is a key part of understanding and representation. Discovering different abstraction dimensions on a large set of texts can help understand the texts from multiple dimensions therefore support multi-dimensional operations required by advanced applications. This paper proposes a low-cost approach to automatically discovering abstraction dimensions represented as class trees on texts. The approach consists of three steps: 1) extract subclass relations from input texts based on modifier pattern and syntactic pattern; 2) construct class trees based on the extracted subclass relations; and 3) select independent class trees with high coverage on texts as abstraction dimensions. The correctness and feasibility of the approach are validated on seven data sets of different types. The average precision, recall and F1-score of the extracted subclass relations of the proposed approach are all greater than 85%. The application of the proposed approach to managing GitHub projects demonstrates that searching on the class trees ensures strong relevance between query and return, can quickly reduce search space and support effective management of projects. The proposed approach not only greatly extends the pattern-based approach to finding abstraction relation from texts with a high coverage but also verifies the feasibility of automatically extracting abstraction dimensions from texts. It can be applied to efficiently manage large-scale text resources from different dimensions to support advanced applications.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2024.3516872