An entropy-based corpus method for improving keyword extraction: An example of sustainability corpus
Natural language processing (NLP), a subfield of artificial intelligence (AI), has progressively influenced corpus-based methods, with keyword extraction often relying on complex NLP algorithms or models as an integral technique within corpus-based methods. With growing concern for sustainability is...
Gespeichert in:
Veröffentlicht in: | Engineering applications of artificial intelligence 2024-07, Vol.133, p.108049, Article 108049 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Natural language processing (NLP), a subfield of artificial intelligence (AI), has progressively influenced corpus-based methods, with keyword extraction often relying on complex NLP algorithms or models as an integral technique within corpus-based methods. With growing concern for sustainability issues, keyword extraction impacts the information acquisition in decision-making and policy development. However, traditional corpus-based keyword extraction methods involve limitations, such as the inability to automatically exclude meaningless words, evaluate the relative importance of keyword parameters, and integrate parameters for comprehensively keyword evaluation. To address these limitations, this paper proposes an entropy-based corpus method. The proposed method first optimizes the keyword list by excluding function and generic words using a machine-based technique (word types decrease by 5.76%; total words decrease by 72.2%). Second, it calculates the objective weights of log-likelihood (0.5518), frequency (0.4048), and range (0.0433) parameters to define their relative importance, facilitating parameter integration before evaluating keyword importance. Then, it calculates the aggregated value of each keyword to assess its level of importance. As a result, it streamlines the manual word selection process and comprehensively evaluates the importance of keywords. Compared to the four traditional methods, the keyword extraction results of the proposed method, which accounts for only 1.77% of the original list, better reflects the linguistic patterns of the target corpus, potentially facilitating future corpus-based keyword analysis research.
[Display omitted] |
---|---|
ISSN: | 0952-1976 1873-6769 |
DOI: | 10.1016/j.engappai.2024.108049 |