An Improved Corpus-Based NLP Method for Facilitating Keyword Extraction: An Example of the COVID-19 Vaccine Hesitancy Corpus

In the current COVID-19 post-pandemic era, COVID-19 vaccine hesitancy is hindering the herd immunity generated by widespread vaccination. It is critical to identify the factors that may cause COVID-19 vaccine hesitancy, enabling the relevant authorities to propose appropriate interventions for mitig...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Sustainability 2023-02, Vol.15 (4), p.3402
1. Verfasser: Chen, Liang-Ching
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 4
container_start_page 3402
container_title Sustainability
container_volume 15
creator Chen, Liang-Ching
description In the current COVID-19 post-pandemic era, COVID-19 vaccine hesitancy is hindering the herd immunity generated by widespread vaccination. It is critical to identify the factors that may cause COVID-19 vaccine hesitancy, enabling the relevant authorities to propose appropriate interventions for mitigating such a phenomenon. Keyword extraction, a sub-field of natural language processing (NLP) applications, plays a vital role in modern medical informatics. When traditional corpus-based NLP methods are used to conduct keyword extraction, they only consider a word’s log-likelihood value to determine whether it is a keyword, which leaves room for concerns about the efficiency and accuracy of this keyword extraction technique. These concerns include the fact that the method is unable to (1) optimize the keyword list by the machine-based approach, (2) effectively evaluate the keyword’s importance level, and (3) integrate the variables to conduct data clustering. Thus, to address the aforementioned issues, this study integrated a machine-based word removal technique, the i10-index, and the importance–performance analysis (IPA) technique to develop an improved corpus-based NLP method for facilitating keyword extraction. The top 200 most-cited Science Citation Index (SCI) research articles discussing COVID-19 vaccine hesitancy were adopted as the target corpus for verification. The results showed that the keywords of Quadrant I (n = 98) reached the highest lexical coverage (9.81%), indicating that the proposed method successfully identified and extracted the most important keywords from the target corpus, thus achieving more domain-oriented and accurate keyword extraction results.
doi_str_mv 10.3390/su15043402
format Article
fullrecord <record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2779697250</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A743493966</galeid><sourcerecordid>A743493966</sourcerecordid><originalsourceid>FETCH-LOGICAL-c396t-141750536ed2e323802b5c72691113787efac171c7dbf14fe86fb64045abe5043</originalsourceid><addsrcrecordid>eNptkd9LwzAQx4soOHQv_gUBnxSqSdM2q29zTh1OJ_7Ya8jSy5axNjNJdQP_eDMUdGDuIZfjc9_L3UXREcFnlBb43DUkwylNcbITtRLMSExwhnf_-PtR27k5DodSUpC8FX12azSolta8Q4l6xi4bF18KFx4Pw0d0D35mSqSMRddC6oX2wut6iu5g_WFsiforb4X02tQXKAj1V6JaLgAZhfwMUG80HlzFpEBjIaWuAd2CCwq1XP9UOoz2lFg4aP_cB9Hrdf-ldxsPRzeDXncYS1rkPiYpYRnOaA5lAjShHZxMMsmSvCCEUNZhoIQkjEhWThRJFXRyNclTnGZiApuJHETH37qhz7cGnOdz09g6lOQJY0VesCTDv9RULIDrWplNc5V2kndZGGsRPpMH6uwfKlgJlZamBqVDfCvhZCshMB5Wfioa5_jg-WmbPf1mpTXOWVB8aXUl7JoTzDc75r87pl8VU5R0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2779697250</pqid></control><display><type>article</type><title>An Improved Corpus-Based NLP Method for Facilitating Keyword Extraction: An Example of the COVID-19 Vaccine Hesitancy Corpus</title><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Chen, Liang-Ching</creator><creatorcontrib>Chen, Liang-Ching</creatorcontrib><description>In the current COVID-19 post-pandemic era, COVID-19 vaccine hesitancy is hindering the herd immunity generated by widespread vaccination. It is critical to identify the factors that may cause COVID-19 vaccine hesitancy, enabling the relevant authorities to propose appropriate interventions for mitigating such a phenomenon. Keyword extraction, a sub-field of natural language processing (NLP) applications, plays a vital role in modern medical informatics. When traditional corpus-based NLP methods are used to conduct keyword extraction, they only consider a word’s log-likelihood value to determine whether it is a keyword, which leaves room for concerns about the efficiency and accuracy of this keyword extraction technique. These concerns include the fact that the method is unable to (1) optimize the keyword list by the machine-based approach, (2) effectively evaluate the keyword’s importance level, and (3) integrate the variables to conduct data clustering. Thus, to address the aforementioned issues, this study integrated a machine-based word removal technique, the i10-index, and the importance–performance analysis (IPA) technique to develop an improved corpus-based NLP method for facilitating keyword extraction. The top 200 most-cited Science Citation Index (SCI) research articles discussing COVID-19 vaccine hesitancy were adopted as the target corpus for verification. The results showed that the keywords of Quadrant I (n = 98) reached the highest lexical coverage (9.81%), indicating that the proposed method successfully identified and extracted the most important keywords from the target corpus, thus achieving more domain-oriented and accurate keyword extraction results.</description><identifier>ISSN: 2071-1050</identifier><identifier>EISSN: 2071-1050</identifier><identifier>DOI: 10.3390/su15043402</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>Algorithms ; Analysis ; Bibliometrics ; Clustering ; Computational linguistics ; Coronaviruses ; COVID-19 vaccines ; Disease transmission ; Efficiency ; Epidemics ; Health informatics ; Herd immunity ; Informatics ; Information retrieval ; Keywords ; Knowledge acquisition ; Language processing ; Medical informatics ; Medical research ; Medicine, Experimental ; Methods ; Natural language ; Natural language interfaces ; Natural language processing ; Pandemics ; Public health ; Software ; Sustainability ; Vaccination ; Vaccines ; Variables</subject><ispartof>Sustainability, 2023-02, Vol.15 (4), p.3402</ispartof><rights>COPYRIGHT 2023 MDPI AG</rights><rights>2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c396t-141750536ed2e323802b5c72691113787efac171c7dbf14fe86fb64045abe5043</citedby><cites>FETCH-LOGICAL-c396t-141750536ed2e323802b5c72691113787efac171c7dbf14fe86fb64045abe5043</cites><orcidid>0000-0002-7896-1990</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27922,27923</link.rule.ids></links><search><creatorcontrib>Chen, Liang-Ching</creatorcontrib><title>An Improved Corpus-Based NLP Method for Facilitating Keyword Extraction: An Example of the COVID-19 Vaccine Hesitancy Corpus</title><title>Sustainability</title><description>In the current COVID-19 post-pandemic era, COVID-19 vaccine hesitancy is hindering the herd immunity generated by widespread vaccination. It is critical to identify the factors that may cause COVID-19 vaccine hesitancy, enabling the relevant authorities to propose appropriate interventions for mitigating such a phenomenon. Keyword extraction, a sub-field of natural language processing (NLP) applications, plays a vital role in modern medical informatics. When traditional corpus-based NLP methods are used to conduct keyword extraction, they only consider a word’s log-likelihood value to determine whether it is a keyword, which leaves room for concerns about the efficiency and accuracy of this keyword extraction technique. These concerns include the fact that the method is unable to (1) optimize the keyword list by the machine-based approach, (2) effectively evaluate the keyword’s importance level, and (3) integrate the variables to conduct data clustering. Thus, to address the aforementioned issues, this study integrated a machine-based word removal technique, the i10-index, and the importance–performance analysis (IPA) technique to develop an improved corpus-based NLP method for facilitating keyword extraction. The top 200 most-cited Science Citation Index (SCI) research articles discussing COVID-19 vaccine hesitancy were adopted as the target corpus for verification. The results showed that the keywords of Quadrant I (n = 98) reached the highest lexical coverage (9.81%), indicating that the proposed method successfully identified and extracted the most important keywords from the target corpus, thus achieving more domain-oriented and accurate keyword extraction results.</description><subject>Algorithms</subject><subject>Analysis</subject><subject>Bibliometrics</subject><subject>Clustering</subject><subject>Computational linguistics</subject><subject>Coronaviruses</subject><subject>COVID-19 vaccines</subject><subject>Disease transmission</subject><subject>Efficiency</subject><subject>Epidemics</subject><subject>Health informatics</subject><subject>Herd immunity</subject><subject>Informatics</subject><subject>Information retrieval</subject><subject>Keywords</subject><subject>Knowledge acquisition</subject><subject>Language processing</subject><subject>Medical informatics</subject><subject>Medical research</subject><subject>Medicine, Experimental</subject><subject>Methods</subject><subject>Natural language</subject><subject>Natural language interfaces</subject><subject>Natural language processing</subject><subject>Pandemics</subject><subject>Public health</subject><subject>Software</subject><subject>Sustainability</subject><subject>Vaccination</subject><subject>Vaccines</subject><subject>Variables</subject><issn>2071-1050</issn><issn>2071-1050</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNptkd9LwzAQx4soOHQv_gUBnxSqSdM2q29zTh1OJ_7Ya8jSy5axNjNJdQP_eDMUdGDuIZfjc9_L3UXREcFnlBb43DUkwylNcbITtRLMSExwhnf_-PtR27k5DodSUpC8FX12azSolta8Q4l6xi4bF18KFx4Pw0d0D35mSqSMRddC6oX2wut6iu5g_WFsiforb4X02tQXKAj1V6JaLgAZhfwMUG80HlzFpEBjIaWuAd2CCwq1XP9UOoz2lFg4aP_cB9Hrdf-ldxsPRzeDXncYS1rkPiYpYRnOaA5lAjShHZxMMsmSvCCEUNZhoIQkjEhWThRJFXRyNclTnGZiApuJHETH37qhz7cGnOdz09g6lOQJY0VesCTDv9RULIDrWplNc5V2kndZGGsRPpMH6uwfKlgJlZamBqVDfCvhZCshMB5Wfioa5_jg-WmbPf1mpTXOWVB8aXUl7JoTzDc75r87pl8VU5R0</recordid><startdate>20230201</startdate><enddate>20230201</enddate><creator>Chen, Liang-Ching</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>4U-</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>COVID</scope><scope>DWQXO</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><orcidid>https://orcid.org/0000-0002-7896-1990</orcidid></search><sort><creationdate>20230201</creationdate><title>An Improved Corpus-Based NLP Method for Facilitating Keyword Extraction: An Example of the COVID-19 Vaccine Hesitancy Corpus</title><author>Chen, Liang-Ching</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c396t-141750536ed2e323802b5c72691113787efac171c7dbf14fe86fb64045abe5043</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Analysis</topic><topic>Bibliometrics</topic><topic>Clustering</topic><topic>Computational linguistics</topic><topic>Coronaviruses</topic><topic>COVID-19 vaccines</topic><topic>Disease transmission</topic><topic>Efficiency</topic><topic>Epidemics</topic><topic>Health informatics</topic><topic>Herd immunity</topic><topic>Informatics</topic><topic>Information retrieval</topic><topic>Keywords</topic><topic>Knowledge acquisition</topic><topic>Language processing</topic><topic>Medical informatics</topic><topic>Medical research</topic><topic>Medicine, Experimental</topic><topic>Methods</topic><topic>Natural language</topic><topic>Natural language interfaces</topic><topic>Natural language processing</topic><topic>Pandemics</topic><topic>Public health</topic><topic>Software</topic><topic>Sustainability</topic><topic>Vaccination</topic><topic>Vaccines</topic><topic>Variables</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Liang-Ching</creatorcontrib><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>University Readers</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>Coronavirus Research Database</collection><collection>ProQuest Central Korea</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>Sustainability</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Liang-Ching</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An Improved Corpus-Based NLP Method for Facilitating Keyword Extraction: An Example of the COVID-19 Vaccine Hesitancy Corpus</atitle><jtitle>Sustainability</jtitle><date>2023-02-01</date><risdate>2023</risdate><volume>15</volume><issue>4</issue><spage>3402</spage><pages>3402-</pages><issn>2071-1050</issn><eissn>2071-1050</eissn><abstract>In the current COVID-19 post-pandemic era, COVID-19 vaccine hesitancy is hindering the herd immunity generated by widespread vaccination. It is critical to identify the factors that may cause COVID-19 vaccine hesitancy, enabling the relevant authorities to propose appropriate interventions for mitigating such a phenomenon. Keyword extraction, a sub-field of natural language processing (NLP) applications, plays a vital role in modern medical informatics. When traditional corpus-based NLP methods are used to conduct keyword extraction, they only consider a word’s log-likelihood value to determine whether it is a keyword, which leaves room for concerns about the efficiency and accuracy of this keyword extraction technique. These concerns include the fact that the method is unable to (1) optimize the keyword list by the machine-based approach, (2) effectively evaluate the keyword’s importance level, and (3) integrate the variables to conduct data clustering. Thus, to address the aforementioned issues, this study integrated a machine-based word removal technique, the i10-index, and the importance–performance analysis (IPA) technique to develop an improved corpus-based NLP method for facilitating keyword extraction. The top 200 most-cited Science Citation Index (SCI) research articles discussing COVID-19 vaccine hesitancy were adopted as the target corpus for verification. The results showed that the keywords of Quadrant I (n = 98) reached the highest lexical coverage (9.81%), indicating that the proposed method successfully identified and extracted the most important keywords from the target corpus, thus achieving more domain-oriented and accurate keyword extraction results.</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/su15043402</doi><orcidid>https://orcid.org/0000-0002-7896-1990</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2071-1050
ispartof Sustainability, 2023-02, Vol.15 (4), p.3402
issn 2071-1050
2071-1050
language eng
recordid cdi_proquest_journals_2779697250
source MDPI - Multidisciplinary Digital Publishing Institute; EZB-FREE-00999 freely available EZB journals
subjects Algorithms
Analysis
Bibliometrics
Clustering
Computational linguistics
Coronaviruses
COVID-19 vaccines
Disease transmission
Efficiency
Epidemics
Health informatics
Herd immunity
Informatics
Information retrieval
Keywords
Knowledge acquisition
Language processing
Medical informatics
Medical research
Medicine, Experimental
Methods
Natural language
Natural language interfaces
Natural language processing
Pandemics
Public health
Software
Sustainability
Vaccination
Vaccines
Variables
title An Improved Corpus-Based NLP Method for Facilitating Keyword Extraction: An Example of the COVID-19 Vaccine Hesitancy Corpus
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T14%3A56%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20Improved%20Corpus-Based%20NLP%20Method%20for%20Facilitating%20Keyword%20Extraction:%20An%20Example%20of%20the%20COVID-19%20Vaccine%20Hesitancy%20Corpus&rft.jtitle=Sustainability&rft.au=Chen,%20Liang-Ching&rft.date=2023-02-01&rft.volume=15&rft.issue=4&rft.spage=3402&rft.pages=3402-&rft.issn=2071-1050&rft.eissn=2071-1050&rft_id=info:doi/10.3390/su15043402&rft_dat=%3Cgale_proqu%3EA743493966%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2779697250&rft_id=info:pmid/&rft_galeid=A743493966&rfr_iscdi=true