Text Guide: Improving the Quality of Long Text Classification by a Text Selection Method Based on Feature Importance

The performance of text classification methods has improved greatly over the last decade for text instances of less than 512 tokens. This limit has been adopted by most state-of-the-research transformer models due to the high computational cost of analyzing longer text instances. To mitigate this pr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2021, Vol.9, p.105439-105450
Hauptverfasser: Fiok, Krzysztof, Karwowski, Waldemar, Gutierrez-Franco, Edgar, Davahli, Mohammad Reza, Wilamowski, Maciej, Ahram, Tareq, Al-Juaid, Awad, Zurada, Jozef
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 105450
container_issue
container_start_page 105439
container_title IEEE access
container_volume 9
creator Fiok, Krzysztof
Karwowski, Waldemar
Gutierrez-Franco, Edgar
Davahli, Mohammad Reza
Wilamowski, Maciej
Ahram, Tareq
Al-Juaid, Awad
Zurada, Jozef
description The performance of text classification methods has improved greatly over the last decade for text instances of less than 512 tokens. This limit has been adopted by most state-of-the-research transformer models due to the high computational cost of analyzing longer text instances. To mitigate this problem and to improve classification for longer texts, researchers have sought to resolve the underlying causes of the computational cost and have proposed optimizations for the attention mechanism, which is the key element of every transformer model. In our study, we are not pursuing the ultimate goal of long text classification, i.e., the ability to analyze entire text instances at one time while preserving high performance at a reasonable computational cost. Instead, we propose a text truncation method called Text Guide, in which the original text length is reduced to a predefined limit in a manner that improves performance over naive and semi-naive approaches while preserving low computational costs. Text Guide benefits from the concept of feature importance, a notion from the explainable artificial intelligence domain. We demonstrate that Text Guide can be used to improve the performance of recent language models specifically designed for long text classification, such as Longformer. Moreover, we discovered that parameter optimization is the key to Text Guide performance and must be conducted before the method is deployed. Future experiments may reveal additional benefits provided by this new method.
doi_str_mv 10.1109/ACCESS.2021.3099758
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2021_3099758</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9494560</ieee_id><doaj_id>oai_doaj_org_article_95251632594b470da62df0c74adbbe9f</doaj_id><sourcerecordid>2557978636</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-2c7a94235e2e9ea5a7c473ece616166eb282dec6308ff2a7b9a7c7ab3faa5f63</originalsourceid><addsrcrecordid>eNpNUcFuGyEUXFWt1CjJF-SC1LNdFhZYektWSWrJURTZd_SWfSRYG5MCG9V_X-yNovIOwDAz74mpqquaLuua6p_XXXe72SwZZfWSU62VaL9UZ6yWesEFl1__O3-vLlPa0bLaAgl1VuUt_s3kfvID_iKr17cY3v3-meQXJE8TjD4fSHBkHQp2YnYjpOSdt5B92JP-QGB-2OCI9oQ9YH4JA7mBhAMp9zuEPEU8uoeYYW_xovrmYEx4-bGfV9u72233e7F-vF911-uFbWibF8wq0A3jAhlqBAHKNoqjRVmXktizlg1oJaetcwxUrwtDQc8dgHCSn1er2XYIsDNv0b9CPJgA3pyAEJ8NxOztiEYLJmrJmdBN3yg6gGSDo1Y1MPQ9ale8fsxe5Yf-TJiy2YUp7sv0hgmhtGolP3bkM8vGkFJE99m1puYYlpnDMsewzEdYRXU1qzwifip0oxshKf8H4FOQ4Q</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2557978636</pqid></control><display><type>article</type><title>Text Guide: Improving the Quality of Long Text Classification by a Text Selection Method Based on Feature Importance</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Fiok, Krzysztof ; Karwowski, Waldemar ; Gutierrez-Franco, Edgar ; Davahli, Mohammad Reza ; Wilamowski, Maciej ; Ahram, Tareq ; Al-Juaid, Awad ; Zurada, Jozef</creator><creatorcontrib>Fiok, Krzysztof ; Karwowski, Waldemar ; Gutierrez-Franco, Edgar ; Davahli, Mohammad Reza ; Wilamowski, Maciej ; Ahram, Tareq ; Al-Juaid, Awad ; Zurada, Jozef</creatorcontrib><description>The performance of text classification methods has improved greatly over the last decade for text instances of less than 512 tokens. This limit has been adopted by most state-of-the-research transformer models due to the high computational cost of analyzing longer text instances. To mitigate this problem and to improve classification for longer texts, researchers have sought to resolve the underlying causes of the computational cost and have proposed optimizations for the attention mechanism, which is the key element of every transformer model. In our study, we are not pursuing the ultimate goal of long text classification, i.e., the ability to analyze entire text instances at one time while preserving high performance at a reasonable computational cost. Instead, we propose a text truncation method called Text Guide, in which the original text length is reduced to a predefined limit in a manner that improves performance over naive and semi-naive approaches while preserving low computational costs. Text Guide benefits from the concept of feature importance, a notion from the explainable artificial intelligence domain. We demonstrate that Text Guide can be used to improve the performance of recent language models specifically designed for long text classification, such as Longformer. Moreover, we discovered that parameter optimization is the key to Text Guide performance and must be conducted before the method is deployed. Future experiments may reveal additional benefits provided by this new method.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2021.3099758</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Analytical models ; Classification ; Computational efficiency ; Computational modeling ; Computing costs ; Cost benefit analysis ; Explainable artificial intelligence ; feature importance ; language model ; long text ; method ; Optimization ; Performance enhancement ; Task analysis ; Text categorization ; Transformers</subject><ispartof>IEEE access, 2021, Vol.9, p.105439-105450</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-2c7a94235e2e9ea5a7c473ece616166eb282dec6308ff2a7b9a7c7ab3faa5f63</citedby><cites>FETCH-LOGICAL-c408t-2c7a94235e2e9ea5a7c473ece616166eb282dec6308ff2a7b9a7c7ab3faa5f63</cites><orcidid>0000-0001-5711-1498 ; 0000-0002-8128-5356 ; 0000-0002-3793-4814 ; 0000-0002-9134-3441 ; 0000-0003-4021-1235</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9494560$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2095,4009,27612,27902,27903,27904,54912</link.rule.ids></links><search><creatorcontrib>Fiok, Krzysztof</creatorcontrib><creatorcontrib>Karwowski, Waldemar</creatorcontrib><creatorcontrib>Gutierrez-Franco, Edgar</creatorcontrib><creatorcontrib>Davahli, Mohammad Reza</creatorcontrib><creatorcontrib>Wilamowski, Maciej</creatorcontrib><creatorcontrib>Ahram, Tareq</creatorcontrib><creatorcontrib>Al-Juaid, Awad</creatorcontrib><creatorcontrib>Zurada, Jozef</creatorcontrib><title>Text Guide: Improving the Quality of Long Text Classification by a Text Selection Method Based on Feature Importance</title><title>IEEE access</title><addtitle>Access</addtitle><description>The performance of text classification methods has improved greatly over the last decade for text instances of less than 512 tokens. This limit has been adopted by most state-of-the-research transformer models due to the high computational cost of analyzing longer text instances. To mitigate this problem and to improve classification for longer texts, researchers have sought to resolve the underlying causes of the computational cost and have proposed optimizations for the attention mechanism, which is the key element of every transformer model. In our study, we are not pursuing the ultimate goal of long text classification, i.e., the ability to analyze entire text instances at one time while preserving high performance at a reasonable computational cost. Instead, we propose a text truncation method called Text Guide, in which the original text length is reduced to a predefined limit in a manner that improves performance over naive and semi-naive approaches while preserving low computational costs. Text Guide benefits from the concept of feature importance, a notion from the explainable artificial intelligence domain. We demonstrate that Text Guide can be used to improve the performance of recent language models specifically designed for long text classification, such as Longformer. Moreover, we discovered that parameter optimization is the key to Text Guide performance and must be conducted before the method is deployed. Future experiments may reveal additional benefits provided by this new method.</description><subject>Analytical models</subject><subject>Classification</subject><subject>Computational efficiency</subject><subject>Computational modeling</subject><subject>Computing costs</subject><subject>Cost benefit analysis</subject><subject>Explainable artificial intelligence</subject><subject>feature importance</subject><subject>language model</subject><subject>long text</subject><subject>method</subject><subject>Optimization</subject><subject>Performance enhancement</subject><subject>Task analysis</subject><subject>Text categorization</subject><subject>Transformers</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUcFuGyEUXFWt1CjJF-SC1LNdFhZYektWSWrJURTZd_SWfSRYG5MCG9V_X-yNovIOwDAz74mpqquaLuua6p_XXXe72SwZZfWSU62VaL9UZ6yWesEFl1__O3-vLlPa0bLaAgl1VuUt_s3kfvID_iKr17cY3v3-meQXJE8TjD4fSHBkHQp2YnYjpOSdt5B92JP-QGB-2OCI9oQ9YH4JA7mBhAMp9zuEPEU8uoeYYW_xovrmYEx4-bGfV9u72233e7F-vF911-uFbWibF8wq0A3jAhlqBAHKNoqjRVmXktizlg1oJaetcwxUrwtDQc8dgHCSn1er2XYIsDNv0b9CPJgA3pyAEJ8NxOztiEYLJmrJmdBN3yg6gGSDo1Y1MPQ9ale8fsxe5Yf-TJiy2YUp7sv0hgmhtGolP3bkM8vGkFJE99m1puYYlpnDMsewzEdYRXU1qzwifip0oxshKf8H4FOQ4Q</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Fiok, Krzysztof</creator><creator>Karwowski, Waldemar</creator><creator>Gutierrez-Franco, Edgar</creator><creator>Davahli, Mohammad Reza</creator><creator>Wilamowski, Maciej</creator><creator>Ahram, Tareq</creator><creator>Al-Juaid, Awad</creator><creator>Zurada, Jozef</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-5711-1498</orcidid><orcidid>https://orcid.org/0000-0002-8128-5356</orcidid><orcidid>https://orcid.org/0000-0002-3793-4814</orcidid><orcidid>https://orcid.org/0000-0002-9134-3441</orcidid><orcidid>https://orcid.org/0000-0003-4021-1235</orcidid></search><sort><creationdate>2021</creationdate><title>Text Guide: Improving the Quality of Long Text Classification by a Text Selection Method Based on Feature Importance</title><author>Fiok, Krzysztof ; Karwowski, Waldemar ; Gutierrez-Franco, Edgar ; Davahli, Mohammad Reza ; Wilamowski, Maciej ; Ahram, Tareq ; Al-Juaid, Awad ; Zurada, Jozef</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-2c7a94235e2e9ea5a7c473ece616166eb282dec6308ff2a7b9a7c7ab3faa5f63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Analytical models</topic><topic>Classification</topic><topic>Computational efficiency</topic><topic>Computational modeling</topic><topic>Computing costs</topic><topic>Cost benefit analysis</topic><topic>Explainable artificial intelligence</topic><topic>feature importance</topic><topic>language model</topic><topic>long text</topic><topic>method</topic><topic>Optimization</topic><topic>Performance enhancement</topic><topic>Task analysis</topic><topic>Text categorization</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Fiok, Krzysztof</creatorcontrib><creatorcontrib>Karwowski, Waldemar</creatorcontrib><creatorcontrib>Gutierrez-Franco, Edgar</creatorcontrib><creatorcontrib>Davahli, Mohammad Reza</creatorcontrib><creatorcontrib>Wilamowski, Maciej</creatorcontrib><creatorcontrib>Ahram, Tareq</creatorcontrib><creatorcontrib>Al-Juaid, Awad</creatorcontrib><creatorcontrib>Zurada, Jozef</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fiok, Krzysztof</au><au>Karwowski, Waldemar</au><au>Gutierrez-Franco, Edgar</au><au>Davahli, Mohammad Reza</au><au>Wilamowski, Maciej</au><au>Ahram, Tareq</au><au>Al-Juaid, Awad</au><au>Zurada, Jozef</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Text Guide: Improving the Quality of Long Text Classification by a Text Selection Method Based on Feature Importance</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2021</date><risdate>2021</risdate><volume>9</volume><spage>105439</spage><epage>105450</epage><pages>105439-105450</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>The performance of text classification methods has improved greatly over the last decade for text instances of less than 512 tokens. This limit has been adopted by most state-of-the-research transformer models due to the high computational cost of analyzing longer text instances. To mitigate this problem and to improve classification for longer texts, researchers have sought to resolve the underlying causes of the computational cost and have proposed optimizations for the attention mechanism, which is the key element of every transformer model. In our study, we are not pursuing the ultimate goal of long text classification, i.e., the ability to analyze entire text instances at one time while preserving high performance at a reasonable computational cost. Instead, we propose a text truncation method called Text Guide, in which the original text length is reduced to a predefined limit in a manner that improves performance over naive and semi-naive approaches while preserving low computational costs. Text Guide benefits from the concept of feature importance, a notion from the explainable artificial intelligence domain. We demonstrate that Text Guide can be used to improve the performance of recent language models specifically designed for long text classification, such as Longformer. Moreover, we discovered that parameter optimization is the key to Text Guide performance and must be conducted before the method is deployed. Future experiments may reveal additional benefits provided by this new method.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2021.3099758</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-5711-1498</orcidid><orcidid>https://orcid.org/0000-0002-8128-5356</orcidid><orcidid>https://orcid.org/0000-0002-3793-4814</orcidid><orcidid>https://orcid.org/0000-0002-9134-3441</orcidid><orcidid>https://orcid.org/0000-0003-4021-1235</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2021, Vol.9, p.105439-105450
issn 2169-3536
2169-3536
language eng
recordid cdi_crossref_primary_10_1109_ACCESS_2021_3099758
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals
subjects Analytical models
Classification
Computational efficiency
Computational modeling
Computing costs
Cost benefit analysis
Explainable artificial intelligence
feature importance
language model
long text
method
Optimization
Performance enhancement
Task analysis
Text categorization
Transformers
title Text Guide: Improving the Quality of Long Text Classification by a Text Selection Method Based on Feature Importance
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T11%3A09%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Text%20Guide:%20Improving%20the%20Quality%20of%20Long%20Text%20Classification%20by%20a%20Text%20Selection%20Method%20Based%20on%20Feature%20Importance&rft.jtitle=IEEE%20access&rft.au=Fiok,%20Krzysztof&rft.date=2021&rft.volume=9&rft.spage=105439&rft.epage=105450&rft.pages=105439-105450&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2021.3099758&rft_dat=%3Cproquest_cross%3E2557978636%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2557978636&rft_id=info:pmid/&rft_ieee_id=9494560&rft_doaj_id=oai_doaj_org_article_95251632594b470da62df0c74adbbe9f&rfr_iscdi=true