Text Guide: Improving the Quality of Long Text Classification by a Text Selection Method Based on Feature Importance

The performance of text classification methods has improved greatly over the last decade for text instances of less than 512 tokens. This limit has been adopted by most state-of-the-research transformer models due to the high computational cost of analyzing longer text instances. To mitigate this pr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2021, Vol.9, p.105439-105450
Hauptverfasser:	Fiok, Krzysztof, Karwowski, Waldemar, Gutierrez-Franco, Edgar, Davahli, Mohammad Reza, Wilamowski, Maciej, Ahram, Tareq, Al-Juaid, Awad, Zurada, Jozef
Format:	Artikel
Sprache:	eng
Schlagworte:	Analytical models Classification Computational efficiency Computational modeling Computing costs Cost benefit analysis Explainable artificial intelligence feature importance language model long text method Optimization Performance enhancement Task analysis Text categorization Transformers
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	105450
container_issue
container_start_page	105439
container_title	IEEE access
container_volume	9
creator	Fiok, Krzysztof Karwowski, Waldemar Gutierrez-Franco, Edgar Davahli, Mohammad Reza Wilamowski, Maciej Ahram, Tareq Al-Juaid, Awad Zurada, Jozef
description	The performance of text classification methods has improved greatly over the last decade for text instances of less than 512 tokens. This limit has been adopted by most state-of-the-research transformer models due to the high computational cost of analyzing longer text instances. To mitigate this problem and to improve classification for longer texts, researchers have sought to resolve the underlying causes of the computational cost and have proposed optimizations for the attention mechanism, which is the key element of every transformer model. In our study, we are not pursuing the ultimate goal of long text classification, i.e., the ability to analyze entire text instances at one time while preserving high performance at a reasonable computational cost. Instead, we propose a text truncation method called Text Guide, in which the original text length is reduced to a predefined limit in a manner that improves performance over naive and semi-naive approaches while preserving low computational costs. Text Guide benefits from the concept of feature importance, a notion from the explainable artificial intelligence domain. We demonstrate that Text Guide can be used to improve the performance of recent language models specifically designed for long text classification, such as Longformer. Moreover, we discovered that parameter optimization is the key to Text Guide performance and must be conducted before the method is deployed. Future experiments may reveal additional benefits provided by this new method.
doi_str_mv	10.1109/ACCESS.2021.3099758
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2021_3099758</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9494560</ieee_id><doaj_id>oai_doaj_org_article_95251632594b470da62df0c74adbbe9f</doaj_id><sourcerecordid>2557978636</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-2c7a94235e2e9ea5a7c473ece616166eb282dec6308ff2a7b9a7c7ab3faa5f63</originalsourceid><addsrcrecordid>eNpNUcFuGyEUXFWt1CjJF-SC1LNdFhZYektWSWrJURTZd_SWfSRYG5MCG9V_X-yNovIOwDAz74mpqquaLuua6p_XXXe72SwZZfWSU62VaL9UZ6yWesEFl1__O3-vLlPa0bLaAgl1VuUt_s3kfvID_iKr17cY3v3-meQXJE8TjD4fSHBkHQp2YnYjpOSdt5B92JP-QGB-2OCI9oQ9YH4JA7mBhAMp9zuEPEU8uoeYYW_xovrmYEx4-bGfV9u72233e7F-vF911-uFbWibF8wq0A3jAhlqBAHKNoqjRVmXktizlg1oJaetcwxUrwtDQc8dgHCSn1er2XYIsDNv0b9CPJgA3pyAEJ8NxOztiEYLJmrJmdBN3yg6gGSDo1Y1MPQ9ale8fsxe5Yf-TJiy2YUp7sv0hgmhtGolP3bkM8vGkFJE99m1puYYlpnDMsewzEdYRXU1qzwifip0oxshKf8H4FOQ4Q</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2557978636</pqid></control><display><type>article</type><title>Text Guide: Improving the Quality of Long Text Classification by a Text Selection Method Based on Feature Importance</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Fiok, Krzysztof ; Karwowski, Waldemar ; Gutierrez-Franco, Edgar ; Davahli, Mohammad Reza ; Wilamowski, Maciej ; Ahram, Tareq ; Al-Juaid, Awad ; Zurada, Jozef</creator><creatorcontrib>Fiok, Krzysztof ; Karwowski, Waldemar ; Gutierrez-Franco, Edgar ; Davahli, Mohammad Reza ; Wilamowski, Maciej ; Ahram, Tareq ; Al-Juaid, Awad ; Zurada, Jozef</creatorcontrib><description>The performance of text classification methods has improved greatly over the last decade for text instances of less than 512 tokens. This limit has been adopted by most state-of-the-research transformer models due to the high computational cost of analyzing longer text instances. To mitigate this problem and to improve classification for longer texts, researchers have sought to resolve the underlying causes of the computational cost and have proposed optimizations for the attention mechanism, which is the key element of every transformer model. In our study, we are not pursuing the ultimate goal of long text classification, i.e., the ability to analyze entire text instances at one time while preserving high performance at a reasonable computational cost. Instead, we propose a text truncation method called Text Guide, in which the original text length is reduced to a predefined limit in a manner that improves performance over naive and semi-naive approaches while preserving low computational costs. Text Guide benefits from the concept of feature importance, a notion from the explainable artificial intelligence domain. We demonstrate that Text Guide can be used to improve the performance of recent language models specifically designed for long text classification, such as Longformer. Moreover, we discovered that parameter optimization is the key to Text Guide performance and must be conducted before the method is deployed. Future experiments may reveal additional benefits provided by this new method.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2021.3099758</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Analytical models ; Classification ; Computational efficiency ; Computational modeling ; Computing costs ; Cost benefit analysis ; Explainable artificial intelligence ; feature importance ; language model ; long text ; method ; Optimization ; Performance enhancement ; Task analysis ; Text categorization ; Transformers</subject><ispartof>IEEE access, 2021, Vol.9, p.105439-105450</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-2c7a94235e2e9ea5a7c473ece616166eb282dec6308ff2a7b9a7c7ab3faa5f63</citedby><cites>FETCH-LOGICAL-c408t-2c7a94235e2e9ea5a7c473ece616166eb282dec6308ff2a7b9a7c7ab3faa5f63</cites><orcidid>0000-0001-5711-1498 ; 0000-0002-8128-5356 ; 0000-0002-3793-4814 ; 0000-0002-9134-3441 ; 0000-0003-4021-1235</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9494560$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2095,4009,27612,27902,27903,27904,54912</link.rule.ids></links><search><creatorcontrib>Fiok, Krzysztof</creatorcontrib><creatorcontrib>Karwowski, Waldemar</creatorcontrib><creatorcontrib>Gutierrez-Franco, Edgar</creatorcontrib><creatorcontrib>Davahli, Mohammad Reza</creatorcontrib><creatorcontrib>Wilamowski, Maciej</creatorcontrib><creatorcontrib>Ahram, Tareq</creatorcontrib><creatorcontrib>Al-Juaid, Awad</creatorcontrib><creatorcontrib>Zurada, Jozef</creatorcontrib><title>Text Guide: Improving the Quality of Long Text Classification by a Text Selection Method Based on Feature Importance</title><title>IEEE access</title><addtitle>Access</addtitle><description>The performance of text classification methods has improved greatly over the last decade for text instances of less than 512 tokens. This limit has been adopted by most state-of-the-research transformer models due to the high computational cost of analyzing longer text instances. To mitigate this problem and to improve classification for longer texts, researchers have sought to resolve the underlying causes of the computational cost and have proposed optimizations for the attention mechanism, which is the key element of every transformer model. In our study, we are not pursuing the ultimate goal of long text classification, i.e., the ability to analyze entire text instances at one time while preserving high performance at a reasonable computational cost. Instead, we propose a text truncation method called Text Guide, in which the original text length is reduced to a predefined limit in a manner that improves performance over naive and semi-naive approaches while preserving low computational costs. Text Guide benefits from the concept of feature importance, a notion from the explainable artificial intelligence domain. We demonstrate that Text Guide can be used to improve the performance of recent language models specifically designed for long text classification, such as Longformer. Moreover, we discovered that parameter optimization is the key to Text Guide performance and must be conducted before the method is deployed. Future experiments may reveal additional benefits provided by this new method.</description><subject>Analytical models</subject><subject>Classification</subject><subject>Computational efficiency</subject><subject>Computational modeling</subject><subject>Computing costs</subject><subject>Cost benefit analysis</subject><subject>Explainable artificial intelligence</subject><subject>feature importance</subject><subject>language model</subject><subject>long text</subject><subject>method</subject><subject>Optimization</subject><subject>Performance enhancement</subject><subject>Task analysis</subject><subject>Text categorization</subject><subject>Transformers</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUcFuGyEUXFWt1CjJF-SC1LNdFhZYektWSWrJURTZd_SWfSRYG5MCG9V_X-yNovIOwDAz74mpqquaLuua6p_XXXe72SwZZfWSU62VaL9UZ6yWesEFl1__O3-vLlPa0bLaAgl1VuUt_s3kfvID_iKr17cY3v3-meQXJE8TjD4fSHBkHQp2YnYjpOSdt5B92JP-QGB-2OCI9oQ9YH4JA7mBhAMp9zuEPEU8uoeYYW_xovrmYEx4-bGfV9u72233e7F-vF911-uFbWibF8wq0A3jAhlqBAHKNoqjRVmXktizlg1oJaetcwxUrwtDQc8dgHCSn1er2XYIsDNv0b9CPJgA3pyAEJ8NxOztiEYLJmrJmdBN3yg6gGSDo1Y1MPQ9ale8fsxe5Yf-TJiy2YUp7sv0hgmhtGolP3bkM8vGkFJE99m1puYYlpnDMsewzEdYRXU1qzwifip0oxshKf8H4FOQ4Q</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Fiok, Krzysztof</creator><creator>Karwowski, Waldemar</creator><creator>Gutierrez-Franco, Edgar</creator><creator>Davahli, Mohammad Reza</creator><creator>Wilamowski, Maciej</creator><creator>Ahram, Tareq</creator><creator>Al-Juaid, Awad</creator><creator>Zurada, Jozef</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-5711-1498</orcidid><orcidid>https://orcid.org/0000-0002-8128-5356</orcidid><orcidid>https://orcid.org/0000-0002-3793-4814</orcidid><orcidid>https://orcid.org/0000-0002-9134-3441</orcidid><orcidid>https://orcid.org/0000-0003-4021-1235</orcidid></search><sort><creationdate>2021</creationdate><title>Text Guide: Improving the Quality of Long Text Classification by a Text Selection Method Based on Feature Importance</title><author>Fiok, Krzysztof ; Karwowski, Waldemar ; Gutierrez-Franco, Edgar ; Davahli, Mohammad Reza ; Wilamowski, Maciej ; Ahram, Tareq ; Al-Juaid, Awad ; Zurada, Jozef</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-2c7a94235e2e9ea5a7c473ece616166eb282dec6308ff2a7b9a7c7ab3faa5f63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Analytical models</topic><topic>Classification</topic><topic>Computational efficiency</topic><topic>Computational modeling</topic><topic>Computing costs</topic><topic>Cost benefit analysis</topic><topic>Explainable artificial intelligence</topic><topic>feature importance</topic><topic>language model</topic><topic>long text</topic><topic>method</topic><topic>Optimization</topic><topic>Performance enhancement</topic><topic>Task analysis</topic><topic>Text categorization</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Fiok, Krzysztof</creatorcontrib><creatorcontrib>Karwowski, Waldemar</creatorcontrib><creatorcontrib>Gutierrez-Franco, Edgar</creatorcontrib><creatorcontrib>Davahli, Mohammad Reza</creatorcontrib><creatorcontrib>Wilamowski, Maciej</creatorcontrib><creatorcontrib>Ahram, Tareq</creatorcontrib><creatorcontrib>Al-Juaid, Awad</creatorcontrib><creatorcontrib>Zurada, Jozef</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fiok, Krzysztof</au><au>Karwowski, Waldemar</au><au>Gutierrez-Franco, Edgar</au><au>Davahli, Mohammad Reza</au><au>Wilamowski, Maciej</au><au>Ahram, Tareq</au><au>Al-Juaid, Awad</au><au>Zurada, Jozef</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Text Guide: Improving the Quality of Long Text Classification by a Text Selection Method Based on Feature Importance</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2021</date><risdate>2021</risdate><volume>9</volume><spage>105439</spage><epage>105450</epage><pages>105439-105450</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>The performance of text classification methods has improved greatly over the last decade for text instances of less than 512 tokens. This limit has been adopted by most state-of-the-research transformer models due to the high computational cost of analyzing longer text instances. To mitigate this problem and to improve classification for longer texts, researchers have sought to resolve the underlying causes of the computational cost and have proposed optimizations for the attention mechanism, which is the key element of every transformer model. In our study, we are not pursuing the ultimate goal of long text classification, i.e., the ability to analyze entire text instances at one time while preserving high performance at a reasonable computational cost. Instead, we propose a text truncation method called Text Guide, in which the original text length is reduced to a predefined limit in a manner that improves performance over naive and semi-naive approaches while preserving low computational costs. Text Guide benefits from the concept of feature importance, a notion from the explainable artificial intelligence domain. We demonstrate that Text Guide can be used to improve the performance of recent language models specifically designed for long text classification, such as Longformer. Moreover, we discovered that parameter optimization is the key to Text Guide performance and must be conducted before the method is deployed. Future experiments may reveal additional benefits provided by this new method.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2021.3099758</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-5711-1498</orcidid><orcidid>https://orcid.org/0000-0002-8128-5356</orcidid><orcidid>https://orcid.org/0000-0002-3793-4814</orcidid><orcidid>https://orcid.org/0000-0002-9134-3441</orcidid><orcidid>https://orcid.org/0000-0003-4021-1235</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2021, Vol.9, p.105439-105450
issn	2169-3536 2169-3536
language	eng
recordid	cdi_crossref_primary_10_1109_ACCESS_2021_3099758
source	IEEE Open Access Journals; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals
subjects	Analytical models Classification Computational efficiency Computational modeling Computing costs Cost benefit analysis Explainable artificial intelligence feature importance language model long text method Optimization Performance enhancement Task analysis Text categorization Transformers
title	Text Guide: Improving the Quality of Long Text Classification by a Text Selection Method Based on Feature Importance
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T11%3A09%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Text%20Guide:%20Improving%20the%20Quality%20of%20Long%20Text%20Classification%20by%20a%20Text%20Selection%20Method%20Based%20on%20Feature%20Importance&rft.jtitle=IEEE%20access&rft.au=Fiok,%20Krzysztof&rft.date=2021&rft.volume=9&rft.spage=105439&rft.epage=105450&rft.pages=105439-105450&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2021.3099758&rft_dat=%3Cproquest_cross%3E2557978636%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2557978636&rft_id=info:pmid/&rft_ieee_id=9494560&rft_doaj_id=oai_doaj_org_article_95251632594b470da62df0c74adbbe9f&rfr_iscdi=true