An automated text categorization framework based on hyperparameter optimization

A great variety of text tasks such as topic or spam identification, user profiling, and sentiment analysis can be posed as a supervised learning problem and tackled using a text classifier. A text classifier consists of several subprocesses, some of them are general enough to be applied to any super...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge-based systems 2018-06, Vol.149, p.110-123
Hauptverfasser:	Tellez, Eric S., Moctezuma, Daniela, Miranda-Jiménez, Sabino, Graff, Mario
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial intelligence Authoring Classification Classifiers Data mining Datasets Hyperparameter optimization Machine learning Natural language processing Sentiment analysis Text analysis Text categorization Text classification Text modelling
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	123
container_issue
container_start_page	110
container_title	Knowledge-based systems
container_volume	149
creator	Tellez, Eric S. Moctezuma, Daniela Miranda-Jiménez, Sabino Graff, Mario
description	A great variety of text tasks such as topic or spam identification, user profiling, and sentiment analysis can be posed as a supervised learning problem and tackled using a text classifier. A text classifier consists of several subprocesses, some of them are general enough to be applied to any supervised learning problem, whereas others are specifically designed to tackle a particular task using complex and computational expensive processes such as lemmatization, syntactic analysis, etc. Contrary to traditional approaches, we propose a minimalist and multi-propose text-classifier able to tackle tasks independently of domain and language. We named our approach μTC. Our approach is composed of several easy-to-implement text transformations, text representations, and a supervised learning algorithm. These pieces produce a competitive classifier in several challenging domains such as informally written text. We provide a detailed description of μTC along with an extensive experimental comparison with relevant state-of-the-art methods, i.e., μTC was compared on 30 different datasets. Regarding accuracy, μTC obtained the best performance in 20 datasets while achieves competitive results in the remaining ones. The compared datasets include several problems like topic and polarity classification, spam detection, user profiling and authorship attribution. Furthermore, our approach allows the usage of the technology even without an in-depth knowledge of machine learning and natural language processing.
doi_str_mv	10.1016/j.knosys.2018.03.003
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2052718832</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0950705118301217</els_id><sourcerecordid>2052718832</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-76b9979c816e91b69c15d9d789ef16e94fd5c469d076482045ebee236fadba243</originalsourceid><addsrcrecordid>eNp9kMtOwzAQRS0EEqXwBywisU4YP_LwBqmqeEmVuoG15TgTSEriYLtA-XoSpWtWM5q5947mEHJNIaFAs9s22fXWH3zCgBYJ8ASAn5AFLXIW5wLkKVmATCHOIaXn5ML7FgAYo8WCbFd9pPfBdjpgFQX8CZEZ2zfrml8dGttHtdMdflu3i0rtR804ej8M6AY9LQK6yA6h6Y7yS3JW6w-PV8e6JK8P9y_rp3izfXxerzax4VyEOM9KKXNpCpqhpGUmDU0rWeWFxHoaibpKjchkBXkmCgYixRKR8azWVamZ4EtyM-cOzn7u0QfV2r3rx5OKQcpyWhScjSoxq4yz3jus1eCaTruDoqAmdKpVMzo1oVPA1YhutN3NNhw_-GrQKW8a7A1WjUMTVGWb_wP-ABgCeuQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2052718832</pqid></control><display><type>article</type><title>An automated text categorization framework based on hyperparameter optimization</title><source>ScienceDirect Journals (5 years ago - present)</source><creator>Tellez, Eric S. ; Moctezuma, Daniela ; Miranda-Jiménez, Sabino ; Graff, Mario</creator><creatorcontrib>Tellez, Eric S. ; Moctezuma, Daniela ; Miranda-Jiménez, Sabino ; Graff, Mario</creatorcontrib><description>A great variety of text tasks such as topic or spam identification, user profiling, and sentiment analysis can be posed as a supervised learning problem and tackled using a text classifier. A text classifier consists of several subprocesses, some of them are general enough to be applied to any supervised learning problem, whereas others are specifically designed to tackle a particular task using complex and computational expensive processes such as lemmatization, syntactic analysis, etc. Contrary to traditional approaches, we propose a minimalist and multi-propose text-classifier able to tackle tasks independently of domain and language. We named our approach μTC. Our approach is composed of several easy-to-implement text transformations, text representations, and a supervised learning algorithm. These pieces produce a competitive classifier in several challenging domains such as informally written text. We provide a detailed description of μTC along with an extensive experimental comparison with relevant state-of-the-art methods, i.e., μTC was compared on 30 different datasets. Regarding accuracy, μTC obtained the best performance in 20 datasets while achieves competitive results in the remaining ones. The compared datasets include several problems like topic and polarity classification, spam detection, user profiling and authorship attribution. Furthermore, our approach allows the usage of the technology even without an in-depth knowledge of machine learning and natural language processing.</description><identifier>ISSN: 0950-7051</identifier><identifier>EISSN: 1872-7409</identifier><identifier>DOI: 10.1016/j.knosys.2018.03.003</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Algorithms ; Artificial intelligence ; Authoring ; Classification ; Classifiers ; Data mining ; Datasets ; Hyperparameter optimization ; Machine learning ; Natural language processing ; Sentiment analysis ; Text analysis ; Text categorization ; Text classification ; Text modelling</subject><ispartof>Knowledge-based systems, 2018-06, Vol.149, p.110-123</ispartof><rights>2018 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Jun 1, 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c334t-76b9979c816e91b69c15d9d789ef16e94fd5c469d076482045ebee236fadba243</citedby><cites>FETCH-LOGICAL-c334t-76b9979c816e91b69c15d9d789ef16e94fd5c469d076482045ebee236fadba243</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.knosys.2018.03.003$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3548,27922,27923,45993</link.rule.ids></links><search><creatorcontrib>Tellez, Eric S.</creatorcontrib><creatorcontrib>Moctezuma, Daniela</creatorcontrib><creatorcontrib>Miranda-Jiménez, Sabino</creatorcontrib><creatorcontrib>Graff, Mario</creatorcontrib><title>An automated text categorization framework based on hyperparameter optimization</title><title>Knowledge-based systems</title><description>A great variety of text tasks such as topic or spam identification, user profiling, and sentiment analysis can be posed as a supervised learning problem and tackled using a text classifier. A text classifier consists of several subprocesses, some of them are general enough to be applied to any supervised learning problem, whereas others are specifically designed to tackle a particular task using complex and computational expensive processes such as lemmatization, syntactic analysis, etc. Contrary to traditional approaches, we propose a minimalist and multi-propose text-classifier able to tackle tasks independently of domain and language. We named our approach μTC. Our approach is composed of several easy-to-implement text transformations, text representations, and a supervised learning algorithm. These pieces produce a competitive classifier in several challenging domains such as informally written text. We provide a detailed description of μTC along with an extensive experimental comparison with relevant state-of-the-art methods, i.e., μTC was compared on 30 different datasets. Regarding accuracy, μTC obtained the best performance in 20 datasets while achieves competitive results in the remaining ones. The compared datasets include several problems like topic and polarity classification, spam detection, user profiling and authorship attribution. Furthermore, our approach allows the usage of the technology even without an in-depth knowledge of machine learning and natural language processing.</description><subject>Algorithms</subject><subject>Artificial intelligence</subject><subject>Authoring</subject><subject>Classification</subject><subject>Classifiers</subject><subject>Data mining</subject><subject>Datasets</subject><subject>Hyperparameter optimization</subject><subject>Machine learning</subject><subject>Natural language processing</subject><subject>Sentiment analysis</subject><subject>Text analysis</subject><subject>Text categorization</subject><subject>Text classification</subject><subject>Text modelling</subject><issn>0950-7051</issn><issn>1872-7409</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNp9kMtOwzAQRS0EEqXwBywisU4YP_LwBqmqeEmVuoG15TgTSEriYLtA-XoSpWtWM5q5947mEHJNIaFAs9s22fXWH3zCgBYJ8ASAn5AFLXIW5wLkKVmATCHOIaXn5ML7FgAYo8WCbFd9pPfBdjpgFQX8CZEZ2zfrml8dGttHtdMdflu3i0rtR804ej8M6AY9LQK6yA6h6Y7yS3JW6w-PV8e6JK8P9y_rp3izfXxerzax4VyEOM9KKXNpCpqhpGUmDU0rWeWFxHoaibpKjchkBXkmCgYixRKR8azWVamZ4EtyM-cOzn7u0QfV2r3rx5OKQcpyWhScjSoxq4yz3jus1eCaTruDoqAmdKpVMzo1oVPA1YhutN3NNhw_-GrQKW8a7A1WjUMTVGWb_wP-ABgCeuQ</recordid><startdate>20180601</startdate><enddate>20180601</enddate><creator>Tellez, Eric S.</creator><creator>Moctezuma, Daniela</creator><creator>Miranda-Jiménez, Sabino</creator><creator>Graff, Mario</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20180601</creationdate><title>An automated text categorization framework based on hyperparameter optimization</title><author>Tellez, Eric S. ; Moctezuma, Daniela ; Miranda-Jiménez, Sabino ; Graff, Mario</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-76b9979c816e91b69c15d9d789ef16e94fd5c469d076482045ebee236fadba243</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Algorithms</topic><topic>Artificial intelligence</topic><topic>Authoring</topic><topic>Classification</topic><topic>Classifiers</topic><topic>Data mining</topic><topic>Datasets</topic><topic>Hyperparameter optimization</topic><topic>Machine learning</topic><topic>Natural language processing</topic><topic>Sentiment analysis</topic><topic>Text analysis</topic><topic>Text categorization</topic><topic>Text classification</topic><topic>Text modelling</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tellez, Eric S.</creatorcontrib><creatorcontrib>Moctezuma, Daniela</creatorcontrib><creatorcontrib>Miranda-Jiménez, Sabino</creatorcontrib><creatorcontrib>Graff, Mario</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Knowledge-based systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tellez, Eric S.</au><au>Moctezuma, Daniela</au><au>Miranda-Jiménez, Sabino</au><au>Graff, Mario</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An automated text categorization framework based on hyperparameter optimization</atitle><jtitle>Knowledge-based systems</jtitle><date>2018-06-01</date><risdate>2018</risdate><volume>149</volume><spage>110</spage><epage>123</epage><pages>110-123</pages><issn>0950-7051</issn><eissn>1872-7409</eissn><abstract>A great variety of text tasks such as topic or spam identification, user profiling, and sentiment analysis can be posed as a supervised learning problem and tackled using a text classifier. A text classifier consists of several subprocesses, some of them are general enough to be applied to any supervised learning problem, whereas others are specifically designed to tackle a particular task using complex and computational expensive processes such as lemmatization, syntactic analysis, etc. Contrary to traditional approaches, we propose a minimalist and multi-propose text-classifier able to tackle tasks independently of domain and language. We named our approach μTC. Our approach is composed of several easy-to-implement text transformations, text representations, and a supervised learning algorithm. These pieces produce a competitive classifier in several challenging domains such as informally written text. We provide a detailed description of μTC along with an extensive experimental comparison with relevant state-of-the-art methods, i.e., μTC was compared on 30 different datasets. Regarding accuracy, μTC obtained the best performance in 20 datasets while achieves competitive results in the remaining ones. The compared datasets include several problems like topic and polarity classification, spam detection, user profiling and authorship attribution. Furthermore, our approach allows the usage of the technology even without an in-depth knowledge of machine learning and natural language processing.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.knosys.2018.03.003</doi><tpages>14</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0950-7051
ispartof	Knowledge-based systems, 2018-06, Vol.149, p.110-123
issn	0950-7051 1872-7409
language	eng
recordid	cdi_proquest_journals_2052718832
source	ScienceDirect Journals (5 years ago - present)
subjects	Algorithms Artificial intelligence Authoring Classification Classifiers Data mining Datasets Hyperparameter optimization Machine learning Natural language processing Sentiment analysis Text analysis Text categorization Text classification Text modelling
title	An automated text categorization framework based on hyperparameter optimization
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T09%3A35%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20automated%20text%20categorization%20framework%20based%20on%20hyperparameter%20optimization&rft.jtitle=Knowledge-based%20systems&rft.au=Tellez,%20Eric%20S.&rft.date=2018-06-01&rft.volume=149&rft.spage=110&rft.epage=123&rft.pages=110-123&rft.issn=0950-7051&rft.eissn=1872-7409&rft_id=info:doi/10.1016/j.knosys.2018.03.003&rft_dat=%3Cproquest_cross%3E2052718832%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2052718832&rft_id=info:pmid/&rft_els_id=S0950705118301217&rfr_iscdi=true