Automated Categorization of Research Papers with MONO Supervised Term Weighting in RECApp

Natural Language Processing, specifically text classification or text categorization, has become a trend in computer science. Commonly, text classification is used to categorize large amounts of data to allocate less time to retrieve information. Students, as well as research advisers and panelists,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of advanced computer science & applications 2023, Vol.14 (2)
Hauptverfasser: Biol, Ivic Jan A., Depositario, Rhey Marc A., Noangay, Glenn Geo T., Melchor, Julian Michael F., Abalorio, Cristopher C., Bustillo, James Cloyd M.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 2
container_start_page
container_title International journal of advanced computer science & applications
container_volume 14
creator Biol, Ivic Jan A.
Depositario, Rhey Marc A.
Noangay, Glenn Geo T.
Melchor, Julian Michael F.
Abalorio, Cristopher C.
Bustillo, James Cloyd M.
description Natural Language Processing, specifically text classification or text categorization, has become a trend in computer science. Commonly, text classification is used to categorize large amounts of data to allocate less time to retrieve information. Students, as well as research advisers and panelists, take extra effort and time in classifying research documents. To solve this problem, the researchers used state-of-the-art supervised term weighting schemes, namely: TF-MONO and SQRTF-MONO and its application to machine learning algorithms: K-Nearest Neighbor, Linear Support Vector, Naive Bayes Classifiers, creating a total of six classifier models to ascertain which of them performs optimally in classifying research documents while utilizing Optical Character Recognition for text extraction. The results showed that among all classification models trained, SQRTF-MONO and Linear SVC outperformed all other models with an F1 score of 0.94 both in the abstract and the background of the study datasets. In conclusion, the developed classification model and application prototype can be a tool to help researchers, advisers, and panelists to lessen the time spent in classifying research documents.
doi_str_mv 10.14569/IJACSA.2023.0140240
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2791786423</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2791786423</sourcerecordid><originalsourceid>FETCH-LOGICAL-c274t-c78c2eaef8195bdb09b58dcf25f6ba9261bf8e38e9cac68e8fc13078b5b38c2d3</originalsourceid><addsrcrecordid>eNotkE1PwzAMhiMEEtPYP-AQiXNHPpo2PVbVgKHB0DYEnKI0TbZMrC1JOwS_nrDNB9uy3te2HgCuMRrjmCXZ7fQxL5b5mCBCxwjHiMToDAwIZknEWIrODz2PMErfL8HI-y0KQTOScDoAH3nfNTvZ6QoWIa8bZ39lZ5saNgYutNfSqQ18ka12Hn7bbgOf5s9zuOzDYG99sK2028E3bdebztZraGu4mBR5216BCyM_vR6d6hC83k1WxUM0m99Pi3wWKZLGXaRSroiW2nCcsbIqUVYyXilDmElKGb7EpeGacp0pqRKuuVGYopSXrKTBWdEhuDnubV3z1WvfiW3TuzqcFCTNcMqTmNCgio8q5RrvnTaidXYn3Y_ASBw4iiNH8c9RnDjSP321Zo0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2791786423</pqid></control><display><type>article</type><title>Automated Categorization of Research Papers with MONO Supervised Term Weighting in RECApp</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Biol, Ivic Jan A. ; Depositario, Rhey Marc A. ; Noangay, Glenn Geo T. ; Melchor, Julian Michael F. ; Abalorio, Cristopher C. ; Bustillo, James Cloyd M.</creator><creatorcontrib>Biol, Ivic Jan A. ; Depositario, Rhey Marc A. ; Noangay, Glenn Geo T. ; Melchor, Julian Michael F. ; Abalorio, Cristopher C. ; Bustillo, James Cloyd M.</creatorcontrib><description>Natural Language Processing, specifically text classification or text categorization, has become a trend in computer science. Commonly, text classification is used to categorize large amounts of data to allocate less time to retrieve information. Students, as well as research advisers and panelists, take extra effort and time in classifying research documents. To solve this problem, the researchers used state-of-the-art supervised term weighting schemes, namely: TF-MONO and SQRTF-MONO and its application to machine learning algorithms: K-Nearest Neighbor, Linear Support Vector, Naive Bayes Classifiers, creating a total of six classifier models to ascertain which of them performs optimally in classifying research documents while utilizing Optical Character Recognition for text extraction. The results showed that among all classification models trained, SQRTF-MONO and Linear SVC outperformed all other models with an F1 score of 0.94 both in the abstract and the background of the study datasets. In conclusion, the developed classification model and application prototype can be a tool to help researchers, advisers, and panelists to lessen the time spent in classifying research documents.</description><identifier>ISSN: 2158-107X</identifier><identifier>EISSN: 2156-5570</identifier><identifier>DOI: 10.14569/IJACSA.2023.0140240</identifier><language>eng</language><publisher>West Yorkshire: Science and Information (SAI) Organization Limited</publisher><subject>Algorithms ; Classification ; Classifiers ; Documents ; Information retrieval ; Machine learning ; Natural language processing ; Optical character recognition ; Text categorization ; Weighting</subject><ispartof>International journal of advanced computer science &amp; applications, 2023, Vol.14 (2)</ispartof><rights>2023. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,4010,27900,27901,27902</link.rule.ids></links><search><creatorcontrib>Biol, Ivic Jan A.</creatorcontrib><creatorcontrib>Depositario, Rhey Marc A.</creatorcontrib><creatorcontrib>Noangay, Glenn Geo T.</creatorcontrib><creatorcontrib>Melchor, Julian Michael F.</creatorcontrib><creatorcontrib>Abalorio, Cristopher C.</creatorcontrib><creatorcontrib>Bustillo, James Cloyd M.</creatorcontrib><title>Automated Categorization of Research Papers with MONO Supervised Term Weighting in RECApp</title><title>International journal of advanced computer science &amp; applications</title><description>Natural Language Processing, specifically text classification or text categorization, has become a trend in computer science. Commonly, text classification is used to categorize large amounts of data to allocate less time to retrieve information. Students, as well as research advisers and panelists, take extra effort and time in classifying research documents. To solve this problem, the researchers used state-of-the-art supervised term weighting schemes, namely: TF-MONO and SQRTF-MONO and its application to machine learning algorithms: K-Nearest Neighbor, Linear Support Vector, Naive Bayes Classifiers, creating a total of six classifier models to ascertain which of them performs optimally in classifying research documents while utilizing Optical Character Recognition for text extraction. The results showed that among all classification models trained, SQRTF-MONO and Linear SVC outperformed all other models with an F1 score of 0.94 both in the abstract and the background of the study datasets. In conclusion, the developed classification model and application prototype can be a tool to help researchers, advisers, and panelists to lessen the time spent in classifying research documents.</description><subject>Algorithms</subject><subject>Classification</subject><subject>Classifiers</subject><subject>Documents</subject><subject>Information retrieval</subject><subject>Machine learning</subject><subject>Natural language processing</subject><subject>Optical character recognition</subject><subject>Text categorization</subject><subject>Weighting</subject><issn>2158-107X</issn><issn>2156-5570</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>BENPR</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNotkE1PwzAMhiMEEtPYP-AQiXNHPpo2PVbVgKHB0DYEnKI0TbZMrC1JOwS_nrDNB9uy3te2HgCuMRrjmCXZ7fQxL5b5mCBCxwjHiMToDAwIZknEWIrODz2PMErfL8HI-y0KQTOScDoAH3nfNTvZ6QoWIa8bZ39lZ5saNgYutNfSqQ18ka12Hn7bbgOf5s9zuOzDYG99sK2028E3bdebztZraGu4mBR5216BCyM_vR6d6hC83k1WxUM0m99Pi3wWKZLGXaRSroiW2nCcsbIqUVYyXilDmElKGb7EpeGacp0pqRKuuVGYopSXrKTBWdEhuDnubV3z1WvfiW3TuzqcFCTNcMqTmNCgio8q5RrvnTaidXYn3Y_ASBw4iiNH8c9RnDjSP321Zo0</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Biol, Ivic Jan A.</creator><creator>Depositario, Rhey Marc A.</creator><creator>Noangay, Glenn Geo T.</creator><creator>Melchor, Julian Michael F.</creator><creator>Abalorio, Cristopher C.</creator><creator>Bustillo, James Cloyd M.</creator><general>Science and Information (SAI) Organization Limited</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7XB</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PHGZM</scope><scope>PHGZT</scope><scope>PIMPY</scope><scope>PKEHL</scope><scope>PQEST</scope><scope>PQGLB</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>2023</creationdate><title>Automated Categorization of Research Papers with MONO Supervised Term Weighting in RECApp</title><author>Biol, Ivic Jan A. ; Depositario, Rhey Marc A. ; Noangay, Glenn Geo T. ; Melchor, Julian Michael F. ; Abalorio, Cristopher C. ; Bustillo, James Cloyd M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c274t-c78c2eaef8195bdb09b58dcf25f6ba9261bf8e38e9cac68e8fc13078b5b38c2d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Classification</topic><topic>Classifiers</topic><topic>Documents</topic><topic>Information retrieval</topic><topic>Machine learning</topic><topic>Natural language processing</topic><topic>Optical character recognition</topic><topic>Text categorization</topic><topic>Weighting</topic><toplevel>online_resources</toplevel><creatorcontrib>Biol, Ivic Jan A.</creatorcontrib><creatorcontrib>Depositario, Rhey Marc A.</creatorcontrib><creatorcontrib>Noangay, Glenn Geo T.</creatorcontrib><creatorcontrib>Melchor, Julian Michael F.</creatorcontrib><creatorcontrib>Abalorio, Cristopher C.</creatorcontrib><creatorcontrib>Bustillo, James Cloyd M.</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central (New)</collection><collection>ProQuest One Academic (New)</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Middle East (New)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Applied &amp; Life Sciences</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>International journal of advanced computer science &amp; applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Biol, Ivic Jan A.</au><au>Depositario, Rhey Marc A.</au><au>Noangay, Glenn Geo T.</au><au>Melchor, Julian Michael F.</au><au>Abalorio, Cristopher C.</au><au>Bustillo, James Cloyd M.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Automated Categorization of Research Papers with MONO Supervised Term Weighting in RECApp</atitle><jtitle>International journal of advanced computer science &amp; applications</jtitle><date>2023</date><risdate>2023</risdate><volume>14</volume><issue>2</issue><issn>2158-107X</issn><eissn>2156-5570</eissn><abstract>Natural Language Processing, specifically text classification or text categorization, has become a trend in computer science. Commonly, text classification is used to categorize large amounts of data to allocate less time to retrieve information. Students, as well as research advisers and panelists, take extra effort and time in classifying research documents. To solve this problem, the researchers used state-of-the-art supervised term weighting schemes, namely: TF-MONO and SQRTF-MONO and its application to machine learning algorithms: K-Nearest Neighbor, Linear Support Vector, Naive Bayes Classifiers, creating a total of six classifier models to ascertain which of them performs optimally in classifying research documents while utilizing Optical Character Recognition for text extraction. The results showed that among all classification models trained, SQRTF-MONO and Linear SVC outperformed all other models with an F1 score of 0.94 both in the abstract and the background of the study datasets. In conclusion, the developed classification model and application prototype can be a tool to help researchers, advisers, and panelists to lessen the time spent in classifying research documents.</abstract><cop>West Yorkshire</cop><pub>Science and Information (SAI) Organization Limited</pub><doi>10.14569/IJACSA.2023.0140240</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2158-107X
ispartof International journal of advanced computer science & applications, 2023, Vol.14 (2)
issn 2158-107X
2156-5570
language eng
recordid cdi_proquest_journals_2791786423
source Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Algorithms
Classification
Classifiers
Documents
Information retrieval
Machine learning
Natural language processing
Optical character recognition
Text categorization
Weighting
title Automated Categorization of Research Papers with MONO Supervised Term Weighting in RECApp
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-16T04%3A09%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Automated%20Categorization%20of%20Research%20Papers%20with%20MONO%20Supervised%20Term%20Weighting%20in%20RECApp&rft.jtitle=International%20journal%20of%20advanced%20computer%20science%20&%20applications&rft.au=Biol,%20Ivic%20Jan%20A.&rft.date=2023&rft.volume=14&rft.issue=2&rft.issn=2158-107X&rft.eissn=2156-5570&rft_id=info:doi/10.14569/IJACSA.2023.0140240&rft_dat=%3Cproquest_cross%3E2791786423%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2791786423&rft_id=info:pmid/&rfr_iscdi=true