The effect of Chi-Square Feature Selection on Question Classification using Multinomial Naïve Bayes

Question classification is one of the essential tasks for question answering system. This task will determine the expected answer type (EAT) of the question given to the system. Multinomial Naïve Bayes algorithm is one of the learning algorithms that can be used to classify questions. At the classif...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Sinkron 2022-10, Vol.7 (4), p.2430-2436
Hauptverfasser: Yusliani, Novi, Aruda, Syechky Al Qodrin, Marieska, Mastura Diana, Saputra, Danny Mathew, Abdiansah, Abdiansah
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2436
container_issue 4
container_start_page 2430
container_title Sinkron
container_volume 7
creator Yusliani, Novi
Aruda, Syechky Al Qodrin
Marieska, Mastura Diana
Saputra, Danny Mathew
Abdiansah, Abdiansah
description Question classification is one of the essential tasks for question answering system. This task will determine the expected answer type (EAT) of the question given to the system. Multinomial Naïve Bayes algorithm is one of the learning algorithms that can be used to classify questions. At the classification stage, this algorithm used a set of features in the knowledge model. The number of features used can result in curse of dimensionality if the feature is in high dimension. Feature selection can be used to reduce the feature dimension and could increase the system performance. Chi-Square algorithm can be used to select features that describe each category. In this research, the Multinomial Naïve Bayes is used to classify the question sentences and the Chi-Square algorithm is used for the feature selection. The dataset used is a set of Indonesian question sentences, consisting of 519 labeled factoids, 491 labeled non-factoids, and 185 labeled other. The test results showed an increase in accuracy of 0.1 when used feature selection. System accuracy when used feature selection is 0.87 with the number of features used are 248. Without feature selection, the accuracy is 0.77 with the number of features used are 1374.
doi_str_mv 10.33395/sinkron.v7i4.11788
format Article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_33395_sinkron_v7i4_11788</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_33395_sinkron_v7i4_11788</sourcerecordid><originalsourceid>FETCH-crossref_primary_10_33395_sinkron_v7i4_117883</originalsourceid><addsrcrecordid>eNqdj91KAzEQhYMoWLRP0Ju8wK7JJtuf2y4WbxSkvfAujOvEDk03NdkU-lQ-hC9mXPYJhIFzhuEM52NsJkWplFrVD5G6Q_BdeV6QLqVcLJdXbFLVWhaVkKvr0Qut327ZNEZ6F7Wci1pX8wn72O2Ro7XY9txb3uyp2H4lCMg3CH3KukWXj-Q7nuc1YRx84yB_stTCsKbc4ZM_J9dT548Ejr_Az_cZ-RouGO_ZjQUXcTrqHVObx13zVLTBxxjQmlOgI4SLkcIMTGZkMn9MZmBS_0v9AmcGW_Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>The effect of Chi-Square Feature Selection on Question Classification using Multinomial Naïve Bayes</title><source>Alma/SFX Local Collection</source><creator>Yusliani, Novi ; Aruda, Syechky Al Qodrin ; Marieska, Mastura Diana ; Saputra, Danny Mathew ; Abdiansah, Abdiansah</creator><creatorcontrib>Yusliani, Novi ; Aruda, Syechky Al Qodrin ; Marieska, Mastura Diana ; Saputra, Danny Mathew ; Abdiansah, Abdiansah</creatorcontrib><description>Question classification is one of the essential tasks for question answering system. This task will determine the expected answer type (EAT) of the question given to the system. Multinomial Naïve Bayes algorithm is one of the learning algorithms that can be used to classify questions. At the classification stage, this algorithm used a set of features in the knowledge model. The number of features used can result in curse of dimensionality if the feature is in high dimension. Feature selection can be used to reduce the feature dimension and could increase the system performance. Chi-Square algorithm can be used to select features that describe each category. In this research, the Multinomial Naïve Bayes is used to classify the question sentences and the Chi-Square algorithm is used for the feature selection. The dataset used is a set of Indonesian question sentences, consisting of 519 labeled factoids, 491 labeled non-factoids, and 185 labeled other. The test results showed an increase in accuracy of 0.1 when used feature selection. System accuracy when used feature selection is 0.87 with the number of features used are 248. Without feature selection, the accuracy is 0.77 with the number of features used are 1374.</description><identifier>ISSN: 2541-044X</identifier><identifier>EISSN: 2541-2019</identifier><identifier>DOI: 10.33395/sinkron.v7i4.11788</identifier><language>eng</language><ispartof>Sinkron, 2022-10, Vol.7 (4), p.2430-2436</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Yusliani, Novi</creatorcontrib><creatorcontrib>Aruda, Syechky Al Qodrin</creatorcontrib><creatorcontrib>Marieska, Mastura Diana</creatorcontrib><creatorcontrib>Saputra, Danny Mathew</creatorcontrib><creatorcontrib>Abdiansah, Abdiansah</creatorcontrib><title>The effect of Chi-Square Feature Selection on Question Classification using Multinomial Naïve Bayes</title><title>Sinkron</title><description>Question classification is one of the essential tasks for question answering system. This task will determine the expected answer type (EAT) of the question given to the system. Multinomial Naïve Bayes algorithm is one of the learning algorithms that can be used to classify questions. At the classification stage, this algorithm used a set of features in the knowledge model. The number of features used can result in curse of dimensionality if the feature is in high dimension. Feature selection can be used to reduce the feature dimension and could increase the system performance. Chi-Square algorithm can be used to select features that describe each category. In this research, the Multinomial Naïve Bayes is used to classify the question sentences and the Chi-Square algorithm is used for the feature selection. The dataset used is a set of Indonesian question sentences, consisting of 519 labeled factoids, 491 labeled non-factoids, and 185 labeled other. The test results showed an increase in accuracy of 0.1 when used feature selection. System accuracy when used feature selection is 0.87 with the number of features used are 248. Without feature selection, the accuracy is 0.77 with the number of features used are 1374.</description><issn>2541-044X</issn><issn>2541-2019</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNqdj91KAzEQhYMoWLRP0Ju8wK7JJtuf2y4WbxSkvfAujOvEDk03NdkU-lQ-hC9mXPYJhIFzhuEM52NsJkWplFrVD5G6Q_BdeV6QLqVcLJdXbFLVWhaVkKvr0Qut327ZNEZ6F7Wci1pX8wn72O2Ro7XY9txb3uyp2H4lCMg3CH3KukWXj-Q7nuc1YRx84yB_stTCsKbc4ZM_J9dT548Ejr_Az_cZ-RouGO_ZjQUXcTrqHVObx13zVLTBxxjQmlOgI4SLkcIMTGZkMn9MZmBS_0v9AmcGW_Q</recordid><startdate>20221009</startdate><enddate>20221009</enddate><creator>Yusliani, Novi</creator><creator>Aruda, Syechky Al Qodrin</creator><creator>Marieska, Mastura Diana</creator><creator>Saputra, Danny Mathew</creator><creator>Abdiansah, Abdiansah</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20221009</creationdate><title>The effect of Chi-Square Feature Selection on Question Classification using Multinomial Naïve Bayes</title><author>Yusliani, Novi ; Aruda, Syechky Al Qodrin ; Marieska, Mastura Diana ; Saputra, Danny Mathew ; Abdiansah, Abdiansah</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-crossref_primary_10_33395_sinkron_v7i4_117883</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Yusliani, Novi</creatorcontrib><creatorcontrib>Aruda, Syechky Al Qodrin</creatorcontrib><creatorcontrib>Marieska, Mastura Diana</creatorcontrib><creatorcontrib>Saputra, Danny Mathew</creatorcontrib><creatorcontrib>Abdiansah, Abdiansah</creatorcontrib><collection>CrossRef</collection><jtitle>Sinkron</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yusliani, Novi</au><au>Aruda, Syechky Al Qodrin</au><au>Marieska, Mastura Diana</au><au>Saputra, Danny Mathew</au><au>Abdiansah, Abdiansah</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The effect of Chi-Square Feature Selection on Question Classification using Multinomial Naïve Bayes</atitle><jtitle>Sinkron</jtitle><date>2022-10-09</date><risdate>2022</risdate><volume>7</volume><issue>4</issue><spage>2430</spage><epage>2436</epage><pages>2430-2436</pages><issn>2541-044X</issn><eissn>2541-2019</eissn><abstract>Question classification is one of the essential tasks for question answering system. This task will determine the expected answer type (EAT) of the question given to the system. Multinomial Naïve Bayes algorithm is one of the learning algorithms that can be used to classify questions. At the classification stage, this algorithm used a set of features in the knowledge model. The number of features used can result in curse of dimensionality if the feature is in high dimension. Feature selection can be used to reduce the feature dimension and could increase the system performance. Chi-Square algorithm can be used to select features that describe each category. In this research, the Multinomial Naïve Bayes is used to classify the question sentences and the Chi-Square algorithm is used for the feature selection. The dataset used is a set of Indonesian question sentences, consisting of 519 labeled factoids, 491 labeled non-factoids, and 185 labeled other. The test results showed an increase in accuracy of 0.1 when used feature selection. System accuracy when used feature selection is 0.87 with the number of features used are 248. Without feature selection, the accuracy is 0.77 with the number of features used are 1374.</abstract><doi>10.33395/sinkron.v7i4.11788</doi></addata></record>
fulltext fulltext
identifier ISSN: 2541-044X
ispartof Sinkron, 2022-10, Vol.7 (4), p.2430-2436
issn 2541-044X
2541-2019
language eng
recordid cdi_crossref_primary_10_33395_sinkron_v7i4_11788
source Alma/SFX Local Collection
title The effect of Chi-Square Feature Selection on Question Classification using Multinomial Naïve Bayes
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T21%3A36%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20effect%20of%20Chi-Square%20Feature%20Selection%20on%20Question%20Classification%20using%20Multinomial%20Na%C3%AFve%20Bayes&rft.jtitle=Sinkron&rft.au=Yusliani,%20Novi&rft.date=2022-10-09&rft.volume=7&rft.issue=4&rft.spage=2430&rft.epage=2436&rft.pages=2430-2436&rft.issn=2541-044X&rft.eissn=2541-2019&rft_id=info:doi/10.33395/sinkron.v7i4.11788&rft_dat=%3Ccrossref%3E10_33395_sinkron_v7i4_11788%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true