Cross-domain sentiment analysis model on Indonesian YouTube comment

A cross-domain sentiment analysis (CDSA) study in the Indonesian language and tree-based ensemble machine learning is quite interesting. CDSA is useful to support the labeling process of cross-domain sentiment and reduce any dependence on the experts; however, the mechanism in the opinion unstructur...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of advances in intelligent informatics 2021-03, Vol.7 (1), p.12-25
Hauptverfasser: Aribowo, Agus Sasmito, Basiron, Halizah, Yusof, Noor Fazilla Abd, Khomsah, Siti
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 25
container_issue 1
container_start_page 12
container_title International journal of advances in intelligent informatics
container_volume 7
creator Aribowo, Agus Sasmito
Basiron, Halizah
Yusof, Noor Fazilla Abd
Khomsah, Siti
description A cross-domain sentiment analysis (CDSA) study in the Indonesian language and tree-based ensemble machine learning is quite interesting. CDSA is useful to support the labeling process of cross-domain sentiment and reduce any dependence on the experts; however, the mechanism in the opinion unstructured by stop word, language expressions, and Indonesian slang words is unidentified yet. This study aimed to obtain the best model of CDSA for the opinion in Indonesia language that commonly is full of stop words and slang words in the Indonesian dialect. This study was purposely to observe the benefits of the stop words cleaning and slang words conversion in CDSA in the Indonesian language form. It was also to find out which machine learning method is suitable for this model. This study started by crawling five datasets of the comments on YouTube from 5 different domains. The dataset was copied into two groups: the dataset group without any process of stop word cleaning and slang word conversion and the dataset group to stop word cleaning and slang word conversion. CDSA model was built for each dataset group and then tested using two types of tree-based ensemble machine learning, i.e., Random Forest (RF) and Extra Tree (ET) classifier, and tested using three types of nonensemble machine learning, including Naive Bayes (NB), SVM, and Decision Tree (DT) as the comparison. Then, It can be suggested that the accuracy of CDSA in Indonesia Language increased if it still removed the stop words and converted the slang words. The best classifier model was built using tree-based ensemble machine learning, particularly ET, as in this study, the ET model could achieve the highest accuracy by 91.19%. This model is expected to be the CDSA technique alternative in the Indonesian language.
doi_str_mv 10.26555/ijain.v7il.554
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2604083038</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2604083038</sourcerecordid><originalsourceid>FETCH-proquest_journals_26040830383</originalsourceid><addsrcrecordid>eNqNjDEPgjAUhBujiUSZXZs4g68tBXai0Z3FiVSpSQn0KQ9M_Pdi4uDoct8N3x1jGwGxTLXWO9cY5-Nn5tpY62TGApkkMkp1JuY_fclCogYARC4zUCJgRdEjUVRjN-05WT-4bgpuvGlf5Ih3WNuWo-cnX6O35IznZxzL8WL5FbuPvGaLm2nJhl-u2PawL4tjdO_xMVoaqgbHfjqkSqaQQK5A5eo_6w3uJULG</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2604083038</pqid></control><display><type>article</type><title>Cross-domain sentiment analysis model on Indonesian YouTube comment</title><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Aribowo, Agus Sasmito ; Basiron, Halizah ; Yusof, Noor Fazilla Abd ; Khomsah, Siti</creator><creatorcontrib>Aribowo, Agus Sasmito ; Basiron, Halizah ; Yusof, Noor Fazilla Abd ; Khomsah, Siti</creatorcontrib><description>A cross-domain sentiment analysis (CDSA) study in the Indonesian language and tree-based ensemble machine learning is quite interesting. CDSA is useful to support the labeling process of cross-domain sentiment and reduce any dependence on the experts; however, the mechanism in the opinion unstructured by stop word, language expressions, and Indonesian slang words is unidentified yet. This study aimed to obtain the best model of CDSA for the opinion in Indonesia language that commonly is full of stop words and slang words in the Indonesian dialect. This study was purposely to observe the benefits of the stop words cleaning and slang words conversion in CDSA in the Indonesian language form. It was also to find out which machine learning method is suitable for this model. This study started by crawling five datasets of the comments on YouTube from 5 different domains. The dataset was copied into two groups: the dataset group without any process of stop word cleaning and slang word conversion and the dataset group to stop word cleaning and slang word conversion. CDSA model was built for each dataset group and then tested using two types of tree-based ensemble machine learning, i.e., Random Forest (RF) and Extra Tree (ET) classifier, and tested using three types of nonensemble machine learning, including Naive Bayes (NB), SVM, and Decision Tree (DT) as the comparison. Then, It can be suggested that the accuracy of CDSA in Indonesia Language increased if it still removed the stop words and converted the slang words. The best classifier model was built using tree-based ensemble machine learning, particularly ET, as in this study, the ET model could achieve the highest accuracy by 91.19%. This model is expected to be the CDSA technique alternative in the Indonesian language.</description><identifier>ISSN: 2442-6571</identifier><identifier>EISSN: 2442-6571</identifier><identifier>DOI: 10.26555/ijain.v7il.554</identifier><language>eng</language><publisher>Yogyakarta: Universitas Ahmad Dahlan</publisher><subject>Accuracy ; Annotations ; Classifiers ; Cleaning ; Conversion ; Data mining ; Datasets ; Decision trees ; Domains ; Language ; Machine learning ; Neural networks ; Sentiment analysis ; Slang ; Social networks ; Support vector machines</subject><ispartof>International journal of advances in intelligent informatics, 2021-03, Vol.7 (1), p.12-25</ispartof><rights>2021. This article is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,864,27924,27925</link.rule.ids></links><search><creatorcontrib>Aribowo, Agus Sasmito</creatorcontrib><creatorcontrib>Basiron, Halizah</creatorcontrib><creatorcontrib>Yusof, Noor Fazilla Abd</creatorcontrib><creatorcontrib>Khomsah, Siti</creatorcontrib><title>Cross-domain sentiment analysis model on Indonesian YouTube comment</title><title>International journal of advances in intelligent informatics</title><description>A cross-domain sentiment analysis (CDSA) study in the Indonesian language and tree-based ensemble machine learning is quite interesting. CDSA is useful to support the labeling process of cross-domain sentiment and reduce any dependence on the experts; however, the mechanism in the opinion unstructured by stop word, language expressions, and Indonesian slang words is unidentified yet. This study aimed to obtain the best model of CDSA for the opinion in Indonesia language that commonly is full of stop words and slang words in the Indonesian dialect. This study was purposely to observe the benefits of the stop words cleaning and slang words conversion in CDSA in the Indonesian language form. It was also to find out which machine learning method is suitable for this model. This study started by crawling five datasets of the comments on YouTube from 5 different domains. The dataset was copied into two groups: the dataset group without any process of stop word cleaning and slang word conversion and the dataset group to stop word cleaning and slang word conversion. CDSA model was built for each dataset group and then tested using two types of tree-based ensemble machine learning, i.e., Random Forest (RF) and Extra Tree (ET) classifier, and tested using three types of nonensemble machine learning, including Naive Bayes (NB), SVM, and Decision Tree (DT) as the comparison. Then, It can be suggested that the accuracy of CDSA in Indonesia Language increased if it still removed the stop words and converted the slang words. The best classifier model was built using tree-based ensemble machine learning, particularly ET, as in this study, the ET model could achieve the highest accuracy by 91.19%. This model is expected to be the CDSA technique alternative in the Indonesian language.</description><subject>Accuracy</subject><subject>Annotations</subject><subject>Classifiers</subject><subject>Cleaning</subject><subject>Conversion</subject><subject>Data mining</subject><subject>Datasets</subject><subject>Decision trees</subject><subject>Domains</subject><subject>Language</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Sentiment analysis</subject><subject>Slang</subject><subject>Social networks</subject><subject>Support vector machines</subject><issn>2442-6571</issn><issn>2442-6571</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjDEPgjAUhBujiUSZXZs4g68tBXai0Z3FiVSpSQn0KQ9M_Pdi4uDoct8N3x1jGwGxTLXWO9cY5-Nn5tpY62TGApkkMkp1JuY_fclCogYARC4zUCJgRdEjUVRjN-05WT-4bgpuvGlf5Ih3WNuWo-cnX6O35IznZxzL8WL5FbuPvGaLm2nJhl-u2PawL4tjdO_xMVoaqgbHfjqkSqaQQK5A5eo_6w3uJULG</recordid><startdate>20210301</startdate><enddate>20210301</enddate><creator>Aribowo, Agus Sasmito</creator><creator>Basiron, Halizah</creator><creator>Yusof, Noor Fazilla Abd</creator><creator>Khomsah, Siti</creator><general>Universitas Ahmad Dahlan</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BVBZV</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20210301</creationdate><title>Cross-domain sentiment analysis model on Indonesian YouTube comment</title><author>Aribowo, Agus Sasmito ; Basiron, Halizah ; Yusof, Noor Fazilla Abd ; Khomsah, Siti</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_26040830383</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Accuracy</topic><topic>Annotations</topic><topic>Classifiers</topic><topic>Cleaning</topic><topic>Conversion</topic><topic>Data mining</topic><topic>Datasets</topic><topic>Decision trees</topic><topic>Domains</topic><topic>Language</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Sentiment analysis</topic><topic>Slang</topic><topic>Social networks</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Aribowo, Agus Sasmito</creatorcontrib><creatorcontrib>Basiron, Halizah</creatorcontrib><creatorcontrib>Yusof, Noor Fazilla Abd</creatorcontrib><creatorcontrib>Khomsah, Siti</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>East &amp; South Asia Database</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><jtitle>International journal of advances in intelligent informatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Aribowo, Agus Sasmito</au><au>Basiron, Halizah</au><au>Yusof, Noor Fazilla Abd</au><au>Khomsah, Siti</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Cross-domain sentiment analysis model on Indonesian YouTube comment</atitle><jtitle>International journal of advances in intelligent informatics</jtitle><date>2021-03-01</date><risdate>2021</risdate><volume>7</volume><issue>1</issue><spage>12</spage><epage>25</epage><pages>12-25</pages><issn>2442-6571</issn><eissn>2442-6571</eissn><abstract>A cross-domain sentiment analysis (CDSA) study in the Indonesian language and tree-based ensemble machine learning is quite interesting. CDSA is useful to support the labeling process of cross-domain sentiment and reduce any dependence on the experts; however, the mechanism in the opinion unstructured by stop word, language expressions, and Indonesian slang words is unidentified yet. This study aimed to obtain the best model of CDSA for the opinion in Indonesia language that commonly is full of stop words and slang words in the Indonesian dialect. This study was purposely to observe the benefits of the stop words cleaning and slang words conversion in CDSA in the Indonesian language form. It was also to find out which machine learning method is suitable for this model. This study started by crawling five datasets of the comments on YouTube from 5 different domains. The dataset was copied into two groups: the dataset group without any process of stop word cleaning and slang word conversion and the dataset group to stop word cleaning and slang word conversion. CDSA model was built for each dataset group and then tested using two types of tree-based ensemble machine learning, i.e., Random Forest (RF) and Extra Tree (ET) classifier, and tested using three types of nonensemble machine learning, including Naive Bayes (NB), SVM, and Decision Tree (DT) as the comparison. Then, It can be suggested that the accuracy of CDSA in Indonesia Language increased if it still removed the stop words and converted the slang words. The best classifier model was built using tree-based ensemble machine learning, particularly ET, as in this study, the ET model could achieve the highest accuracy by 91.19%. This model is expected to be the CDSA technique alternative in the Indonesian language.</abstract><cop>Yogyakarta</cop><pub>Universitas Ahmad Dahlan</pub><doi>10.26555/ijain.v7il.554</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2442-6571
ispartof International journal of advances in intelligent informatics, 2021-03, Vol.7 (1), p.12-25
issn 2442-6571
2442-6571
language eng
recordid cdi_proquest_journals_2604083038
source DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals
subjects Accuracy
Annotations
Classifiers
Cleaning
Conversion
Data mining
Datasets
Decision trees
Domains
Language
Machine learning
Neural networks
Sentiment analysis
Slang
Social networks
Support vector machines
title Cross-domain sentiment analysis model on Indonesian YouTube comment
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T01%3A01%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Cross-domain%20sentiment%20analysis%20model%20on%20Indonesian%20YouTube%20comment&rft.jtitle=International%20journal%20of%20advances%20in%20intelligent%20informatics&rft.au=Aribowo,%20Agus%20Sasmito&rft.date=2021-03-01&rft.volume=7&rft.issue=1&rft.spage=12&rft.epage=25&rft.pages=12-25&rft.issn=2442-6571&rft.eissn=2442-6571&rft_id=info:doi/10.26555/ijain.v7il.554&rft_dat=%3Cproquest%3E2604083038%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2604083038&rft_id=info:pmid/&rfr_iscdi=true