Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions

In recent years, the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age. Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting. Howev...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computers, materials & continua materials & continua, 2021-01, Vol.68 (1), p.521-535
Hauptverfasser:	Thanh Vo, Minh, H. Vo, Anh, Nguyen, Trang, Sharma, Rohit, Le, Tuong
Format:	Artikel
Sprache:	eng
Schlagworte:	Datasets Descriptions Feature extraction Fraud Internet Job descriptions Job hunting Machine learning Modules Oversampling Performance measurement Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	535
container_issue	1
container_start_page	521
container_title	Computers, materials & continua
container_volume	68
creator	Thanh Vo, Minh H. Vo, Anh Nguyen, Trang Sharma, Rohit Le, Tuong
description	In recent years, the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age. Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting. However, the problem of detecting fake job descriptions comes up against the problem of class imbalance when the number of genuine jobs exceeds the number of fake jobs. This causes a reduction in the predictability and performance of traditional machine learning models. We therefore present an efficient framework that uses an oversampling technique called FJD-OT (Fake Job Description Detection Using Oversampling Techniques) to improve the predictability of detecting fake job descriptions. In the proposed framework, we apply several techniques including the removal of stop words and the use of a tokenizer to preprocess the text data in the first module. We then use a bag of words in combination with the term frequency-inverse document frequency (TF-IDF) approach to extract the features from the text data to create the feature dataset in the second module. Next, our framework applies k-fold cross-validation, a commonly used technique to test the effectiveness of machine learning models, that splits the experimental dataset [the Employment Scam Aegean (ESA) dataset in our study] into training and test sets for evaluation. The training set is passed through the third module, an oversampling module in which the SVMSMOTE method is used to balance data before training the classifiers in the last module. The experimental results indicate that the proposed approach significantly improves the predictability of fake job description detection on the ESA dataset based on several popular performance metrics.
doi_str_mv	10.32604/cmc.2021.015645
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2507804829</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2507804829</sourcerecordid><originalsourceid>FETCH-LOGICAL-c313t-12c29124699231a89c70c12e9a1fc78dee503b69a101d63bc8aec8cd3656d02e3</originalsourceid><addsrcrecordid>eNpNkDFPwzAQhS0EEqWwM1piTjnbiRuPqKVQVAmEYLacy4WmJHGxUyH-PWnLwHTv3j29kz7GrgVMlNSQ3mKLEwlSTEBkOs1O2EhkqU6klPr0nz5nFzFuAJRWBkbsdU6uqbsP_l33a96vic8aFyNftoVrXIfEX4IvGmp53R3Oc-oJ-9p33Fd84T6JP_licCOGerv34yU7q1wT6epvjtn74v5t9pisnh-Ws7tVgkqoPhESpREy1cZIJVxucAooJBknKpzmJVEGqtDDCqLUqsDcEeZYKp3pEiSpMbs59m6D_9pR7O3G70I3vLQyg2kOaS7NkIJjCoOPMVBlt6FuXfixAuyBnB3I2T05eySnfgGHCGAQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2507804829</pqid></control><display><type>article</type><title>Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Thanh Vo, Minh ; H. Vo, Anh ; Nguyen, Trang ; Sharma, Rohit ; Le, Tuong</creator><creatorcontrib>Thanh Vo, Minh ; H. Vo, Anh ; Nguyen, Trang ; Sharma, Rohit ; Le, Tuong</creatorcontrib><description>In recent years, the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age. Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting. However, the problem of detecting fake job descriptions comes up against the problem of class imbalance when the number of genuine jobs exceeds the number of fake jobs. This causes a reduction in the predictability and performance of traditional machine learning models. We therefore present an efficient framework that uses an oversampling technique called FJD-OT (Fake Job Description Detection Using Oversampling Techniques) to improve the predictability of detecting fake job descriptions. In the proposed framework, we apply several techniques including the removal of stop words and the use of a tokenizer to preprocess the text data in the first module. We then use a bag of words in combination with the term frequency-inverse document frequency (TF-IDF) approach to extract the features from the text data to create the feature dataset in the second module. Next, our framework applies k-fold cross-validation, a commonly used technique to test the effectiveness of machine learning models, that splits the experimental dataset [the Employment Scam Aegean (ESA) dataset in our study] into training and test sets for evaluation. The training set is passed through the third module, an oversampling module in which the SVMSMOTE method is used to balance data before training the classifiers in the last module. The experimental results indicate that the proposed approach significantly improves the predictability of fake job description detection on the ESA dataset based on several popular performance metrics.</description><identifier>ISSN: 1546-2226</identifier><identifier>ISSN: 1546-2218</identifier><identifier>EISSN: 1546-2226</identifier><identifier>DOI: 10.32604/cmc.2021.015645</identifier><language>eng</language><publisher>Henderson: Tech Science Press</publisher><subject>Datasets ; Descriptions ; Feature extraction ; Fraud ; Internet ; Job descriptions ; Job hunting ; Machine learning ; Modules ; Oversampling ; Performance measurement ; Training</subject><ispartof>Computers, materials & continua, 2021-01, Vol.68 (1), p.521-535</ispartof><rights>2021. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c313t-12c29124699231a89c70c12e9a1fc78dee503b69a101d63bc8aec8cd3656d02e3</citedby><cites>FETCH-LOGICAL-c313t-12c29124699231a89c70c12e9a1fc78dee503b69a101d63bc8aec8cd3656d02e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Thanh Vo, Minh</creatorcontrib><creatorcontrib>H. Vo, Anh</creatorcontrib><creatorcontrib>Nguyen, Trang</creatorcontrib><creatorcontrib>Sharma, Rohit</creatorcontrib><creatorcontrib>Le, Tuong</creatorcontrib><title>Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions</title><title>Computers, materials & continua</title><description>In recent years, the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age. Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting. However, the problem of detecting fake job descriptions comes up against the problem of class imbalance when the number of genuine jobs exceeds the number of fake jobs. This causes a reduction in the predictability and performance of traditional machine learning models. We therefore present an efficient framework that uses an oversampling technique called FJD-OT (Fake Job Description Detection Using Oversampling Techniques) to improve the predictability of detecting fake job descriptions. In the proposed framework, we apply several techniques including the removal of stop words and the use of a tokenizer to preprocess the text data in the first module. We then use a bag of words in combination with the term frequency-inverse document frequency (TF-IDF) approach to extract the features from the text data to create the feature dataset in the second module. Next, our framework applies k-fold cross-validation, a commonly used technique to test the effectiveness of machine learning models, that splits the experimental dataset [the Employment Scam Aegean (ESA) dataset in our study] into training and test sets for evaluation. The training set is passed through the third module, an oversampling module in which the SVMSMOTE method is used to balance data before training the classifiers in the last module. The experimental results indicate that the proposed approach significantly improves the predictability of fake job description detection on the ESA dataset based on several popular performance metrics.</description><subject>Datasets</subject><subject>Descriptions</subject><subject>Feature extraction</subject><subject>Fraud</subject><subject>Internet</subject><subject>Job descriptions</subject><subject>Job hunting</subject><subject>Machine learning</subject><subject>Modules</subject><subject>Oversampling</subject><subject>Performance measurement</subject><subject>Training</subject><issn>1546-2226</issn><issn>1546-2218</issn><issn>1546-2226</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpNkDFPwzAQhS0EEqWwM1piTjnbiRuPqKVQVAmEYLacy4WmJHGxUyH-PWnLwHTv3j29kz7GrgVMlNSQ3mKLEwlSTEBkOs1O2EhkqU6klPr0nz5nFzFuAJRWBkbsdU6uqbsP_l33a96vic8aFyNftoVrXIfEX4IvGmp53R3Oc-oJ-9p33Fd84T6JP_licCOGerv34yU7q1wT6epvjtn74v5t9pisnh-Ws7tVgkqoPhESpREy1cZIJVxucAooJBknKpzmJVEGqtDDCqLUqsDcEeZYKp3pEiSpMbs59m6D_9pR7O3G70I3vLQyg2kOaS7NkIJjCoOPMVBlt6FuXfixAuyBnB3I2T05eySnfgGHCGAQ</recordid><startdate>20210101</startdate><enddate>20210101</enddate><creator>Thanh Vo, Minh</creator><creator>H. Vo, Anh</creator><creator>Nguyen, Trang</creator><creator>Sharma, Rohit</creator><creator>Le, Tuong</creator><general>Tech Science Press</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope></search><sort><creationdate>20210101</creationdate><title>Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions</title><author>Thanh Vo, Minh ; H. Vo, Anh ; Nguyen, Trang ; Sharma, Rohit ; Le, Tuong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c313t-12c29124699231a89c70c12e9a1fc78dee503b69a101d63bc8aec8cd3656d02e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Datasets</topic><topic>Descriptions</topic><topic>Feature extraction</topic><topic>Fraud</topic><topic>Internet</topic><topic>Job descriptions</topic><topic>Job hunting</topic><topic>Machine learning</topic><topic>Modules</topic><topic>Oversampling</topic><topic>Performance measurement</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Thanh Vo, Minh</creatorcontrib><creatorcontrib>H. Vo, Anh</creatorcontrib><creatorcontrib>Nguyen, Trang</creatorcontrib><creatorcontrib>Sharma, Rohit</creatorcontrib><creatorcontrib>Le, Tuong</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Computers, materials & continua</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Thanh Vo, Minh</au><au>H. Vo, Anh</au><au>Nguyen, Trang</au><au>Sharma, Rohit</au><au>Le, Tuong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions</atitle><jtitle>Computers, materials & continua</jtitle><date>2021-01-01</date><risdate>2021</risdate><volume>68</volume><issue>1</issue><spage>521</spage><epage>535</epage><pages>521-535</pages><issn>1546-2226</issn><issn>1546-2218</issn><eissn>1546-2226</eissn><abstract>In recent years, the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age. Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting. However, the problem of detecting fake job descriptions comes up against the problem of class imbalance when the number of genuine jobs exceeds the number of fake jobs. This causes a reduction in the predictability and performance of traditional machine learning models. We therefore present an efficient framework that uses an oversampling technique called FJD-OT (Fake Job Description Detection Using Oversampling Techniques) to improve the predictability of detecting fake job descriptions. In the proposed framework, we apply several techniques including the removal of stop words and the use of a tokenizer to preprocess the text data in the first module. We then use a bag of words in combination with the term frequency-inverse document frequency (TF-IDF) approach to extract the features from the text data to create the feature dataset in the second module. Next, our framework applies k-fold cross-validation, a commonly used technique to test the effectiveness of machine learning models, that splits the experimental dataset [the Employment Scam Aegean (ESA) dataset in our study] into training and test sets for evaluation. The training set is passed through the third module, an oversampling module in which the SVMSMOTE method is used to balance data before training the classifiers in the last module. The experimental results indicate that the proposed approach significantly improves the predictability of fake job description detection on the ESA dataset based on several popular performance metrics.</abstract><cop>Henderson</cop><pub>Tech Science Press</pub><doi>10.32604/cmc.2021.015645</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1546-2226
ispartof	Computers, materials & continua, 2021-01, Vol.68 (1), p.521-535
issn	1546-2226 1546-2218 1546-2226
language	eng
recordid	cdi_proquest_journals_2507804829
source	Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	Datasets Descriptions Feature extraction Fraud Internet Job descriptions Job hunting Machine learning Modules Oversampling Performance measurement Training
title	Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T10%3A14%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Dealing%20with%20the%20Class%20Imbalance%20Problem%20in%20the%20Detection%20of%20Fake%20Job%20Descriptions&rft.jtitle=Computers,%20materials%20&%20continua&rft.au=Thanh%20Vo,%20Minh&rft.date=2021-01-01&rft.volume=68&rft.issue=1&rft.spage=521&rft.epage=535&rft.pages=521-535&rft.issn=1546-2226&rft.eissn=1546-2226&rft_id=info:doi/10.32604/cmc.2021.015645&rft_dat=%3Cproquest_cross%3E2507804829%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2507804829&rft_id=info:pmid/&rfr_iscdi=true