Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions
In recent years, the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age. Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting. Howev...
Gespeichert in:
Veröffentlicht in: | Computers, materials & continua materials & continua, 2021-01, Vol.68 (1), p.521-535 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 535 |
---|---|
container_issue | 1 |
container_start_page | 521 |
container_title | Computers, materials & continua |
container_volume | 68 |
creator | Thanh Vo, Minh H. Vo, Anh Nguyen, Trang Sharma, Rohit Le, Tuong |
description | In recent years, the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age. Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting. However, the problem of detecting fake job descriptions comes up against the problem of class imbalance when the number of genuine jobs exceeds the number of fake jobs. This causes a reduction in the predictability and performance of traditional machine learning models. We therefore present an efficient framework that uses an oversampling technique called FJD-OT (Fake Job Description Detection Using Oversampling Techniques) to improve the predictability of detecting fake job descriptions. In the proposed framework, we apply several techniques including the removal of stop words and the use of a tokenizer to preprocess the text data in the first module. We then use a bag of words in combination with the term frequency-inverse document frequency (TF-IDF) approach to extract the features from the text data to create the feature dataset in the second module. Next, our framework applies k-fold cross-validation, a commonly used technique to test the effectiveness of machine learning models, that splits the experimental dataset [the Employment Scam Aegean (ESA) dataset in our study] into training and test sets for evaluation. The training set is passed through the third module, an oversampling module in which the SVMSMOTE method is used to balance data before training the classifiers in the last module. The experimental results indicate that the proposed approach significantly improves the predictability of fake job description detection on the ESA dataset based on several popular performance metrics. |
doi_str_mv | 10.32604/cmc.2021.015645 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2507804829</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2507804829</sourcerecordid><originalsourceid>FETCH-LOGICAL-c313t-12c29124699231a89c70c12e9a1fc78dee503b69a101d63bc8aec8cd3656d02e3</originalsourceid><addsrcrecordid>eNpNkDFPwzAQhS0EEqWwM1piTjnbiRuPqKVQVAmEYLacy4WmJHGxUyH-PWnLwHTv3j29kz7GrgVMlNSQ3mKLEwlSTEBkOs1O2EhkqU6klPr0nz5nFzFuAJRWBkbsdU6uqbsP_l33a96vic8aFyNftoVrXIfEX4IvGmp53R3Oc-oJ-9p33Fd84T6JP_licCOGerv34yU7q1wT6epvjtn74v5t9pisnh-Ws7tVgkqoPhESpREy1cZIJVxucAooJBknKpzmJVEGqtDDCqLUqsDcEeZYKp3pEiSpMbs59m6D_9pR7O3G70I3vLQyg2kOaS7NkIJjCoOPMVBlt6FuXfixAuyBnB3I2T05eySnfgGHCGAQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2507804829</pqid></control><display><type>article</type><title>Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Thanh Vo, Minh ; H. Vo, Anh ; Nguyen, Trang ; Sharma, Rohit ; Le, Tuong</creator><creatorcontrib>Thanh Vo, Minh ; H. Vo, Anh ; Nguyen, Trang ; Sharma, Rohit ; Le, Tuong</creatorcontrib><description>In recent years, the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age. Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting. However, the problem of detecting fake job descriptions comes up against the problem of class imbalance when the number of genuine jobs exceeds the number of fake jobs. This causes a reduction in the predictability and performance of traditional machine learning models. We therefore present an efficient framework that uses an oversampling technique called FJD-OT (Fake Job Description Detection Using Oversampling Techniques) to improve the predictability of detecting fake job descriptions. In the proposed framework, we apply several techniques including the removal of stop words and the use of a tokenizer to preprocess the text data in the first module. We then use a bag of words in combination with the term frequency-inverse document frequency (TF-IDF) approach to extract the features from the text data to create the feature dataset in the second module. Next, our framework applies k-fold cross-validation, a commonly used technique to test the effectiveness of machine learning models, that splits the experimental dataset [the Employment Scam Aegean (ESA) dataset in our study] into training and test sets for evaluation. The training set is passed through the third module, an oversampling module in which the SVMSMOTE method is used to balance data before training the classifiers in the last module. The experimental results indicate that the proposed approach significantly improves the predictability of fake job description detection on the ESA dataset based on several popular performance metrics.</description><identifier>ISSN: 1546-2226</identifier><identifier>ISSN: 1546-2218</identifier><identifier>EISSN: 1546-2226</identifier><identifier>DOI: 10.32604/cmc.2021.015645</identifier><language>eng</language><publisher>Henderson: Tech Science Press</publisher><subject>Datasets ; Descriptions ; Feature extraction ; Fraud ; Internet ; Job descriptions ; Job hunting ; Machine learning ; Modules ; Oversampling ; Performance measurement ; Training</subject><ispartof>Computers, materials & continua, 2021-01, Vol.68 (1), p.521-535</ispartof><rights>2021. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c313t-12c29124699231a89c70c12e9a1fc78dee503b69a101d63bc8aec8cd3656d02e3</citedby><cites>FETCH-LOGICAL-c313t-12c29124699231a89c70c12e9a1fc78dee503b69a101d63bc8aec8cd3656d02e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Thanh Vo, Minh</creatorcontrib><creatorcontrib>H. Vo, Anh</creatorcontrib><creatorcontrib>Nguyen, Trang</creatorcontrib><creatorcontrib>Sharma, Rohit</creatorcontrib><creatorcontrib>Le, Tuong</creatorcontrib><title>Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions</title><title>Computers, materials & continua</title><description>In recent years, the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age. Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting. However, the problem of detecting fake job descriptions comes up against the problem of class imbalance when the number of genuine jobs exceeds the number of fake jobs. This causes a reduction in the predictability and performance of traditional machine learning models. We therefore present an efficient framework that uses an oversampling technique called FJD-OT (Fake Job Description Detection Using Oversampling Techniques) to improve the predictability of detecting fake job descriptions. In the proposed framework, we apply several techniques including the removal of stop words and the use of a tokenizer to preprocess the text data in the first module. We then use a bag of words in combination with the term frequency-inverse document frequency (TF-IDF) approach to extract the features from the text data to create the feature dataset in the second module. Next, our framework applies k-fold cross-validation, a commonly used technique to test the effectiveness of machine learning models, that splits the experimental dataset [the Employment Scam Aegean (ESA) dataset in our study] into training and test sets for evaluation. The training set is passed through the third module, an oversampling module in which the SVMSMOTE method is used to balance data before training the classifiers in the last module. The experimental results indicate that the proposed approach significantly improves the predictability of fake job description detection on the ESA dataset based on several popular performance metrics.</description><subject>Datasets</subject><subject>Descriptions</subject><subject>Feature extraction</subject><subject>Fraud</subject><subject>Internet</subject><subject>Job descriptions</subject><subject>Job hunting</subject><subject>Machine learning</subject><subject>Modules</subject><subject>Oversampling</subject><subject>Performance measurement</subject><subject>Training</subject><issn>1546-2226</issn><issn>1546-2218</issn><issn>1546-2226</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpNkDFPwzAQhS0EEqWwM1piTjnbiRuPqKVQVAmEYLacy4WmJHGxUyH-PWnLwHTv3j29kz7GrgVMlNSQ3mKLEwlSTEBkOs1O2EhkqU6klPr0nz5nFzFuAJRWBkbsdU6uqbsP_l33a96vic8aFyNftoVrXIfEX4IvGmp53R3Oc-oJ-9p33Fd84T6JP_licCOGerv34yU7q1wT6epvjtn74v5t9pisnh-Ws7tVgkqoPhESpREy1cZIJVxucAooJBknKpzmJVEGqtDDCqLUqsDcEeZYKp3pEiSpMbs59m6D_9pR7O3G70I3vLQyg2kOaS7NkIJjCoOPMVBlt6FuXfixAuyBnB3I2T05eySnfgGHCGAQ</recordid><startdate>20210101</startdate><enddate>20210101</enddate><creator>Thanh Vo, Minh</creator><creator>H. Vo, Anh</creator><creator>Nguyen, Trang</creator><creator>Sharma, Rohit</creator><creator>Le, Tuong</creator><general>Tech Science Press</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope></search><sort><creationdate>20210101</creationdate><title>Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions</title><author>Thanh Vo, Minh ; H. Vo, Anh ; Nguyen, Trang ; Sharma, Rohit ; Le, Tuong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c313t-12c29124699231a89c70c12e9a1fc78dee503b69a101d63bc8aec8cd3656d02e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Datasets</topic><topic>Descriptions</topic><topic>Feature extraction</topic><topic>Fraud</topic><topic>Internet</topic><topic>Job descriptions</topic><topic>Job hunting</topic><topic>Machine learning</topic><topic>Modules</topic><topic>Oversampling</topic><topic>Performance measurement</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Thanh Vo, Minh</creatorcontrib><creatorcontrib>H. Vo, Anh</creatorcontrib><creatorcontrib>Nguyen, Trang</creatorcontrib><creatorcontrib>Sharma, Rohit</creatorcontrib><creatorcontrib>Le, Tuong</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Computers, materials & continua</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Thanh Vo, Minh</au><au>H. Vo, Anh</au><au>Nguyen, Trang</au><au>Sharma, Rohit</au><au>Le, Tuong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions</atitle><jtitle>Computers, materials & continua</jtitle><date>2021-01-01</date><risdate>2021</risdate><volume>68</volume><issue>1</issue><spage>521</spage><epage>535</epage><pages>521-535</pages><issn>1546-2226</issn><issn>1546-2218</issn><eissn>1546-2226</eissn><abstract>In recent years, the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age. Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting. However, the problem of detecting fake job descriptions comes up against the problem of class imbalance when the number of genuine jobs exceeds the number of fake jobs. This causes a reduction in the predictability and performance of traditional machine learning models. We therefore present an efficient framework that uses an oversampling technique called FJD-OT (Fake Job Description Detection Using Oversampling Techniques) to improve the predictability of detecting fake job descriptions. In the proposed framework, we apply several techniques including the removal of stop words and the use of a tokenizer to preprocess the text data in the first module. We then use a bag of words in combination with the term frequency-inverse document frequency (TF-IDF) approach to extract the features from the text data to create the feature dataset in the second module. Next, our framework applies k-fold cross-validation, a commonly used technique to test the effectiveness of machine learning models, that splits the experimental dataset [the Employment Scam Aegean (ESA) dataset in our study] into training and test sets for evaluation. The training set is passed through the third module, an oversampling module in which the SVMSMOTE method is used to balance data before training the classifiers in the last module. The experimental results indicate that the proposed approach significantly improves the predictability of fake job description detection on the ESA dataset based on several popular performance metrics.</abstract><cop>Henderson</cop><pub>Tech Science Press</pub><doi>10.32604/cmc.2021.015645</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1546-2226 |
ispartof | Computers, materials & continua, 2021-01, Vol.68 (1), p.521-535 |
issn | 1546-2226 1546-2218 1546-2226 |
language | eng |
recordid | cdi_proquest_journals_2507804829 |
source | Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals |
subjects | Datasets Descriptions Feature extraction Fraud Internet Job descriptions Job hunting Machine learning Modules Oversampling Performance measurement Training |
title | Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T10%3A14%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Dealing%20with%20the%20Class%20Imbalance%20Problem%20in%20the%20Detection%20of%20Fake%20Job%20Descriptions&rft.jtitle=Computers,%20materials%20&%20continua&rft.au=Thanh%20Vo,%20Minh&rft.date=2021-01-01&rft.volume=68&rft.issue=1&rft.spage=521&rft.epage=535&rft.pages=521-535&rft.issn=1546-2226&rft.eissn=1546-2226&rft_id=info:doi/10.32604/cmc.2021.015645&rft_dat=%3Cproquest_cross%3E2507804829%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2507804829&rft_id=info:pmid/&rfr_iscdi=true |