Idea plagiarism detection with recurrent neural networks and vector space model

PurposeNatural languages have a fundamental quality of suppleness that makes it possible to present a single idea in plenty of different ways. This feature is often exploited in the academic world, leading to the theft of work referred to as plagiarism. Many approaches have been put forward to detec...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of intelligent computing and cybernetics 2021-07, Vol.14 (3), p.321-332
Hauptverfasser: Nazir, Azra, Mir, Roohie Naaz, Qureshi, Shaima
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 332
container_issue 3
container_start_page 321
container_title International journal of intelligent computing and cybernetics
container_volume 14
creator Nazir, Azra
Mir, Roohie Naaz
Qureshi, Shaima
description PurposeNatural languages have a fundamental quality of suppleness that makes it possible to present a single idea in plenty of different ways. This feature is often exploited in the academic world, leading to the theft of work referred to as plagiarism. Many approaches have been put forward to detect such cases based on various text features and grammatical structures of languages. However, there is a huge scope of improvement for detecting intelligent plagiarism.Design/methodology/approachTo realize this, the paper introduces a hybrid model to detect intelligent plagiarism by breaking the entire process into three stages: (1) clustering, (2) vector formulation in each cluster based on semantic roles, normalization and similarity index calculation and (3) Summary generation using encoder-decoder. An effective weighing scheme has been introduced to select terms used to build vectors based on K-means, which is calculated on the synonym set for the said term. If the value calculated in the last stage lies above a predefined threshold, only then the next semantic argument is analyzed. When the similarity score for two documents is beyond the threshold, a short summary for plagiarized documents is created.FindingsExperimental results show that this method is able to detect connotation and concealment used in idea plagiarism besides detecting literal plagiarism.Originality/valueThe proposed model can help academics stay updated by providing summaries of relevant articles. It would eliminate the practice of plagiarism infesting the academic community at an unprecedented pace. The model will also accelerate the process of reviewing academic documents, aiding in the speedy publishing of research articles.
doi_str_mv 10.1108/IJICC-11-2020-0178
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2551430127</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2551430127</sourcerecordid><originalsourceid>FETCH-LOGICAL-c317t-4bb5d2991def0f3277208dde8f9ada32c5587650e3c8939c67a3d5527fbe59043</originalsourceid><addsrcrecordid>eNptkEtLxDAUhYMoOI7-AVcB19U8Jk2ylOKjMjAbBXchk9xqx7apSevgv7fjiCC4OmdxzrncD6FzSi4pJeqqfCiLIqM0Y4SRjFCpDtCMSpFnXGp1-OvV8zE6SWlDSK6E4jO0Kj1Y3Df2pbaxTi32MIAb6tDhbT284ghujBG6AXcwRttMMmxDfEvYdh5_TNEQceqtA9wGD80pOqpsk-DsR-fo6fbmsbjPlqu7srheZo5TOWSL9Vp4pjX1UJGKMykZUd6DqrT1ljMnhJK5IMCd0ly7XFruhWCyWoPQZMHn6GK_28fwPkIazCaMsZtOGiYEXXBCmZxSbJ9yMaQUoTJ9rFsbPw0lZgfOfIObrNmBMztwU4nuS9DC9LH_v_MHNv8CoaZv5g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2551430127</pqid></control><display><type>article</type><title>Idea plagiarism detection with recurrent neural networks and vector space model</title><source>Standard: Emerald eJournal Premier Collection</source><source>Emerald A-Z Current Journals</source><creator>Nazir, Azra ; Mir, Roohie Naaz ; Qureshi, Shaima</creator><creatorcontrib>Nazir, Azra ; Mir, Roohie Naaz ; Qureshi, Shaima</creatorcontrib><description>PurposeNatural languages have a fundamental quality of suppleness that makes it possible to present a single idea in plenty of different ways. This feature is often exploited in the academic world, leading to the theft of work referred to as plagiarism. Many approaches have been put forward to detect such cases based on various text features and grammatical structures of languages. However, there is a huge scope of improvement for detecting intelligent plagiarism.Design/methodology/approachTo realize this, the paper introduces a hybrid model to detect intelligent plagiarism by breaking the entire process into three stages: (1) clustering, (2) vector formulation in each cluster based on semantic roles, normalization and similarity index calculation and (3) Summary generation using encoder-decoder. An effective weighing scheme has been introduced to select terms used to build vectors based on K-means, which is calculated on the synonym set for the said term. If the value calculated in the last stage lies above a predefined threshold, only then the next semantic argument is analyzed. When the similarity score for two documents is beyond the threshold, a short summary for plagiarized documents is created.FindingsExperimental results show that this method is able to detect connotation and concealment used in idea plagiarism besides detecting literal plagiarism.Originality/valueThe proposed model can help academics stay updated by providing summaries of relevant articles. It would eliminate the practice of plagiarism infesting the academic community at an unprecedented pace. The model will also accelerate the process of reviewing academic documents, aiding in the speedy publishing of research articles.</description><identifier>ISSN: 1756-378X</identifier><identifier>EISSN: 1756-3798</identifier><identifier>DOI: 10.1108/IJICC-11-2020-0178</identifier><language>eng</language><publisher>Bingley: Emerald Publishing Limited</publisher><subject>Algorithms ; Clustering ; Coders ; Deep learning ; Documents ; Encoders-Decoders ; Hybrid systems ; Labeling ; Languages ; Machine learning ; Neural networks ; Plagiarism ; Recurrent neural networks ; Semantics ; Similarity ; Theft ; Vector space</subject><ispartof>International journal of intelligent computing and cybernetics, 2021-07, Vol.14 (3), p.321-332</ispartof><rights>Emerald Publishing Limited</rights><rights>Emerald Publishing Limited 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c317t-4bb5d2991def0f3277208dde8f9ada32c5587650e3c8939c67a3d5527fbe59043</citedby><cites>FETCH-LOGICAL-c317t-4bb5d2991def0f3277208dde8f9ada32c5587650e3c8939c67a3d5527fbe59043</cites><orcidid>0000-0002-6267-8111</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.emerald.com/insight/content/doi/10.1108/IJICC-11-2020-0178/full/html$$EHTML$$P50$$Gemerald$$H</linktohtml><link.rule.ids>314,780,784,967,11635,21695,27924,27925,52689,53244</link.rule.ids></links><search><creatorcontrib>Nazir, Azra</creatorcontrib><creatorcontrib>Mir, Roohie Naaz</creatorcontrib><creatorcontrib>Qureshi, Shaima</creatorcontrib><title>Idea plagiarism detection with recurrent neural networks and vector space model</title><title>International journal of intelligent computing and cybernetics</title><description>PurposeNatural languages have a fundamental quality of suppleness that makes it possible to present a single idea in plenty of different ways. This feature is often exploited in the academic world, leading to the theft of work referred to as plagiarism. Many approaches have been put forward to detect such cases based on various text features and grammatical structures of languages. However, there is a huge scope of improvement for detecting intelligent plagiarism.Design/methodology/approachTo realize this, the paper introduces a hybrid model to detect intelligent plagiarism by breaking the entire process into three stages: (1) clustering, (2) vector formulation in each cluster based on semantic roles, normalization and similarity index calculation and (3) Summary generation using encoder-decoder. An effective weighing scheme has been introduced to select terms used to build vectors based on K-means, which is calculated on the synonym set for the said term. If the value calculated in the last stage lies above a predefined threshold, only then the next semantic argument is analyzed. When the similarity score for two documents is beyond the threshold, a short summary for plagiarized documents is created.FindingsExperimental results show that this method is able to detect connotation and concealment used in idea plagiarism besides detecting literal plagiarism.Originality/valueThe proposed model can help academics stay updated by providing summaries of relevant articles. It would eliminate the practice of plagiarism infesting the academic community at an unprecedented pace. The model will also accelerate the process of reviewing academic documents, aiding in the speedy publishing of research articles.</description><subject>Algorithms</subject><subject>Clustering</subject><subject>Coders</subject><subject>Deep learning</subject><subject>Documents</subject><subject>Encoders-Decoders</subject><subject>Hybrid systems</subject><subject>Labeling</subject><subject>Languages</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Plagiarism</subject><subject>Recurrent neural networks</subject><subject>Semantics</subject><subject>Similarity</subject><subject>Theft</subject><subject>Vector space</subject><issn>1756-378X</issn><issn>1756-3798</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNptkEtLxDAUhYMoOI7-AVcB19U8Jk2ylOKjMjAbBXchk9xqx7apSevgv7fjiCC4OmdxzrncD6FzSi4pJeqqfCiLIqM0Y4SRjFCpDtCMSpFnXGp1-OvV8zE6SWlDSK6E4jO0Kj1Y3Df2pbaxTi32MIAb6tDhbT284ghujBG6AXcwRttMMmxDfEvYdh5_TNEQceqtA9wGD80pOqpsk-DsR-fo6fbmsbjPlqu7srheZo5TOWSL9Vp4pjX1UJGKMykZUd6DqrT1ljMnhJK5IMCd0ly7XFruhWCyWoPQZMHn6GK_28fwPkIazCaMsZtOGiYEXXBCmZxSbJ9yMaQUoTJ9rFsbPw0lZgfOfIObrNmBMztwU4nuS9DC9LH_v_MHNv8CoaZv5g</recordid><startdate>20210715</startdate><enddate>20210715</enddate><creator>Nazir, Azra</creator><creator>Mir, Roohie Naaz</creator><creator>Qureshi, Shaima</creator><general>Emerald Publishing Limited</general><general>Emerald Group Publishing Limited</general><scope>AAYXX</scope><scope>CITATION</scope><scope>0U~</scope><scope>1-H</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L.0</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M2P</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PYYUZ</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-6267-8111</orcidid></search><sort><creationdate>20210715</creationdate><title>Idea plagiarism detection with recurrent neural networks and vector space model</title><author>Nazir, Azra ; Mir, Roohie Naaz ; Qureshi, Shaima</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c317t-4bb5d2991def0f3277208dde8f9ada32c5587650e3c8939c67a3d5527fbe59043</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Clustering</topic><topic>Coders</topic><topic>Deep learning</topic><topic>Documents</topic><topic>Encoders-Decoders</topic><topic>Hybrid systems</topic><topic>Labeling</topic><topic>Languages</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Plagiarism</topic><topic>Recurrent neural networks</topic><topic>Semantics</topic><topic>Similarity</topic><topic>Theft</topic><topic>Vector space</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Nazir, Azra</creatorcontrib><creatorcontrib>Mir, Roohie Naaz</creatorcontrib><creatorcontrib>Qureshi, Shaima</creatorcontrib><collection>CrossRef</collection><collection>Global News &amp; ABI/Inform Professional</collection><collection>Trade PRO</collection><collection>Computer and Information Systems Abstracts</collection><collection>Access via ABI/INFORM (ProQuest)</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ABI/INFORM Professional Standard</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>One Business (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ABI/INFORM Collection China</collection><collection>ProQuest Central Basic</collection><jtitle>International journal of intelligent computing and cybernetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nazir, Azra</au><au>Mir, Roohie Naaz</au><au>Qureshi, Shaima</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Idea plagiarism detection with recurrent neural networks and vector space model</atitle><jtitle>International journal of intelligent computing and cybernetics</jtitle><date>2021-07-15</date><risdate>2021</risdate><volume>14</volume><issue>3</issue><spage>321</spage><epage>332</epage><pages>321-332</pages><issn>1756-378X</issn><eissn>1756-3798</eissn><abstract>PurposeNatural languages have a fundamental quality of suppleness that makes it possible to present a single idea in plenty of different ways. This feature is often exploited in the academic world, leading to the theft of work referred to as plagiarism. Many approaches have been put forward to detect such cases based on various text features and grammatical structures of languages. However, there is a huge scope of improvement for detecting intelligent plagiarism.Design/methodology/approachTo realize this, the paper introduces a hybrid model to detect intelligent plagiarism by breaking the entire process into three stages: (1) clustering, (2) vector formulation in each cluster based on semantic roles, normalization and similarity index calculation and (3) Summary generation using encoder-decoder. An effective weighing scheme has been introduced to select terms used to build vectors based on K-means, which is calculated on the synonym set for the said term. If the value calculated in the last stage lies above a predefined threshold, only then the next semantic argument is analyzed. When the similarity score for two documents is beyond the threshold, a short summary for plagiarized documents is created.FindingsExperimental results show that this method is able to detect connotation and concealment used in idea plagiarism besides detecting literal plagiarism.Originality/valueThe proposed model can help academics stay updated by providing summaries of relevant articles. It would eliminate the practice of plagiarism infesting the academic community at an unprecedented pace. The model will also accelerate the process of reviewing academic documents, aiding in the speedy publishing of research articles.</abstract><cop>Bingley</cop><pub>Emerald Publishing Limited</pub><doi>10.1108/IJICC-11-2020-0178</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-6267-8111</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1756-378X
ispartof International journal of intelligent computing and cybernetics, 2021-07, Vol.14 (3), p.321-332
issn 1756-378X
1756-3798
language eng
recordid cdi_proquest_journals_2551430127
source Standard: Emerald eJournal Premier Collection; Emerald A-Z Current Journals
subjects Algorithms
Clustering
Coders
Deep learning
Documents
Encoders-Decoders
Hybrid systems
Labeling
Languages
Machine learning
Neural networks
Plagiarism
Recurrent neural networks
Semantics
Similarity
Theft
Vector space
title Idea plagiarism detection with recurrent neural networks and vector space model
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T15%3A51%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Idea%20plagiarism%20detection%20with%20recurrent%20neural%20networks%20and%20vector%20space%20model&rft.jtitle=International%20journal%20of%20intelligent%20computing%20and%20cybernetics&rft.au=Nazir,%20Azra&rft.date=2021-07-15&rft.volume=14&rft.issue=3&rft.spage=321&rft.epage=332&rft.pages=321-332&rft.issn=1756-378X&rft.eissn=1756-3798&rft_id=info:doi/10.1108/IJICC-11-2020-0178&rft_dat=%3Cproquest_cross%3E2551430127%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2551430127&rft_id=info:pmid/&rfr_iscdi=true