An application for plagiarized source code detection based on a parse tree kernel

Program plagiarism detection is a task of detecting plagiarized code pairs among a set of source codes. In this paper, we propose a code plagiarism detection system that uses a parse tree kernel. Our parse tree kernel calculates a similarity value between two source codes in terms of their parse tre...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Engineering applications of artificial intelligence 2013-09, Vol.26 (8), p.1911-1918
Hauptverfasser: Son, Jeong-Woo, Noh, Tae-Gil, Song, Hyun-Je, Park, Seong-Bae
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1918
container_issue 8
container_start_page 1911
container_title Engineering applications of artificial intelligence
container_volume 26
creator Son, Jeong-Woo
Noh, Tae-Gil
Song, Hyun-Je
Park, Seong-Bae
description Program plagiarism detection is a task of detecting plagiarized code pairs among a set of source codes. In this paper, we propose a code plagiarism detection system that uses a parse tree kernel. Our parse tree kernel calculates a similarity value between two source codes in terms of their parse tree similarity. Since parse trees contain the essential syntactic structure of source codes, the system effectively handles structural information. The contributions of this paper are two-fold. First, we propose a parse tree kernel that is optimized for program source code. The evaluation shows that our system based on this kernel outperforms well-known baseline systems. Second, we collected a large number of real-world Java source codes from a university programming class. This test set was manually analyzed and tagged by two independent human annotators to mark plagiarized codes. It can be used to evaluate the performance of various detection systems in real-world environments. The experiments with the test set show that the performance of our plagiarism detection system reaches to 93% level of human annotators. [Display omitted] •Program plagiarism detection method that relies on parse tree similarities.•Parse trees are compared in a kernel space.•A new source code parse tree kernel is proposed for detection performance.•Evaluation with real-world data showed 0.93 F-1 score at max.
doi_str_mv 10.1016/j.engappai.2013.06.007
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1506394913</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0952197613001085</els_id><sourcerecordid>1506394913</sourcerecordid><originalsourceid>FETCH-LOGICAL-c345t-bfc119502924c223384f7e9f65f90b99a190d6bcd4406e884ceac654721dae083</originalsourceid><addsrcrecordid>eNqFkE9LxDAQxYMouK5-BcnRS-ukSdPm5rL4DxZE0HNI0-mStdvUpCvopzfr6tnTDMx7j3k_Qi4Z5AyYvN7kOKzNOBqXF8B4DjIHqI7IjNUVz2Ql1TGZgSqLjKlKnpKzGDcAwGshZ-R5MdDk7Z01k_MD7XygY2_WzgT3hS2NfhcsUutbpC1OaH9UjYnplhZDRxMi0ikg0jcMA_bn5KQzfcSL3zknr3e3L8uHbPV0_7hcrDLLRTllTWcZUyUUqhC2KHh6p6tQdbLsFDRKGaaglY1thQCJdS0sGitLURWsNQg1n5OrQ-4Y_PsO46S3LlrsezOg30XNSpBcCcV4ksqD1AYfY8BOj8FtTfjUDPSeod7oP4Z6z1CD1IlhMt4cjJiKfDgMOlqHg8XWhYRCt979F_ENFPt92A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1506394913</pqid></control><display><type>article</type><title>An application for plagiarized source code detection based on a parse tree kernel</title><source>Elsevier ScienceDirect Journals</source><creator>Son, Jeong-Woo ; Noh, Tae-Gil ; Song, Hyun-Je ; Park, Seong-Bae</creator><creatorcontrib>Son, Jeong-Woo ; Noh, Tae-Gil ; Song, Hyun-Je ; Park, Seong-Bae</creatorcontrib><description>Program plagiarism detection is a task of detecting plagiarized code pairs among a set of source codes. In this paper, we propose a code plagiarism detection system that uses a parse tree kernel. Our parse tree kernel calculates a similarity value between two source codes in terms of their parse tree similarity. Since parse trees contain the essential syntactic structure of source codes, the system effectively handles structural information. The contributions of this paper are two-fold. First, we propose a parse tree kernel that is optimized for program source code. The evaluation shows that our system based on this kernel outperforms well-known baseline systems. Second, we collected a large number of real-world Java source codes from a university programming class. This test set was manually analyzed and tagged by two independent human annotators to mark plagiarized codes. It can be used to evaluate the performance of various detection systems in real-world environments. The experiments with the test set show that the performance of our plagiarism detection system reaches to 93% level of human annotators. [Display omitted] •Program plagiarism detection method that relies on parse tree similarities.•Parse trees are compared in a kernel space.•A new source code parse tree kernel is proposed for detection performance.•Evaluation with real-world data showed 0.93 F-1 score at max.</description><identifier>ISSN: 0952-1976</identifier><identifier>EISSN: 1873-6769</identifier><identifier>DOI: 10.1016/j.engappai.2013.06.007</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><subject>Expert systems ; Parse tree kernel ; Plagiarism detection ; Software plagiarism ; Tree kernel</subject><ispartof>Engineering applications of artificial intelligence, 2013-09, Vol.26 (8), p.1911-1918</ispartof><rights>2013 Elsevier Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c345t-bfc119502924c223384f7e9f65f90b99a190d6bcd4406e884ceac654721dae083</citedby><cites>FETCH-LOGICAL-c345t-bfc119502924c223384f7e9f65f90b99a190d6bcd4406e884ceac654721dae083</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0952197613001085$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Son, Jeong-Woo</creatorcontrib><creatorcontrib>Noh, Tae-Gil</creatorcontrib><creatorcontrib>Song, Hyun-Je</creatorcontrib><creatorcontrib>Park, Seong-Bae</creatorcontrib><title>An application for plagiarized source code detection based on a parse tree kernel</title><title>Engineering applications of artificial intelligence</title><description>Program plagiarism detection is a task of detecting plagiarized code pairs among a set of source codes. In this paper, we propose a code plagiarism detection system that uses a parse tree kernel. Our parse tree kernel calculates a similarity value between two source codes in terms of their parse tree similarity. Since parse trees contain the essential syntactic structure of source codes, the system effectively handles structural information. The contributions of this paper are two-fold. First, we propose a parse tree kernel that is optimized for program source code. The evaluation shows that our system based on this kernel outperforms well-known baseline systems. Second, we collected a large number of real-world Java source codes from a university programming class. This test set was manually analyzed and tagged by two independent human annotators to mark plagiarized codes. It can be used to evaluate the performance of various detection systems in real-world environments. The experiments with the test set show that the performance of our plagiarism detection system reaches to 93% level of human annotators. [Display omitted] •Program plagiarism detection method that relies on parse tree similarities.•Parse trees are compared in a kernel space.•A new source code parse tree kernel is proposed for detection performance.•Evaluation with real-world data showed 0.93 F-1 score at max.</description><subject>Expert systems</subject><subject>Parse tree kernel</subject><subject>Plagiarism detection</subject><subject>Software plagiarism</subject><subject>Tree kernel</subject><issn>0952-1976</issn><issn>1873-6769</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><recordid>eNqFkE9LxDAQxYMouK5-BcnRS-ukSdPm5rL4DxZE0HNI0-mStdvUpCvopzfr6tnTDMx7j3k_Qi4Z5AyYvN7kOKzNOBqXF8B4DjIHqI7IjNUVz2Ql1TGZgSqLjKlKnpKzGDcAwGshZ-R5MdDk7Z01k_MD7XygY2_WzgT3hS2NfhcsUutbpC1OaH9UjYnplhZDRxMi0ikg0jcMA_bn5KQzfcSL3zknr3e3L8uHbPV0_7hcrDLLRTllTWcZUyUUqhC2KHh6p6tQdbLsFDRKGaaglY1thQCJdS0sGitLURWsNQg1n5OrQ-4Y_PsO46S3LlrsezOg30XNSpBcCcV4ksqD1AYfY8BOj8FtTfjUDPSeod7oP4Z6z1CD1IlhMt4cjJiKfDgMOlqHg8XWhYRCt979F_ENFPt92A</recordid><startdate>201309</startdate><enddate>201309</enddate><creator>Son, Jeong-Woo</creator><creator>Noh, Tae-Gil</creator><creator>Song, Hyun-Je</creator><creator>Park, Seong-Bae</creator><general>Elsevier Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7TB</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201309</creationdate><title>An application for plagiarized source code detection based on a parse tree kernel</title><author>Son, Jeong-Woo ; Noh, Tae-Gil ; Song, Hyun-Je ; Park, Seong-Bae</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c345t-bfc119502924c223384f7e9f65f90b99a190d6bcd4406e884ceac654721dae083</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Expert systems</topic><topic>Parse tree kernel</topic><topic>Plagiarism detection</topic><topic>Software plagiarism</topic><topic>Tree kernel</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Son, Jeong-Woo</creatorcontrib><creatorcontrib>Noh, Tae-Gil</creatorcontrib><creatorcontrib>Song, Hyun-Je</creatorcontrib><creatorcontrib>Park, Seong-Bae</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Engineering applications of artificial intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Son, Jeong-Woo</au><au>Noh, Tae-Gil</au><au>Song, Hyun-Je</au><au>Park, Seong-Bae</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An application for plagiarized source code detection based on a parse tree kernel</atitle><jtitle>Engineering applications of artificial intelligence</jtitle><date>2013-09</date><risdate>2013</risdate><volume>26</volume><issue>8</issue><spage>1911</spage><epage>1918</epage><pages>1911-1918</pages><issn>0952-1976</issn><eissn>1873-6769</eissn><abstract>Program plagiarism detection is a task of detecting plagiarized code pairs among a set of source codes. In this paper, we propose a code plagiarism detection system that uses a parse tree kernel. Our parse tree kernel calculates a similarity value between two source codes in terms of their parse tree similarity. Since parse trees contain the essential syntactic structure of source codes, the system effectively handles structural information. The contributions of this paper are two-fold. First, we propose a parse tree kernel that is optimized for program source code. The evaluation shows that our system based on this kernel outperforms well-known baseline systems. Second, we collected a large number of real-world Java source codes from a university programming class. This test set was manually analyzed and tagged by two independent human annotators to mark plagiarized codes. It can be used to evaluate the performance of various detection systems in real-world environments. The experiments with the test set show that the performance of our plagiarism detection system reaches to 93% level of human annotators. [Display omitted] •Program plagiarism detection method that relies on parse tree similarities.•Parse trees are compared in a kernel space.•A new source code parse tree kernel is proposed for detection performance.•Evaluation with real-world data showed 0.93 F-1 score at max.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/j.engappai.2013.06.007</doi><tpages>8</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0952-1976
ispartof Engineering applications of artificial intelligence, 2013-09, Vol.26 (8), p.1911-1918
issn 0952-1976
1873-6769
language eng
recordid cdi_proquest_miscellaneous_1506394913
source Elsevier ScienceDirect Journals
subjects Expert systems
Parse tree kernel
Plagiarism detection
Software plagiarism
Tree kernel
title An application for plagiarized source code detection based on a parse tree kernel
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T22%3A54%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20application%20for%20plagiarized%20source%20code%20detection%20based%20on%20a%20parse%20tree%20kernel&rft.jtitle=Engineering%20applications%20of%20artificial%20intelligence&rft.au=Son,%20Jeong-Woo&rft.date=2013-09&rft.volume=26&rft.issue=8&rft.spage=1911&rft.epage=1918&rft.pages=1911-1918&rft.issn=0952-1976&rft.eissn=1873-6769&rft_id=info:doi/10.1016/j.engappai.2013.06.007&rft_dat=%3Cproquest_cross%3E1506394913%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1506394913&rft_id=info:pmid/&rft_els_id=S0952197613001085&rfr_iscdi=true