Duplicate code section detection for source code

Techniques for duplicate code section detection for source code are described herein. An aspect includes receiving a plurality of input files corresponding to a software project comprising source code written in a computer programming language. Another aspect includes segmenting each of the pluralit...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Jochem, Jerome, Baudel, Thomas, Le Bars, Hervé
Format:	Patent
Sprache:	eng
Schlagworte:	CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Jochem, Jerome Baudel, Thomas Le Bars, Hervé
description	Techniques for duplicate code section detection for source code are described herein. An aspect includes receiving a plurality of input files corresponding to a software project comprising source code written in a computer programming language. Another aspect includes segmenting each of the plurality of input files into a plurality of statements based on instruction boundaries corresponding to the computer programming language, wherein a respective statement start index is determined for each of the plurality of statements. Another aspect includes populating an enhanced generalized suffix array (eGSA) based on the determined statement start indices, wherein each statement start index corresponds to a respective suffix in a row in the eGSA, and wherein each row comprises a longest common prefix (LCP) field and a preceding statement value corresponding to the row's respective suffix. Another aspect includes identifying duplicate code sections in the plurality of input files based on the eGSA.
format	Patent
fullrecord	<record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US10970066B1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US10970066B1</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US10970066B13</originalsourceid><addsrcrecordid>eNrjZDBwKS3IyUxOLElVSM5PSVUoTk0uyczPU0hJLYGy0vKLFIrzS4uSISp4GFjTEnOKU3mhNDeDoptriLOHbmpBfnxqcUFicmpeakl8aLChgaW5gYGZmZOhMTFqAF6EK2A</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Duplicate code section detection for source code</title><source>esp@cenet</source><creator>Jochem, Jerome ; Baudel, Thomas ; Le Bars, Hervé</creator><creatorcontrib>Jochem, Jerome ; Baudel, Thomas ; Le Bars, Hervé</creatorcontrib><description>Techniques for duplicate code section detection for source code are described herein. An aspect includes receiving a plurality of input files corresponding to a software project comprising source code written in a computer programming language. Another aspect includes segmenting each of the plurality of input files into a plurality of statements based on instruction boundaries corresponding to the computer programming language, wherein a respective statement start index is determined for each of the plurality of statements. Another aspect includes populating an enhanced generalized suffix array (eGSA) based on the determined statement start indices, wherein each statement start index corresponds to a respective suffix in a row in the eGSA, and wherein each row comprises a longest common prefix (LCP) field and a preceding statement value corresponding to the row's respective suffix. Another aspect includes identifying duplicate code sections in the plurality of input files based on the eGSA.</description><language>eng</language><subject>CALCULATING ; COMPUTING ; COUNTING ; ELECTRIC DIGITAL DATA PROCESSING ; PHYSICS</subject><creationdate>2021</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20210406&DB=EPODOC&CC=US&NR=10970066B1$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25562,76317</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20210406&DB=EPODOC&CC=US&NR=10970066B1$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Jochem, Jerome</creatorcontrib><creatorcontrib>Baudel, Thomas</creatorcontrib><creatorcontrib>Le Bars, Hervé</creatorcontrib><title>Duplicate code section detection for source code</title><description>Techniques for duplicate code section detection for source code are described herein. An aspect includes receiving a plurality of input files corresponding to a software project comprising source code written in a computer programming language. Another aspect includes segmenting each of the plurality of input files into a plurality of statements based on instruction boundaries corresponding to the computer programming language, wherein a respective statement start index is determined for each of the plurality of statements. Another aspect includes populating an enhanced generalized suffix array (eGSA) based on the determined statement start indices, wherein each statement start index corresponds to a respective suffix in a row in the eGSA, and wherein each row comprises a longest common prefix (LCP) field and a preceding statement value corresponding to the row's respective suffix. Another aspect includes identifying duplicate code sections in the plurality of input files based on the eGSA.</description><subject>CALCULATING</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>ELECTRIC DIGITAL DATA PROCESSING</subject><subject>PHYSICS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2021</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZDBwKS3IyUxOLElVSM5PSVUoTk0uyczPU0hJLYGy0vKLFIrzS4uSISp4GFjTEnOKU3mhNDeDoptriLOHbmpBfnxqcUFicmpeakl8aLChgaW5gYGZmZOhMTFqAF6EK2A</recordid><startdate>20210406</startdate><enddate>20210406</enddate><creator>Jochem, Jerome</creator><creator>Baudel, Thomas</creator><creator>Le Bars, Hervé</creator><scope>EVB</scope></search><sort><creationdate>20210406</creationdate><title>Duplicate code section detection for source code</title><author>Jochem, Jerome ; Baudel, Thomas ; Le Bars, Hervé</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US10970066B13</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2021</creationdate><topic>CALCULATING</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>ELECTRIC DIGITAL DATA PROCESSING</topic><topic>PHYSICS</topic><toplevel>online_resources</toplevel><creatorcontrib>Jochem, Jerome</creatorcontrib><creatorcontrib>Baudel, Thomas</creatorcontrib><creatorcontrib>Le Bars, Hervé</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jochem, Jerome</au><au>Baudel, Thomas</au><au>Le Bars, Hervé</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Duplicate code section detection for source code</title><date>2021-04-06</date><risdate>2021</risdate><abstract>Techniques for duplicate code section detection for source code are described herein. An aspect includes receiving a plurality of input files corresponding to a software project comprising source code written in a computer programming language. Another aspect includes segmenting each of the plurality of input files into a plurality of statements based on instruction boundaries corresponding to the computer programming language, wherein a respective statement start index is determined for each of the plurality of statements. Another aspect includes populating an enhanced generalized suffix array (eGSA) based on the determined statement start indices, wherein each statement start index corresponds to a respective suffix in a row in the eGSA, and wherein each row comprises a longest common prefix (LCP) field and a preceding statement value corresponding to the row's respective suffix. Another aspect includes identifying duplicate code sections in the plurality of input files based on the eGSA.</abstract><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	eng
recordid	cdi_epo_espacenet_US10970066B1
source	esp@cenet
subjects	CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
title	Duplicate code section detection for source code
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T23%3A00%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=Jochem,%20Jerome&rft.date=2021-04-06&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS10970066B1%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true