Duplicate code section detection for source code

Techniques for duplicate code section detection for source code are described herein. An aspect includes receiving a plurality of input files corresponding to a software project comprising source code written in a computer programming language. Another aspect includes segmenting each of the pluralit...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Jochem, Jerome, Baudel, Thomas, Le Bars, Hervé
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Jochem, Jerome
Baudel, Thomas
Le Bars, Hervé
description Techniques for duplicate code section detection for source code are described herein. An aspect includes receiving a plurality of input files corresponding to a software project comprising source code written in a computer programming language. Another aspect includes segmenting each of the plurality of input files into a plurality of statements based on instruction boundaries corresponding to the computer programming language, wherein a respective statement start index is determined for each of the plurality of statements. Another aspect includes populating an enhanced generalized suffix array (eGSA) based on the determined statement start indices, wherein each statement start index corresponds to a respective suffix in a row in the eGSA, and wherein each row comprises a longest common prefix (LCP) field and a preceding statement value corresponding to the row's respective suffix. Another aspect includes identifying duplicate code sections in the plurality of input files based on the eGSA.
format Patent
fullrecord <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US10970066B1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US10970066B1</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US10970066B13</originalsourceid><addsrcrecordid>eNrjZDBwKS3IyUxOLElVSM5PSVUoTk0uyczPU0hJLYGy0vKLFIrzS4uSISp4GFjTEnOKU3mhNDeDoptriLOHbmpBfnxqcUFicmpeakl8aLChgaW5gYGZmZOhMTFqAF6EK2A</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Duplicate code section detection for source code</title><source>esp@cenet</source><creator>Jochem, Jerome ; Baudel, Thomas ; Le Bars, Hervé</creator><creatorcontrib>Jochem, Jerome ; Baudel, Thomas ; Le Bars, Hervé</creatorcontrib><description>Techniques for duplicate code section detection for source code are described herein. An aspect includes receiving a plurality of input files corresponding to a software project comprising source code written in a computer programming language. Another aspect includes segmenting each of the plurality of input files into a plurality of statements based on instruction boundaries corresponding to the computer programming language, wherein a respective statement start index is determined for each of the plurality of statements. Another aspect includes populating an enhanced generalized suffix array (eGSA) based on the determined statement start indices, wherein each statement start index corresponds to a respective suffix in a row in the eGSA, and wherein each row comprises a longest common prefix (LCP) field and a preceding statement value corresponding to the row's respective suffix. Another aspect includes identifying duplicate code sections in the plurality of input files based on the eGSA.</description><language>eng</language><subject>CALCULATING ; COMPUTING ; COUNTING ; ELECTRIC DIGITAL DATA PROCESSING ; PHYSICS</subject><creationdate>2021</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20210406&amp;DB=EPODOC&amp;CC=US&amp;NR=10970066B1$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25562,76317</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20210406&amp;DB=EPODOC&amp;CC=US&amp;NR=10970066B1$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Jochem, Jerome</creatorcontrib><creatorcontrib>Baudel, Thomas</creatorcontrib><creatorcontrib>Le Bars, Hervé</creatorcontrib><title>Duplicate code section detection for source code</title><description>Techniques for duplicate code section detection for source code are described herein. An aspect includes receiving a plurality of input files corresponding to a software project comprising source code written in a computer programming language. Another aspect includes segmenting each of the plurality of input files into a plurality of statements based on instruction boundaries corresponding to the computer programming language, wherein a respective statement start index is determined for each of the plurality of statements. Another aspect includes populating an enhanced generalized suffix array (eGSA) based on the determined statement start indices, wherein each statement start index corresponds to a respective suffix in a row in the eGSA, and wherein each row comprises a longest common prefix (LCP) field and a preceding statement value corresponding to the row's respective suffix. Another aspect includes identifying duplicate code sections in the plurality of input files based on the eGSA.</description><subject>CALCULATING</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>ELECTRIC DIGITAL DATA PROCESSING</subject><subject>PHYSICS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2021</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZDBwKS3IyUxOLElVSM5PSVUoTk0uyczPU0hJLYGy0vKLFIrzS4uSISp4GFjTEnOKU3mhNDeDoptriLOHbmpBfnxqcUFicmpeakl8aLChgaW5gYGZmZOhMTFqAF6EK2A</recordid><startdate>20210406</startdate><enddate>20210406</enddate><creator>Jochem, Jerome</creator><creator>Baudel, Thomas</creator><creator>Le Bars, Hervé</creator><scope>EVB</scope></search><sort><creationdate>20210406</creationdate><title>Duplicate code section detection for source code</title><author>Jochem, Jerome ; Baudel, Thomas ; Le Bars, Hervé</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US10970066B13</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2021</creationdate><topic>CALCULATING</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>ELECTRIC DIGITAL DATA PROCESSING</topic><topic>PHYSICS</topic><toplevel>online_resources</toplevel><creatorcontrib>Jochem, Jerome</creatorcontrib><creatorcontrib>Baudel, Thomas</creatorcontrib><creatorcontrib>Le Bars, Hervé</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jochem, Jerome</au><au>Baudel, Thomas</au><au>Le Bars, Hervé</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Duplicate code section detection for source code</title><date>2021-04-06</date><risdate>2021</risdate><abstract>Techniques for duplicate code section detection for source code are described herein. An aspect includes receiving a plurality of input files corresponding to a software project comprising source code written in a computer programming language. Another aspect includes segmenting each of the plurality of input files into a plurality of statements based on instruction boundaries corresponding to the computer programming language, wherein a respective statement start index is determined for each of the plurality of statements. Another aspect includes populating an enhanced generalized suffix array (eGSA) based on the determined statement start indices, wherein each statement start index corresponds to a respective suffix in a row in the eGSA, and wherein each row comprises a longest common prefix (LCP) field and a preceding statement value corresponding to the row's respective suffix. Another aspect includes identifying duplicate code sections in the plurality of input files based on the eGSA.</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language eng
recordid cdi_epo_espacenet_US10970066B1
source esp@cenet
subjects CALCULATING
COMPUTING
COUNTING
ELECTRIC DIGITAL DATA PROCESSING
PHYSICS
title Duplicate code section detection for source code
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T23%3A00%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=Jochem,%20Jerome&rft.date=2021-04-06&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS10970066B1%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true