SecuGuard: Leveraging pattern-exploiting training in language models for advanced software vulnerability detection

Identifying vulnerabilities within source code remains paramount in assuring software quality and security. This study introduces a refined semi-supervised learning methodology that capitalizes on pattern-exploiting training coupled with cloze-style interrogation techniques. The research strategy em...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International Journal of Mathematics and Computer in Engineering 2025-06, Vol.3 (1), p.47-56
Hauptverfasser:	Basharat, Mahmoud, Omar, Marwan
Format:	Artikel
Sprache:	eng
Schlagworte:	68T07 cloze-style questions Language models pattern-exploiting training RoBERTa software vulnerabilities vulnerability detection
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	56
container_issue	1
container_start_page	47
container_title	International Journal of Mathematics and Computer in Engineering
container_volume	3
creator	Basharat, Mahmoud Omar, Marwan
description	Identifying vulnerabilities within source code remains paramount in assuring software quality and security. This study introduces a refined semi-supervised learning methodology that capitalizes on pattern-exploiting training coupled with cloze-style interrogation techniques. The research strategy employed involves the training of a linguistic model on the Software Assurance Reference Dataset (SARD) and Devign datasets, which are replete with vulnerable code fragments. The training procedure entails obscuring specific segments of the code and subsequently prompting the model to ascertain the obfuscated tokens. Empirical analyses underscore the efficacy of our method in pinpointing vulnerabilities in source code, benefiting substantially from patterns discerned within the code fragments. This investigation underscores the potential of integrating pattern-exploiting training and cloze-based queries to enhance the precision of vulnerability detection within source code.
doi_str_mv	10.2478/ijmce-2025-0005
format	Article
fullrecord	<record><control><sourceid>walterdegruyter_cross</sourceid><recordid>TN_cdi_crossref_primary_10_2478_ijmce_2025_0005</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_2478_ijmce_2025_00053147</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1337-23142310dcf93a91565336a522367b9befdbe1505b36f616886f1d4215fcb6a83</originalsourceid><addsrcrecordid>eNp1kM1OwzAQhC0EElXpmatfINQ_sZPACVVQkCpxAM6RY68jV65TOU5L356EcuDCYbWjlWa08yF0S8kdy4ty6bY7DRkjTGSEEHGBZqwSMiuILC__6Gu06HvXEEHLvJAFnaH4DnpYDyqae7yBA0TVutDivUoJYsjga-87l6ZTisqFSbiAvQrtoFrAu86A77HtIlbmoIIGg_vOpqOKgA-DD2Ng47xLJ2wggU6uCzfoyirfw-J3z9Hn89PH6iXbvK1fV4-bTFPOi4xxmo9DjLYVVxUVUnAulWCMy6KpGrCmASqIaLi0ksqylJaanFFhdSNVyedoec7Vsev7CLbeR7dT8VRTUk_U6h9q9UStnqiNjoez46j8WN9AG4fTKOptN8Qw_vqvk-YF_wavrncO</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>SecuGuard: Leveraging pattern-exploiting training in language models for advanced software vulnerability detection</title><source>Walter De Gruyter: Open Access Journals</source><creator>Basharat, Mahmoud ; Omar, Marwan</creator><creatorcontrib>Basharat, Mahmoud ; Omar, Marwan</creatorcontrib><description>Identifying vulnerabilities within source code remains paramount in assuring software quality and security. This study introduces a refined semi-supervised learning methodology that capitalizes on pattern-exploiting training coupled with cloze-style interrogation techniques. The research strategy employed involves the training of a linguistic model on the Software Assurance Reference Dataset (SARD) and Devign datasets, which are replete with vulnerable code fragments. The training procedure entails obscuring specific segments of the code and subsequently prompting the model to ascertain the obfuscated tokens. Empirical analyses underscore the efficacy of our method in pinpointing vulnerabilities in source code, benefiting substantially from patterns discerned within the code fragments. This investigation underscores the potential of integrating pattern-exploiting training and cloze-based queries to enhance the precision of vulnerability detection within source code.</description><identifier>ISSN: 2956-7068</identifier><identifier>EISSN: 2956-7068</identifier><identifier>DOI: 10.2478/ijmce-2025-0005</identifier><language>eng</language><publisher>Sciendo</publisher><subject>68T07 ; cloze-style questions ; Language models ; pattern-exploiting training ; RoBERTa ; software vulnerabilities ; vulnerability detection</subject><ispartof>International Journal of Mathematics and Computer in Engineering, 2025-06, Vol.3 (1), p.47-56</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c1337-23142310dcf93a91565336a522367b9befdbe1505b36f616886f1d4215fcb6a83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://sciendo.com/pdf/10.2478/ijmce-2025-0005$$EPDF$$P50$$Gwalterdegruyter$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://sciendo.com/article/10.2478/ijmce-2025-0005$$EHTML$$P50$$Gwalterdegruyter$$Hfree_for_read</linktohtml><link.rule.ids>314,778,782,27907,27908,75915,75916</link.rule.ids></links><search><creatorcontrib>Basharat, Mahmoud</creatorcontrib><creatorcontrib>Omar, Marwan</creatorcontrib><title>SecuGuard: Leveraging pattern-exploiting training in language models for advanced software vulnerability detection</title><title>International Journal of Mathematics and Computer in Engineering</title><description>Identifying vulnerabilities within source code remains paramount in assuring software quality and security. This study introduces a refined semi-supervised learning methodology that capitalizes on pattern-exploiting training coupled with cloze-style interrogation techniques. The research strategy employed involves the training of a linguistic model on the Software Assurance Reference Dataset (SARD) and Devign datasets, which are replete with vulnerable code fragments. The training procedure entails obscuring specific segments of the code and subsequently prompting the model to ascertain the obfuscated tokens. Empirical analyses underscore the efficacy of our method in pinpointing vulnerabilities in source code, benefiting substantially from patterns discerned within the code fragments. This investigation underscores the potential of integrating pattern-exploiting training and cloze-based queries to enhance the precision of vulnerability detection within source code.</description><subject>68T07</subject><subject>cloze-style questions</subject><subject>Language models</subject><subject>pattern-exploiting training</subject><subject>RoBERTa</subject><subject>software vulnerabilities</subject><subject>vulnerability detection</subject><issn>2956-7068</issn><issn>2956-7068</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><recordid>eNp1kM1OwzAQhC0EElXpmatfINQ_sZPACVVQkCpxAM6RY68jV65TOU5L356EcuDCYbWjlWa08yF0S8kdy4ty6bY7DRkjTGSEEHGBZqwSMiuILC__6Gu06HvXEEHLvJAFnaH4DnpYDyqae7yBA0TVutDivUoJYsjga-87l6ZTisqFSbiAvQrtoFrAu86A77HtIlbmoIIGg_vOpqOKgA-DD2Ng47xLJ2wggU6uCzfoyirfw-J3z9Hn89PH6iXbvK1fV4-bTFPOi4xxmo9DjLYVVxUVUnAulWCMy6KpGrCmASqIaLi0ksqylJaanFFhdSNVyedoec7Vsev7CLbeR7dT8VRTUk_U6h9q9UStnqiNjoez46j8WN9AG4fTKOptN8Qw_vqvk-YF_wavrncO</recordid><startdate>20250601</startdate><enddate>20250601</enddate><creator>Basharat, Mahmoud</creator><creator>Omar, Marwan</creator><general>Sciendo</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20250601</creationdate><title>SecuGuard: Leveraging pattern-exploiting training in language models for advanced software vulnerability detection</title><author>Basharat, Mahmoud ; Omar, Marwan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1337-23142310dcf93a91565336a522367b9befdbe1505b36f616886f1d4215fcb6a83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><topic>68T07</topic><topic>cloze-style questions</topic><topic>Language models</topic><topic>pattern-exploiting training</topic><topic>RoBERTa</topic><topic>software vulnerabilities</topic><topic>vulnerability detection</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Basharat, Mahmoud</creatorcontrib><creatorcontrib>Omar, Marwan</creatorcontrib><collection>CrossRef</collection><jtitle>International Journal of Mathematics and Computer in Engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Basharat, Mahmoud</au><au>Omar, Marwan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SecuGuard: Leveraging pattern-exploiting training in language models for advanced software vulnerability detection</atitle><jtitle>International Journal of Mathematics and Computer in Engineering</jtitle><date>2025-06-01</date><risdate>2025</risdate><volume>3</volume><issue>1</issue><spage>47</spage><epage>56</epage><pages>47-56</pages><issn>2956-7068</issn><eissn>2956-7068</eissn><abstract>Identifying vulnerabilities within source code remains paramount in assuring software quality and security. This study introduces a refined semi-supervised learning methodology that capitalizes on pattern-exploiting training coupled with cloze-style interrogation techniques. The research strategy employed involves the training of a linguistic model on the Software Assurance Reference Dataset (SARD) and Devign datasets, which are replete with vulnerable code fragments. The training procedure entails obscuring specific segments of the code and subsequently prompting the model to ascertain the obfuscated tokens. Empirical analyses underscore the efficacy of our method in pinpointing vulnerabilities in source code, benefiting substantially from patterns discerned within the code fragments. This investigation underscores the potential of integrating pattern-exploiting training and cloze-based queries to enhance the precision of vulnerability detection within source code.</abstract><pub>Sciendo</pub><doi>10.2478/ijmce-2025-0005</doi><tpages>10</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2956-7068
ispartof	International Journal of Mathematics and Computer in Engineering, 2025-06, Vol.3 (1), p.47-56
issn	2956-7068 2956-7068
language	eng
recordid	cdi_crossref_primary_10_2478_ijmce_2025_0005
source	Walter De Gruyter: Open Access Journals
subjects	68T07 cloze-style questions Language models pattern-exploiting training RoBERTa software vulnerabilities vulnerability detection
title	SecuGuard: Leveraging pattern-exploiting training in language models for advanced software vulnerability detection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T13%3A31%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-walterdegruyter_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SecuGuard:%20Leveraging%20pattern-exploiting%20training%20in%20language%20models%20for%20advanced%20software%20vulnerability%20detection&rft.jtitle=International%20Journal%20of%20Mathematics%20and%20Computer%20in%20Engineering&rft.au=Basharat,%20Mahmoud&rft.date=2025-06-01&rft.volume=3&rft.issue=1&rft.spage=47&rft.epage=56&rft.pages=47-56&rft.issn=2956-7068&rft.eissn=2956-7068&rft_id=info:doi/10.2478/ijmce-2025-0005&rft_dat=%3Cwalterdegruyter_cross%3E10_2478_ijmce_2025_00053147%3C/walterdegruyter_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true