Type-migrating C-to-Rust translation using a large language model

Rust, a modern system programming language, introduces new types that prevent memory bugs and data races. This makes translating legacy system programs from C to Rust a promising approach to enhance their reliability. Since manual code translation is time-consuming, it is desirable to automate the t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Empirical software engineering : an international journal 2025-02, Vol.30 (1), p.3, Article 3
Hauptverfasser: Hong, Jaemin, Ryu, Sukyoung
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 1
container_start_page 3
container_title Empirical software engineering : an international journal
container_volume 30
creator Hong, Jaemin
Ryu, Sukyoung
description Rust, a modern system programming language, introduces new types that prevent memory bugs and data races. This makes translating legacy system programs from C to Rust a promising approach to enhance their reliability. Since manual code translation is time-consuming, it is desirable to automate the translation. To yield satisfactory results, the translator should have the ability to perform type migration , i.e., removing C types and introducing Rust types in the code. In this work, we aim to automatically port an entire C program to Rust by translating each C function to a Rust function with a signature containing proper Rust types through type migration. This goal is challenging because (1) type migration cannot be achieved through syntactic mappings between type names, and (2) after type migration, function bodies should be correctly restructured based on the precise understanding of the functions’ behavior. To address these difficulties, we leverage large language models (LLMs), which possess knowledge of program semantics and programming idioms. However, naïvely instructing LLMs to translate each function produces unsatisfactory Rust code, containing unmigrated or improperly migrated types and a huge number of type errors. To resolve these issues, we propose three techniques: (1) generating candidate signatures, (2) providing translated callees’ signatures to LLMs, and (3) iteratively fixing type errors using compiler feedback. Our evaluation shows that the proposed approach yields a 63.5% increase in migrated types and a 71.5% decrease in type errors compared to the baseline (the naïve LLM-based translation) with modest performance overhead.
doi_str_mv 10.1007/s10664-024-10573-2
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3152309671</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3117783207</sourcerecordid><originalsourceid>FETCH-LOGICAL-c228t-250d4e1a9be99ae819f380c5920eaf532c67502414eb7b65d0b5c896aeb29d023</originalsourceid><addsrcrecordid>eNp9UE1LxDAQDaLguvoHPBU8RyeTJmmOS_ELFgRZzyFt07JLt61Je9h_b2oFb3uZecy8efN4hNwzeGQA6ikwkDKlgCllIBSneEFWbAZKMnkZMc-QchTymtyEcAAArVKxIpvdaXD0uG-8Hfddk-R07OnnFMZk9LYLbZz2XTKFeWeT1vrGxdo1k43g2FeuvSVXtW2Du_vra_L18rzL3-j24_U932xpiZiNFAVUqWNWF05r6zKma55BKTSCs7XgWEolon-WukIVUlRQiDLT0roCdQXI1-Rh0R18_z25MJpDP_kuvjScCeSgpWLnWUypjCOoyMKFVfo-BO9qM_j90fqTYWDmQM0SqImGzG-gZjbAl6MQyV3j_L_0masf5p92iA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3117783207</pqid></control><display><type>article</type><title>Type-migrating C-to-Rust translation using a large language model</title><source>SpringerLink (Online service)</source><creator>Hong, Jaemin ; Ryu, Sukyoung</creator><creatorcontrib>Hong, Jaemin ; Ryu, Sukyoung</creatorcontrib><description>Rust, a modern system programming language, introduces new types that prevent memory bugs and data races. This makes translating legacy system programs from C to Rust a promising approach to enhance their reliability. Since manual code translation is time-consuming, it is desirable to automate the translation. To yield satisfactory results, the translator should have the ability to perform type migration , i.e., removing C types and introducing Rust types in the code. In this work, we aim to automatically port an entire C program to Rust by translating each C function to a Rust function with a signature containing proper Rust types through type migration. This goal is challenging because (1) type migration cannot be achieved through syntactic mappings between type names, and (2) after type migration, function bodies should be correctly restructured based on the precise understanding of the functions’ behavior. To address these difficulties, we leverage large language models (LLMs), which possess knowledge of program semantics and programming idioms. However, naïvely instructing LLMs to translate each function produces unsatisfactory Rust code, containing unmigrated or improperly migrated types and a huge number of type errors. To resolve these issues, we propose three techniques: (1) generating candidate signatures, (2) providing translated callees’ signatures to LLMs, and (3) iteratively fixing type errors using compiler feedback. Our evaluation shows that the proposed approach yields a 63.5% increase in migrated types and a 71.5% decrease in type errors compared to the baseline (the naïve LLM-based translation) with modest performance overhead.</description><identifier>ISSN: 1382-3256</identifier><identifier>EISSN: 1573-7616</identifier><identifier>DOI: 10.1007/s10664-024-10573-2</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Compilers ; Computer Science ; Errors ; Interpreters ; Large language models ; Programming Languages ; Rust prevention ; Semantics ; Signatures ; Software Engineering/Programming and Operating Systems ; Translating</subject><ispartof>Empirical software engineering : an international journal, 2025-02, Vol.30 (1), p.3, Article 3</ispartof><rights>The Author(s) 2024</rights><rights>The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>Copyright Springer Nature B.V. Jan 2025</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c228t-250d4e1a9be99ae819f380c5920eaf532c67502414eb7b65d0b5c896aeb29d023</cites><orcidid>0000-0003-4067-7369 ; 0000-0002-0019-9772</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10664-024-10573-2$$EPDF$$P50$$Gspringer$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10664-024-10573-2$$EHTML$$P50$$Gspringer$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Hong, Jaemin</creatorcontrib><creatorcontrib>Ryu, Sukyoung</creatorcontrib><title>Type-migrating C-to-Rust translation using a large language model</title><title>Empirical software engineering : an international journal</title><addtitle>Empir Software Eng</addtitle><description>Rust, a modern system programming language, introduces new types that prevent memory bugs and data races. This makes translating legacy system programs from C to Rust a promising approach to enhance their reliability. Since manual code translation is time-consuming, it is desirable to automate the translation. To yield satisfactory results, the translator should have the ability to perform type migration , i.e., removing C types and introducing Rust types in the code. In this work, we aim to automatically port an entire C program to Rust by translating each C function to a Rust function with a signature containing proper Rust types through type migration. This goal is challenging because (1) type migration cannot be achieved through syntactic mappings between type names, and (2) after type migration, function bodies should be correctly restructured based on the precise understanding of the functions’ behavior. To address these difficulties, we leverage large language models (LLMs), which possess knowledge of program semantics and programming idioms. However, naïvely instructing LLMs to translate each function produces unsatisfactory Rust code, containing unmigrated or improperly migrated types and a huge number of type errors. To resolve these issues, we propose three techniques: (1) generating candidate signatures, (2) providing translated callees’ signatures to LLMs, and (3) iteratively fixing type errors using compiler feedback. Our evaluation shows that the proposed approach yields a 63.5% increase in migrated types and a 71.5% decrease in type errors compared to the baseline (the naïve LLM-based translation) with modest performance overhead.</description><subject>Compilers</subject><subject>Computer Science</subject><subject>Errors</subject><subject>Interpreters</subject><subject>Large language models</subject><subject>Programming Languages</subject><subject>Rust prevention</subject><subject>Semantics</subject><subject>Signatures</subject><subject>Software Engineering/Programming and Operating Systems</subject><subject>Translating</subject><issn>1382-3256</issn><issn>1573-7616</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><recordid>eNp9UE1LxDAQDaLguvoHPBU8RyeTJmmOS_ELFgRZzyFt07JLt61Je9h_b2oFb3uZecy8efN4hNwzeGQA6ikwkDKlgCllIBSneEFWbAZKMnkZMc-QchTymtyEcAAArVKxIpvdaXD0uG-8Hfddk-R07OnnFMZk9LYLbZz2XTKFeWeT1vrGxdo1k43g2FeuvSVXtW2Du_vra_L18rzL3-j24_U932xpiZiNFAVUqWNWF05r6zKma55BKTSCs7XgWEolon-WukIVUlRQiDLT0roCdQXI1-Rh0R18_z25MJpDP_kuvjScCeSgpWLnWUypjCOoyMKFVfo-BO9qM_j90fqTYWDmQM0SqImGzG-gZjbAl6MQyV3j_L_0masf5p92iA</recordid><startdate>20250201</startdate><enddate>20250201</enddate><creator>Hong, Jaemin</creator><creator>Ryu, Sukyoung</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-4067-7369</orcidid><orcidid>https://orcid.org/0000-0002-0019-9772</orcidid></search><sort><creationdate>20250201</creationdate><title>Type-migrating C-to-Rust translation using a large language model</title><author>Hong, Jaemin ; Ryu, Sukyoung</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c228t-250d4e1a9be99ae819f380c5920eaf532c67502414eb7b65d0b5c896aeb29d023</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><topic>Compilers</topic><topic>Computer Science</topic><topic>Errors</topic><topic>Interpreters</topic><topic>Large language models</topic><topic>Programming Languages</topic><topic>Rust prevention</topic><topic>Semantics</topic><topic>Signatures</topic><topic>Software Engineering/Programming and Operating Systems</topic><topic>Translating</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hong, Jaemin</creatorcontrib><creatorcontrib>Ryu, Sukyoung</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Empirical software engineering : an international journal</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hong, Jaemin</au><au>Ryu, Sukyoung</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Type-migrating C-to-Rust translation using a large language model</atitle><jtitle>Empirical software engineering : an international journal</jtitle><stitle>Empir Software Eng</stitle><date>2025-02-01</date><risdate>2025</risdate><volume>30</volume><issue>1</issue><spage>3</spage><pages>3-</pages><artnum>3</artnum><issn>1382-3256</issn><eissn>1573-7616</eissn><abstract>Rust, a modern system programming language, introduces new types that prevent memory bugs and data races. This makes translating legacy system programs from C to Rust a promising approach to enhance their reliability. Since manual code translation is time-consuming, it is desirable to automate the translation. To yield satisfactory results, the translator should have the ability to perform type migration , i.e., removing C types and introducing Rust types in the code. In this work, we aim to automatically port an entire C program to Rust by translating each C function to a Rust function with a signature containing proper Rust types through type migration. This goal is challenging because (1) type migration cannot be achieved through syntactic mappings between type names, and (2) after type migration, function bodies should be correctly restructured based on the precise understanding of the functions’ behavior. To address these difficulties, we leverage large language models (LLMs), which possess knowledge of program semantics and programming idioms. However, naïvely instructing LLMs to translate each function produces unsatisfactory Rust code, containing unmigrated or improperly migrated types and a huge number of type errors. To resolve these issues, we propose three techniques: (1) generating candidate signatures, (2) providing translated callees’ signatures to LLMs, and (3) iteratively fixing type errors using compiler feedback. Our evaluation shows that the proposed approach yields a 63.5% increase in migrated types and a 71.5% decrease in type errors compared to the baseline (the naïve LLM-based translation) with modest performance overhead.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10664-024-10573-2</doi><orcidid>https://orcid.org/0000-0003-4067-7369</orcidid><orcidid>https://orcid.org/0000-0002-0019-9772</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1382-3256
ispartof Empirical software engineering : an international journal, 2025-02, Vol.30 (1), p.3, Article 3
issn 1382-3256
1573-7616
language eng
recordid cdi_proquest_journals_3152309671
source SpringerLink (Online service)
subjects Compilers
Computer Science
Errors
Interpreters
Large language models
Programming Languages
Rust prevention
Semantics
Signatures
Software Engineering/Programming and Operating Systems
Translating
title Type-migrating C-to-Rust translation using a large language model
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T08%3A45%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Type-migrating%20C-to-Rust%20translation%20using%20a%20large%20language%20model&rft.jtitle=Empirical%20software%20engineering%20:%20an%20international%20journal&rft.au=Hong,%20Jaemin&rft.date=2025-02-01&rft.volume=30&rft.issue=1&rft.spage=3&rft.pages=3-&rft.artnum=3&rft.issn=1382-3256&rft.eissn=1573-7616&rft_id=info:doi/10.1007/s10664-024-10573-2&rft_dat=%3Cproquest_cross%3E3117783207%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3117783207&rft_id=info:pmid/&rfr_iscdi=true