Type-migrating C-to-Rust translation using a large language model
Rust, a modern system programming language, introduces new types that prevent memory bugs and data races. This makes translating legacy system programs from C to Rust a promising approach to enhance their reliability. Since manual code translation is time-consuming, it is desirable to automate the t...
Gespeichert in:
Veröffentlicht in: | Empirical software engineering : an international journal 2025-02, Vol.30 (1), p.3, Article 3 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 1 |
container_start_page | 3 |
container_title | Empirical software engineering : an international journal |
container_volume | 30 |
creator | Hong, Jaemin Ryu, Sukyoung |
description | Rust, a modern system programming language, introduces new types that prevent memory bugs and data races. This makes translating legacy system programs from C to Rust a promising approach to enhance their reliability. Since manual code translation is time-consuming, it is desirable to automate the translation. To yield satisfactory results, the translator should have the ability to perform
type migration
, i.e., removing C types and introducing Rust types in the code. In this work, we aim to automatically port an entire C program to Rust by translating each C function to a Rust function with a signature containing proper Rust types through type migration. This goal is challenging because (1) type migration cannot be achieved through syntactic mappings between type names, and (2) after type migration, function bodies should be correctly restructured based on the precise understanding of the functions’ behavior. To address these difficulties, we leverage large language models (LLMs), which possess knowledge of program semantics and programming idioms. However, naïvely instructing LLMs to translate each function produces unsatisfactory Rust code, containing unmigrated or improperly migrated types and a huge number of type errors. To resolve these issues, we propose three techniques: (1) generating candidate signatures, (2) providing translated callees’ signatures to LLMs, and (3) iteratively fixing type errors using compiler feedback. Our evaluation shows that the proposed approach yields a 63.5% increase in migrated types and a 71.5% decrease in type errors compared to the baseline (the naïve LLM-based translation) with modest performance overhead. |
doi_str_mv | 10.1007/s10664-024-10573-2 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3152309671</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3117783207</sourcerecordid><originalsourceid>FETCH-LOGICAL-c228t-250d4e1a9be99ae819f380c5920eaf532c67502414eb7b65d0b5c896aeb29d023</originalsourceid><addsrcrecordid>eNp9UE1LxDAQDaLguvoHPBU8RyeTJmmOS_ELFgRZzyFt07JLt61Je9h_b2oFb3uZecy8efN4hNwzeGQA6ikwkDKlgCllIBSneEFWbAZKMnkZMc-QchTymtyEcAAArVKxIpvdaXD0uG-8Hfddk-R07OnnFMZk9LYLbZz2XTKFeWeT1vrGxdo1k43g2FeuvSVXtW2Du_vra_L18rzL3-j24_U932xpiZiNFAVUqWNWF05r6zKma55BKTSCs7XgWEolon-WukIVUlRQiDLT0roCdQXI1-Rh0R18_z25MJpDP_kuvjScCeSgpWLnWUypjCOoyMKFVfo-BO9qM_j90fqTYWDmQM0SqImGzG-gZjbAl6MQyV3j_L_0masf5p92iA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3117783207</pqid></control><display><type>article</type><title>Type-migrating C-to-Rust translation using a large language model</title><source>SpringerLink (Online service)</source><creator>Hong, Jaemin ; Ryu, Sukyoung</creator><creatorcontrib>Hong, Jaemin ; Ryu, Sukyoung</creatorcontrib><description>Rust, a modern system programming language, introduces new types that prevent memory bugs and data races. This makes translating legacy system programs from C to Rust a promising approach to enhance their reliability. Since manual code translation is time-consuming, it is desirable to automate the translation. To yield satisfactory results, the translator should have the ability to perform
type migration
, i.e., removing C types and introducing Rust types in the code. In this work, we aim to automatically port an entire C program to Rust by translating each C function to a Rust function with a signature containing proper Rust types through type migration. This goal is challenging because (1) type migration cannot be achieved through syntactic mappings between type names, and (2) after type migration, function bodies should be correctly restructured based on the precise understanding of the functions’ behavior. To address these difficulties, we leverage large language models (LLMs), which possess knowledge of program semantics and programming idioms. However, naïvely instructing LLMs to translate each function produces unsatisfactory Rust code, containing unmigrated or improperly migrated types and a huge number of type errors. To resolve these issues, we propose three techniques: (1) generating candidate signatures, (2) providing translated callees’ signatures to LLMs, and (3) iteratively fixing type errors using compiler feedback. Our evaluation shows that the proposed approach yields a 63.5% increase in migrated types and a 71.5% decrease in type errors compared to the baseline (the naïve LLM-based translation) with modest performance overhead.</description><identifier>ISSN: 1382-3256</identifier><identifier>EISSN: 1573-7616</identifier><identifier>DOI: 10.1007/s10664-024-10573-2</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Compilers ; Computer Science ; Errors ; Interpreters ; Large language models ; Programming Languages ; Rust prevention ; Semantics ; Signatures ; Software Engineering/Programming and Operating Systems ; Translating</subject><ispartof>Empirical software engineering : an international journal, 2025-02, Vol.30 (1), p.3, Article 3</ispartof><rights>The Author(s) 2024</rights><rights>The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>Copyright Springer Nature B.V. Jan 2025</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c228t-250d4e1a9be99ae819f380c5920eaf532c67502414eb7b65d0b5c896aeb29d023</cites><orcidid>0000-0003-4067-7369 ; 0000-0002-0019-9772</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10664-024-10573-2$$EPDF$$P50$$Gspringer$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10664-024-10573-2$$EHTML$$P50$$Gspringer$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Hong, Jaemin</creatorcontrib><creatorcontrib>Ryu, Sukyoung</creatorcontrib><title>Type-migrating C-to-Rust translation using a large language model</title><title>Empirical software engineering : an international journal</title><addtitle>Empir Software Eng</addtitle><description>Rust, a modern system programming language, introduces new types that prevent memory bugs and data races. This makes translating legacy system programs from C to Rust a promising approach to enhance their reliability. Since manual code translation is time-consuming, it is desirable to automate the translation. To yield satisfactory results, the translator should have the ability to perform
type migration
, i.e., removing C types and introducing Rust types in the code. In this work, we aim to automatically port an entire C program to Rust by translating each C function to a Rust function with a signature containing proper Rust types through type migration. This goal is challenging because (1) type migration cannot be achieved through syntactic mappings between type names, and (2) after type migration, function bodies should be correctly restructured based on the precise understanding of the functions’ behavior. To address these difficulties, we leverage large language models (LLMs), which possess knowledge of program semantics and programming idioms. However, naïvely instructing LLMs to translate each function produces unsatisfactory Rust code, containing unmigrated or improperly migrated types and a huge number of type errors. To resolve these issues, we propose three techniques: (1) generating candidate signatures, (2) providing translated callees’ signatures to LLMs, and (3) iteratively fixing type errors using compiler feedback. Our evaluation shows that the proposed approach yields a 63.5% increase in migrated types and a 71.5% decrease in type errors compared to the baseline (the naïve LLM-based translation) with modest performance overhead.</description><subject>Compilers</subject><subject>Computer Science</subject><subject>Errors</subject><subject>Interpreters</subject><subject>Large language models</subject><subject>Programming Languages</subject><subject>Rust prevention</subject><subject>Semantics</subject><subject>Signatures</subject><subject>Software Engineering/Programming and Operating Systems</subject><subject>Translating</subject><issn>1382-3256</issn><issn>1573-7616</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><recordid>eNp9UE1LxDAQDaLguvoHPBU8RyeTJmmOS_ELFgRZzyFt07JLt61Je9h_b2oFb3uZecy8efN4hNwzeGQA6ikwkDKlgCllIBSneEFWbAZKMnkZMc-QchTymtyEcAAArVKxIpvdaXD0uG-8Hfddk-R07OnnFMZk9LYLbZz2XTKFeWeT1vrGxdo1k43g2FeuvSVXtW2Du_vra_L18rzL3-j24_U932xpiZiNFAVUqWNWF05r6zKma55BKTSCs7XgWEolon-WukIVUlRQiDLT0roCdQXI1-Rh0R18_z25MJpDP_kuvjScCeSgpWLnWUypjCOoyMKFVfo-BO9qM_j90fqTYWDmQM0SqImGzG-gZjbAl6MQyV3j_L_0masf5p92iA</recordid><startdate>20250201</startdate><enddate>20250201</enddate><creator>Hong, Jaemin</creator><creator>Ryu, Sukyoung</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-4067-7369</orcidid><orcidid>https://orcid.org/0000-0002-0019-9772</orcidid></search><sort><creationdate>20250201</creationdate><title>Type-migrating C-to-Rust translation using a large language model</title><author>Hong, Jaemin ; Ryu, Sukyoung</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c228t-250d4e1a9be99ae819f380c5920eaf532c67502414eb7b65d0b5c896aeb29d023</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><topic>Compilers</topic><topic>Computer Science</topic><topic>Errors</topic><topic>Interpreters</topic><topic>Large language models</topic><topic>Programming Languages</topic><topic>Rust prevention</topic><topic>Semantics</topic><topic>Signatures</topic><topic>Software Engineering/Programming and Operating Systems</topic><topic>Translating</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hong, Jaemin</creatorcontrib><creatorcontrib>Ryu, Sukyoung</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Empirical software engineering : an international journal</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hong, Jaemin</au><au>Ryu, Sukyoung</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Type-migrating C-to-Rust translation using a large language model</atitle><jtitle>Empirical software engineering : an international journal</jtitle><stitle>Empir Software Eng</stitle><date>2025-02-01</date><risdate>2025</risdate><volume>30</volume><issue>1</issue><spage>3</spage><pages>3-</pages><artnum>3</artnum><issn>1382-3256</issn><eissn>1573-7616</eissn><abstract>Rust, a modern system programming language, introduces new types that prevent memory bugs and data races. This makes translating legacy system programs from C to Rust a promising approach to enhance their reliability. Since manual code translation is time-consuming, it is desirable to automate the translation. To yield satisfactory results, the translator should have the ability to perform
type migration
, i.e., removing C types and introducing Rust types in the code. In this work, we aim to automatically port an entire C program to Rust by translating each C function to a Rust function with a signature containing proper Rust types through type migration. This goal is challenging because (1) type migration cannot be achieved through syntactic mappings between type names, and (2) after type migration, function bodies should be correctly restructured based on the precise understanding of the functions’ behavior. To address these difficulties, we leverage large language models (LLMs), which possess knowledge of program semantics and programming idioms. However, naïvely instructing LLMs to translate each function produces unsatisfactory Rust code, containing unmigrated or improperly migrated types and a huge number of type errors. To resolve these issues, we propose three techniques: (1) generating candidate signatures, (2) providing translated callees’ signatures to LLMs, and (3) iteratively fixing type errors using compiler feedback. Our evaluation shows that the proposed approach yields a 63.5% increase in migrated types and a 71.5% decrease in type errors compared to the baseline (the naïve LLM-based translation) with modest performance overhead.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10664-024-10573-2</doi><orcidid>https://orcid.org/0000-0003-4067-7369</orcidid><orcidid>https://orcid.org/0000-0002-0019-9772</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1382-3256 |
ispartof | Empirical software engineering : an international journal, 2025-02, Vol.30 (1), p.3, Article 3 |
issn | 1382-3256 1573-7616 |
language | eng |
recordid | cdi_proquest_journals_3152309671 |
source | SpringerLink (Online service) |
subjects | Compilers Computer Science Errors Interpreters Large language models Programming Languages Rust prevention Semantics Signatures Software Engineering/Programming and Operating Systems Translating |
title | Type-migrating C-to-Rust translation using a large language model |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T08%3A45%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Type-migrating%20C-to-Rust%20translation%20using%20a%20large%20language%20model&rft.jtitle=Empirical%20software%20engineering%20:%20an%20international%20journal&rft.au=Hong,%20Jaemin&rft.date=2025-02-01&rft.volume=30&rft.issue=1&rft.spage=3&rft.pages=3-&rft.artnum=3&rft.issn=1382-3256&rft.eissn=1573-7616&rft_id=info:doi/10.1007/s10664-024-10573-2&rft_dat=%3Cproquest_cross%3E3117783207%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3117783207&rft_id=info:pmid/&rfr_iscdi=true |