Deep code comment generation with hybrid lexical and syntactical information

During software maintenance, developers spend a lot of time understanding the source code. Existing studies show that code comments help developers comprehend programs and reduce additional time spent on reading and navigating source code. Unfortunately, these comments are often mismatched, missing...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Empirical software engineering : an international journal 2020-05, Vol.25 (3), p.2179-2217
Hauptverfasser:	Hu, Xing, Li, Ge, Xia, Xin, Lo, David, Jin, Zhi
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Compilers Computer Science Information retrieval Interpreters Machine translation Natural language processing Programming Languages Software Software Engineering/Programming and Operating Systems Source code
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	2217
container_issue	3
container_start_page	2179
container_title	Empirical software engineering : an international journal
container_volume	25
creator	Hu, Xing Li, Ge Xia, Xin Lo, David Jin, Zhi
description	During software maintenance, developers spend a lot of time understanding the source code. Existing studies show that code comments help developers comprehend programs and reduce additional time spent on reading and navigating source code. Unfortunately, these comments are often mismatched, missing or outdated in software projects. Developers have to infer the functionality from the source code. This paper proposes a new approach named Hybrid-DeepCom to automatically generate code comments for the functional units of Java language, namely, Java methods. The generated comments aim to help developers understand the functionality of Java methods. Hybrid-DeepCom applies Natural Language Processing (NLP) techniques to learn from a large code corpus and generates comments from learned features. It formulates the comment generation task as the machine translation problem. Hybrid-DeepCom exploits a deep neural network that combines the lexical and structure information of Java methods for better comments generation. We conduct experiments on a large-scale Java corpus built from 9,714 open source projects on GitHub. We evaluate the experimental results on both machine translation metrics and information retrieval metrics. Experimental results demonstrate that our method Hybrid-DeepCom outperforms the state-of-the-art by a substantial margin. In addition, we evaluate the influence of out-of-vocabulary tokens on comment generation. The results show that reducing the out-of-vocabulary tokens improves the accuracy effectively.
doi_str_mv	10.1007/s10664-019-09730-9
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2400150108</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2400150108</sourcerecordid><originalsourceid>FETCH-LOGICAL-c363t-f317574b641606d883dd63fa80832143d1f000348b21d71bc0eeaebe362815643</originalsourceid><addsrcrecordid>eNp9kMtOwzAQRS0EEqXwA6wisTbMxI6TLFF5SpXYwNpy4kmbqnGKbQT9e0yDxI7NPKR774wOY5cI1whQ3gQEpSQHrDnUpQBeH7EZFqXgpUJ1nGZR5VzkhTplZyFsAJJMFjO2vCPaZe1oKZVhIBezFTnyJvajyz77uM7W-8b3NtvSV9-abWaczcLeRdPGw967bvTDQX_OTjqzDXTx2-fs7eH-dfHEly-Pz4vbJW-FEpF3AsuilI2SqEDZqhLWKtGZCiqRoxQWu_SfkFWToy2xaYHIUENC5RUWSoo5u5pyd358_6AQ9Wb88C6d1LkEwAIwRc1ZPqlaP4bgqdM73w_G7zWC_qGmJ2o6UdMHarpOJjGZQhK7Ffm_6H9c3661buc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2400150108</pqid></control><display><type>article</type><title>Deep code comment generation with hybrid lexical and syntactical information</title><source>SpringerLink Journals - AutoHoldings</source><creator>Hu, Xing ; Li, Ge ; Xia, Xin ; Lo, David ; Jin, Zhi</creator><creatorcontrib>Hu, Xing ; Li, Ge ; Xia, Xin ; Lo, David ; Jin, Zhi</creatorcontrib><description>During software maintenance, developers spend a lot of time understanding the source code. Existing studies show that code comments help developers comprehend programs and reduce additional time spent on reading and navigating source code. Unfortunately, these comments are often mismatched, missing or outdated in software projects. Developers have to infer the functionality from the source code. This paper proposes a new approach named Hybrid-DeepCom to automatically generate code comments for the functional units of Java language, namely, Java methods. The generated comments aim to help developers understand the functionality of Java methods. Hybrid-DeepCom applies Natural Language Processing (NLP) techniques to learn from a large code corpus and generates comments from learned features. It formulates the comment generation task as the machine translation problem. Hybrid-DeepCom exploits a deep neural network that combines the lexical and structure information of Java methods for better comments generation. We conduct experiments on a large-scale Java corpus built from 9,714 open source projects on GitHub. We evaluate the experimental results on both machine translation metrics and information retrieval metrics. Experimental results demonstrate that our method Hybrid-DeepCom outperforms the state-of-the-art by a substantial margin. In addition, we evaluate the influence of out-of-vocabulary tokens on comment generation. The results show that reducing the out-of-vocabulary tokens improves the accuracy effectively.</description><identifier>ISSN: 1382-3256</identifier><identifier>EISSN: 1573-7616</identifier><identifier>DOI: 10.1007/s10664-019-09730-9</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Artificial neural networks ; Compilers ; Computer Science ; Information retrieval ; Interpreters ; Machine translation ; Natural language processing ; Programming Languages ; Software ; Software Engineering/Programming and Operating Systems ; Source code</subject><ispartof>Empirical software engineering : an international journal, 2020-05, Vol.25 (3), p.2179-2217</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2019</rights><rights>Springer Science+Business Media, LLC, part of Springer Nature 2019.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c363t-f317574b641606d883dd63fa80832143d1f000348b21d71bc0eeaebe362815643</citedby><cites>FETCH-LOGICAL-c363t-f317574b641606d883dd63fa80832143d1f000348b21d71bc0eeaebe362815643</cites><orcidid>0000-0003-0093-3292</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10664-019-09730-9$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10664-019-09730-9$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Hu, Xing</creatorcontrib><creatorcontrib>Li, Ge</creatorcontrib><creatorcontrib>Xia, Xin</creatorcontrib><creatorcontrib>Lo, David</creatorcontrib><creatorcontrib>Jin, Zhi</creatorcontrib><title>Deep code comment generation with hybrid lexical and syntactical information</title><title>Empirical software engineering : an international journal</title><addtitle>Empir Software Eng</addtitle><description>During software maintenance, developers spend a lot of time understanding the source code. Existing studies show that code comments help developers comprehend programs and reduce additional time spent on reading and navigating source code. Unfortunately, these comments are often mismatched, missing or outdated in software projects. Developers have to infer the functionality from the source code. This paper proposes a new approach named Hybrid-DeepCom to automatically generate code comments for the functional units of Java language, namely, Java methods. The generated comments aim to help developers understand the functionality of Java methods. Hybrid-DeepCom applies Natural Language Processing (NLP) techniques to learn from a large code corpus and generates comments from learned features. It formulates the comment generation task as the machine translation problem. Hybrid-DeepCom exploits a deep neural network that combines the lexical and structure information of Java methods for better comments generation. We conduct experiments on a large-scale Java corpus built from 9,714 open source projects on GitHub. We evaluate the experimental results on both machine translation metrics and information retrieval metrics. Experimental results demonstrate that our method Hybrid-DeepCom outperforms the state-of-the-art by a substantial margin. In addition, we evaluate the influence of out-of-vocabulary tokens on comment generation. The results show that reducing the out-of-vocabulary tokens improves the accuracy effectively.</description><subject>Artificial neural networks</subject><subject>Compilers</subject><subject>Computer Science</subject><subject>Information retrieval</subject><subject>Interpreters</subject><subject>Machine translation</subject><subject>Natural language processing</subject><subject>Programming Languages</subject><subject>Software</subject><subject>Software Engineering/Programming and Operating Systems</subject><subject>Source code</subject><issn>1382-3256</issn><issn>1573-7616</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp9kMtOwzAQRS0EEqXwA6wisTbMxI6TLFF5SpXYwNpy4kmbqnGKbQT9e0yDxI7NPKR774wOY5cI1whQ3gQEpSQHrDnUpQBeH7EZFqXgpUJ1nGZR5VzkhTplZyFsAJJMFjO2vCPaZe1oKZVhIBezFTnyJvajyz77uM7W-8b3NtvSV9-abWaczcLeRdPGw967bvTDQX_OTjqzDXTx2-fs7eH-dfHEly-Pz4vbJW-FEpF3AsuilI2SqEDZqhLWKtGZCiqRoxQWu_SfkFWToy2xaYHIUENC5RUWSoo5u5pyd358_6AQ9Wb88C6d1LkEwAIwRc1ZPqlaP4bgqdM73w_G7zWC_qGmJ2o6UdMHarpOJjGZQhK7Ffm_6H9c3661buc</recordid><startdate>20200501</startdate><enddate>20200501</enddate><creator>Hu, Xing</creator><creator>Li, Ge</creator><creator>Xia, Xin</creator><creator>Lo, David</creator><creator>Jin, Zhi</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>L6V</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>S0W</scope><orcidid>https://orcid.org/0000-0003-0093-3292</orcidid></search><sort><creationdate>20200501</creationdate><title>Deep code comment generation with hybrid lexical and syntactical information</title><author>Hu, Xing ; Li, Ge ; Xia, Xin ; Lo, David ; Jin, Zhi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c363t-f317574b641606d883dd63fa80832143d1f000348b21d71bc0eeaebe362815643</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Artificial neural networks</topic><topic>Compilers</topic><topic>Computer Science</topic><topic>Information retrieval</topic><topic>Interpreters</topic><topic>Machine translation</topic><topic>Natural language processing</topic><topic>Programming Languages</topic><topic>Software</topic><topic>Software Engineering/Programming and Operating Systems</topic><topic>Source code</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hu, Xing</creatorcontrib><creatorcontrib>Li, Ge</creatorcontrib><creatorcontrib>Xia, Xin</creatorcontrib><creatorcontrib>Lo, David</creatorcontrib><creatorcontrib>Jin, Zhi</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>DELNET Engineering & Technology Collection</collection><jtitle>Empirical software engineering : an international journal</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hu, Xing</au><au>Li, Ge</au><au>Xia, Xin</au><au>Lo, David</au><au>Jin, Zhi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep code comment generation with hybrid lexical and syntactical information</atitle><jtitle>Empirical software engineering : an international journal</jtitle><stitle>Empir Software Eng</stitle><date>2020-05-01</date><risdate>2020</risdate><volume>25</volume><issue>3</issue><spage>2179</spage><epage>2217</epage><pages>2179-2217</pages><issn>1382-3256</issn><eissn>1573-7616</eissn><abstract>During software maintenance, developers spend a lot of time understanding the source code. Existing studies show that code comments help developers comprehend programs and reduce additional time spent on reading and navigating source code. Unfortunately, these comments are often mismatched, missing or outdated in software projects. Developers have to infer the functionality from the source code. This paper proposes a new approach named Hybrid-DeepCom to automatically generate code comments for the functional units of Java language, namely, Java methods. The generated comments aim to help developers understand the functionality of Java methods. Hybrid-DeepCom applies Natural Language Processing (NLP) techniques to learn from a large code corpus and generates comments from learned features. It formulates the comment generation task as the machine translation problem. Hybrid-DeepCom exploits a deep neural network that combines the lexical and structure information of Java methods for better comments generation. We conduct experiments on a large-scale Java corpus built from 9,714 open source projects on GitHub. We evaluate the experimental results on both machine translation metrics and information retrieval metrics. Experimental results demonstrate that our method Hybrid-DeepCom outperforms the state-of-the-art by a substantial margin. In addition, we evaluate the influence of out-of-vocabulary tokens on comment generation. The results show that reducing the out-of-vocabulary tokens improves the accuracy effectively.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10664-019-09730-9</doi><tpages>39</tpages><orcidid>https://orcid.org/0000-0003-0093-3292</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1382-3256
ispartof	Empirical software engineering : an international journal, 2020-05, Vol.25 (3), p.2179-2217
issn	1382-3256 1573-7616
language	eng
recordid	cdi_proquest_journals_2400150108
source	SpringerLink Journals - AutoHoldings
subjects	Artificial neural networks Compilers Computer Science Information retrieval Interpreters Machine translation Natural language processing Programming Languages Software Software Engineering/Programming and Operating Systems Source code
title	Deep code comment generation with hybrid lexical and syntactical information
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T22%3A34%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20code%20comment%20generation%20with%20hybrid%20lexical%20and%20syntactical%20information&rft.jtitle=Empirical%20software%20engineering%20:%20an%20international%20journal&rft.au=Hu,%20Xing&rft.date=2020-05-01&rft.volume=25&rft.issue=3&rft.spage=2179&rft.epage=2217&rft.pages=2179-2217&rft.issn=1382-3256&rft.eissn=1573-7616&rft_id=info:doi/10.1007/s10664-019-09730-9&rft_dat=%3Cproquest_cross%3E2400150108%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2400150108&rft_id=info:pmid/&rfr_iscdi=true