Analyzing Techniques for Duplicate Question Detection on Q&A Websites for Game Developers

Game development is currently the largest industry in the entertainment segment and has a high demand for skilled game developers that can produce high-quality games. To satiate this demand, game developers need resources that can provide them with the knowledge they need to learn and improve their...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Empirical software engineering : an international journal 2023-01, Vol.28 (1), p.17, Article 17
Hauptverfasser: Kamienski, Arthur, Hindle, Abram, Bezemer, Cor-Paul
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 1
container_start_page 17
container_title Empirical software engineering : an international journal
container_volume 28
creator Kamienski, Arthur
Hindle, Abram
Bezemer, Cor-Paul
description Game development is currently the largest industry in the entertainment segment and has a high demand for skilled game developers that can produce high-quality games. To satiate this demand, game developers need resources that can provide them with the knowledge they need to learn and improve their skills. Question and Answer (Q&A) websites are one of such resources that provide a valuable source of knowledge about game development practices. However, the presence of duplicate questions on Q&A websites hinders their ability to effectively provide information for their users. While several researchers created and analyzed techniques for duplicate question detection on websites such as Stack Overflow, so far no studies have explored how well those techniques work on Q&A websites for game development. With that in mind, in this paper we analyze how we can use pre-trained and unsupervised techniques to detect duplicate questions on Q&A websites focused on game development using data extracted from the Game Development Stack Exchange and Stack Overflow. We also explore how we can leverage a small set of labelled data to improve the performance of those techniques. The pre-trained technique based on MPNet achieved the highest results in identifying duplicate questions about game development, and we could achieve a better performance when combining multiple unsupervised techniques into a single supervised model. Furthermore, the supervised models could identify duplicate questions on websites different from those they were trained on with little to no decrease in performance. Our results lay the groundwork for building better duplicate question detection systems in Q&A websites for game developers and ultimately providing game developers with a more effective Q&A community.
doi_str_mv 10.1007/s10664-022-10256-w
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2748040365</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2748040365</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-adcb7c5a072ceebd5de257bb7da509ecf5abc5835f560963df4d5d2294787a03</originalsourceid><addsrcrecordid>eNp9kN9LwzAQx4MoOKf_gE8FwbfoJWma9nFsOoWBDAbiU0jT6-zo2pp0jvnXm_0A34SDO-4-3-PuS8gtgwcGoB49gySJKXBOGXCZ0O0ZGTCpBFUJS85DLVJORZhckivvVwCQqVgOyMeoMfXup2qW0QLtZ1N9bdBHZeuiyaarK2t6jOah1VdtE02wR3uoQszvR9E75r7qT4KpWWNAvrFuO3T-mlyUpvZ4c8pDsnh-Woxf6Oxt-joezagVLOupKWyurDSguEXMC1kglyrPVWEkZGhLaXIrUyFLmUCWiKKMA8N5FqtUGRBDcndc27l2f3uvV-3Ghae85ipOIQaRyEDxI2Vd673DUneuWhu30wz03kF9dFAHB_XBQb0NInEU-QA3S3R_q_9R_QIYkXUn</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2748040365</pqid></control><display><type>article</type><title>Analyzing Techniques for Duplicate Question Detection on Q&amp;A Websites for Game Developers</title><source>SpringerLink Journals - AutoHoldings</source><creator>Kamienski, Arthur ; Hindle, Abram ; Bezemer, Cor-Paul</creator><creatorcontrib>Kamienski, Arthur ; Hindle, Abram ; Bezemer, Cor-Paul</creatorcontrib><description><![CDATA[Game development is currently the largest industry in the entertainment segment and has a high demand for skilled game developers that can produce high-quality games. To satiate this demand, game developers need resources that can provide them with the knowledge they need to learn and improve their skills. Question and Answer (Q&A) websites are one of such resources that provide a valuable source of knowledge about game development practices. However, the presence of duplicate questions on Q&A websites hinders their ability to effectively provide information for their users. While several researchers created and analyzed techniques for duplicate question detection on websites such as Stack Overflow, so far no studies have explored how well those techniques work on Q&A websites for game development. With that in mind, in this paper we analyze how we can use pre-trained and unsupervised techniques to detect duplicate questions on Q&A websites focused on game development using data extracted from the Game Development Stack Exchange and Stack Overflow. We also explore how we can leverage a small set of labelled data to improve the performance of those techniques. The pre-trained technique based on MPNet achieved the highest results in identifying duplicate questions about game development, and we could achieve a better performance when combining multiple unsupervised techniques into a single supervised model. Furthermore, the supervised models could identify duplicate questions on websites different from those they were trained on with little to no decrease in performance. Our results lay the groundwork for building better duplicate question detection systems in Q&A websites for game developers and ultimately providing game developers with a more effective Q&A community.]]></description><identifier>ISSN: 1382-3256</identifier><identifier>EISSN: 1573-7616</identifier><identifier>DOI: 10.1007/s10664-022-10256-w</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Compilers ; Computer &amp; video games ; Computer Science ; Games ; Interpreters ; Performance enhancement ; Programming Languages ; Questions ; Reproduction (copying) ; Software engineering ; Software Engineering/Programming and Operating Systems ; Websites</subject><ispartof>Empirical software engineering : an international journal, 2023-01, Vol.28 (1), p.17, Article 17</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-adcb7c5a072ceebd5de257bb7da509ecf5abc5835f560963df4d5d2294787a03</citedby><cites>FETCH-LOGICAL-c319t-adcb7c5a072ceebd5de257bb7da509ecf5abc5835f560963df4d5d2294787a03</cites><orcidid>0000-0003-3851-8262</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10664-022-10256-w$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10664-022-10256-w$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Kamienski, Arthur</creatorcontrib><creatorcontrib>Hindle, Abram</creatorcontrib><creatorcontrib>Bezemer, Cor-Paul</creatorcontrib><title>Analyzing Techniques for Duplicate Question Detection on Q&amp;A Websites for Game Developers</title><title>Empirical software engineering : an international journal</title><addtitle>Empir Software Eng</addtitle><description><![CDATA[Game development is currently the largest industry in the entertainment segment and has a high demand for skilled game developers that can produce high-quality games. To satiate this demand, game developers need resources that can provide them with the knowledge they need to learn and improve their skills. Question and Answer (Q&A) websites are one of such resources that provide a valuable source of knowledge about game development practices. However, the presence of duplicate questions on Q&A websites hinders their ability to effectively provide information for their users. While several researchers created and analyzed techniques for duplicate question detection on websites such as Stack Overflow, so far no studies have explored how well those techniques work on Q&A websites for game development. With that in mind, in this paper we analyze how we can use pre-trained and unsupervised techniques to detect duplicate questions on Q&A websites focused on game development using data extracted from the Game Development Stack Exchange and Stack Overflow. We also explore how we can leverage a small set of labelled data to improve the performance of those techniques. The pre-trained technique based on MPNet achieved the highest results in identifying duplicate questions about game development, and we could achieve a better performance when combining multiple unsupervised techniques into a single supervised model. Furthermore, the supervised models could identify duplicate questions on websites different from those they were trained on with little to no decrease in performance. Our results lay the groundwork for building better duplicate question detection systems in Q&A websites for game developers and ultimately providing game developers with a more effective Q&A community.]]></description><subject>Compilers</subject><subject>Computer &amp; video games</subject><subject>Computer Science</subject><subject>Games</subject><subject>Interpreters</subject><subject>Performance enhancement</subject><subject>Programming Languages</subject><subject>Questions</subject><subject>Reproduction (copying)</subject><subject>Software engineering</subject><subject>Software Engineering/Programming and Operating Systems</subject><subject>Websites</subject><issn>1382-3256</issn><issn>1573-7616</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>AFKRA</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNp9kN9LwzAQx4MoOKf_gE8FwbfoJWma9nFsOoWBDAbiU0jT6-zo2pp0jvnXm_0A34SDO-4-3-PuS8gtgwcGoB49gySJKXBOGXCZ0O0ZGTCpBFUJS85DLVJORZhckivvVwCQqVgOyMeoMfXup2qW0QLtZ1N9bdBHZeuiyaarK2t6jOah1VdtE02wR3uoQszvR9E75r7qT4KpWWNAvrFuO3T-mlyUpvZ4c8pDsnh-Woxf6Oxt-joezagVLOupKWyurDSguEXMC1kglyrPVWEkZGhLaXIrUyFLmUCWiKKMA8N5FqtUGRBDcndc27l2f3uvV-3Ghae85ipOIQaRyEDxI2Vd673DUneuWhu30wz03kF9dFAHB_XBQb0NInEU-QA3S3R_q_9R_QIYkXUn</recordid><startdate>20230101</startdate><enddate>20230101</enddate><creator>Kamienski, Arthur</creator><creator>Hindle, Abram</creator><creator>Bezemer, Cor-Paul</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>L6V</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>S0W</scope><orcidid>https://orcid.org/0000-0003-3851-8262</orcidid></search><sort><creationdate>20230101</creationdate><title>Analyzing Techniques for Duplicate Question Detection on Q&amp;A Websites for Game Developers</title><author>Kamienski, Arthur ; Hindle, Abram ; Bezemer, Cor-Paul</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-adcb7c5a072ceebd5de257bb7da509ecf5abc5835f560963df4d5d2294787a03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Compilers</topic><topic>Computer &amp; video games</topic><topic>Computer Science</topic><topic>Games</topic><topic>Interpreters</topic><topic>Performance enhancement</topic><topic>Programming Languages</topic><topic>Questions</topic><topic>Reproduction (copying)</topic><topic>Software engineering</topic><topic>Software Engineering/Programming and Operating Systems</topic><topic>Websites</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kamienski, Arthur</creatorcontrib><creatorcontrib>Hindle, Abram</creatorcontrib><creatorcontrib>Bezemer, Cor-Paul</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Engineering Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>DELNET Engineering &amp; Technology Collection</collection><jtitle>Empirical software engineering : an international journal</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kamienski, Arthur</au><au>Hindle, Abram</au><au>Bezemer, Cor-Paul</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Analyzing Techniques for Duplicate Question Detection on Q&amp;A Websites for Game Developers</atitle><jtitle>Empirical software engineering : an international journal</jtitle><stitle>Empir Software Eng</stitle><date>2023-01-01</date><risdate>2023</risdate><volume>28</volume><issue>1</issue><spage>17</spage><pages>17-</pages><artnum>17</artnum><issn>1382-3256</issn><eissn>1573-7616</eissn><abstract><![CDATA[Game development is currently the largest industry in the entertainment segment and has a high demand for skilled game developers that can produce high-quality games. To satiate this demand, game developers need resources that can provide them with the knowledge they need to learn and improve their skills. Question and Answer (Q&A) websites are one of such resources that provide a valuable source of knowledge about game development practices. However, the presence of duplicate questions on Q&A websites hinders their ability to effectively provide information for their users. While several researchers created and analyzed techniques for duplicate question detection on websites such as Stack Overflow, so far no studies have explored how well those techniques work on Q&A websites for game development. With that in mind, in this paper we analyze how we can use pre-trained and unsupervised techniques to detect duplicate questions on Q&A websites focused on game development using data extracted from the Game Development Stack Exchange and Stack Overflow. We also explore how we can leverage a small set of labelled data to improve the performance of those techniques. The pre-trained technique based on MPNet achieved the highest results in identifying duplicate questions about game development, and we could achieve a better performance when combining multiple unsupervised techniques into a single supervised model. Furthermore, the supervised models could identify duplicate questions on websites different from those they were trained on with little to no decrease in performance. Our results lay the groundwork for building better duplicate question detection systems in Q&A websites for game developers and ultimately providing game developers with a more effective Q&A community.]]></abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10664-022-10256-w</doi><orcidid>https://orcid.org/0000-0003-3851-8262</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1382-3256
ispartof Empirical software engineering : an international journal, 2023-01, Vol.28 (1), p.17, Article 17
issn 1382-3256
1573-7616
language eng
recordid cdi_proquest_journals_2748040365
source SpringerLink Journals - AutoHoldings
subjects Compilers
Computer & video games
Computer Science
Games
Interpreters
Performance enhancement
Programming Languages
Questions
Reproduction (copying)
Software engineering
Software Engineering/Programming and Operating Systems
Websites
title Analyzing Techniques for Duplicate Question Detection on Q&A Websites for Game Developers
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T14%3A39%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Analyzing%20Techniques%20for%20Duplicate%20Question%20Detection%20on%20Q&A%20Websites%20for%20Game%20Developers&rft.jtitle=Empirical%20software%20engineering%20:%20an%20international%20journal&rft.au=Kamienski,%20Arthur&rft.date=2023-01-01&rft.volume=28&rft.issue=1&rft.spage=17&rft.pages=17-&rft.artnum=17&rft.issn=1382-3256&rft.eissn=1573-7616&rft_id=info:doi/10.1007/s10664-022-10256-w&rft_dat=%3Cproquest_cross%3E2748040365%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2748040365&rft_id=info:pmid/&rfr_iscdi=true