Big Data Software Engineering: Analysis of Knowledge Domains and Skill Sets Using LDA-Based Topic Modeling

Software engineering is a data-driven discipline and an integral part of data science. The introduction of big data systems has led to a great transformation in the architecture, methodologies, knowledge domains, and skills related to software engineering. Accordingly, education programs are now req...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2019, Vol.7, p.82541-82552
Hauptverfasser: Gurcan, Fatih, Cagiltay, Nergiz Ercil
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 82552
container_issue
container_start_page 82541
container_title IEEE access
container_volume 7
creator Gurcan, Fatih
Cagiltay, Nergiz Ercil
description Software engineering is a data-driven discipline and an integral part of data science. The introduction of big data systems has led to a great transformation in the architecture, methodologies, knowledge domains, and skills related to software engineering. Accordingly, education programs are now required to adapt themselves to up-to-date developments by first identifying the competencies concerning big data software engineering to meet the industrial needs and follow the latest trends. This paper aims to reveal the knowledge domains and skill sets required for big data software engineering and develop a taxonomy by mapping these competencies. A semi-automatic methodology is proposed for the semantic analysis of the textual contents of online job advertisements related to big data software engineering. This methodology uses the latent Dirichlet allocation (LDA), a probabilistic topic-modeling technique to discover the hidden semantic structures from a given textual corpus. The output of this paper is a systematic competency map comprising the essential knowledge domains, skills, and tools for big data software engineering. The findings of this paper are expected to help evaluate and improve IT professionals' vocational knowledge and skills, identify professional roles and competencies in personnel recruitment processes of companies, and meet the skill requirements of the industry through software engineering education programs. Additionally, the proposed model can be extended to blogs, social networks, forums, and other online communities to allow automatic identification of emerging trends and generate contextual tags.
doi_str_mv 10.1109/ACCESS.2019.2924075
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2455636911</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8742548</ieee_id><doaj_id>oai_doaj_org_article_4f3d8277ac244582b5d19302db713d92</doaj_id><sourcerecordid>2455636911</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-9890053cbce56c0665c5a1ce4ae3c9b8d7a86f3205e082e6ee4a0908ab575fc13</originalsourceid><addsrcrecordid>eNpNUU1P4zAQjVastAj4BVws7TnF37H3VtryIYr2EDhbjj2J3A1x1w6q-PekBCHmMqP35r2R5hXFJcELQrC-Wq5Wm7peUEz0gmrKcSV-FKeUSF0yweTJt_lXcZHzDk-lJkhUp8XuOnRobUeL6tiOB5sAbYYuDAApDN0ftBxs_5ZDRrFFD0M89OA7QOv4YsOQkR08qv-Fvkc1jBk950mDtutleW0zePQU98Ghx-ihn4jz4mdr-wwXn_2seL7ZPK3uyu3f2_vVcls6jtVYaqUxFsw1DoR0WErhhCUOuAXmdKN8ZZVsGcUCsKIgYWKwxso2ohKtI-ysuJ99fbQ7s0_hxaY3E20wH0BMnbFpDK4Hw1vmFa0q6yjnQtFGeKIZpr6pCPOaTl6_Z699iv9fIY9mF1_T9JNsKBdCMqnJ8SKbt1yKOSdov64SbI4ZmTkjc8zIfGY0qS5nVQCAL4WqOBVcsXcgBotk</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2455636911</pqid></control><display><type>article</type><title>Big Data Software Engineering: Analysis of Knowledge Domains and Skill Sets Using LDA-Based Topic Modeling</title><source>Directory of Open Access Journals</source><source>IEEE Xplore Open Access Journals</source><source>EZB Electronic Journals Library</source><creator>Gurcan, Fatih ; Cagiltay, Nergiz Ercil</creator><creatorcontrib>Gurcan, Fatih ; Cagiltay, Nergiz Ercil</creatorcontrib><description>Software engineering is a data-driven discipline and an integral part of data science. The introduction of big data systems has led to a great transformation in the architecture, methodologies, knowledge domains, and skills related to software engineering. Accordingly, education programs are now required to adapt themselves to up-to-date developments by first identifying the competencies concerning big data software engineering to meet the industrial needs and follow the latest trends. This paper aims to reveal the knowledge domains and skill sets required for big data software engineering and develop a taxonomy by mapping these competencies. A semi-automatic methodology is proposed for the semantic analysis of the textual contents of online job advertisements related to big data software engineering. This methodology uses the latent Dirichlet allocation (LDA), a probabilistic topic-modeling technique to discover the hidden semantic structures from a given textual corpus. The output of this paper is a systematic competency map comprising the essential knowledge domains, skills, and tools for big data software engineering. The findings of this paper are expected to help evaluate and improve IT professionals' vocational knowledge and skills, identify professional roles and competencies in personnel recruitment processes of companies, and meet the skill requirements of the industry through software engineering education programs. Additionally, the proposed model can be extended to blogs, social networks, forums, and other online communities to allow automatic identification of emerging trends and generate contextual tags.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2019.2924075</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Big Data ; Big data software engineering ; competency map ; Computer architecture ; Data systems ; Dirichlet problem ; Domains ; Engineering education ; Industries ; knowledge domains and skill sets ; Knowledge engineering ; latent Dirichlet allocation ; Modelling ; Semantics ; Skills ; Social networks ; Software ; Software engineering ; Taxonomy ; topic modeling ; Trends</subject><ispartof>IEEE access, 2019, Vol.7, p.82541-82552</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-9890053cbce56c0665c5a1ce4ae3c9b8d7a86f3205e082e6ee4a0908ab575fc13</citedby><cites>FETCH-LOGICAL-c408t-9890053cbce56c0665c5a1ce4ae3c9b8d7a86f3205e082e6ee4a0908ab575fc13</cites><orcidid>0000-0001-9915-6686</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8742548$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,4010,27610,27900,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Gurcan, Fatih</creatorcontrib><creatorcontrib>Cagiltay, Nergiz Ercil</creatorcontrib><title>Big Data Software Engineering: Analysis of Knowledge Domains and Skill Sets Using LDA-Based Topic Modeling</title><title>IEEE access</title><addtitle>Access</addtitle><description>Software engineering is a data-driven discipline and an integral part of data science. The introduction of big data systems has led to a great transformation in the architecture, methodologies, knowledge domains, and skills related to software engineering. Accordingly, education programs are now required to adapt themselves to up-to-date developments by first identifying the competencies concerning big data software engineering to meet the industrial needs and follow the latest trends. This paper aims to reveal the knowledge domains and skill sets required for big data software engineering and develop a taxonomy by mapping these competencies. A semi-automatic methodology is proposed for the semantic analysis of the textual contents of online job advertisements related to big data software engineering. This methodology uses the latent Dirichlet allocation (LDA), a probabilistic topic-modeling technique to discover the hidden semantic structures from a given textual corpus. The output of this paper is a systematic competency map comprising the essential knowledge domains, skills, and tools for big data software engineering. The findings of this paper are expected to help evaluate and improve IT professionals' vocational knowledge and skills, identify professional roles and competencies in personnel recruitment processes of companies, and meet the skill requirements of the industry through software engineering education programs. Additionally, the proposed model can be extended to blogs, social networks, forums, and other online communities to allow automatic identification of emerging trends and generate contextual tags.</description><subject>Big Data</subject><subject>Big data software engineering</subject><subject>competency map</subject><subject>Computer architecture</subject><subject>Data systems</subject><subject>Dirichlet problem</subject><subject>Domains</subject><subject>Engineering education</subject><subject>Industries</subject><subject>knowledge domains and skill sets</subject><subject>Knowledge engineering</subject><subject>latent Dirichlet allocation</subject><subject>Modelling</subject><subject>Semantics</subject><subject>Skills</subject><subject>Social networks</subject><subject>Software</subject><subject>Software engineering</subject><subject>Taxonomy</subject><subject>topic modeling</subject><subject>Trends</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUU1P4zAQjVastAj4BVws7TnF37H3VtryIYr2EDhbjj2J3A1x1w6q-PekBCHmMqP35r2R5hXFJcELQrC-Wq5Wm7peUEz0gmrKcSV-FKeUSF0yweTJt_lXcZHzDk-lJkhUp8XuOnRobUeL6tiOB5sAbYYuDAApDN0ftBxs_5ZDRrFFD0M89OA7QOv4YsOQkR08qv-Fvkc1jBk950mDtutleW0zePQU98Ghx-ihn4jz4mdr-wwXn_2seL7ZPK3uyu3f2_vVcls6jtVYaqUxFsw1DoR0WErhhCUOuAXmdKN8ZZVsGcUCsKIgYWKwxso2ohKtI-ysuJ99fbQ7s0_hxaY3E20wH0BMnbFpDK4Hw1vmFa0q6yjnQtFGeKIZpr6pCPOaTl6_Z699iv9fIY9mF1_T9JNsKBdCMqnJ8SKbt1yKOSdov64SbI4ZmTkjc8zIfGY0qS5nVQCAL4WqOBVcsXcgBotk</recordid><startdate>2019</startdate><enddate>2019</enddate><creator>Gurcan, Fatih</creator><creator>Cagiltay, Nergiz Ercil</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-9915-6686</orcidid></search><sort><creationdate>2019</creationdate><title>Big Data Software Engineering: Analysis of Knowledge Domains and Skill Sets Using LDA-Based Topic Modeling</title><author>Gurcan, Fatih ; Cagiltay, Nergiz Ercil</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-9890053cbce56c0665c5a1ce4ae3c9b8d7a86f3205e082e6ee4a0908ab575fc13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Big Data</topic><topic>Big data software engineering</topic><topic>competency map</topic><topic>Computer architecture</topic><topic>Data systems</topic><topic>Dirichlet problem</topic><topic>Domains</topic><topic>Engineering education</topic><topic>Industries</topic><topic>knowledge domains and skill sets</topic><topic>Knowledge engineering</topic><topic>latent Dirichlet allocation</topic><topic>Modelling</topic><topic>Semantics</topic><topic>Skills</topic><topic>Social networks</topic><topic>Software</topic><topic>Software engineering</topic><topic>Taxonomy</topic><topic>topic modeling</topic><topic>Trends</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gurcan, Fatih</creatorcontrib><creatorcontrib>Cagiltay, Nergiz Ercil</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Gurcan, Fatih</au><au>Cagiltay, Nergiz Ercil</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Big Data Software Engineering: Analysis of Knowledge Domains and Skill Sets Using LDA-Based Topic Modeling</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2019</date><risdate>2019</risdate><volume>7</volume><spage>82541</spage><epage>82552</epage><pages>82541-82552</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Software engineering is a data-driven discipline and an integral part of data science. The introduction of big data systems has led to a great transformation in the architecture, methodologies, knowledge domains, and skills related to software engineering. Accordingly, education programs are now required to adapt themselves to up-to-date developments by first identifying the competencies concerning big data software engineering to meet the industrial needs and follow the latest trends. This paper aims to reveal the knowledge domains and skill sets required for big data software engineering and develop a taxonomy by mapping these competencies. A semi-automatic methodology is proposed for the semantic analysis of the textual contents of online job advertisements related to big data software engineering. This methodology uses the latent Dirichlet allocation (LDA), a probabilistic topic-modeling technique to discover the hidden semantic structures from a given textual corpus. The output of this paper is a systematic competency map comprising the essential knowledge domains, skills, and tools for big data software engineering. The findings of this paper are expected to help evaluate and improve IT professionals' vocational knowledge and skills, identify professional roles and competencies in personnel recruitment processes of companies, and meet the skill requirements of the industry through software engineering education programs. Additionally, the proposed model can be extended to blogs, social networks, forums, and other online communities to allow automatic identification of emerging trends and generate contextual tags.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2019.2924075</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-9915-6686</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2019, Vol.7, p.82541-82552
issn 2169-3536
2169-3536
language eng
recordid cdi_proquest_journals_2455636911
source Directory of Open Access Journals; IEEE Xplore Open Access Journals; EZB Electronic Journals Library
subjects Big Data
Big data software engineering
competency map
Computer architecture
Data systems
Dirichlet problem
Domains
Engineering education
Industries
knowledge domains and skill sets
Knowledge engineering
latent Dirichlet allocation
Modelling
Semantics
Skills
Social networks
Software
Software engineering
Taxonomy
topic modeling
Trends
title Big Data Software Engineering: Analysis of Knowledge Domains and Skill Sets Using LDA-Based Topic Modeling
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-11T18%3A40%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Big%20Data%20Software%20Engineering:%20Analysis%20of%20Knowledge%20Domains%20and%20Skill%20Sets%20Using%20LDA-Based%20Topic%20Modeling&rft.jtitle=IEEE%20access&rft.au=Gurcan,%20Fatih&rft.date=2019&rft.volume=7&rft.spage=82541&rft.epage=82552&rft.pages=82541-82552&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2019.2924075&rft_dat=%3Cproquest_cross%3E2455636911%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2455636911&rft_id=info:pmid/&rft_ieee_id=8742548&rft_doaj_id=oai_doaj_org_article_4f3d8277ac244582b5d19302db713d92&rfr_iscdi=true