DIGDUG: Scalable Separable Dense Graph Pruning and Join Operations in MapReduce

Linking topics to specific experts in technical documents and finding connections between experts are crucial for detecting the evolution of emerging topics and the relationships between their influencers in state-of-the-art research. Current techniques that make such connections are limited to simi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on big data 2021-12, Vol.7 (6), p.930-951
Hauptverfasser:	Shukla, Manu, Dharme, Dinesh, Ramnarain, Pallavi, Santos, Ray Dos, Lu, Chang-Tien
Format:	Artikel
Sprache:	eng
Schlagworte:	Big data Data mining distributed graphs graph joins Iterative methods Joining processes MapReduce Organizations Pruning Social networking (online) Technical information
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	951
container_issue	6
container_start_page	930
container_title	IEEE transactions on big data
container_volume	7
creator	Shukla, Manu Dharme, Dinesh Ramnarain, Pallavi Santos, Ray Dos Lu, Chang-Tien
description	Linking topics to specific experts in technical documents and finding connections between experts are crucial for detecting the evolution of emerging topics and the relationships between their influencers in state-of-the-art research. Current techniques that make such connections are limited to similarity measures. Methods based on weights such as TF-IDF and frequency to identify important topics and self joins between topics and experts are generally utilized to identify connections between experts. However, such approaches are inadequate for identifying emerging keywords and experts since the most useful terms in technical documents tend to be infrequent and concentrated in just a few documents. This makes connecting experts through joins on large dense graphs challenging. In this article, we present DIGDUG, a framework that identifies emerging topics by applying graph operations to technical terms. The framework identifies connections between authors of patents and journal papers by performing joins on connected topics and topics associated with the authors at scale. The problem of scaling the graph operations for topics and experts is solved through dense graph pruning and graph joins categorized under their own scalable separable dense graph class. Experiments were performed on technical domains to validate the utility of the connections between interests and experts. Comparing our graph join and pruning technique against multiple graph and join methods in MapReduce revealed a significant improvement in performance using our approach.
doi_str_mv	10.1109/TBDATA.2020.2983650
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2604919718</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9050872</ieee_id><sourcerecordid>2604919718</sourcerecordid><originalsourceid>FETCH-LOGICAL-c297t-e2d3d51bbc69796fc66af0c54d3297dc92ba536275883e5e852fbdbc8e8db7763</originalsourceid><addsrcrecordid>eNpNkEtPwkAUhSdGEwnyC9hM4ro4D-blDkErBoMRWE-mM7dagm2d0oX_3mKNcXXPzT3nnuRDaEzJhFJibrZ3i9l2NmGEkQkzmktBztCAcc4SpQw5_6cv0ahp9oQQKgnhhg3QerFMF7v0Fm-8O7jsAHgDtYs_agFlAziNrn7HL7Eti_INuzLgp6oo8bqG6I5FVTa4255d_Qqh9XCFLnJ3aGD0O4do93C_nT8mq3W6nM9WiWdGHRNggQdBs8xLo4zMvZQuJ15MA-_uwRuWOcElU0JrDgK0YHkWMq9Bh0wpyYfouv9bx-qzheZo91Uby67SMkmmhhpFdefivcvHqmki5LaOxYeLX5YSe4Jne3j2BM_-wutS4z5VAMBfwhBBtGL8G33faX0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2604919718</pqid></control><display><type>article</type><title>DIGDUG: Scalable Separable Dense Graph Pruning and Join Operations in MapReduce</title><source>IEEE Electronic Library (IEL)</source><creator>Shukla, Manu ; Dharme, Dinesh ; Ramnarain, Pallavi ; Santos, Ray Dos ; Lu, Chang-Tien</creator><creatorcontrib>Shukla, Manu ; Dharme, Dinesh ; Ramnarain, Pallavi ; Santos, Ray Dos ; Lu, Chang-Tien</creatorcontrib><description>Linking topics to specific experts in technical documents and finding connections between experts are crucial for detecting the evolution of emerging topics and the relationships between their influencers in state-of-the-art research. Current techniques that make such connections are limited to similarity measures. Methods based on weights such as TF-IDF and frequency to identify important topics and self joins between topics and experts are generally utilized to identify connections between experts. However, such approaches are inadequate for identifying emerging keywords and experts since the most useful terms in technical documents tend to be infrequent and concentrated in just a few documents. This makes connecting experts through joins on large dense graphs challenging. In this article, we present DIGDUG, a framework that identifies emerging topics by applying graph operations to technical terms. The framework identifies connections between authors of patents and journal papers by performing joins on connected topics and topics associated with the authors at scale. The problem of scaling the graph operations for topics and experts is solved through dense graph pruning and graph joins categorized under their own scalable separable dense graph class. Experiments were performed on technical domains to validate the utility of the connections between interests and experts. Comparing our graph join and pruning technique against multiple graph and join methods in MapReduce revealed a significant improvement in performance using our approach.</description><identifier>ISSN: 2332-7790</identifier><identifier>EISSN: 2332-7790</identifier><identifier>EISSN: 2372-2096</identifier><identifier>DOI: 10.1109/TBDATA.2020.2983650</identifier><identifier>CODEN: ITBDAX</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Big data ; Data mining ; distributed graphs ; graph joins ; Iterative methods ; Joining processes ; MapReduce ; Organizations ; Pruning ; Social networking (online) ; Technical information</subject><ispartof>IEEE transactions on big data, 2021-12, Vol.7 (6), p.930-951</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c297t-e2d3d51bbc69796fc66af0c54d3297dc92ba536275883e5e852fbdbc8e8db7763</citedby><cites>FETCH-LOGICAL-c297t-e2d3d51bbc69796fc66af0c54d3297dc92ba536275883e5e852fbdbc8e8db7763</cites><orcidid>0000-0003-3411-5933</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9050872$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9050872$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Shukla, Manu</creatorcontrib><creatorcontrib>Dharme, Dinesh</creatorcontrib><creatorcontrib>Ramnarain, Pallavi</creatorcontrib><creatorcontrib>Santos, Ray Dos</creatorcontrib><creatorcontrib>Lu, Chang-Tien</creatorcontrib><title>DIGDUG: Scalable Separable Dense Graph Pruning and Join Operations in MapReduce</title><title>IEEE transactions on big data</title><addtitle>TBData</addtitle><description>Linking topics to specific experts in technical documents and finding connections between experts are crucial for detecting the evolution of emerging topics and the relationships between their influencers in state-of-the-art research. Current techniques that make such connections are limited to similarity measures. Methods based on weights such as TF-IDF and frequency to identify important topics and self joins between topics and experts are generally utilized to identify connections between experts. However, such approaches are inadequate for identifying emerging keywords and experts since the most useful terms in technical documents tend to be infrequent and concentrated in just a few documents. This makes connecting experts through joins on large dense graphs challenging. In this article, we present DIGDUG, a framework that identifies emerging topics by applying graph operations to technical terms. The framework identifies connections between authors of patents and journal papers by performing joins on connected topics and topics associated with the authors at scale. The problem of scaling the graph operations for topics and experts is solved through dense graph pruning and graph joins categorized under their own scalable separable dense graph class. Experiments were performed on technical domains to validate the utility of the connections between interests and experts. Comparing our graph join and pruning technique against multiple graph and join methods in MapReduce revealed a significant improvement in performance using our approach.</description><subject>Big data</subject><subject>Data mining</subject><subject>distributed graphs</subject><subject>graph joins</subject><subject>Iterative methods</subject><subject>Joining processes</subject><subject>MapReduce</subject><subject>Organizations</subject><subject>Pruning</subject><subject>Social networking (online)</subject><subject>Technical information</subject><issn>2332-7790</issn><issn>2332-7790</issn><issn>2372-2096</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkEtPwkAUhSdGEwnyC9hM4ro4D-blDkErBoMRWE-mM7dagm2d0oX_3mKNcXXPzT3nnuRDaEzJhFJibrZ3i9l2NmGEkQkzmktBztCAcc4SpQw5_6cv0ahp9oQQKgnhhg3QerFMF7v0Fm-8O7jsAHgDtYs_agFlAziNrn7HL7Eti_INuzLgp6oo8bqG6I5FVTa4255d_Qqh9XCFLnJ3aGD0O4do93C_nT8mq3W6nM9WiWdGHRNggQdBs8xLo4zMvZQuJ15MA-_uwRuWOcElU0JrDgK0YHkWMq9Bh0wpyYfouv9bx-qzheZo91Uby67SMkmmhhpFdefivcvHqmki5LaOxYeLX5YSe4Jne3j2BM_-wutS4z5VAMBfwhBBtGL8G33faX0</recordid><startdate>20211201</startdate><enddate>20211201</enddate><creator>Shukla, Manu</creator><creator>Dharme, Dinesh</creator><creator>Ramnarain, Pallavi</creator><creator>Santos, Ray Dos</creator><creator>Lu, Chang-Tien</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope><orcidid>https://orcid.org/0000-0003-3411-5933</orcidid></search><sort><creationdate>20211201</creationdate><title>DIGDUG: Scalable Separable Dense Graph Pruning and Join Operations in MapReduce</title><author>Shukla, Manu ; Dharme, Dinesh ; Ramnarain, Pallavi ; Santos, Ray Dos ; Lu, Chang-Tien</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c297t-e2d3d51bbc69796fc66af0c54d3297dc92ba536275883e5e852fbdbc8e8db7763</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Big data</topic><topic>Data mining</topic><topic>distributed graphs</topic><topic>graph joins</topic><topic>Iterative methods</topic><topic>Joining processes</topic><topic>MapReduce</topic><topic>Organizations</topic><topic>Pruning</topic><topic>Social networking (online)</topic><topic>Technical information</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shukla, Manu</creatorcontrib><creatorcontrib>Dharme, Dinesh</creatorcontrib><creatorcontrib>Ramnarain, Pallavi</creatorcontrib><creatorcontrib>Santos, Ray Dos</creatorcontrib><creatorcontrib>Lu, Chang-Tien</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on big data</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Shukla, Manu</au><au>Dharme, Dinesh</au><au>Ramnarain, Pallavi</au><au>Santos, Ray Dos</au><au>Lu, Chang-Tien</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>DIGDUG: Scalable Separable Dense Graph Pruning and Join Operations in MapReduce</atitle><jtitle>IEEE transactions on big data</jtitle><stitle>TBData</stitle><date>2021-12-01</date><risdate>2021</risdate><volume>7</volume><issue>6</issue><spage>930</spage><epage>951</epage><pages>930-951</pages><issn>2332-7790</issn><eissn>2332-7790</eissn><eissn>2372-2096</eissn><coden>ITBDAX</coden><abstract>Linking topics to specific experts in technical documents and finding connections between experts are crucial for detecting the evolution of emerging topics and the relationships between their influencers in state-of-the-art research. Current techniques that make such connections are limited to similarity measures. Methods based on weights such as TF-IDF and frequency to identify important topics and self joins between topics and experts are generally utilized to identify connections between experts. However, such approaches are inadequate for identifying emerging keywords and experts since the most useful terms in technical documents tend to be infrequent and concentrated in just a few documents. This makes connecting experts through joins on large dense graphs challenging. In this article, we present DIGDUG, a framework that identifies emerging topics by applying graph operations to technical terms. The framework identifies connections between authors of patents and journal papers by performing joins on connected topics and topics associated with the authors at scale. The problem of scaling the graph operations for topics and experts is solved through dense graph pruning and graph joins categorized under their own scalable separable dense graph class. Experiments were performed on technical domains to validate the utility of the connections between interests and experts. Comparing our graph join and pruning technique against multiple graph and join methods in MapReduce revealed a significant improvement in performance using our approach.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TBDATA.2020.2983650</doi><tpages>22</tpages><orcidid>https://orcid.org/0000-0003-3411-5933</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 2332-7790
ispartof	IEEE transactions on big data, 2021-12, Vol.7 (6), p.930-951
issn	2332-7790 2332-7790 2372-2096
language	eng
recordid	cdi_proquest_journals_2604919718
source	IEEE Electronic Library (IEL)
subjects	Big data Data mining distributed graphs graph joins Iterative methods Joining processes MapReduce Organizations Pruning Social networking (online) Technical information
title	DIGDUG: Scalable Separable Dense Graph Pruning and Join Operations in MapReduce
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T22%3A47%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=DIGDUG:%20Scalable%20Separable%20Dense%20Graph%20Pruning%20and%20Join%20Operations%20in%20MapReduce&rft.jtitle=IEEE%20transactions%20on%20big%20data&rft.au=Shukla,%20Manu&rft.date=2021-12-01&rft.volume=7&rft.issue=6&rft.spage=930&rft.epage=951&rft.pages=930-951&rft.issn=2332-7790&rft.eissn=2332-7790&rft.coden=ITBDAX&rft_id=info:doi/10.1109/TBDATA.2020.2983650&rft_dat=%3Cproquest_RIE%3E2604919718%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2604919718&rft_id=info:pmid/&rft_ieee_id=9050872&rfr_iscdi=true