DIGDUG: Scalable Separable Dense Graph Pruning and Join Operations in MapReduce
Linking topics to specific experts in technical documents and finding connections between experts are crucial for detecting the evolution of emerging topics and the relationships between their influencers in state-of-the-art research. Current techniques that make such connections are limited to simi...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on big data 2021-12, Vol.7 (6), p.930-951 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 951 |
---|---|
container_issue | 6 |
container_start_page | 930 |
container_title | IEEE transactions on big data |
container_volume | 7 |
creator | Shukla, Manu Dharme, Dinesh Ramnarain, Pallavi Santos, Ray Dos Lu, Chang-Tien |
description | Linking topics to specific experts in technical documents and finding connections between experts are crucial for detecting the evolution of emerging topics and the relationships between their influencers in state-of-the-art research. Current techniques that make such connections are limited to similarity measures. Methods based on weights such as TF-IDF and frequency to identify important topics and self joins between topics and experts are generally utilized to identify connections between experts. However, such approaches are inadequate for identifying emerging keywords and experts since the most useful terms in technical documents tend to be infrequent and concentrated in just a few documents. This makes connecting experts through joins on large dense graphs challenging. In this article, we present DIGDUG, a framework that identifies emerging topics by applying graph operations to technical terms. The framework identifies connections between authors of patents and journal papers by performing joins on connected topics and topics associated with the authors at scale. The problem of scaling the graph operations for topics and experts is solved through dense graph pruning and graph joins categorized under their own scalable separable dense graph class. Experiments were performed on technical domains to validate the utility of the connections between interests and experts. Comparing our graph join and pruning technique against multiple graph and join methods in MapReduce revealed a significant improvement in performance using our approach. |
doi_str_mv | 10.1109/TBDATA.2020.2983650 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2604919718</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9050872</ieee_id><sourcerecordid>2604919718</sourcerecordid><originalsourceid>FETCH-LOGICAL-c297t-e2d3d51bbc69796fc66af0c54d3297dc92ba536275883e5e852fbdbc8e8db7763</originalsourceid><addsrcrecordid>eNpNkEtPwkAUhSdGEwnyC9hM4ro4D-blDkErBoMRWE-mM7dagm2d0oX_3mKNcXXPzT3nnuRDaEzJhFJibrZ3i9l2NmGEkQkzmktBztCAcc4SpQw5_6cv0ahp9oQQKgnhhg3QerFMF7v0Fm-8O7jsAHgDtYs_agFlAziNrn7HL7Eti_INuzLgp6oo8bqG6I5FVTa4255d_Qqh9XCFLnJ3aGD0O4do93C_nT8mq3W6nM9WiWdGHRNggQdBs8xLo4zMvZQuJ15MA-_uwRuWOcElU0JrDgK0YHkWMq9Bh0wpyYfouv9bx-qzheZo91Uby67SMkmmhhpFdefivcvHqmki5LaOxYeLX5YSe4Jne3j2BM_-wutS4z5VAMBfwhBBtGL8G33faX0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2604919718</pqid></control><display><type>article</type><title>DIGDUG: Scalable Separable Dense Graph Pruning and Join Operations in MapReduce</title><source>IEEE Electronic Library (IEL)</source><creator>Shukla, Manu ; Dharme, Dinesh ; Ramnarain, Pallavi ; Santos, Ray Dos ; Lu, Chang-Tien</creator><creatorcontrib>Shukla, Manu ; Dharme, Dinesh ; Ramnarain, Pallavi ; Santos, Ray Dos ; Lu, Chang-Tien</creatorcontrib><description>Linking topics to specific experts in technical documents and finding connections between experts are crucial for detecting the evolution of emerging topics and the relationships between their influencers in state-of-the-art research. Current techniques that make such connections are limited to similarity measures. Methods based on weights such as TF-IDF and frequency to identify important topics and self joins between topics and experts are generally utilized to identify connections between experts. However, such approaches are inadequate for identifying emerging keywords and experts since the most useful terms in technical documents tend to be infrequent and concentrated in just a few documents. This makes connecting experts through joins on large dense graphs challenging. In this article, we present DIGDUG, a framework that identifies emerging topics by applying graph operations to technical terms. The framework identifies connections between authors of patents and journal papers by performing joins on connected topics and topics associated with the authors at scale. The problem of scaling the graph operations for topics and experts is solved through dense graph pruning and graph joins categorized under their own scalable separable dense graph class. Experiments were performed on technical domains to validate the utility of the connections between interests and experts. Comparing our graph join and pruning technique against multiple graph and join methods in MapReduce revealed a significant improvement in performance using our approach.</description><identifier>ISSN: 2332-7790</identifier><identifier>EISSN: 2332-7790</identifier><identifier>EISSN: 2372-2096</identifier><identifier>DOI: 10.1109/TBDATA.2020.2983650</identifier><identifier>CODEN: ITBDAX</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Big data ; Data mining ; distributed graphs ; graph joins ; Iterative methods ; Joining processes ; MapReduce ; Organizations ; Pruning ; Social networking (online) ; Technical information</subject><ispartof>IEEE transactions on big data, 2021-12, Vol.7 (6), p.930-951</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c297t-e2d3d51bbc69796fc66af0c54d3297dc92ba536275883e5e852fbdbc8e8db7763</citedby><cites>FETCH-LOGICAL-c297t-e2d3d51bbc69796fc66af0c54d3297dc92ba536275883e5e852fbdbc8e8db7763</cites><orcidid>0000-0003-3411-5933</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9050872$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9050872$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Shukla, Manu</creatorcontrib><creatorcontrib>Dharme, Dinesh</creatorcontrib><creatorcontrib>Ramnarain, Pallavi</creatorcontrib><creatorcontrib>Santos, Ray Dos</creatorcontrib><creatorcontrib>Lu, Chang-Tien</creatorcontrib><title>DIGDUG: Scalable Separable Dense Graph Pruning and Join Operations in MapReduce</title><title>IEEE transactions on big data</title><addtitle>TBData</addtitle><description>Linking topics to specific experts in technical documents and finding connections between experts are crucial for detecting the evolution of emerging topics and the relationships between their influencers in state-of-the-art research. Current techniques that make such connections are limited to similarity measures. Methods based on weights such as TF-IDF and frequency to identify important topics and self joins between topics and experts are generally utilized to identify connections between experts. However, such approaches are inadequate for identifying emerging keywords and experts since the most useful terms in technical documents tend to be infrequent and concentrated in just a few documents. This makes connecting experts through joins on large dense graphs challenging. In this article, we present DIGDUG, a framework that identifies emerging topics by applying graph operations to technical terms. The framework identifies connections between authors of patents and journal papers by performing joins on connected topics and topics associated with the authors at scale. The problem of scaling the graph operations for topics and experts is solved through dense graph pruning and graph joins categorized under their own scalable separable dense graph class. Experiments were performed on technical domains to validate the utility of the connections between interests and experts. Comparing our graph join and pruning technique against multiple graph and join methods in MapReduce revealed a significant improvement in performance using our approach.</description><subject>Big data</subject><subject>Data mining</subject><subject>distributed graphs</subject><subject>graph joins</subject><subject>Iterative methods</subject><subject>Joining processes</subject><subject>MapReduce</subject><subject>Organizations</subject><subject>Pruning</subject><subject>Social networking (online)</subject><subject>Technical information</subject><issn>2332-7790</issn><issn>2332-7790</issn><issn>2372-2096</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkEtPwkAUhSdGEwnyC9hM4ro4D-blDkErBoMRWE-mM7dagm2d0oX_3mKNcXXPzT3nnuRDaEzJhFJibrZ3i9l2NmGEkQkzmktBztCAcc4SpQw5_6cv0ahp9oQQKgnhhg3QerFMF7v0Fm-8O7jsAHgDtYs_agFlAziNrn7HL7Eti_INuzLgp6oo8bqG6I5FVTa4255d_Qqh9XCFLnJ3aGD0O4do93C_nT8mq3W6nM9WiWdGHRNggQdBs8xLo4zMvZQuJ15MA-_uwRuWOcElU0JrDgK0YHkWMq9Bh0wpyYfouv9bx-qzheZo91Uby67SMkmmhhpFdefivcvHqmki5LaOxYeLX5YSe4Jne3j2BM_-wutS4z5VAMBfwhBBtGL8G33faX0</recordid><startdate>20211201</startdate><enddate>20211201</enddate><creator>Shukla, Manu</creator><creator>Dharme, Dinesh</creator><creator>Ramnarain, Pallavi</creator><creator>Santos, Ray Dos</creator><creator>Lu, Chang-Tien</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope><orcidid>https://orcid.org/0000-0003-3411-5933</orcidid></search><sort><creationdate>20211201</creationdate><title>DIGDUG: Scalable Separable Dense Graph Pruning and Join Operations in MapReduce</title><author>Shukla, Manu ; Dharme, Dinesh ; Ramnarain, Pallavi ; Santos, Ray Dos ; Lu, Chang-Tien</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c297t-e2d3d51bbc69796fc66af0c54d3297dc92ba536275883e5e852fbdbc8e8db7763</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Big data</topic><topic>Data mining</topic><topic>distributed graphs</topic><topic>graph joins</topic><topic>Iterative methods</topic><topic>Joining processes</topic><topic>MapReduce</topic><topic>Organizations</topic><topic>Pruning</topic><topic>Social networking (online)</topic><topic>Technical information</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shukla, Manu</creatorcontrib><creatorcontrib>Dharme, Dinesh</creatorcontrib><creatorcontrib>Ramnarain, Pallavi</creatorcontrib><creatorcontrib>Santos, Ray Dos</creatorcontrib><creatorcontrib>Lu, Chang-Tien</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on big data</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Shukla, Manu</au><au>Dharme, Dinesh</au><au>Ramnarain, Pallavi</au><au>Santos, Ray Dos</au><au>Lu, Chang-Tien</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>DIGDUG: Scalable Separable Dense Graph Pruning and Join Operations in MapReduce</atitle><jtitle>IEEE transactions on big data</jtitle><stitle>TBData</stitle><date>2021-12-01</date><risdate>2021</risdate><volume>7</volume><issue>6</issue><spage>930</spage><epage>951</epage><pages>930-951</pages><issn>2332-7790</issn><eissn>2332-7790</eissn><eissn>2372-2096</eissn><coden>ITBDAX</coden><abstract>Linking topics to specific experts in technical documents and finding connections between experts are crucial for detecting the evolution of emerging topics and the relationships between their influencers in state-of-the-art research. Current techniques that make such connections are limited to similarity measures. Methods based on weights such as TF-IDF and frequency to identify important topics and self joins between topics and experts are generally utilized to identify connections between experts. However, such approaches are inadequate for identifying emerging keywords and experts since the most useful terms in technical documents tend to be infrequent and concentrated in just a few documents. This makes connecting experts through joins on large dense graphs challenging. In this article, we present DIGDUG, a framework that identifies emerging topics by applying graph operations to technical terms. The framework identifies connections between authors of patents and journal papers by performing joins on connected topics and topics associated with the authors at scale. The problem of scaling the graph operations for topics and experts is solved through dense graph pruning and graph joins categorized under their own scalable separable dense graph class. Experiments were performed on technical domains to validate the utility of the connections between interests and experts. Comparing our graph join and pruning technique against multiple graph and join methods in MapReduce revealed a significant improvement in performance using our approach.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TBDATA.2020.2983650</doi><tpages>22</tpages><orcidid>https://orcid.org/0000-0003-3411-5933</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 2332-7790 |
ispartof | IEEE transactions on big data, 2021-12, Vol.7 (6), p.930-951 |
issn | 2332-7790 2332-7790 2372-2096 |
language | eng |
recordid | cdi_proquest_journals_2604919718 |
source | IEEE Electronic Library (IEL) |
subjects | Big data Data mining distributed graphs graph joins Iterative methods Joining processes MapReduce Organizations Pruning Social networking (online) Technical information |
title | DIGDUG: Scalable Separable Dense Graph Pruning and Join Operations in MapReduce |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T22%3A47%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=DIGDUG:%20Scalable%20Separable%20Dense%20Graph%20Pruning%20and%20Join%20Operations%20in%20MapReduce&rft.jtitle=IEEE%20transactions%20on%20big%20data&rft.au=Shukla,%20Manu&rft.date=2021-12-01&rft.volume=7&rft.issue=6&rft.spage=930&rft.epage=951&rft.pages=930-951&rft.issn=2332-7790&rft.eissn=2332-7790&rft.coden=ITBDAX&rft_id=info:doi/10.1109/TBDATA.2020.2983650&rft_dat=%3Cproquest_RIE%3E2604919718%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2604919718&rft_id=info:pmid/&rft_ieee_id=9050872&rfr_iscdi=true |