InferSpark: Statistical Inference at Scale
The Apache Spark stack has enabled fast large-scale data processing. Despite a rich library of statistical models and inference algorithms, it does not give domain users the ability to develop their own models. The emergence of probabilistic programming languages has showed the promise of developing...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Zhao, Zhuoyue Pei, Jialing Lo, Eric Zhu, Kenny Q Liu, Chris |
description | The Apache Spark stack has enabled fast large-scale data processing. Despite
a rich library of statistical models and inference algorithms, it does not give
domain users the ability to develop their own models. The emergence of
probabilistic programming languages has showed the promise of developing
sophisticated probabilistic models in a succinct and programmatic way. These
frameworks have the potential of automatically generating inference algorithms
for the user defined models and answering various statistical queries about the
model. It is a perfect time to unite these two great directions to produce a
programmable big data analysis framework. We thus propose, InferSpark, a
probabilistic programming framework on top of Apache Spark. Efficient
statistical inference can be easily implemented on this framework and inference
process can leverage the distributed main memory processing power of Spark.
This framework makes statistical inference on big data possible and speed up
the penetration of probabilistic programming into the data engineering domain. |
doi_str_mv | 10.48550/arxiv.1707.02047 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1707_02047</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1707_02047</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-86413ed7d999179f65aaab704e59bb624d58cba67f67b5d24a28a04278eff80d3</originalsourceid><addsrcrecordid>eNotjr0KwjAYRbM4SPUBnOwstKZpki9xE_GnUHCoe_nSJFD8QWIRfXtrdbpwuBwOIbOMplwJQZcYXu0zzYBCShnlMCaL4uZdqO4Yzqu46rBrH13b4CUeuLs1LsYurnriJmTk8fJw0_9G5LTbnjaHpDzui826TFACJEryLHcWrNY6A-2lQEQDlDuhjZGMW6Ea01-9BCMs48gUUs5AOe8VtXlE5j_tEFvfQ3vF8K6_0fUQnX8AM607Sg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>InferSpark: Statistical Inference at Scale</title><source>arXiv.org</source><creator>Zhao, Zhuoyue ; Pei, Jialing ; Lo, Eric ; Zhu, Kenny Q ; Liu, Chris</creator><creatorcontrib>Zhao, Zhuoyue ; Pei, Jialing ; Lo, Eric ; Zhu, Kenny Q ; Liu, Chris</creatorcontrib><description>The Apache Spark stack has enabled fast large-scale data processing. Despite
a rich library of statistical models and inference algorithms, it does not give
domain users the ability to develop their own models. The emergence of
probabilistic programming languages has showed the promise of developing
sophisticated probabilistic models in a succinct and programmatic way. These
frameworks have the potential of automatically generating inference algorithms
for the user defined models and answering various statistical queries about the
model. It is a perfect time to unite these two great directions to produce a
programmable big data analysis framework. We thus propose, InferSpark, a
probabilistic programming framework on top of Apache Spark. Efficient
statistical inference can be easily implemented on this framework and inference
process can leverage the distributed main memory processing power of Spark.
This framework makes statistical inference on big data possible and speed up
the penetration of probabilistic programming into the data engineering domain.</description><identifier>DOI: 10.48550/arxiv.1707.02047</identifier><language>eng</language><subject>Computer Science - Databases</subject><creationdate>2017-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1707.02047$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1707.02047$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhao, Zhuoyue</creatorcontrib><creatorcontrib>Pei, Jialing</creatorcontrib><creatorcontrib>Lo, Eric</creatorcontrib><creatorcontrib>Zhu, Kenny Q</creatorcontrib><creatorcontrib>Liu, Chris</creatorcontrib><title>InferSpark: Statistical Inference at Scale</title><description>The Apache Spark stack has enabled fast large-scale data processing. Despite
a rich library of statistical models and inference algorithms, it does not give
domain users the ability to develop their own models. The emergence of
probabilistic programming languages has showed the promise of developing
sophisticated probabilistic models in a succinct and programmatic way. These
frameworks have the potential of automatically generating inference algorithms
for the user defined models and answering various statistical queries about the
model. It is a perfect time to unite these two great directions to produce a
programmable big data analysis framework. We thus propose, InferSpark, a
probabilistic programming framework on top of Apache Spark. Efficient
statistical inference can be easily implemented on this framework and inference
process can leverage the distributed main memory processing power of Spark.
This framework makes statistical inference on big data possible and speed up
the penetration of probabilistic programming into the data engineering domain.</description><subject>Computer Science - Databases</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotjr0KwjAYRbM4SPUBnOwstKZpki9xE_GnUHCoe_nSJFD8QWIRfXtrdbpwuBwOIbOMplwJQZcYXu0zzYBCShnlMCaL4uZdqO4Yzqu46rBrH13b4CUeuLs1LsYurnriJmTk8fJw0_9G5LTbnjaHpDzui826TFACJEryLHcWrNY6A-2lQEQDlDuhjZGMW6Ea01-9BCMs48gUUs5AOe8VtXlE5j_tEFvfQ3vF8K6_0fUQnX8AM607Sg</recordid><startdate>20170707</startdate><enddate>20170707</enddate><creator>Zhao, Zhuoyue</creator><creator>Pei, Jialing</creator><creator>Lo, Eric</creator><creator>Zhu, Kenny Q</creator><creator>Liu, Chris</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20170707</creationdate><title>InferSpark: Statistical Inference at Scale</title><author>Zhao, Zhuoyue ; Pei, Jialing ; Lo, Eric ; Zhu, Kenny Q ; Liu, Chris</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-86413ed7d999179f65aaab704e59bb624d58cba67f67b5d24a28a04278eff80d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Computer Science - Databases</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhao, Zhuoyue</creatorcontrib><creatorcontrib>Pei, Jialing</creatorcontrib><creatorcontrib>Lo, Eric</creatorcontrib><creatorcontrib>Zhu, Kenny Q</creatorcontrib><creatorcontrib>Liu, Chris</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhao, Zhuoyue</au><au>Pei, Jialing</au><au>Lo, Eric</au><au>Zhu, Kenny Q</au><au>Liu, Chris</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>InferSpark: Statistical Inference at Scale</atitle><date>2017-07-07</date><risdate>2017</risdate><abstract>The Apache Spark stack has enabled fast large-scale data processing. Despite
a rich library of statistical models and inference algorithms, it does not give
domain users the ability to develop their own models. The emergence of
probabilistic programming languages has showed the promise of developing
sophisticated probabilistic models in a succinct and programmatic way. These
frameworks have the potential of automatically generating inference algorithms
for the user defined models and answering various statistical queries about the
model. It is a perfect time to unite these two great directions to produce a
programmable big data analysis framework. We thus propose, InferSpark, a
probabilistic programming framework on top of Apache Spark. Efficient
statistical inference can be easily implemented on this framework and inference
process can leverage the distributed main memory processing power of Spark.
This framework makes statistical inference on big data possible and speed up
the penetration of probabilistic programming into the data engineering domain.</abstract><doi>10.48550/arxiv.1707.02047</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.1707.02047 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_1707_02047 |
source | arXiv.org |
subjects | Computer Science - Databases |
title | InferSpark: Statistical Inference at Scale |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T21%3A48%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=InferSpark:%20Statistical%20Inference%20at%20Scale&rft.au=Zhao,%20Zhuoyue&rft.date=2017-07-07&rft_id=info:doi/10.48550/arxiv.1707.02047&rft_dat=%3Carxiv_GOX%3E1707_02047%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |