MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization

Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A≈WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on knowledge and data engineering 2017-10, Vol.30 (3)
Hauptverfasser: Kannan, Ramakrishnan, Ballard, Grey, Park, Haesun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 3
container_start_page
container_title IEEE transactions on knowledge and data engineering
container_volume 30
creator Kannan, Ramakrishnan
Ballard, Grey
Park, Haesun
description Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A≈WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient parallel algorithms to solve the problem for big data sets. The main contribution of this work is a new, high-performance parallel computational framework for a broad class of NMF algorithms that iteratively solves alternating non-negative least squares (NLS) subproblems for W and H. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). The framework is flexible and able to leverage a variety of NMF and NLS algorithms, including Multiplicative Update, Hierarchical Alternating Least Squares, and Block Principal Pivoting. Our implementation allows us to benchmark and compare different algorithms on massive dense and sparse data matrices of size that spans from few hundreds of millions to billions. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements. The code and the datasets used for conducting the experiments are available online.
format Article
fullrecord <record><control><sourceid>osti</sourceid><recordid>TN_cdi_osti_scitechconnect_1429224</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1429224</sourcerecordid><originalsourceid>FETCH-osti_scitechconnect_14292243</originalsourceid><addsrcrecordid>eNqNjE8LgjAAxUcUZH--w-g-cHOidrNI6pB0yEsXGXPayrbYRkWfPo0-QKf3ez8ebwA8HIYxIjjBw459ihENaDQGE2svvu_HUYw9cNofdihLi3wJUwX7smJWVDAz7Cae2lxhrQ1MWyeMYk6qBhX36gsw10qJpuOHgHvmjHzBjHGnjXx3UqsZGNWstWL-yylYZJvjeou0dbK0XDrBz7w_4a7ElCSE0OCv0Qe_3EKi</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization</title><source>IEEE Electronic Library (IEL)</source><creator>Kannan, Ramakrishnan ; Ballard, Grey ; Park, Haesun</creator><creatorcontrib>Kannan, Ramakrishnan ; Ballard, Grey ; Park, Haesun ; Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)</creatorcontrib><description>Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A≈WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient parallel algorithms to solve the problem for big data sets. The main contribution of this work is a new, high-performance parallel computational framework for a broad class of NMF algorithms that iteratively solves alternating non-negative least squares (NLS) subproblems for W and H. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). The framework is flexible and able to leverage a variety of NMF and NLS algorithms, including Multiplicative Update, Hierarchical Alternating Least Squares, and Block Principal Pivoting. Our implementation allows us to benchmark and compare different algorithms on massive dense and sparse data matrices of size that spans from few hundreds of millions to billions. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements. The code and the datasets used for conducting the experiments are available online.</description><identifier>ISSN: 1041-4347</identifier><identifier>EISSN: 1558-2191</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Algorithm design and analysis ; Analytical models ; Approximation algorithms ; Collaboration ; Computational modeling ; HPC ; MATHEMATICS AND COMPUTING ; MPI ; NMF ; Program processors ; Sparse matrices</subject><ispartof>IEEE transactions on knowledge and data engineering, 2017-10, Vol.30 (3)</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000000258524806 ; 0000000315578027</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885</link.rule.ids><backlink>$$Uhttps://www.osti.gov/servlets/purl/1429224$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Kannan, Ramakrishnan</creatorcontrib><creatorcontrib>Ballard, Grey</creatorcontrib><creatorcontrib>Park, Haesun</creatorcontrib><creatorcontrib>Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)</creatorcontrib><title>MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization</title><title>IEEE transactions on knowledge and data engineering</title><description>Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A≈WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient parallel algorithms to solve the problem for big data sets. The main contribution of this work is a new, high-performance parallel computational framework for a broad class of NMF algorithms that iteratively solves alternating non-negative least squares (NLS) subproblems for W and H. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). The framework is flexible and able to leverage a variety of NMF and NLS algorithms, including Multiplicative Update, Hierarchical Alternating Least Squares, and Block Principal Pivoting. Our implementation allows us to benchmark and compare different algorithms on massive dense and sparse data matrices of size that spans from few hundreds of millions to billions. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements. The code and the datasets used for conducting the experiments are available online.</description><subject>Algorithm design and analysis</subject><subject>Analytical models</subject><subject>Approximation algorithms</subject><subject>Collaboration</subject><subject>Computational modeling</subject><subject>HPC</subject><subject>MATHEMATICS AND COMPUTING</subject><subject>MPI</subject><subject>NMF</subject><subject>Program processors</subject><subject>Sparse matrices</subject><issn>1041-4347</issn><issn>1558-2191</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><recordid>eNqNjE8LgjAAxUcUZH--w-g-cHOidrNI6pB0yEsXGXPayrbYRkWfPo0-QKf3ez8ebwA8HIYxIjjBw459ihENaDQGE2svvu_HUYw9cNofdihLi3wJUwX7smJWVDAz7Cae2lxhrQ1MWyeMYk6qBhX36gsw10qJpuOHgHvmjHzBjHGnjXx3UqsZGNWstWL-yylYZJvjeou0dbK0XDrBz7w_4a7ElCSE0OCv0Qe_3EKi</recordid><startdate>20171030</startdate><enddate>20171030</enddate><creator>Kannan, Ramakrishnan</creator><creator>Ballard, Grey</creator><creator>Park, Haesun</creator><general>IEEE</general><scope>OIOZB</scope><scope>OTOTI</scope><orcidid>https://orcid.org/0000000258524806</orcidid><orcidid>https://orcid.org/0000000315578027</orcidid></search><sort><creationdate>20171030</creationdate><title>MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization</title><author>Kannan, Ramakrishnan ; Ballard, Grey ; Park, Haesun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-osti_scitechconnect_14292243</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Algorithm design and analysis</topic><topic>Analytical models</topic><topic>Approximation algorithms</topic><topic>Collaboration</topic><topic>Computational modeling</topic><topic>HPC</topic><topic>MATHEMATICS AND COMPUTING</topic><topic>MPI</topic><topic>NMF</topic><topic>Program processors</topic><topic>Sparse matrices</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kannan, Ramakrishnan</creatorcontrib><creatorcontrib>Ballard, Grey</creatorcontrib><creatorcontrib>Park, Haesun</creatorcontrib><creatorcontrib>Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)</creatorcontrib><collection>OSTI.GOV - Hybrid</collection><collection>OSTI.GOV</collection><jtitle>IEEE transactions on knowledge and data engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kannan, Ramakrishnan</au><au>Ballard, Grey</au><au>Park, Haesun</au><aucorp>Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization</atitle><jtitle>IEEE transactions on knowledge and data engineering</jtitle><date>2017-10-30</date><risdate>2017</risdate><volume>30</volume><issue>3</issue><issn>1041-4347</issn><eissn>1558-2191</eissn><abstract>Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A≈WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient parallel algorithms to solve the problem for big data sets. The main contribution of this work is a new, high-performance parallel computational framework for a broad class of NMF algorithms that iteratively solves alternating non-negative least squares (NLS) subproblems for W and H. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). The framework is flexible and able to leverage a variety of NMF and NLS algorithms, including Multiplicative Update, Hierarchical Alternating Least Squares, and Block Principal Pivoting. Our implementation allows us to benchmark and compare different algorithms on massive dense and sparse data matrices of size that spans from few hundreds of millions to billions. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements. The code and the datasets used for conducting the experiments are available online.</abstract><cop>United States</cop><pub>IEEE</pub><orcidid>https://orcid.org/0000000258524806</orcidid><orcidid>https://orcid.org/0000000315578027</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1041-4347
ispartof IEEE transactions on knowledge and data engineering, 2017-10, Vol.30 (3)
issn 1041-4347
1558-2191
language eng
recordid cdi_osti_scitechconnect_1429224
source IEEE Electronic Library (IEL)
subjects Algorithm design and analysis
Analytical models
Approximation algorithms
Collaboration
Computational modeling
HPC
MATHEMATICS AND COMPUTING
MPI
NMF
Program processors
Sparse matrices
title MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T17%3A34%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-osti&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MPI-FAUN:%20An%20MPI-Based%20Framework%20for%20Alternating-Updating%20Nonnegative%20Matrix%20Factorization&rft.jtitle=IEEE%20transactions%20on%20knowledge%20and%20data%20engineering&rft.au=Kannan,%20Ramakrishnan&rft.aucorp=Oak%20Ridge%20National%20Lab.%20(ORNL),%20Oak%20Ridge,%20TN%20(United%20States).%20Oak%20Ridge%20Leadership%20Computing%20Facility%20(OLCF)&rft.date=2017-10-30&rft.volume=30&rft.issue=3&rft.issn=1041-4347&rft.eissn=1558-2191&rft_id=info:doi/&rft_dat=%3Costi%3E1429224%3C/osti%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true