Reconstructing Householder vectors from Tall-Skinny QR

The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of parallel and distributed computing 2015-11, Vol.85 (C), p.3-31
Hauptverfasser: Ballard, G., Demmel, J., Grigori, L., Jacquelin, M., Knight, N., Nguyen, H.D.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 31
container_issue C
container_start_page 3
container_title Journal of parallel and distributed computing
container_volume 85
creator Ballard, G.
Demmel, J.
Grigori, L.
Jacquelin, M.
Knight, N.
Nguyen, H.D.
description The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development to support the new representation. Further, implicitly applying the orthogonal factor to the trailing matrix in the context of factoring a square matrix is more complicated and costly than with the Householder representation. We show how to perform TSQR and then reconstruct the Householder vector representation with the same asymptotic communication efficiency and little extra computational cost. We demonstrate the high performance and numerical stability of this algorithm both theoretically and empirically. The new Householder reconstruction algorithm allows us to design more efficient parallel QR algorithms, with significantly lower latency cost compared to Householder QR and lower bandwidth and latency costs compared with Communication-Avoiding QR (CAQR) algorithm. Experiments on supercomputers demonstrate the benefits of the communication cost improvements: in particular, our experiments show substantial improvements over tuned library implementations for tall-and-skinny matrices. We also provide algorithmic improvements to the Householder QR and CAQR algorithms, and we investigate several alternatives to the Householder reconstruction algorithm that sacrifice guarantees on numerical stability in some cases in order to obtain higher performance. •We reconstruct Householder vectors representing the Q-factor from Tall-Skinny QR.•Our approach has the same asymptotic communication efficiency as TSQR.•Additionally, it enables more communication-efficient parallel QR algorithms.•We also provide algorithmic improvements to the Householder QR and CAQR algorithms.
doi_str_mv 10.1016/j.jpdc.2015.06.003
format Article
fullrecord <record><control><sourceid>proquest_osti_</sourceid><recordid>TN_cdi_osti_scitechconnect_1236219</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S074373151500101X</els_id><sourcerecordid>1778036077</sourcerecordid><originalsourceid>FETCH-LOGICAL-c508t-438e8eb2c3f25a3f4905696fdf0b3aa9ca281a070d66333920fdfe7d51f7b0123</originalsourceid><addsrcrecordid>eNp9kE9LxDAQxYMouP75Ap6KJz20Tpo2ScGLiLrCgriu55BNp27WbrMm3QW_vSkVj54GZn7v8eYRckEho0D5zTpbb2uT5UDLDHgGwA7IhELFU5CFPCQTEAVLBaPlMTkJYQ1AaSnkhPA5GteF3u9Mb7uPZOp2AVeurdEnezS98yFpvNskC9226dun7brv5HV-Ro4a3QY8_52n5P3xYXE_TWcvT8_3d7PUlCD7tGASJS5zw5q81KwpKih5xZu6gSXTujI6l1SDgJpzxliVQzyhqEvaiCXQnJ2Sy9HXhd6qYGyPZhUDdzGainee0ypC1yO00q3aervR_ls5bdX0bqaGXXQqqJDlnkb2amS33n3tMPRqY4PBttUdxtcVFUIC4yBERPMRNd6F4LH586aghtbVWg2tq6F1BVzF1qPodhRhbGVv0Q-hsTNYWz9krp39T_4D1W-IxA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1778036077</pqid></control><display><type>article</type><title>Reconstructing Householder vectors from Tall-Skinny QR</title><source>ScienceDirect Journals (5 years ago - present)</source><creator>Ballard, G. ; Demmel, J. ; Grigori, L. ; Jacquelin, M. ; Knight, N. ; Nguyen, H.D.</creator><creatorcontrib>Ballard, G. ; Demmel, J. ; Grigori, L. ; Jacquelin, M. ; Knight, N. ; Nguyen, H.D. ; Sandia National Lab. (SNL-CA), Livermore, CA (United States)</creatorcontrib><description>The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development to support the new representation. Further, implicitly applying the orthogonal factor to the trailing matrix in the context of factoring a square matrix is more complicated and costly than with the Householder representation. We show how to perform TSQR and then reconstruct the Householder vector representation with the same asymptotic communication efficiency and little extra computational cost. We demonstrate the high performance and numerical stability of this algorithm both theoretically and empirically. The new Householder reconstruction algorithm allows us to design more efficient parallel QR algorithms, with significantly lower latency cost compared to Householder QR and lower bandwidth and latency costs compared with Communication-Avoiding QR (CAQR) algorithm. Experiments on supercomputers demonstrate the benefits of the communication cost improvements: in particular, our experiments show substantial improvements over tuned library implementations for tall-and-skinny matrices. We also provide algorithmic improvements to the Householder QR and CAQR algorithms, and we investigate several alternatives to the Householder reconstruction algorithm that sacrifice guarantees on numerical stability in some cases in order to obtain higher performance. •We reconstruct Householder vectors representing the Q-factor from Tall-Skinny QR.•Our approach has the same asymptotic communication efficiency as TSQR.•Additionally, it enables more communication-efficient parallel QR algorithms.•We also provide algorithmic improvements to the Householder QR and CAQR algorithms.</description><identifier>ISSN: 0743-7315</identifier><identifier>EISSN: 1096-0848</identifier><identifier>DOI: 10.1016/j.jpdc.2015.06.003</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Algorithms ; Asymptotic properties ; Communication-avoiding algorithms ; Computer Science ; Dense linear algebra ; Distributed, Parallel, and Cluster Computing ; Mathematical analysis ; Mathematical models ; MATHEMATICS AND COMPUTING ; Numerical stability ; QR decomposition ; Reconstruction ; Representations ; Vectors (mathematics)</subject><ispartof>Journal of parallel and distributed computing, 2015-11, Vol.85 (C), p.3-31</ispartof><rights>2015 Elsevier Inc.</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c508t-438e8eb2c3f25a3f4905696fdf0b3aa9ca281a070d66333920fdfe7d51f7b0123</citedby><cites>FETCH-LOGICAL-c508t-438e8eb2c3f25a3f4905696fdf0b3aa9ca281a070d66333920fdfe7d51f7b0123</cites><orcidid>0000-0002-5880-1076</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.jpdc.2015.06.003$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>230,314,780,784,885,3548,27923,27924,45994</link.rule.ids><backlink>$$Uhttps://inria.hal.science/hal-01241785$$DView record in HAL$$Hfree_for_read</backlink><backlink>$$Uhttps://www.osti.gov/servlets/purl/1236219$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Ballard, G.</creatorcontrib><creatorcontrib>Demmel, J.</creatorcontrib><creatorcontrib>Grigori, L.</creatorcontrib><creatorcontrib>Jacquelin, M.</creatorcontrib><creatorcontrib>Knight, N.</creatorcontrib><creatorcontrib>Nguyen, H.D.</creatorcontrib><creatorcontrib>Sandia National Lab. (SNL-CA), Livermore, CA (United States)</creatorcontrib><title>Reconstructing Householder vectors from Tall-Skinny QR</title><title>Journal of parallel and distributed computing</title><description>The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development to support the new representation. Further, implicitly applying the orthogonal factor to the trailing matrix in the context of factoring a square matrix is more complicated and costly than with the Householder representation. We show how to perform TSQR and then reconstruct the Householder vector representation with the same asymptotic communication efficiency and little extra computational cost. We demonstrate the high performance and numerical stability of this algorithm both theoretically and empirically. The new Householder reconstruction algorithm allows us to design more efficient parallel QR algorithms, with significantly lower latency cost compared to Householder QR and lower bandwidth and latency costs compared with Communication-Avoiding QR (CAQR) algorithm. Experiments on supercomputers demonstrate the benefits of the communication cost improvements: in particular, our experiments show substantial improvements over tuned library implementations for tall-and-skinny matrices. We also provide algorithmic improvements to the Householder QR and CAQR algorithms, and we investigate several alternatives to the Householder reconstruction algorithm that sacrifice guarantees on numerical stability in some cases in order to obtain higher performance. •We reconstruct Householder vectors representing the Q-factor from Tall-Skinny QR.•Our approach has the same asymptotic communication efficiency as TSQR.•Additionally, it enables more communication-efficient parallel QR algorithms.•We also provide algorithmic improvements to the Householder QR and CAQR algorithms.</description><subject>Algorithms</subject><subject>Asymptotic properties</subject><subject>Communication-avoiding algorithms</subject><subject>Computer Science</subject><subject>Dense linear algebra</subject><subject>Distributed, Parallel, and Cluster Computing</subject><subject>Mathematical analysis</subject><subject>Mathematical models</subject><subject>MATHEMATICS AND COMPUTING</subject><subject>Numerical stability</subject><subject>QR decomposition</subject><subject>Reconstruction</subject><subject>Representations</subject><subject>Vectors (mathematics)</subject><issn>0743-7315</issn><issn>1096-0848</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNp9kE9LxDAQxYMouP75Ap6KJz20Tpo2ScGLiLrCgriu55BNp27WbrMm3QW_vSkVj54GZn7v8eYRckEho0D5zTpbb2uT5UDLDHgGwA7IhELFU5CFPCQTEAVLBaPlMTkJYQ1AaSnkhPA5GteF3u9Mb7uPZOp2AVeurdEnezS98yFpvNskC9226dun7brv5HV-Ro4a3QY8_52n5P3xYXE_TWcvT8_3d7PUlCD7tGASJS5zw5q81KwpKih5xZu6gSXTujI6l1SDgJpzxliVQzyhqEvaiCXQnJ2Sy9HXhd6qYGyPZhUDdzGainee0ypC1yO00q3aervR_ls5bdX0bqaGXXQqqJDlnkb2amS33n3tMPRqY4PBttUdxtcVFUIC4yBERPMRNd6F4LH586aghtbVWg2tq6F1BVzF1qPodhRhbGVv0Q-hsTNYWz9krp39T_4D1W-IxA</recordid><startdate>20151101</startdate><enddate>20151101</enddate><creator>Ballard, G.</creator><creator>Demmel, J.</creator><creator>Grigori, L.</creator><creator>Jacquelin, M.</creator><creator>Knight, N.</creator><creator>Nguyen, H.D.</creator><general>Elsevier Inc</general><general>Elsevier</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>1XC</scope><scope>OIOZB</scope><scope>OTOTI</scope><orcidid>https://orcid.org/0000-0002-5880-1076</orcidid></search><sort><creationdate>20151101</creationdate><title>Reconstructing Householder vectors from Tall-Skinny QR</title><author>Ballard, G. ; Demmel, J. ; Grigori, L. ; Jacquelin, M. ; Knight, N. ; Nguyen, H.D.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c508t-438e8eb2c3f25a3f4905696fdf0b3aa9ca281a070d66333920fdfe7d51f7b0123</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Algorithms</topic><topic>Asymptotic properties</topic><topic>Communication-avoiding algorithms</topic><topic>Computer Science</topic><topic>Dense linear algebra</topic><topic>Distributed, Parallel, and Cluster Computing</topic><topic>Mathematical analysis</topic><topic>Mathematical models</topic><topic>MATHEMATICS AND COMPUTING</topic><topic>Numerical stability</topic><topic>QR decomposition</topic><topic>Reconstruction</topic><topic>Representations</topic><topic>Vectors (mathematics)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ballard, G.</creatorcontrib><creatorcontrib>Demmel, J.</creatorcontrib><creatorcontrib>Grigori, L.</creatorcontrib><creatorcontrib>Jacquelin, M.</creatorcontrib><creatorcontrib>Knight, N.</creatorcontrib><creatorcontrib>Nguyen, H.D.</creatorcontrib><creatorcontrib>Sandia National Lab. (SNL-CA), Livermore, CA (United States)</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>OSTI.GOV - Hybrid</collection><collection>OSTI.GOV</collection><jtitle>Journal of parallel and distributed computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ballard, G.</au><au>Demmel, J.</au><au>Grigori, L.</au><au>Jacquelin, M.</au><au>Knight, N.</au><au>Nguyen, H.D.</au><aucorp>Sandia National Lab. (SNL-CA), Livermore, CA (United States)</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Reconstructing Householder vectors from Tall-Skinny QR</atitle><jtitle>Journal of parallel and distributed computing</jtitle><date>2015-11-01</date><risdate>2015</risdate><volume>85</volume><issue>C</issue><spage>3</spage><epage>31</epage><pages>3-31</pages><issn>0743-7315</issn><eissn>1096-0848</eissn><abstract>The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development to support the new representation. Further, implicitly applying the orthogonal factor to the trailing matrix in the context of factoring a square matrix is more complicated and costly than with the Householder representation. We show how to perform TSQR and then reconstruct the Householder vector representation with the same asymptotic communication efficiency and little extra computational cost. We demonstrate the high performance and numerical stability of this algorithm both theoretically and empirically. The new Householder reconstruction algorithm allows us to design more efficient parallel QR algorithms, with significantly lower latency cost compared to Householder QR and lower bandwidth and latency costs compared with Communication-Avoiding QR (CAQR) algorithm. Experiments on supercomputers demonstrate the benefits of the communication cost improvements: in particular, our experiments show substantial improvements over tuned library implementations for tall-and-skinny matrices. We also provide algorithmic improvements to the Householder QR and CAQR algorithms, and we investigate several alternatives to the Householder reconstruction algorithm that sacrifice guarantees on numerical stability in some cases in order to obtain higher performance. •We reconstruct Householder vectors representing the Q-factor from Tall-Skinny QR.•Our approach has the same asymptotic communication efficiency as TSQR.•Additionally, it enables more communication-efficient parallel QR algorithms.•We also provide algorithmic improvements to the Householder QR and CAQR algorithms.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><doi>10.1016/j.jpdc.2015.06.003</doi><tpages>29</tpages><orcidid>https://orcid.org/0000-0002-5880-1076</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0743-7315
ispartof Journal of parallel and distributed computing, 2015-11, Vol.85 (C), p.3-31
issn 0743-7315
1096-0848
language eng
recordid cdi_osti_scitechconnect_1236219
source ScienceDirect Journals (5 years ago - present)
subjects Algorithms
Asymptotic properties
Communication-avoiding algorithms
Computer Science
Dense linear algebra
Distributed, Parallel, and Cluster Computing
Mathematical analysis
Mathematical models
MATHEMATICS AND COMPUTING
Numerical stability
QR decomposition
Reconstruction
Representations
Vectors (mathematics)
title Reconstructing Householder vectors from Tall-Skinny QR
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T23%3A26%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_osti_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Reconstructing%20Householder%20vectors%20from%20Tall-Skinny%20QR&rft.jtitle=Journal%20of%20parallel%20and%20distributed%20computing&rft.au=Ballard,%20G.&rft.aucorp=Sandia%20National%20Lab.%20(SNL-CA),%20Livermore,%20CA%20(United%20States)&rft.date=2015-11-01&rft.volume=85&rft.issue=C&rft.spage=3&rft.epage=31&rft.pages=3-31&rft.issn=0743-7315&rft.eissn=1096-0848&rft_id=info:doi/10.1016/j.jpdc.2015.06.003&rft_dat=%3Cproquest_osti_%3E1778036077%3C/proquest_osti_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1778036077&rft_id=info:pmid/&rft_els_id=S074373151500101X&rfr_iscdi=true