Cache Optimization for Coarse Grain Task Parallel Processing Using Inter-Array Padding

The wide use of multiprocessor system has been making automatic parallelizing compilers more important. To improve the performance of multiprocessor system more by compiler, multigrain parallelization is important. In multigrain parallelization, coarse grain task parallelism among loops and subrouti...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ishizaka, Kazuhisa, Obata, Motoki, Kasahara, Hironori
Format:	Buchkapitel
Sprache:	eng
Schlagworte:	Applied sciences Computer science control theory systems Computer systems and distributed systems. User interface Exact sciences and technology Padding Paral Programming languages Software Timothy
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	76
container_issue
container_start_page	64
container_title
container_volume	2958
creator	Ishizaka, Kazuhisa Obata, Motoki Kasahara, Hironori
description	The wide use of multiprocessor system has been making automatic parallelizing compilers more important. To improve the performance of multiprocessor system more by compiler, multigrain parallelization is important. In multigrain parallelization, coarse grain task parallelism among loops and subroutines and near fine grain parallelism among statements are used in addition to the traditional loop parallelism. In addition, locality optimization to use cache effectively is also important for the performance improvement. This paper describes inter-array padding to minimize cache conflict misses among macro-tasks with data localization scheme which decomposes loops sharing the same arrays to fit cache size and executes the decomposed loops consecutively on the same processor. In the performance evaluation on Sun Ultra 80(4pe), OSCAR compiler on which the proposed scheme is implemented gave us 2.5 times speedup against the maximum performance of Sun Forte compiler automatic loop parallelization at the average of SPEC CFP95 tomcatv, swim hydro2d and turb3d programs. Also, OSCAR compiler showed 2.1 times speedup on IBM RS/6000 44p-270(4pe) against XLF compiler.
doi_str_mv	10.1007/978-3-540-24644-2_5
format	Book Chapter
fullrecord	<record><control><sourceid>proquest_pasca</sourceid><recordid>TN_cdi_pascalfrancis_primary_15758886</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>EBC3088278_11_72</sourcerecordid><originalsourceid>FETCH-LOGICAL-c431t-bfaa94b5018ca4537974bdcd3aaf9f0ecfb38c677cacc76664917a269471ebfb3</originalsourceid><addsrcrecordid>eNpFkLlOxDAQhs0pVrBPQJOG0uCxHR8lWnFJSFAArTXxOhAIyWKHYnl6hgUJF7b0H6Pxx9gxiFMQwp5567jitRZcaqM1l6HeYnNSFWkbSW6zGRgArpT2O_8egPdil82EEpJ7q9U-m3mKOGGsPGDzUl4FHXAefD1jTwuML6m6W03de_eFUzcOVTvmajFiLqm6ytgN1QOWt-oeM_Z96qv7PMZUSjc8V4-b-2aYUubnOeOaUsslaUdsr8W-pPnfe8geLy8eFtf89u7qZnF-y6NWMPGmRfS6qWmdiLpWlhZulnGpEFvfihTbRrlorI0YozXGaA8WpfHaQmrIPGQnv3NXWCL2bcYhdiWscveOeR2gtrVzzlAOfnOFrOE55dCM41sJIMIP70D0ggrEL2zgBuJNHfk3O48fn6lMIf2UYhomIhFfcEXfLkEJ5yTVAYKV6hsOu33O</addsrcrecordid><sourcetype>Index Database</sourcetype><iscdi>true</iscdi><recordtype>book_chapter</recordtype><pqid>EBC3088278_11_72</pqid></control><display><type>book_chapter</type><title>Cache Optimization for Coarse Grain Task Parallel Processing Using Inter-Array Padding</title><source>Springer Books</source><creator>Ishizaka, Kazuhisa ; Obata, Motoki ; Kasahara, Hironori</creator><contributor>Rauchwerger, Lawrence</contributor><creatorcontrib>Ishizaka, Kazuhisa ; Obata, Motoki ; Kasahara, Hironori ; Rauchwerger, Lawrence</creatorcontrib><description>The wide use of multiprocessor system has been making automatic parallelizing compilers more important. To improve the performance of multiprocessor system more by compiler, multigrain parallelization is important. In multigrain parallelization, coarse grain task parallelism among loops and subroutines and near fine grain parallelism among statements are used in addition to the traditional loop parallelism. In addition, locality optimization to use cache effectively is also important for the performance improvement. This paper describes inter-array padding to minimize cache conflict misses among macro-tasks with data localization scheme which decomposes loops sharing the same arrays to fit cache size and executes the decomposed loops consecutively on the same processor. In the performance evaluation on Sun Ultra 80(4pe), OSCAR compiler on which the proposed scheme is implemented gave us 2.5 times speedup against the maximum performance of Sun Forte compiler automatic loop parallelization at the average of SPEC CFP95 tomcatv, swim hydro2d and turb3d programs. Also, OSCAR compiler showed 2.1 times speedup on IBM RS/6000 44p-270(4pe) against XLF compiler.</description><identifier>ISSN: 0302-9743</identifier><identifier>ISBN: 9783540211990</identifier><identifier>ISBN: 3540211993</identifier><identifier>EISSN: 1611-3349</identifier><identifier>EISBN: 9783540246442</identifier><identifier>EISBN: 3540246444</identifier><identifier>DOI: 10.1007/978-3-540-24644-2_5</identifier><identifier>OCLC: 934980672</identifier><identifier>LCCallNum: QA76.76.C65</identifier><language>eng</language><publisher>Germany: Springer Berlin / Heidelberg</publisher><subject>Applied sciences ; Computer science; control theory; systems ; Computer systems and distributed systems. User interface ; Exact sciences and technology ; Padding ; Paral ; Programming languages ; Software ; Timothy</subject><ispartof>Languages and Compilers for Parallel Computing, 2004, Vol.2958, p.64-76</ispartof><rights>Springer-Verlag Berlin Heidelberg 2004</rights><rights>2004 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c431t-bfaa94b5018ca4537974bdcd3aaf9f0ecfb38c677cacc76664917a269471ebfb3</citedby><relation>Lecture Notes in Computer Science</relation></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttps://ebookcentral.proquest.com/covers/3088278-l.jpg</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/978-3-540-24644-2_5$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/978-3-540-24644-2_5$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>309,310,775,776,780,785,786,789,4036,4037,27902,38232,41418,42487</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=15758886$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><contributor>Rauchwerger, Lawrence</contributor><creatorcontrib>Ishizaka, Kazuhisa</creatorcontrib><creatorcontrib>Obata, Motoki</creatorcontrib><creatorcontrib>Kasahara, Hironori</creatorcontrib><title>Cache Optimization for Coarse Grain Task Parallel Processing Using Inter-Array Padding</title><title>Languages and Compilers for Parallel Computing</title><description>The wide use of multiprocessor system has been making automatic parallelizing compilers more important. To improve the performance of multiprocessor system more by compiler, multigrain parallelization is important. In multigrain parallelization, coarse grain task parallelism among loops and subroutines and near fine grain parallelism among statements are used in addition to the traditional loop parallelism. In addition, locality optimization to use cache effectively is also important for the performance improvement. This paper describes inter-array padding to minimize cache conflict misses among macro-tasks with data localization scheme which decomposes loops sharing the same arrays to fit cache size and executes the decomposed loops consecutively on the same processor. In the performance evaluation on Sun Ultra 80(4pe), OSCAR compiler on which the proposed scheme is implemented gave us 2.5 times speedup against the maximum performance of Sun Forte compiler automatic loop parallelization at the average of SPEC CFP95 tomcatv, swim hydro2d and turb3d programs. Also, OSCAR compiler showed 2.1 times speedup on IBM RS/6000 44p-270(4pe) against XLF compiler.</description><subject>Applied sciences</subject><subject>Computer science; control theory; systems</subject><subject>Computer systems and distributed systems. User interface</subject><subject>Exact sciences and technology</subject><subject>Padding</subject><subject>Paral</subject><subject>Programming languages</subject><subject>Software</subject><subject>Timothy</subject><issn>0302-9743</issn><issn>1611-3349</issn><isbn>9783540211990</isbn><isbn>3540211993</isbn><isbn>9783540246442</isbn><isbn>3540246444</isbn><fulltext>true</fulltext><rsrctype>book_chapter</rsrctype><creationdate>2004</creationdate><recordtype>book_chapter</recordtype><recordid>eNpFkLlOxDAQhs0pVrBPQJOG0uCxHR8lWnFJSFAArTXxOhAIyWKHYnl6hgUJF7b0H6Pxx9gxiFMQwp5567jitRZcaqM1l6HeYnNSFWkbSW6zGRgArpT2O_8egPdil82EEpJ7q9U-m3mKOGGsPGDzUl4FHXAefD1jTwuML6m6W03de_eFUzcOVTvmajFiLqm6ytgN1QOWt-oeM_Z96qv7PMZUSjc8V4-b-2aYUubnOeOaUsslaUdsr8W-pPnfe8geLy8eFtf89u7qZnF-y6NWMPGmRfS6qWmdiLpWlhZulnGpEFvfihTbRrlorI0YozXGaA8WpfHaQmrIPGQnv3NXWCL2bcYhdiWscveOeR2gtrVzzlAOfnOFrOE55dCM41sJIMIP70D0ggrEL2zgBuJNHfk3O48fn6lMIf2UYhomIhFfcEXfLkEJ5yTVAYKV6hsOu33O</recordid><startdate>2004</startdate><enddate>2004</enddate><creator>Ishizaka, Kazuhisa</creator><creator>Obata, Motoki</creator><creator>Kasahara, Hironori</creator><general>Springer Berlin / Heidelberg</general><general>Springer Berlin Heidelberg</general><general>Springer</general><scope>FFUUA</scope><scope>IQODW</scope></search><sort><creationdate>2004</creationdate><title>Cache Optimization for Coarse Grain Task Parallel Processing Using Inter-Array Padding</title><author>Ishizaka, Kazuhisa ; Obata, Motoki ; Kasahara, Hironori</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c431t-bfaa94b5018ca4537974bdcd3aaf9f0ecfb38c677cacc76664917a269471ebfb3</frbrgroupid><rsrctype>book_chapters</rsrctype><prefilter>book_chapters</prefilter><language>eng</language><creationdate>2004</creationdate><topic>Applied sciences</topic><topic>Computer science; control theory; systems</topic><topic>Computer systems and distributed systems. User interface</topic><topic>Exact sciences and technology</topic><topic>Padding</topic><topic>Paral</topic><topic>Programming languages</topic><topic>Software</topic><topic>Timothy</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ishizaka, Kazuhisa</creatorcontrib><creatorcontrib>Obata, Motoki</creatorcontrib><creatorcontrib>Kasahara, Hironori</creatorcontrib><collection>ProQuest Ebook Central - Book Chapters - Demo use only</collection><collection>Pascal-Francis</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ishizaka, Kazuhisa</au><au>Obata, Motoki</au><au>Kasahara, Hironori</au><au>Rauchwerger, Lawrence</au><format>book</format><genre>bookitem</genre><ristype>CHAP</ristype><atitle>Cache Optimization for Coarse Grain Task Parallel Processing Using Inter-Array Padding</atitle><btitle>Languages and Compilers for Parallel Computing</btitle><seriestitle>Lecture Notes in Computer Science</seriestitle><date>2004</date><risdate>2004</risdate><volume>2958</volume><spage>64</spage><epage>76</epage><pages>64-76</pages><issn>0302-9743</issn><eissn>1611-3349</eissn><isbn>9783540211990</isbn><isbn>3540211993</isbn><eisbn>9783540246442</eisbn><eisbn>3540246444</eisbn><abstract>The wide use of multiprocessor system has been making automatic parallelizing compilers more important. To improve the performance of multiprocessor system more by compiler, multigrain parallelization is important. In multigrain parallelization, coarse grain task parallelism among loops and subroutines and near fine grain parallelism among statements are used in addition to the traditional loop parallelism. In addition, locality optimization to use cache effectively is also important for the performance improvement. This paper describes inter-array padding to minimize cache conflict misses among macro-tasks with data localization scheme which decomposes loops sharing the same arrays to fit cache size and executes the decomposed loops consecutively on the same processor. In the performance evaluation on Sun Ultra 80(4pe), OSCAR compiler on which the proposed scheme is implemented gave us 2.5 times speedup against the maximum performance of Sun Forte compiler automatic loop parallelization at the average of SPEC CFP95 tomcatv, swim hydro2d and turb3d programs. Also, OSCAR compiler showed 2.1 times speedup on IBM RS/6000 44p-270(4pe) against XLF compiler.</abstract><cop>Germany</cop><pub>Springer Berlin / Heidelberg</pub><doi>10.1007/978-3-540-24644-2_5</doi><oclcid>934980672</oclcid><tpages>13</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0302-9743
ispartof	Languages and Compilers for Parallel Computing, 2004, Vol.2958, p.64-76
issn	0302-9743 1611-3349
language	eng
recordid	cdi_pascalfrancis_primary_15758886
source	Springer Books
subjects	Applied sciences Computer science control theory systems Computer systems and distributed systems. User interface Exact sciences and technology Padding Paral Programming languages Software Timothy
title	Cache Optimization for Coarse Grain Task Parallel Processing Using Inter-Array Padding
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T06%3A18%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pasca&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=bookitem&rft.atitle=Cache%20Optimization%20for%20Coarse%20Grain%20Task%20Parallel%20Processing%20Using%20Inter-Array%20Padding&rft.btitle=Languages%20and%20Compilers%20for%20Parallel%20Computing&rft.au=Ishizaka,%20Kazuhisa&rft.date=2004&rft.volume=2958&rft.spage=64&rft.epage=76&rft.pages=64-76&rft.issn=0302-9743&rft.eissn=1611-3349&rft.isbn=9783540211990&rft.isbn_list=3540211993&rft_id=info:doi/10.1007/978-3-540-24644-2_5&rft_dat=%3Cproquest_pasca%3EEBC3088278_11_72%3C/proquest_pasca%3E%3Curl%3E%3C/url%3E&rft.eisbn=9783540246442&rft.eisbn_list=3540246444&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=EBC3088278_11_72&rft_id=info:pmid/&rfr_iscdi=true