Automatic code mapping on an intelligent memory architecture

This paper presents an algorithm to automatically map code on a generic intelligent memory system that consists of a high-end host processor and a simpler memory processor. To achieve high performance with this type of architecture, the code needs to be partitioned and scheduled such that each secti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on computers 2001-11, Vol.50 (11), p.1248-1266
Hauptverfasser: Yan Solihin, Jaejin Lee, Torrellas, J.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1266
container_issue 11
container_start_page 1248
container_title IEEE transactions on computers
container_volume 50
creator Yan Solihin
Jaejin Lee
Torrellas, J.
description This paper presents an algorithm to automatically map code on a generic intelligent memory system that consists of a high-end host processor and a simpler memory processor. To achieve high performance with this type of architecture, the code needs to be partitioned and scheduled such that each section is assigned to the processor on which it runs most efficiently. In addition, the two processors should overlap their execution as much as possible. With our algorithm, applications are mapped fully automatically using both static and dynamic information. Using a set of standard applications and a simulated architecture, we obtain average speedups of 1.7 for numerical applications and 1.2 for nonnumerical applications over a single host with plain memory. The speedups are very close and often higher than ideal speedups on a more expensive multiprocessor system composed of two identical host processors. Our work shows that heterogeneity can be cost-effectively exploited and represents one step toward effectively mapping code on heterogeneous intelligent memory systems.
doi_str_mv 10.1109/12.966498
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_27033842</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>966498</ieee_id><sourcerecordid>27033842</sourcerecordid><originalsourceid>FETCH-LOGICAL-c367t-bd6ae9938887052b1c8e57ea68ebce48cbd660ab89f6d515a779aae9124c8e3d3</originalsourceid><addsrcrecordid>eNqF0T1PwzAQBmALgUQpDKxMEQOIIcV24i-Jpar4kiqxwGw5zrW4auJgJ0P_PYZUDAww3XDPne70InRO8IwQrG4JnSnOSyUP0IQwJnKlGD9EE4yJzFVR4mN0EuMGY8wpVhN0Nx9635je2cz6GrLGdJ1r15lvM9Nmru1hu3VraPusgcaHXWaCfXc92H4IcIqOVmYb4Wxfp-jt4f518ZQvXx6fF_Nlbgsu-ryquQGlCimlwIxWxEpgAgyXUFkopU2AY1NJteI1I8wIoUyaILRMsqiLKboe93bBfwwQe924aNNlpgU_RK2wUBwTzJO8-lNSKQgvS_o_FLgo5De8_AU3fghteldLWUoqGS0SuhmRDT7GACvdBdeYsNME669cNKF6zCXZi9E6APhx--YnAleG7Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>884828523</pqid></control><display><type>article</type><title>Automatic code mapping on an intelligent memory architecture</title><source>IEEE Electronic Library (IEL)</source><creator>Yan Solihin ; Jaejin Lee ; Torrellas, J.</creator><creatorcontrib>Yan Solihin ; Jaejin Lee ; Torrellas, J.</creatorcontrib><description>This paper presents an algorithm to automatically map code on a generic intelligent memory system that consists of a high-end host processor and a simpler memory processor. To achieve high performance with this type of architecture, the code needs to be partitioned and scheduled such that each section is assigned to the processor on which it runs most efficiently. In addition, the two processors should overlap their execution as much as possible. With our algorithm, applications are mapped fully automatically using both static and dynamic information. Using a set of standard applications and a simulated architecture, we obtain average speedups of 1.7 for numerical applications and 1.2 for nonnumerical applications over a single host with plain memory. The speedups are very close and often higher than ideal speedups on a more expensive multiprocessor system composed of two identical host processors. Our work shows that heterogeneity can be cost-effectively exploited and represents one step toward effectively mapping code on heterogeneous intelligent memory systems.</description><identifier>ISSN: 0018-9340</identifier><identifier>EISSN: 1557-9956</identifier><identifier>DOI: 10.1109/12.966498</identifier><identifier>CODEN: ITCOB4</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Application software ; Architecture ; Computer architecture ; Computer simulation ; Coprocessors ; Delay ; Dynamical systems ; Dynamics ; Heterogeneity ; Intelligent systems ; Mapping ; Memory architecture ; Microprocessors ; Multiprocessing systems ; Partitioning algorithms ; Processor scheduling ; Proposals ; Studies</subject><ispartof>IEEE transactions on computers, 2001-11, Vol.50 (11), p.1248-1266</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2001</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c367t-bd6ae9938887052b1c8e57ea68ebce48cbd660ab89f6d515a779aae9124c8e3d3</citedby><cites>FETCH-LOGICAL-c367t-bd6ae9938887052b1c8e57ea68ebce48cbd660ab89f6d515a779aae9124c8e3d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/966498$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27923,27924,54757</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/966498$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Yan Solihin</creatorcontrib><creatorcontrib>Jaejin Lee</creatorcontrib><creatorcontrib>Torrellas, J.</creatorcontrib><title>Automatic code mapping on an intelligent memory architecture</title><title>IEEE transactions on computers</title><addtitle>TC</addtitle><description>This paper presents an algorithm to automatically map code on a generic intelligent memory system that consists of a high-end host processor and a simpler memory processor. To achieve high performance with this type of architecture, the code needs to be partitioned and scheduled such that each section is assigned to the processor on which it runs most efficiently. In addition, the two processors should overlap their execution as much as possible. With our algorithm, applications are mapped fully automatically using both static and dynamic information. Using a set of standard applications and a simulated architecture, we obtain average speedups of 1.7 for numerical applications and 1.2 for nonnumerical applications over a single host with plain memory. The speedups are very close and often higher than ideal speedups on a more expensive multiprocessor system composed of two identical host processors. Our work shows that heterogeneity can be cost-effectively exploited and represents one step toward effectively mapping code on heterogeneous intelligent memory systems.</description><subject>Algorithms</subject><subject>Application software</subject><subject>Architecture</subject><subject>Computer architecture</subject><subject>Computer simulation</subject><subject>Coprocessors</subject><subject>Delay</subject><subject>Dynamical systems</subject><subject>Dynamics</subject><subject>Heterogeneity</subject><subject>Intelligent systems</subject><subject>Mapping</subject><subject>Memory architecture</subject><subject>Microprocessors</subject><subject>Multiprocessing systems</subject><subject>Partitioning algorithms</subject><subject>Processor scheduling</subject><subject>Proposals</subject><subject>Studies</subject><issn>0018-9340</issn><issn>1557-9956</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2001</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNqF0T1PwzAQBmALgUQpDKxMEQOIIcV24i-Jpar4kiqxwGw5zrW4auJgJ0P_PYZUDAww3XDPne70InRO8IwQrG4JnSnOSyUP0IQwJnKlGD9EE4yJzFVR4mN0EuMGY8wpVhN0Nx9635je2cz6GrLGdJ1r15lvM9Nmru1hu3VraPusgcaHXWaCfXc92H4IcIqOVmYb4Wxfp-jt4f518ZQvXx6fF_Nlbgsu-ryquQGlCimlwIxWxEpgAgyXUFkopU2AY1NJteI1I8wIoUyaILRMsqiLKboe93bBfwwQe924aNNlpgU_RK2wUBwTzJO8-lNSKQgvS_o_FLgo5De8_AU3fghteldLWUoqGS0SuhmRDT7GACvdBdeYsNME669cNKF6zCXZi9E6APhx--YnAleG7Q</recordid><startdate>20011101</startdate><enddate>20011101</enddate><creator>Yan Solihin</creator><creator>Jaejin Lee</creator><creator>Torrellas, J.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>20011101</creationdate><title>Automatic code mapping on an intelligent memory architecture</title><author>Yan Solihin ; Jaejin Lee ; Torrellas, J.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c367t-bd6ae9938887052b1c8e57ea68ebce48cbd660ab89f6d515a779aae9124c8e3d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2001</creationdate><topic>Algorithms</topic><topic>Application software</topic><topic>Architecture</topic><topic>Computer architecture</topic><topic>Computer simulation</topic><topic>Coprocessors</topic><topic>Delay</topic><topic>Dynamical systems</topic><topic>Dynamics</topic><topic>Heterogeneity</topic><topic>Intelligent systems</topic><topic>Mapping</topic><topic>Memory architecture</topic><topic>Microprocessors</topic><topic>Multiprocessing systems</topic><topic>Partitioning algorithms</topic><topic>Processor scheduling</topic><topic>Proposals</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yan Solihin</creatorcontrib><creatorcontrib>Jaejin Lee</creatorcontrib><creatorcontrib>Torrellas, J.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on computers</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yan Solihin</au><au>Jaejin Lee</au><au>Torrellas, J.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Automatic code mapping on an intelligent memory architecture</atitle><jtitle>IEEE transactions on computers</jtitle><stitle>TC</stitle><date>2001-11-01</date><risdate>2001</risdate><volume>50</volume><issue>11</issue><spage>1248</spage><epage>1266</epage><pages>1248-1266</pages><issn>0018-9340</issn><eissn>1557-9956</eissn><coden>ITCOB4</coden><abstract>This paper presents an algorithm to automatically map code on a generic intelligent memory system that consists of a high-end host processor and a simpler memory processor. To achieve high performance with this type of architecture, the code needs to be partitioned and scheduled such that each section is assigned to the processor on which it runs most efficiently. In addition, the two processors should overlap their execution as much as possible. With our algorithm, applications are mapped fully automatically using both static and dynamic information. Using a set of standard applications and a simulated architecture, we obtain average speedups of 1.7 for numerical applications and 1.2 for nonnumerical applications over a single host with plain memory. The speedups are very close and often higher than ideal speedups on a more expensive multiprocessor system composed of two identical host processors. Our work shows that heterogeneity can be cost-effectively exploited and represents one step toward effectively mapping code on heterogeneous intelligent memory systems.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/12.966498</doi><tpages>19</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0018-9340
ispartof IEEE transactions on computers, 2001-11, Vol.50 (11), p.1248-1266
issn 0018-9340
1557-9956
language eng
recordid cdi_proquest_miscellaneous_27033842
source IEEE Electronic Library (IEL)
subjects Algorithms
Application software
Architecture
Computer architecture
Computer simulation
Coprocessors
Delay
Dynamical systems
Dynamics
Heterogeneity
Intelligent systems
Mapping
Memory architecture
Microprocessors
Multiprocessing systems
Partitioning algorithms
Processor scheduling
Proposals
Studies
title Automatic code mapping on an intelligent memory architecture
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T23%3A19%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Automatic%20code%20mapping%20on%20an%20intelligent%20memory%20architecture&rft.jtitle=IEEE%20transactions%20on%20computers&rft.au=Yan%20Solihin&rft.date=2001-11-01&rft.volume=50&rft.issue=11&rft.spage=1248&rft.epage=1266&rft.pages=1248-1266&rft.issn=0018-9340&rft.eissn=1557-9956&rft.coden=ITCOB4&rft_id=info:doi/10.1109/12.966498&rft_dat=%3Cproquest_RIE%3E27033842%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=884828523&rft_id=info:pmid/&rft_ieee_id=966498&rfr_iscdi=true