Reducing operand communication overhead using instruction clustering for multimedia applications

As technology trends yield shorter cycle times and larger, wider datapaths in architectures for multimedia systems, global broadcast networks for operand communication are becoming a major bottleneck in processor performance. New low latency operand transport techniques are needed. This paper propos...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Hongkyu Kim, Wills, D.S., Wills, L.M.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page 8 pp.
container_title
container_volume
creator Hongkyu Kim
Wills, D.S.
Wills, L.M.
description As technology trends yield shorter cycle times and larger, wider datapaths in architectures for multimedia systems, global broadcast networks for operand communication are becoming a major bottleneck in processor performance. New low latency operand transport techniques are needed. This paper proposes and evaluates lower cost mechanisms than traditional bypass networks, exploiting regular operand distribution patterns in multimedia applications. To reduce latency associated with operand movement within a datapath, our mechanism, called dynamic instruction clustering, groups chains of dependent instructions within a basic block at runtime, identifies intermediate value transportation, and schedules it on networked ALUs which are connected by a local dedicated network. By converting global communication into local, the transport latency can be minimized and the critical path of the application code can be executed in consecutive, shortened cycles, resulting in improved performance. We demonstrated that 28% and 30% of total dependence edges residing in the instruction window can be localized on 8 and 16-way machines, respectively. Our results show that the overall performance gains over a wide range of multimedia applications are 16% for 8-way and 35% for 16-way on average.
doi_str_mv 10.1109/ISM.2005.95
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_1565852</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1565852</ieee_id><sourcerecordid>1565852</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-c6ca060cea1d3e6263ae810a6cfe9622e6a80cc6574ad8480b7725a646f5fa583</originalsourceid><addsrcrecordid>eNotjEtLxDAUhQMiqOOsXLrpH2i9TZrbZCmDj4ERQWc_XpNbjfRF0gr-e51xzubA-Q6fEFclFGUJ9mb9-lRIAF1YfSIuoEarZWWsOhPLlL7gL8pqBHku3l7Yzy70H9kwcqTeZ27ourkPjqYw9NnwzfGTyWdz2p9Cn6Y4uwNy7Zwmjvu5GWLWze0UOvaBMhrH9ihIl-K0oTbx8tgLsb2_264e883zw3p1u8mDhSl36AgQHFPpFaNERWxKIHQNW5SSkQw4h7quyJvKwHtdS01YYaMb0kYtxPW_NjDzboyho_izKzVqo6X6Be55VAc</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Reducing operand communication overhead using instruction clustering for multimedia applications</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Hongkyu Kim ; Wills, D.S. ; Wills, L.M.</creator><creatorcontrib>Hongkyu Kim ; Wills, D.S. ; Wills, L.M.</creatorcontrib><description>As technology trends yield shorter cycle times and larger, wider datapaths in architectures for multimedia systems, global broadcast networks for operand communication are becoming a major bottleneck in processor performance. New low latency operand transport techniques are needed. This paper proposes and evaluates lower cost mechanisms than traditional bypass networks, exploiting regular operand distribution patterns in multimedia applications. To reduce latency associated with operand movement within a datapath, our mechanism, called dynamic instruction clustering, groups chains of dependent instructions within a basic block at runtime, identifies intermediate value transportation, and schedules it on networked ALUs which are connected by a local dedicated network. By converting global communication into local, the transport latency can be minimized and the critical path of the application code can be executed in consecutive, shortened cycles, resulting in improved performance. We demonstrated that 28% and 30% of total dependence edges residing in the instruction window can be localized on 8 and 16-way machines, respectively. Our results show that the overall performance gains over a wide range of multimedia applications are 16% for 8-way and 35% for 16-way on average.</description><identifier>ISBN: 0769524893</identifier><identifier>ISBN: 9780769524894</identifier><identifier>DOI: 10.1109/ISM.2005.95</identifier><language>eng</language><publisher>IEEE</publisher><subject>Broadcast technology ; Costs ; Delay ; Dynamic scheduling ; Global communication ; Multimedia communication ; Multimedia systems ; Performance gain ; Runtime ; Transportation</subject><ispartof>Seventh IEEE International Symposium on Multimedia (ISM'05), 2005, p.8 pp.</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1565852$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,4036,4037,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1565852$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Hongkyu Kim</creatorcontrib><creatorcontrib>Wills, D.S.</creatorcontrib><creatorcontrib>Wills, L.M.</creatorcontrib><title>Reducing operand communication overhead using instruction clustering for multimedia applications</title><title>Seventh IEEE International Symposium on Multimedia (ISM'05)</title><addtitle>ISM</addtitle><description>As technology trends yield shorter cycle times and larger, wider datapaths in architectures for multimedia systems, global broadcast networks for operand communication are becoming a major bottleneck in processor performance. New low latency operand transport techniques are needed. This paper proposes and evaluates lower cost mechanisms than traditional bypass networks, exploiting regular operand distribution patterns in multimedia applications. To reduce latency associated with operand movement within a datapath, our mechanism, called dynamic instruction clustering, groups chains of dependent instructions within a basic block at runtime, identifies intermediate value transportation, and schedules it on networked ALUs which are connected by a local dedicated network. By converting global communication into local, the transport latency can be minimized and the critical path of the application code can be executed in consecutive, shortened cycles, resulting in improved performance. We demonstrated that 28% and 30% of total dependence edges residing in the instruction window can be localized on 8 and 16-way machines, respectively. Our results show that the overall performance gains over a wide range of multimedia applications are 16% for 8-way and 35% for 16-way on average.</description><subject>Broadcast technology</subject><subject>Costs</subject><subject>Delay</subject><subject>Dynamic scheduling</subject><subject>Global communication</subject><subject>Multimedia communication</subject><subject>Multimedia systems</subject><subject>Performance gain</subject><subject>Runtime</subject><subject>Transportation</subject><isbn>0769524893</isbn><isbn>9780769524894</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2005</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotjEtLxDAUhQMiqOOsXLrpH2i9TZrbZCmDj4ERQWc_XpNbjfRF0gr-e51xzubA-Q6fEFclFGUJ9mb9-lRIAF1YfSIuoEarZWWsOhPLlL7gL8pqBHku3l7Yzy70H9kwcqTeZ27ourkPjqYw9NnwzfGTyWdz2p9Cn6Y4uwNy7Zwmjvu5GWLWze0UOvaBMhrH9ihIl-K0oTbx8tgLsb2_264e883zw3p1u8mDhSl36AgQHFPpFaNERWxKIHQNW5SSkQw4h7quyJvKwHtdS01YYaMb0kYtxPW_NjDzboyho_izKzVqo6X6Be55VAc</recordid><startdate>2005</startdate><enddate>2005</enddate><creator>Hongkyu Kim</creator><creator>Wills, D.S.</creator><creator>Wills, L.M.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>2005</creationdate><title>Reducing operand communication overhead using instruction clustering for multimedia applications</title><author>Hongkyu Kim ; Wills, D.S. ; Wills, L.M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-c6ca060cea1d3e6263ae810a6cfe9622e6a80cc6574ad8480b7725a646f5fa583</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Broadcast technology</topic><topic>Costs</topic><topic>Delay</topic><topic>Dynamic scheduling</topic><topic>Global communication</topic><topic>Multimedia communication</topic><topic>Multimedia systems</topic><topic>Performance gain</topic><topic>Runtime</topic><topic>Transportation</topic><toplevel>online_resources</toplevel><creatorcontrib>Hongkyu Kim</creatorcontrib><creatorcontrib>Wills, D.S.</creatorcontrib><creatorcontrib>Wills, L.M.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hongkyu Kim</au><au>Wills, D.S.</au><au>Wills, L.M.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Reducing operand communication overhead using instruction clustering for multimedia applications</atitle><btitle>Seventh IEEE International Symposium on Multimedia (ISM'05)</btitle><stitle>ISM</stitle><date>2005</date><risdate>2005</risdate><spage>8 pp.</spage><pages>8 pp.-</pages><isbn>0769524893</isbn><isbn>9780769524894</isbn><abstract>As technology trends yield shorter cycle times and larger, wider datapaths in architectures for multimedia systems, global broadcast networks for operand communication are becoming a major bottleneck in processor performance. New low latency operand transport techniques are needed. This paper proposes and evaluates lower cost mechanisms than traditional bypass networks, exploiting regular operand distribution patterns in multimedia applications. To reduce latency associated with operand movement within a datapath, our mechanism, called dynamic instruction clustering, groups chains of dependent instructions within a basic block at runtime, identifies intermediate value transportation, and schedules it on networked ALUs which are connected by a local dedicated network. By converting global communication into local, the transport latency can be minimized and the critical path of the application code can be executed in consecutive, shortened cycles, resulting in improved performance. We demonstrated that 28% and 30% of total dependence edges residing in the instruction window can be localized on 8 and 16-way machines, respectively. Our results show that the overall performance gains over a wide range of multimedia applications are 16% for 8-way and 35% for 16-way on average.</abstract><pub>IEEE</pub><doi>10.1109/ISM.2005.95</doi></addata></record>
fulltext fulltext_linktorsrc
identifier ISBN: 0769524893
ispartof Seventh IEEE International Symposium on Multimedia (ISM'05), 2005, p.8 pp.
issn
language eng
recordid cdi_ieee_primary_1565852
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Broadcast technology
Costs
Delay
Dynamic scheduling
Global communication
Multimedia communication
Multimedia systems
Performance gain
Runtime
Transportation
title Reducing operand communication overhead using instruction clustering for multimedia applications
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T02%3A25%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Reducing%20operand%20communication%20overhead%20using%20instruction%20clustering%20for%20multimedia%20applications&rft.btitle=Seventh%20IEEE%20International%20Symposium%20on%20Multimedia%20(ISM'05)&rft.au=Hongkyu%20Kim&rft.date=2005&rft.spage=8%20pp.&rft.pages=8%20pp.-&rft.isbn=0769524893&rft.isbn_list=9780769524894&rft_id=info:doi/10.1109/ISM.2005.95&rft_dat=%3Cieee_6IE%3E1565852%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=1565852&rfr_iscdi=true