Reducing operand communication overhead using instruction clustering for multimedia applications

As technology trends yield shorter cycle times and larger, wider datapaths in architectures for multimedia systems, global broadcast networks for operand communication are becoming a major bottleneck in processor performance. New low latency operand transport techniques are needed. This paper propos...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Hongkyu Kim, Wills, D.S., Wills, L.M.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Broadcast technology Costs Delay Dynamic scheduling Global communication Multimedia communication Multimedia systems Performance gain Runtime Transportation
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page	8 pp.
container_title
container_volume
creator	Hongkyu Kim Wills, D.S. Wills, L.M.
description	As technology trends yield shorter cycle times and larger, wider datapaths in architectures for multimedia systems, global broadcast networks for operand communication are becoming a major bottleneck in processor performance. New low latency operand transport techniques are needed. This paper proposes and evaluates lower cost mechanisms than traditional bypass networks, exploiting regular operand distribution patterns in multimedia applications. To reduce latency associated with operand movement within a datapath, our mechanism, called dynamic instruction clustering, groups chains of dependent instructions within a basic block at runtime, identifies intermediate value transportation, and schedules it on networked ALUs which are connected by a local dedicated network. By converting global communication into local, the transport latency can be minimized and the critical path of the application code can be executed in consecutive, shortened cycles, resulting in improved performance. We demonstrated that 28% and 30% of total dependence edges residing in the instruction window can be localized on 8 and 16-way machines, respectively. Our results show that the overall performance gains over a wide range of multimedia applications are 16% for 8-way and 35% for 16-way on average.
doi_str_mv	10.1109/ISM.2005.95
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_1565852</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1565852</ieee_id><sourcerecordid>1565852</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-c6ca060cea1d3e6263ae810a6cfe9622e6a80cc6574ad8480b7725a646f5fa583</originalsourceid><addsrcrecordid>eNotjEtLxDAUhQMiqOOsXLrpH2i9TZrbZCmDj4ERQWc_XpNbjfRF0gr-e51xzubA-Q6fEFclFGUJ9mb9-lRIAF1YfSIuoEarZWWsOhPLlL7gL8pqBHku3l7Yzy70H9kwcqTeZ27ourkPjqYw9NnwzfGTyWdz2p9Cn6Y4uwNy7Zwmjvu5GWLWze0UOvaBMhrH9ihIl-K0oTbx8tgLsb2_264e883zw3p1u8mDhSl36AgQHFPpFaNERWxKIHQNW5SSkQw4h7quyJvKwHtdS01YYaMb0kYtxPW_NjDzboyho_izKzVqo6X6Be55VAc</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Reducing operand communication overhead using instruction clustering for multimedia applications</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Hongkyu Kim ; Wills, D.S. ; Wills, L.M.</creator><creatorcontrib>Hongkyu Kim ; Wills, D.S. ; Wills, L.M.</creatorcontrib><description>As technology trends yield shorter cycle times and larger, wider datapaths in architectures for multimedia systems, global broadcast networks for operand communication are becoming a major bottleneck in processor performance. New low latency operand transport techniques are needed. This paper proposes and evaluates lower cost mechanisms than traditional bypass networks, exploiting regular operand distribution patterns in multimedia applications. To reduce latency associated with operand movement within a datapath, our mechanism, called dynamic instruction clustering, groups chains of dependent instructions within a basic block at runtime, identifies intermediate value transportation, and schedules it on networked ALUs which are connected by a local dedicated network. By converting global communication into local, the transport latency can be minimized and the critical path of the application code can be executed in consecutive, shortened cycles, resulting in improved performance. We demonstrated that 28% and 30% of total dependence edges residing in the instruction window can be localized on 8 and 16-way machines, respectively. Our results show that the overall performance gains over a wide range of multimedia applications are 16% for 8-way and 35% for 16-way on average.</description><identifier>ISBN: 0769524893</identifier><identifier>ISBN: 9780769524894</identifier><identifier>DOI: 10.1109/ISM.2005.95</identifier><language>eng</language><publisher>IEEE</publisher><subject>Broadcast technology ; Costs ; Delay ; Dynamic scheduling ; Global communication ; Multimedia communication ; Multimedia systems ; Performance gain ; Runtime ; Transportation</subject><ispartof>Seventh IEEE International Symposium on Multimedia (ISM'05), 2005, p.8 pp.</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1565852$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,4036,4037,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1565852$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Hongkyu Kim</creatorcontrib><creatorcontrib>Wills, D.S.</creatorcontrib><creatorcontrib>Wills, L.M.</creatorcontrib><title>Reducing operand communication overhead using instruction clustering for multimedia applications</title><title>Seventh IEEE International Symposium on Multimedia (ISM'05)</title><addtitle>ISM</addtitle><description>As technology trends yield shorter cycle times and larger, wider datapaths in architectures for multimedia systems, global broadcast networks for operand communication are becoming a major bottleneck in processor performance. New low latency operand transport techniques are needed. This paper proposes and evaluates lower cost mechanisms than traditional bypass networks, exploiting regular operand distribution patterns in multimedia applications. To reduce latency associated with operand movement within a datapath, our mechanism, called dynamic instruction clustering, groups chains of dependent instructions within a basic block at runtime, identifies intermediate value transportation, and schedules it on networked ALUs which are connected by a local dedicated network. By converting global communication into local, the transport latency can be minimized and the critical path of the application code can be executed in consecutive, shortened cycles, resulting in improved performance. We demonstrated that 28% and 30% of total dependence edges residing in the instruction window can be localized on 8 and 16-way machines, respectively. Our results show that the overall performance gains over a wide range of multimedia applications are 16% for 8-way and 35% for 16-way on average.</description><subject>Broadcast technology</subject><subject>Costs</subject><subject>Delay</subject><subject>Dynamic scheduling</subject><subject>Global communication</subject><subject>Multimedia communication</subject><subject>Multimedia systems</subject><subject>Performance gain</subject><subject>Runtime</subject><subject>Transportation</subject><isbn>0769524893</isbn><isbn>9780769524894</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2005</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotjEtLxDAUhQMiqOOsXLrpH2i9TZrbZCmDj4ERQWc_XpNbjfRF0gr-e51xzubA-Q6fEFclFGUJ9mb9-lRIAF1YfSIuoEarZWWsOhPLlL7gL8pqBHku3l7Yzy70H9kwcqTeZ27ourkPjqYw9NnwzfGTyWdz2p9Cn6Y4uwNy7Zwmjvu5GWLWze0UOvaBMhrH9ihIl-K0oTbx8tgLsb2_264e883zw3p1u8mDhSl36AgQHFPpFaNERWxKIHQNW5SSkQw4h7quyJvKwHtdS01YYaMb0kYtxPW_NjDzboyho_izKzVqo6X6Be55VAc</recordid><startdate>2005</startdate><enddate>2005</enddate><creator>Hongkyu Kim</creator><creator>Wills, D.S.</creator><creator>Wills, L.M.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>2005</creationdate><title>Reducing operand communication overhead using instruction clustering for multimedia applications</title><author>Hongkyu Kim ; Wills, D.S. ; Wills, L.M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-c6ca060cea1d3e6263ae810a6cfe9622e6a80cc6574ad8480b7725a646f5fa583</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Broadcast technology</topic><topic>Costs</topic><topic>Delay</topic><topic>Dynamic scheduling</topic><topic>Global communication</topic><topic>Multimedia communication</topic><topic>Multimedia systems</topic><topic>Performance gain</topic><topic>Runtime</topic><topic>Transportation</topic><toplevel>online_resources</toplevel><creatorcontrib>Hongkyu Kim</creatorcontrib><creatorcontrib>Wills, D.S.</creatorcontrib><creatorcontrib>Wills, L.M.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hongkyu Kim</au><au>Wills, D.S.</au><au>Wills, L.M.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Reducing operand communication overhead using instruction clustering for multimedia applications</atitle><btitle>Seventh IEEE International Symposium on Multimedia (ISM'05)</btitle><stitle>ISM</stitle><date>2005</date><risdate>2005</risdate><spage>8 pp.</spage><pages>8 pp.-</pages><isbn>0769524893</isbn><isbn>9780769524894</isbn><abstract>As technology trends yield shorter cycle times and larger, wider datapaths in architectures for multimedia systems, global broadcast networks for operand communication are becoming a major bottleneck in processor performance. New low latency operand transport techniques are needed. This paper proposes and evaluates lower cost mechanisms than traditional bypass networks, exploiting regular operand distribution patterns in multimedia applications. To reduce latency associated with operand movement within a datapath, our mechanism, called dynamic instruction clustering, groups chains of dependent instructions within a basic block at runtime, identifies intermediate value transportation, and schedules it on networked ALUs which are connected by a local dedicated network. By converting global communication into local, the transport latency can be minimized and the critical path of the application code can be executed in consecutive, shortened cycles, resulting in improved performance. We demonstrated that 28% and 30% of total dependence edges residing in the instruction window can be localized on 8 and 16-way machines, respectively. Our results show that the overall performance gains over a wide range of multimedia applications are 16% for 8-way and 35% for 16-way on average.</abstract><pub>IEEE</pub><doi>10.1109/ISM.2005.95</doi></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISBN: 0769524893
ispartof	Seventh IEEE International Symposium on Multimedia (ISM'05), 2005, p.8 pp.
issn
language	eng
recordid	cdi_ieee_primary_1565852
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Broadcast technology Costs Delay Dynamic scheduling Global communication Multimedia communication Multimedia systems Performance gain Runtime Transportation
title	Reducing operand communication overhead using instruction clustering for multimedia applications
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T02%3A25%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Reducing%20operand%20communication%20overhead%20using%20instruction%20clustering%20for%20multimedia%20applications&rft.btitle=Seventh%20IEEE%20International%20Symposium%20on%20Multimedia%20(ISM'05)&rft.au=Hongkyu%20Kim&rft.date=2005&rft.spage=8%20pp.&rft.pages=8%20pp.-&rft.isbn=0769524893&rft.isbn_list=9780769524894&rft_id=info:doi/10.1109/ISM.2005.95&rft_dat=%3Cieee_6IE%3E1565852%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=1565852&rfr_iscdi=true