Low cost and high throughput multiplierless design of a 16 point 1-D DCT of the new HEVC video coding standard
This paper presents the hardware design of a 16-points 1-D DCT used in the emerging video coding standard HEVC - High Efficiency Video Coding. The 1-D DCT is used by the 16×16 2-D DCT of the HEVC standard. The transforms stage is one of the innovations proposed by HEVC, not only because of the varia...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng ; jpn |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 6 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | |
container_volume | |
creator | Jeske, R. de Souza, J. C. Wrege, G. Conceicao, R. Grellert, M. Mattos, J. Agostini, L. |
description | This paper presents the hardware design of a 16-points 1-D DCT used in the emerging video coding standard HEVC - High Efficiency Video Coding. The 1-D DCT is used by the 16×16 2-D DCT of the HEVC standard. The transforms stage is one of the innovations proposed by HEVC, not only because of the variable size (from 4×4 to 32×32) but also because higher dimension transforms other than the traditional 4×4 and 8×8 are used. The hardware design presented in this work focuses on low cost and high throughput. To achieve such objectives, the 16-points algorithm from HEVC was simplified, so that a more efficient hardware design could be implemented. Some strategies were used during this simplification, such as operations reordering, factoring to compress the length of the operators, multiplications by constant turned into shifts and adds, sub-expressions sharing, among others. The architecture was designed in a fully combinational way in order to reduce hardware overhead. Synthesis results obtained using Altera FPGAs from the Cyclone II and Stratix III families showed hardware resources reduction reaching 72% when compared to an architecture described as a direct transcription of the non-optimized version of the algorithm. Even with a purely combinational implementation, the designed architecture achieved a throughput between 376 Msamples/s and 1.4 Gsamples/s. With these results, the architecture is capable of processing, in the worst case, more than 30 QFHD frames (3840×2160 pixels) per second. Therefore, the architecture is capable of processing videos with significantly high resolutions in real time. To the best of our knowledge, this is the first work in the literature that presents hardware results for the HEVC transforms. |
doi_str_mv | 10.1109/SPL.2012.6211786 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6211786</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6211786</ieee_id><sourcerecordid>6211786</sourcerecordid><originalsourceid>FETCH-LOGICAL-i156t-266f4bf56063f56afb20a2d061dededc07af7ff528528ad086f3ef1d198bebbc3</originalsourceid><addsrcrecordid>eNpVkEFLw0AQhVdEUGrvgpf5A6k7m2SyPUparRBQsHotm-xuspImIbu1-O8bsRdnHu8x7_AdhrE75AtEvnx4fysWgqNYkEDMJF2w-TKTmFAWc5QkLv_dibxmc--_-DQZj0mmN6wr-iNUvQ-gOg2NqxsIzdgf6mY4BNgf2uCG1pmxNd6DNt7VHfQWFCDB0LsuAEYrWOXb3zY0BjpzhM36M4dvp00_obXravBhwqtR37Irq1pv5uecsY-n9TbfRMXr80v-WEQOUwqRILJJaVPiFE-ubCm4EpoTajNtxTNlM2tTIScpzSXZ2FjUuJSlKcsqnrH7P64zxuyG0e3V-LM7fyk-ATqfWpo</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Low cost and high throughput multiplierless design of a 16 point 1-D DCT of the new HEVC video coding standard</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Jeske, R. ; de Souza, J. C. ; Wrege, G. ; Conceicao, R. ; Grellert, M. ; Mattos, J. ; Agostini, L.</creator><creatorcontrib>Jeske, R. ; de Souza, J. C. ; Wrege, G. ; Conceicao, R. ; Grellert, M. ; Mattos, J. ; Agostini, L.</creatorcontrib><description>This paper presents the hardware design of a 16-points 1-D DCT used in the emerging video coding standard HEVC - High Efficiency Video Coding. The 1-D DCT is used by the 16×16 2-D DCT of the HEVC standard. The transforms stage is one of the innovations proposed by HEVC, not only because of the variable size (from 4×4 to 32×32) but also because higher dimension transforms other than the traditional 4×4 and 8×8 are used. The hardware design presented in this work focuses on low cost and high throughput. To achieve such objectives, the 16-points algorithm from HEVC was simplified, so that a more efficient hardware design could be implemented. Some strategies were used during this simplification, such as operations reordering, factoring to compress the length of the operators, multiplications by constant turned into shifts and adds, sub-expressions sharing, among others. The architecture was designed in a fully combinational way in order to reduce hardware overhead. Synthesis results obtained using Altera FPGAs from the Cyclone II and Stratix III families showed hardware resources reduction reaching 72% when compared to an architecture described as a direct transcription of the non-optimized version of the algorithm. Even with a purely combinational implementation, the designed architecture achieved a throughput between 376 Msamples/s and 1.4 Gsamples/s. With these results, the architecture is capable of processing, in the worst case, more than 30 QFHD frames (3840×2160 pixels) per second. Therefore, the architecture is capable of processing videos with significantly high resolutions in real time. To the best of our knowledge, this is the first work in the literature that presents hardware results for the HEVC transforms.</description><identifier>ISBN: 9781467301848</identifier><identifier>ISBN: 1467301841</identifier><identifier>EISBN: 9781467301862</identifier><identifier>EISBN: 1467301868</identifier><identifier>EISBN: 146730185X</identifier><identifier>EISBN: 9781467301855</identifier><identifier>DOI: 10.1109/SPL.2012.6211786</identifier><language>eng ; jpn</language><publisher>IEEE</publisher><subject>1-D DCT ; Algorithm design and analysis ; Computer architecture ; Discrete cosine transforms ; Equations ; FPGA design ; Hardware ; HEVC ; Mathematical model ; video coding</subject><ispartof>2012 VIII Southern Conference on Programmable Logic, 2012, p.1-6</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6211786$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6211786$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Jeske, R.</creatorcontrib><creatorcontrib>de Souza, J. C.</creatorcontrib><creatorcontrib>Wrege, G.</creatorcontrib><creatorcontrib>Conceicao, R.</creatorcontrib><creatorcontrib>Grellert, M.</creatorcontrib><creatorcontrib>Mattos, J.</creatorcontrib><creatorcontrib>Agostini, L.</creatorcontrib><title>Low cost and high throughput multiplierless design of a 16 point 1-D DCT of the new HEVC video coding standard</title><title>2012 VIII Southern Conference on Programmable Logic</title><addtitle>SPL</addtitle><description>This paper presents the hardware design of a 16-points 1-D DCT used in the emerging video coding standard HEVC - High Efficiency Video Coding. The 1-D DCT is used by the 16×16 2-D DCT of the HEVC standard. The transforms stage is one of the innovations proposed by HEVC, not only because of the variable size (from 4×4 to 32×32) but also because higher dimension transforms other than the traditional 4×4 and 8×8 are used. The hardware design presented in this work focuses on low cost and high throughput. To achieve such objectives, the 16-points algorithm from HEVC was simplified, so that a more efficient hardware design could be implemented. Some strategies were used during this simplification, such as operations reordering, factoring to compress the length of the operators, multiplications by constant turned into shifts and adds, sub-expressions sharing, among others. The architecture was designed in a fully combinational way in order to reduce hardware overhead. Synthesis results obtained using Altera FPGAs from the Cyclone II and Stratix III families showed hardware resources reduction reaching 72% when compared to an architecture described as a direct transcription of the non-optimized version of the algorithm. Even with a purely combinational implementation, the designed architecture achieved a throughput between 376 Msamples/s and 1.4 Gsamples/s. With these results, the architecture is capable of processing, in the worst case, more than 30 QFHD frames (3840×2160 pixels) per second. Therefore, the architecture is capable of processing videos with significantly high resolutions in real time. To the best of our knowledge, this is the first work in the literature that presents hardware results for the HEVC transforms.</description><subject>1-D DCT</subject><subject>Algorithm design and analysis</subject><subject>Computer architecture</subject><subject>Discrete cosine transforms</subject><subject>Equations</subject><subject>FPGA design</subject><subject>Hardware</subject><subject>HEVC</subject><subject>Mathematical model</subject><subject>video coding</subject><isbn>9781467301848</isbn><isbn>1467301841</isbn><isbn>9781467301862</isbn><isbn>1467301868</isbn><isbn>146730185X</isbn><isbn>9781467301855</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2012</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpVkEFLw0AQhVdEUGrvgpf5A6k7m2SyPUparRBQsHotm-xuspImIbu1-O8bsRdnHu8x7_AdhrE75AtEvnx4fysWgqNYkEDMJF2w-TKTmFAWc5QkLv_dibxmc--_-DQZj0mmN6wr-iNUvQ-gOg2NqxsIzdgf6mY4BNgf2uCG1pmxNd6DNt7VHfQWFCDB0LsuAEYrWOXb3zY0BjpzhM36M4dvp00_obXravBhwqtR37Irq1pv5uecsY-n9TbfRMXr80v-WEQOUwqRILJJaVPiFE-ubCm4EpoTajNtxTNlM2tTIScpzSXZ2FjUuJSlKcsqnrH7P64zxuyG0e3V-LM7fyk-ATqfWpo</recordid><startdate>201203</startdate><enddate>201203</enddate><creator>Jeske, R.</creator><creator>de Souza, J. C.</creator><creator>Wrege, G.</creator><creator>Conceicao, R.</creator><creator>Grellert, M.</creator><creator>Mattos, J.</creator><creator>Agostini, L.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201203</creationdate><title>Low cost and high throughput multiplierless design of a 16 point 1-D DCT of the new HEVC video coding standard</title><author>Jeske, R. ; de Souza, J. C. ; Wrege, G. ; Conceicao, R. ; Grellert, M. ; Mattos, J. ; Agostini, L.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i156t-266f4bf56063f56afb20a2d061dededc07af7ff528528ad086f3ef1d198bebbc3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng ; jpn</language><creationdate>2012</creationdate><topic>1-D DCT</topic><topic>Algorithm design and analysis</topic><topic>Computer architecture</topic><topic>Discrete cosine transforms</topic><topic>Equations</topic><topic>FPGA design</topic><topic>Hardware</topic><topic>HEVC</topic><topic>Mathematical model</topic><topic>video coding</topic><toplevel>online_resources</toplevel><creatorcontrib>Jeske, R.</creatorcontrib><creatorcontrib>de Souza, J. C.</creatorcontrib><creatorcontrib>Wrege, G.</creatorcontrib><creatorcontrib>Conceicao, R.</creatorcontrib><creatorcontrib>Grellert, M.</creatorcontrib><creatorcontrib>Mattos, J.</creatorcontrib><creatorcontrib>Agostini, L.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jeske, R.</au><au>de Souza, J. C.</au><au>Wrege, G.</au><au>Conceicao, R.</au><au>Grellert, M.</au><au>Mattos, J.</au><au>Agostini, L.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Low cost and high throughput multiplierless design of a 16 point 1-D DCT of the new HEVC video coding standard</atitle><btitle>2012 VIII Southern Conference on Programmable Logic</btitle><stitle>SPL</stitle><date>2012-03</date><risdate>2012</risdate><spage>1</spage><epage>6</epage><pages>1-6</pages><isbn>9781467301848</isbn><isbn>1467301841</isbn><eisbn>9781467301862</eisbn><eisbn>1467301868</eisbn><eisbn>146730185X</eisbn><eisbn>9781467301855</eisbn><abstract>This paper presents the hardware design of a 16-points 1-D DCT used in the emerging video coding standard HEVC - High Efficiency Video Coding. The 1-D DCT is used by the 16×16 2-D DCT of the HEVC standard. The transforms stage is one of the innovations proposed by HEVC, not only because of the variable size (from 4×4 to 32×32) but also because higher dimension transforms other than the traditional 4×4 and 8×8 are used. The hardware design presented in this work focuses on low cost and high throughput. To achieve such objectives, the 16-points algorithm from HEVC was simplified, so that a more efficient hardware design could be implemented. Some strategies were used during this simplification, such as operations reordering, factoring to compress the length of the operators, multiplications by constant turned into shifts and adds, sub-expressions sharing, among others. The architecture was designed in a fully combinational way in order to reduce hardware overhead. Synthesis results obtained using Altera FPGAs from the Cyclone II and Stratix III families showed hardware resources reduction reaching 72% when compared to an architecture described as a direct transcription of the non-optimized version of the algorithm. Even with a purely combinational implementation, the designed architecture achieved a throughput between 376 Msamples/s and 1.4 Gsamples/s. With these results, the architecture is capable of processing, in the worst case, more than 30 QFHD frames (3840×2160 pixels) per second. Therefore, the architecture is capable of processing videos with significantly high resolutions in real time. To the best of our knowledge, this is the first work in the literature that presents hardware results for the HEVC transforms.</abstract><pub>IEEE</pub><doi>10.1109/SPL.2012.6211786</doi><tpages>6</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISBN: 9781467301848 |
ispartof | 2012 VIII Southern Conference on Programmable Logic, 2012, p.1-6 |
issn | |
language | eng ; jpn |
recordid | cdi_ieee_primary_6211786 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | 1-D DCT Algorithm design and analysis Computer architecture Discrete cosine transforms Equations FPGA design Hardware HEVC Mathematical model video coding |
title | Low cost and high throughput multiplierless design of a 16 point 1-D DCT of the new HEVC video coding standard |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T20%3A53%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Low%20cost%20and%20high%20throughput%20multiplierless%20design%20of%20a%2016%20point%201-D%20DCT%20of%20the%20new%20HEVC%20video%20coding%20standard&rft.btitle=2012%20VIII%20Southern%20Conference%20on%20Programmable%20Logic&rft.au=Jeske,%20R.&rft.date=2012-03&rft.spage=1&rft.epage=6&rft.pages=1-6&rft.isbn=9781467301848&rft.isbn_list=1467301841&rft_id=info:doi/10.1109/SPL.2012.6211786&rft_dat=%3Cieee_6IE%3E6211786%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781467301862&rft.eisbn_list=1467301868&rft.eisbn_list=146730185X&rft.eisbn_list=9781467301855&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6211786&rfr_iscdi=true |