Low cost and high throughput multiplierless design of a 16 point 1-D DCT of the new HEVC video coding standard

This paper presents the hardware design of a 16-points 1-D DCT used in the emerging video coding standard HEVC - High Efficiency Video Coding. The 1-D DCT is used by the 16×16 2-D DCT of the HEVC standard. The transforms stage is one of the innovations proposed by HEVC, not only because of the varia...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Jeske, R., de Souza, J. C., Wrege, G., Conceicao, R., Grellert, M., Mattos, J., Agostini, L.
Format: Tagungsbericht
Sprache:eng ; jpn
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 6
container_issue
container_start_page 1
container_title
container_volume
creator Jeske, R.
de Souza, J. C.
Wrege, G.
Conceicao, R.
Grellert, M.
Mattos, J.
Agostini, L.
description This paper presents the hardware design of a 16-points 1-D DCT used in the emerging video coding standard HEVC - High Efficiency Video Coding. The 1-D DCT is used by the 16×16 2-D DCT of the HEVC standard. The transforms stage is one of the innovations proposed by HEVC, not only because of the variable size (from 4×4 to 32×32) but also because higher dimension transforms other than the traditional 4×4 and 8×8 are used. The hardware design presented in this work focuses on low cost and high throughput. To achieve such objectives, the 16-points algorithm from HEVC was simplified, so that a more efficient hardware design could be implemented. Some strategies were used during this simplification, such as operations reordering, factoring to compress the length of the operators, multiplications by constant turned into shifts and adds, sub-expressions sharing, among others. The architecture was designed in a fully combinational way in order to reduce hardware overhead. Synthesis results obtained using Altera FPGAs from the Cyclone II and Stratix III families showed hardware resources reduction reaching 72% when compared to an architecture described as a direct transcription of the non-optimized version of the algorithm. Even with a purely combinational implementation, the designed architecture achieved a throughput between 376 Msamples/s and 1.4 Gsamples/s. With these results, the architecture is capable of processing, in the worst case, more than 30 QFHD frames (3840×2160 pixels) per second. Therefore, the architecture is capable of processing videos with significantly high resolutions in real time. To the best of our knowledge, this is the first work in the literature that presents hardware results for the HEVC transforms.
doi_str_mv 10.1109/SPL.2012.6211786
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6211786</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6211786</ieee_id><sourcerecordid>6211786</sourcerecordid><originalsourceid>FETCH-LOGICAL-i156t-266f4bf56063f56afb20a2d061dededc07af7ff528528ad086f3ef1d198bebbc3</originalsourceid><addsrcrecordid>eNpVkEFLw0AQhVdEUGrvgpf5A6k7m2SyPUparRBQsHotm-xuspImIbu1-O8bsRdnHu8x7_AdhrE75AtEvnx4fysWgqNYkEDMJF2w-TKTmFAWc5QkLv_dibxmc--_-DQZj0mmN6wr-iNUvQ-gOg2NqxsIzdgf6mY4BNgf2uCG1pmxNd6DNt7VHfQWFCDB0LsuAEYrWOXb3zY0BjpzhM36M4dvp00_obXravBhwqtR37Irq1pv5uecsY-n9TbfRMXr80v-WEQOUwqRILJJaVPiFE-ubCm4EpoTajNtxTNlM2tTIScpzSXZ2FjUuJSlKcsqnrH7P64zxuyG0e3V-LM7fyk-ATqfWpo</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Low cost and high throughput multiplierless design of a 16 point 1-D DCT of the new HEVC video coding standard</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Jeske, R. ; de Souza, J. C. ; Wrege, G. ; Conceicao, R. ; Grellert, M. ; Mattos, J. ; Agostini, L.</creator><creatorcontrib>Jeske, R. ; de Souza, J. C. ; Wrege, G. ; Conceicao, R. ; Grellert, M. ; Mattos, J. ; Agostini, L.</creatorcontrib><description>This paper presents the hardware design of a 16-points 1-D DCT used in the emerging video coding standard HEVC - High Efficiency Video Coding. The 1-D DCT is used by the 16×16 2-D DCT of the HEVC standard. The transforms stage is one of the innovations proposed by HEVC, not only because of the variable size (from 4×4 to 32×32) but also because higher dimension transforms other than the traditional 4×4 and 8×8 are used. The hardware design presented in this work focuses on low cost and high throughput. To achieve such objectives, the 16-points algorithm from HEVC was simplified, so that a more efficient hardware design could be implemented. Some strategies were used during this simplification, such as operations reordering, factoring to compress the length of the operators, multiplications by constant turned into shifts and adds, sub-expressions sharing, among others. The architecture was designed in a fully combinational way in order to reduce hardware overhead. Synthesis results obtained using Altera FPGAs from the Cyclone II and Stratix III families showed hardware resources reduction reaching 72% when compared to an architecture described as a direct transcription of the non-optimized version of the algorithm. Even with a purely combinational implementation, the designed architecture achieved a throughput between 376 Msamples/s and 1.4 Gsamples/s. With these results, the architecture is capable of processing, in the worst case, more than 30 QFHD frames (3840×2160 pixels) per second. Therefore, the architecture is capable of processing videos with significantly high resolutions in real time. To the best of our knowledge, this is the first work in the literature that presents hardware results for the HEVC transforms.</description><identifier>ISBN: 9781467301848</identifier><identifier>ISBN: 1467301841</identifier><identifier>EISBN: 9781467301862</identifier><identifier>EISBN: 1467301868</identifier><identifier>EISBN: 146730185X</identifier><identifier>EISBN: 9781467301855</identifier><identifier>DOI: 10.1109/SPL.2012.6211786</identifier><language>eng ; jpn</language><publisher>IEEE</publisher><subject>1-D DCT ; Algorithm design and analysis ; Computer architecture ; Discrete cosine transforms ; Equations ; FPGA design ; Hardware ; HEVC ; Mathematical model ; video coding</subject><ispartof>2012 VIII Southern Conference on Programmable Logic, 2012, p.1-6</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6211786$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6211786$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Jeske, R.</creatorcontrib><creatorcontrib>de Souza, J. C.</creatorcontrib><creatorcontrib>Wrege, G.</creatorcontrib><creatorcontrib>Conceicao, R.</creatorcontrib><creatorcontrib>Grellert, M.</creatorcontrib><creatorcontrib>Mattos, J.</creatorcontrib><creatorcontrib>Agostini, L.</creatorcontrib><title>Low cost and high throughput multiplierless design of a 16 point 1-D DCT of the new HEVC video coding standard</title><title>2012 VIII Southern Conference on Programmable Logic</title><addtitle>SPL</addtitle><description>This paper presents the hardware design of a 16-points 1-D DCT used in the emerging video coding standard HEVC - High Efficiency Video Coding. The 1-D DCT is used by the 16×16 2-D DCT of the HEVC standard. The transforms stage is one of the innovations proposed by HEVC, not only because of the variable size (from 4×4 to 32×32) but also because higher dimension transforms other than the traditional 4×4 and 8×8 are used. The hardware design presented in this work focuses on low cost and high throughput. To achieve such objectives, the 16-points algorithm from HEVC was simplified, so that a more efficient hardware design could be implemented. Some strategies were used during this simplification, such as operations reordering, factoring to compress the length of the operators, multiplications by constant turned into shifts and adds, sub-expressions sharing, among others. The architecture was designed in a fully combinational way in order to reduce hardware overhead. Synthesis results obtained using Altera FPGAs from the Cyclone II and Stratix III families showed hardware resources reduction reaching 72% when compared to an architecture described as a direct transcription of the non-optimized version of the algorithm. Even with a purely combinational implementation, the designed architecture achieved a throughput between 376 Msamples/s and 1.4 Gsamples/s. With these results, the architecture is capable of processing, in the worst case, more than 30 QFHD frames (3840×2160 pixels) per second. Therefore, the architecture is capable of processing videos with significantly high resolutions in real time. To the best of our knowledge, this is the first work in the literature that presents hardware results for the HEVC transforms.</description><subject>1-D DCT</subject><subject>Algorithm design and analysis</subject><subject>Computer architecture</subject><subject>Discrete cosine transforms</subject><subject>Equations</subject><subject>FPGA design</subject><subject>Hardware</subject><subject>HEVC</subject><subject>Mathematical model</subject><subject>video coding</subject><isbn>9781467301848</isbn><isbn>1467301841</isbn><isbn>9781467301862</isbn><isbn>1467301868</isbn><isbn>146730185X</isbn><isbn>9781467301855</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2012</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpVkEFLw0AQhVdEUGrvgpf5A6k7m2SyPUparRBQsHotm-xuspImIbu1-O8bsRdnHu8x7_AdhrE75AtEvnx4fysWgqNYkEDMJF2w-TKTmFAWc5QkLv_dibxmc--_-DQZj0mmN6wr-iNUvQ-gOg2NqxsIzdgf6mY4BNgf2uCG1pmxNd6DNt7VHfQWFCDB0LsuAEYrWOXb3zY0BjpzhM36M4dvp00_obXravBhwqtR37Irq1pv5uecsY-n9TbfRMXr80v-WEQOUwqRILJJaVPiFE-ubCm4EpoTajNtxTNlM2tTIScpzSXZ2FjUuJSlKcsqnrH7P64zxuyG0e3V-LM7fyk-ATqfWpo</recordid><startdate>201203</startdate><enddate>201203</enddate><creator>Jeske, R.</creator><creator>de Souza, J. C.</creator><creator>Wrege, G.</creator><creator>Conceicao, R.</creator><creator>Grellert, M.</creator><creator>Mattos, J.</creator><creator>Agostini, L.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201203</creationdate><title>Low cost and high throughput multiplierless design of a 16 point 1-D DCT of the new HEVC video coding standard</title><author>Jeske, R. ; de Souza, J. C. ; Wrege, G. ; Conceicao, R. ; Grellert, M. ; Mattos, J. ; Agostini, L.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i156t-266f4bf56063f56afb20a2d061dededc07af7ff528528ad086f3ef1d198bebbc3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng ; jpn</language><creationdate>2012</creationdate><topic>1-D DCT</topic><topic>Algorithm design and analysis</topic><topic>Computer architecture</topic><topic>Discrete cosine transforms</topic><topic>Equations</topic><topic>FPGA design</topic><topic>Hardware</topic><topic>HEVC</topic><topic>Mathematical model</topic><topic>video coding</topic><toplevel>online_resources</toplevel><creatorcontrib>Jeske, R.</creatorcontrib><creatorcontrib>de Souza, J. C.</creatorcontrib><creatorcontrib>Wrege, G.</creatorcontrib><creatorcontrib>Conceicao, R.</creatorcontrib><creatorcontrib>Grellert, M.</creatorcontrib><creatorcontrib>Mattos, J.</creatorcontrib><creatorcontrib>Agostini, L.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jeske, R.</au><au>de Souza, J. C.</au><au>Wrege, G.</au><au>Conceicao, R.</au><au>Grellert, M.</au><au>Mattos, J.</au><au>Agostini, L.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Low cost and high throughput multiplierless design of a 16 point 1-D DCT of the new HEVC video coding standard</atitle><btitle>2012 VIII Southern Conference on Programmable Logic</btitle><stitle>SPL</stitle><date>2012-03</date><risdate>2012</risdate><spage>1</spage><epage>6</epage><pages>1-6</pages><isbn>9781467301848</isbn><isbn>1467301841</isbn><eisbn>9781467301862</eisbn><eisbn>1467301868</eisbn><eisbn>146730185X</eisbn><eisbn>9781467301855</eisbn><abstract>This paper presents the hardware design of a 16-points 1-D DCT used in the emerging video coding standard HEVC - High Efficiency Video Coding. The 1-D DCT is used by the 16×16 2-D DCT of the HEVC standard. The transforms stage is one of the innovations proposed by HEVC, not only because of the variable size (from 4×4 to 32×32) but also because higher dimension transforms other than the traditional 4×4 and 8×8 are used. The hardware design presented in this work focuses on low cost and high throughput. To achieve such objectives, the 16-points algorithm from HEVC was simplified, so that a more efficient hardware design could be implemented. Some strategies were used during this simplification, such as operations reordering, factoring to compress the length of the operators, multiplications by constant turned into shifts and adds, sub-expressions sharing, among others. The architecture was designed in a fully combinational way in order to reduce hardware overhead. Synthesis results obtained using Altera FPGAs from the Cyclone II and Stratix III families showed hardware resources reduction reaching 72% when compared to an architecture described as a direct transcription of the non-optimized version of the algorithm. Even with a purely combinational implementation, the designed architecture achieved a throughput between 376 Msamples/s and 1.4 Gsamples/s. With these results, the architecture is capable of processing, in the worst case, more than 30 QFHD frames (3840×2160 pixels) per second. Therefore, the architecture is capable of processing videos with significantly high resolutions in real time. To the best of our knowledge, this is the first work in the literature that presents hardware results for the HEVC transforms.</abstract><pub>IEEE</pub><doi>10.1109/SPL.2012.6211786</doi><tpages>6</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISBN: 9781467301848
ispartof 2012 VIII Southern Conference on Programmable Logic, 2012, p.1-6
issn
language eng ; jpn
recordid cdi_ieee_primary_6211786
source IEEE Electronic Library (IEL) Conference Proceedings
subjects 1-D DCT
Algorithm design and analysis
Computer architecture
Discrete cosine transforms
Equations
FPGA design
Hardware
HEVC
Mathematical model
video coding
title Low cost and high throughput multiplierless design of a 16 point 1-D DCT of the new HEVC video coding standard
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T20%3A53%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Low%20cost%20and%20high%20throughput%20multiplierless%20design%20of%20a%2016%20point%201-D%20DCT%20of%20the%20new%20HEVC%20video%20coding%20standard&rft.btitle=2012%20VIII%20Southern%20Conference%20on%20Programmable%20Logic&rft.au=Jeske,%20R.&rft.date=2012-03&rft.spage=1&rft.epage=6&rft.pages=1-6&rft.isbn=9781467301848&rft.isbn_list=1467301841&rft_id=info:doi/10.1109/SPL.2012.6211786&rft_dat=%3Cieee_6IE%3E6211786%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781467301862&rft.eisbn_list=1467301868&rft.eisbn_list=146730185X&rft.eisbn_list=9781467301855&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6211786&rfr_iscdi=true