SIMD- and cache-friendly algorithm for sorting an array of structures

This paper describes our new algorithm for sorting an array of structures by efficiently exploiting the SIMD instructions and cache memory of today's processors. Recently, multiway mergesort implemented with SIMD instructions has been used as a high-performance in-memory sorting algorithm for s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the VLDB Endowment 2015-07, Vol.8 (11), p.1274-1285
Hauptverfasser: Inoue, Hiroshi, Taura, Kenjiro
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1285
container_issue 11
container_start_page 1274
container_title Proceedings of the VLDB Endowment
container_volume 8
creator Inoue, Hiroshi
Taura, Kenjiro
description This paper describes our new algorithm for sorting an array of structures by efficiently exploiting the SIMD instructions and cache memory of today's processors. Recently, multiway mergesort implemented with SIMD instructions has been used as a high-performance in-memory sorting algorithm for sorting integer values. For sorting an array of structures with SIMD instructions, a frequently used approach is to first pack the key and index for each record into an integer value, sort the key-index pairs using SIMD instructions, then rearrange the records based on the sorted key-index pairs. This approach can efficiently exploit SIMD instructions because it sorts the key-index pairs while packed into integer values; hence, it can use existing high-performance sorting implementations of the SIMD-based multiway mergesort for integers. However, this approach has frequent cache misses in the final rearranging phase due to its random and scattered memory accesses so that this phase limits both single-thread performance and scalability with multiple cores. Our approach is also based on multiway mergesort, but it can avoid costly random accesses for rearranging the records while still efficiently exploiting the SIMD instructions. Our results showed that our approach exhibited up to 2.1x better single-thread performance than the key-index approach implemented with SIMD instructions when sorting 512M 16-byte records on one core. Our approach also yielded better performance when we used multiple cores. Compared to an optimized radix sort, our vectorized multiway mergesort achieved better performance when the each record is large. Our vectorized multiway mergesort also yielded higher scalability with multiple cores than the radix sort.
doi_str_mv 10.14778/2809974.2809988
format Article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_14778_2809974_2809988</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_14778_2809974_2809988</sourcerecordid><originalsourceid>FETCH-LOGICAL-c351t-3da40380d44257c753dbadbd2640d2b1dc7e31ada29e163324f17f13f4ed56463</originalsourceid><addsrcrecordid>eNpNkL1OwzAYRS0EEqWwM_oFXPwX2xlRKVCpiKFljr74pw1KE_Q5HfL2VCUD07lXurrDIeRR8IXQ1ron6XhZWr240LkrMpOi4Ozc7PW_fEvucv7m3Dgj3IystuuPF0ahC9SDP0SWsIldaEcK7b7HZjgcaeqR5h6HptufhxQQYaR9onnAkx9OGPM9uUnQ5vgwcU6-Xle75TvbfL6tl88b5lUhBqYCaK4cD1rLwnpbqFBDqIM0mgdZi-BtVAICyDIKo5TUSdgkVNIxFEYbNSf879djnzPGVP1gcwQcK8Gri4Zq0lBNGtQvNl1Pcw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>SIMD- and cache-friendly algorithm for sorting an array of structures</title><source>ACM Digital Library Complete</source><creator>Inoue, Hiroshi ; Taura, Kenjiro</creator><creatorcontrib>Inoue, Hiroshi ; Taura, Kenjiro</creatorcontrib><description>This paper describes our new algorithm for sorting an array of structures by efficiently exploiting the SIMD instructions and cache memory of today's processors. Recently, multiway mergesort implemented with SIMD instructions has been used as a high-performance in-memory sorting algorithm for sorting integer values. For sorting an array of structures with SIMD instructions, a frequently used approach is to first pack the key and index for each record into an integer value, sort the key-index pairs using SIMD instructions, then rearrange the records based on the sorted key-index pairs. This approach can efficiently exploit SIMD instructions because it sorts the key-index pairs while packed into integer values; hence, it can use existing high-performance sorting implementations of the SIMD-based multiway mergesort for integers. However, this approach has frequent cache misses in the final rearranging phase due to its random and scattered memory accesses so that this phase limits both single-thread performance and scalability with multiple cores. Our approach is also based on multiway mergesort, but it can avoid costly random accesses for rearranging the records while still efficiently exploiting the SIMD instructions. Our results showed that our approach exhibited up to 2.1x better single-thread performance than the key-index approach implemented with SIMD instructions when sorting 512M 16-byte records on one core. Our approach also yielded better performance when we used multiple cores. Compared to an optimized radix sort, our vectorized multiway mergesort achieved better performance when the each record is large. Our vectorized multiway mergesort also yielded higher scalability with multiple cores than the radix sort.</description><identifier>ISSN: 2150-8097</identifier><identifier>EISSN: 2150-8097</identifier><identifier>DOI: 10.14778/2809974.2809988</identifier><language>eng</language><ispartof>Proceedings of the VLDB Endowment, 2015-07, Vol.8 (11), p.1274-1285</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c351t-3da40380d44257c753dbadbd2640d2b1dc7e31ada29e163324f17f13f4ed56463</citedby><cites>FETCH-LOGICAL-c351t-3da40380d44257c753dbadbd2640d2b1dc7e31ada29e163324f17f13f4ed56463</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27923,27924</link.rule.ids></links><search><creatorcontrib>Inoue, Hiroshi</creatorcontrib><creatorcontrib>Taura, Kenjiro</creatorcontrib><title>SIMD- and cache-friendly algorithm for sorting an array of structures</title><title>Proceedings of the VLDB Endowment</title><description>This paper describes our new algorithm for sorting an array of structures by efficiently exploiting the SIMD instructions and cache memory of today's processors. Recently, multiway mergesort implemented with SIMD instructions has been used as a high-performance in-memory sorting algorithm for sorting integer values. For sorting an array of structures with SIMD instructions, a frequently used approach is to first pack the key and index for each record into an integer value, sort the key-index pairs using SIMD instructions, then rearrange the records based on the sorted key-index pairs. This approach can efficiently exploit SIMD instructions because it sorts the key-index pairs while packed into integer values; hence, it can use existing high-performance sorting implementations of the SIMD-based multiway mergesort for integers. However, this approach has frequent cache misses in the final rearranging phase due to its random and scattered memory accesses so that this phase limits both single-thread performance and scalability with multiple cores. Our approach is also based on multiway mergesort, but it can avoid costly random accesses for rearranging the records while still efficiently exploiting the SIMD instructions. Our results showed that our approach exhibited up to 2.1x better single-thread performance than the key-index approach implemented with SIMD instructions when sorting 512M 16-byte records on one core. Our approach also yielded better performance when we used multiple cores. Compared to an optimized radix sort, our vectorized multiway mergesort achieved better performance when the each record is large. Our vectorized multiway mergesort also yielded higher scalability with multiple cores than the radix sort.</description><issn>2150-8097</issn><issn>2150-8097</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNpNkL1OwzAYRS0EEqWwM_oFXPwX2xlRKVCpiKFljr74pw1KE_Q5HfL2VCUD07lXurrDIeRR8IXQ1ron6XhZWr240LkrMpOi4Ozc7PW_fEvucv7m3Dgj3IystuuPF0ahC9SDP0SWsIldaEcK7b7HZjgcaeqR5h6HptufhxQQYaR9onnAkx9OGPM9uUnQ5vgwcU6-Xle75TvbfL6tl88b5lUhBqYCaK4cD1rLwnpbqFBDqIM0mgdZi-BtVAICyDIKo5TUSdgkVNIxFEYbNSf879djnzPGVP1gcwQcK8Gri4Zq0lBNGtQvNl1Pcw</recordid><startdate>20150701</startdate><enddate>20150701</enddate><creator>Inoue, Hiroshi</creator><creator>Taura, Kenjiro</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20150701</creationdate><title>SIMD- and cache-friendly algorithm for sorting an array of structures</title><author>Inoue, Hiroshi ; Taura, Kenjiro</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c351t-3da40380d44257c753dbadbd2640d2b1dc7e31ada29e163324f17f13f4ed56463</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Inoue, Hiroshi</creatorcontrib><creatorcontrib>Taura, Kenjiro</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the VLDB Endowment</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Inoue, Hiroshi</au><au>Taura, Kenjiro</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SIMD- and cache-friendly algorithm for sorting an array of structures</atitle><jtitle>Proceedings of the VLDB Endowment</jtitle><date>2015-07-01</date><risdate>2015</risdate><volume>8</volume><issue>11</issue><spage>1274</spage><epage>1285</epage><pages>1274-1285</pages><issn>2150-8097</issn><eissn>2150-8097</eissn><abstract>This paper describes our new algorithm for sorting an array of structures by efficiently exploiting the SIMD instructions and cache memory of today's processors. Recently, multiway mergesort implemented with SIMD instructions has been used as a high-performance in-memory sorting algorithm for sorting integer values. For sorting an array of structures with SIMD instructions, a frequently used approach is to first pack the key and index for each record into an integer value, sort the key-index pairs using SIMD instructions, then rearrange the records based on the sorted key-index pairs. This approach can efficiently exploit SIMD instructions because it sorts the key-index pairs while packed into integer values; hence, it can use existing high-performance sorting implementations of the SIMD-based multiway mergesort for integers. However, this approach has frequent cache misses in the final rearranging phase due to its random and scattered memory accesses so that this phase limits both single-thread performance and scalability with multiple cores. Our approach is also based on multiway mergesort, but it can avoid costly random accesses for rearranging the records while still efficiently exploiting the SIMD instructions. Our results showed that our approach exhibited up to 2.1x better single-thread performance than the key-index approach implemented with SIMD instructions when sorting 512M 16-byte records on one core. Our approach also yielded better performance when we used multiple cores. Compared to an optimized radix sort, our vectorized multiway mergesort achieved better performance when the each record is large. Our vectorized multiway mergesort also yielded higher scalability with multiple cores than the radix sort.</abstract><doi>10.14778/2809974.2809988</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2150-8097
ispartof Proceedings of the VLDB Endowment, 2015-07, Vol.8 (11), p.1274-1285
issn 2150-8097
2150-8097
language eng
recordid cdi_crossref_primary_10_14778_2809974_2809988
source ACM Digital Library Complete
title SIMD- and cache-friendly algorithm for sorting an array of structures
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T02%3A03%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SIMD-%20and%20cache-friendly%20algorithm%20for%20sorting%20an%20array%20of%20structures&rft.jtitle=Proceedings%20of%20the%20VLDB%20Endowment&rft.au=Inoue,%20Hiroshi&rft.date=2015-07-01&rft.volume=8&rft.issue=11&rft.spage=1274&rft.epage=1285&rft.pages=1274-1285&rft.issn=2150-8097&rft.eissn=2150-8097&rft_id=info:doi/10.14778/2809974.2809988&rft_dat=%3Ccrossref%3E10_14778_2809974_2809988%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true