SIMD- and cache-friendly algorithm for sorting an array of structures

This paper describes our new algorithm for sorting an array of structures by efficiently exploiting the SIMD instructions and cache memory of today's processors. Recently, multiway mergesort implemented with SIMD instructions has been used as a high-performance in-memory sorting algorithm for s...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings of the VLDB Endowment 2015-07, Vol.8 (11), p.1274-1285
Hauptverfasser:	Inoue, Hiroshi, Taura, Kenjiro
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1285
container_issue	11
container_start_page	1274
container_title	Proceedings of the VLDB Endowment
container_volume	8
creator	Inoue, Hiroshi Taura, Kenjiro
description	This paper describes our new algorithm for sorting an array of structures by efficiently exploiting the SIMD instructions and cache memory of today's processors. Recently, multiway mergesort implemented with SIMD instructions has been used as a high-performance in-memory sorting algorithm for sorting integer values. For sorting an array of structures with SIMD instructions, a frequently used approach is to first pack the key and index for each record into an integer value, sort the key-index pairs using SIMD instructions, then rearrange the records based on the sorted key-index pairs. This approach can efficiently exploit SIMD instructions because it sorts the key-index pairs while packed into integer values; hence, it can use existing high-performance sorting implementations of the SIMD-based multiway mergesort for integers. However, this approach has frequent cache misses in the final rearranging phase due to its random and scattered memory accesses so that this phase limits both single-thread performance and scalability with multiple cores. Our approach is also based on multiway mergesort, but it can avoid costly random accesses for rearranging the records while still efficiently exploiting the SIMD instructions. Our results showed that our approach exhibited up to 2.1x better single-thread performance than the key-index approach implemented with SIMD instructions when sorting 512M 16-byte records on one core. Our approach also yielded better performance when we used multiple cores. Compared to an optimized radix sort, our vectorized multiway mergesort achieved better performance when the each record is large. Our vectorized multiway mergesort also yielded higher scalability with multiple cores than the radix sort.
doi_str_mv	10.14778/2809974.2809988
format	Article
fullrecord	<record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_14778_2809974_2809988</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_14778_2809974_2809988</sourcerecordid><originalsourceid>FETCH-LOGICAL-c351t-3da40380d44257c753dbadbd2640d2b1dc7e31ada29e163324f17f13f4ed56463</originalsourceid><addsrcrecordid>eNpNkL1OwzAYRS0EEqWwM_oFXPwX2xlRKVCpiKFljr74pw1KE_Q5HfL2VCUD07lXurrDIeRR8IXQ1ron6XhZWr240LkrMpOi4Ozc7PW_fEvucv7m3Dgj3IystuuPF0ahC9SDP0SWsIldaEcK7b7HZjgcaeqR5h6HptufhxQQYaR9onnAkx9OGPM9uUnQ5vgwcU6-Xle75TvbfL6tl88b5lUhBqYCaK4cD1rLwnpbqFBDqIM0mgdZi-BtVAICyDIKo5TUSdgkVNIxFEYbNSf879djnzPGVP1gcwQcK8Gri4Zq0lBNGtQvNl1Pcw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>SIMD- and cache-friendly algorithm for sorting an array of structures</title><source>ACM Digital Library Complete</source><creator>Inoue, Hiroshi ; Taura, Kenjiro</creator><creatorcontrib>Inoue, Hiroshi ; Taura, Kenjiro</creatorcontrib><description>This paper describes our new algorithm for sorting an array of structures by efficiently exploiting the SIMD instructions and cache memory of today's processors. Recently, multiway mergesort implemented with SIMD instructions has been used as a high-performance in-memory sorting algorithm for sorting integer values. For sorting an array of structures with SIMD instructions, a frequently used approach is to first pack the key and index for each record into an integer value, sort the key-index pairs using SIMD instructions, then rearrange the records based on the sorted key-index pairs. This approach can efficiently exploit SIMD instructions because it sorts the key-index pairs while packed into integer values; hence, it can use existing high-performance sorting implementations of the SIMD-based multiway mergesort for integers. However, this approach has frequent cache misses in the final rearranging phase due to its random and scattered memory accesses so that this phase limits both single-thread performance and scalability with multiple cores. Our approach is also based on multiway mergesort, but it can avoid costly random accesses for rearranging the records while still efficiently exploiting the SIMD instructions. Our results showed that our approach exhibited up to 2.1x better single-thread performance than the key-index approach implemented with SIMD instructions when sorting 512M 16-byte records on one core. Our approach also yielded better performance when we used multiple cores. Compared to an optimized radix sort, our vectorized multiway mergesort achieved better performance when the each record is large. Our vectorized multiway mergesort also yielded higher scalability with multiple cores than the radix sort.</description><identifier>ISSN: 2150-8097</identifier><identifier>EISSN: 2150-8097</identifier><identifier>DOI: 10.14778/2809974.2809988</identifier><language>eng</language><ispartof>Proceedings of the VLDB Endowment, 2015-07, Vol.8 (11), p.1274-1285</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c351t-3da40380d44257c753dbadbd2640d2b1dc7e31ada29e163324f17f13f4ed56463</citedby><cites>FETCH-LOGICAL-c351t-3da40380d44257c753dbadbd2640d2b1dc7e31ada29e163324f17f13f4ed56463</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27923,27924</link.rule.ids></links><search><creatorcontrib>Inoue, Hiroshi</creatorcontrib><creatorcontrib>Taura, Kenjiro</creatorcontrib><title>SIMD- and cache-friendly algorithm for sorting an array of structures</title><title>Proceedings of the VLDB Endowment</title><description>This paper describes our new algorithm for sorting an array of structures by efficiently exploiting the SIMD instructions and cache memory of today's processors. Recently, multiway mergesort implemented with SIMD instructions has been used as a high-performance in-memory sorting algorithm for sorting integer values. For sorting an array of structures with SIMD instructions, a frequently used approach is to first pack the key and index for each record into an integer value, sort the key-index pairs using SIMD instructions, then rearrange the records based on the sorted key-index pairs. This approach can efficiently exploit SIMD instructions because it sorts the key-index pairs while packed into integer values; hence, it can use existing high-performance sorting implementations of the SIMD-based multiway mergesort for integers. However, this approach has frequent cache misses in the final rearranging phase due to its random and scattered memory accesses so that this phase limits both single-thread performance and scalability with multiple cores. Our approach is also based on multiway mergesort, but it can avoid costly random accesses for rearranging the records while still efficiently exploiting the SIMD instructions. Our results showed that our approach exhibited up to 2.1x better single-thread performance than the key-index approach implemented with SIMD instructions when sorting 512M 16-byte records on one core. Our approach also yielded better performance when we used multiple cores. Compared to an optimized radix sort, our vectorized multiway mergesort achieved better performance when the each record is large. Our vectorized multiway mergesort also yielded higher scalability with multiple cores than the radix sort.</description><issn>2150-8097</issn><issn>2150-8097</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNpNkL1OwzAYRS0EEqWwM_oFXPwX2xlRKVCpiKFljr74pw1KE_Q5HfL2VCUD07lXurrDIeRR8IXQ1ron6XhZWr240LkrMpOi4Ozc7PW_fEvucv7m3Dgj3IystuuPF0ahC9SDP0SWsIldaEcK7b7HZjgcaeqR5h6HptufhxQQYaR9onnAkx9OGPM9uUnQ5vgwcU6-Xle75TvbfL6tl88b5lUhBqYCaK4cD1rLwnpbqFBDqIM0mgdZi-BtVAICyDIKo5TUSdgkVNIxFEYbNSf879djnzPGVP1gcwQcK8Gri4Zq0lBNGtQvNl1Pcw</recordid><startdate>20150701</startdate><enddate>20150701</enddate><creator>Inoue, Hiroshi</creator><creator>Taura, Kenjiro</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20150701</creationdate><title>SIMD- and cache-friendly algorithm for sorting an array of structures</title><author>Inoue, Hiroshi ; Taura, Kenjiro</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c351t-3da40380d44257c753dbadbd2640d2b1dc7e31ada29e163324f17f13f4ed56463</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Inoue, Hiroshi</creatorcontrib><creatorcontrib>Taura, Kenjiro</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the VLDB Endowment</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Inoue, Hiroshi</au><au>Taura, Kenjiro</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SIMD- and cache-friendly algorithm for sorting an array of structures</atitle><jtitle>Proceedings of the VLDB Endowment</jtitle><date>2015-07-01</date><risdate>2015</risdate><volume>8</volume><issue>11</issue><spage>1274</spage><epage>1285</epage><pages>1274-1285</pages><issn>2150-8097</issn><eissn>2150-8097</eissn><abstract>This paper describes our new algorithm for sorting an array of structures by efficiently exploiting the SIMD instructions and cache memory of today's processors. Recently, multiway mergesort implemented with SIMD instructions has been used as a high-performance in-memory sorting algorithm for sorting integer values. For sorting an array of structures with SIMD instructions, a frequently used approach is to first pack the key and index for each record into an integer value, sort the key-index pairs using SIMD instructions, then rearrange the records based on the sorted key-index pairs. This approach can efficiently exploit SIMD instructions because it sorts the key-index pairs while packed into integer values; hence, it can use existing high-performance sorting implementations of the SIMD-based multiway mergesort for integers. However, this approach has frequent cache misses in the final rearranging phase due to its random and scattered memory accesses so that this phase limits both single-thread performance and scalability with multiple cores. Our approach is also based on multiway mergesort, but it can avoid costly random accesses for rearranging the records while still efficiently exploiting the SIMD instructions. Our results showed that our approach exhibited up to 2.1x better single-thread performance than the key-index approach implemented with SIMD instructions when sorting 512M 16-byte records on one core. Our approach also yielded better performance when we used multiple cores. Compared to an optimized radix sort, our vectorized multiway mergesort achieved better performance when the each record is large. Our vectorized multiway mergesort also yielded higher scalability with multiple cores than the radix sort.</abstract><doi>10.14778/2809974.2809988</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2150-8097
ispartof	Proceedings of the VLDB Endowment, 2015-07, Vol.8 (11), p.1274-1285
issn	2150-8097 2150-8097
language	eng
recordid	cdi_crossref_primary_10_14778_2809974_2809988
source	ACM Digital Library Complete
title	SIMD- and cache-friendly algorithm for sorting an array of structures
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T02%3A03%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SIMD-%20and%20cache-friendly%20algorithm%20for%20sorting%20an%20array%20of%20structures&rft.jtitle=Proceedings%20of%20the%20VLDB%20Endowment&rft.au=Inoue,%20Hiroshi&rft.date=2015-07-01&rft.volume=8&rft.issue=11&rft.spage=1274&rft.epage=1285&rft.pages=1274-1285&rft.issn=2150-8097&rft.eissn=2150-8097&rft_id=info:doi/10.14778/2809974.2809988&rft_dat=%3Ccrossref%3E10_14778_2809974_2809988%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true