Faster Population Counts Using AVX2 Instructions

Abstract Counting the number of ones in a binary stream is a common operation in database, information-retrieval, cryptographic and machine-learning applications. Most processors have dedicated instructions to count the number of ones in a word (e.g. popcnt on ×64 processors). Maybe surprisingly, we...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer journal 2018-01, Vol.61 (1), p.111-120
Hauptverfasser: Muła, Wojciech, Kurz, Nathan, Lemire, Daniel
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 120
container_issue 1
container_start_page 111
container_title Computer journal
container_volume 61
creator Muła, Wojciech
Kurz, Nathan
Lemire, Daniel
description Abstract Counting the number of ones in a binary stream is a common operation in database, information-retrieval, cryptographic and machine-learning applications. Most processors have dedicated instructions to count the number of ones in a word (e.g. popcnt on ×64 processors). Maybe surprisingly, we show that a vectorized approach using SIMD instructions can be twice as fast as using the dedicated instructions on recent Intel processors. The benefits can be even greater for applications such as similarity measures (e.g. the Jaccard index) that require additional Boolean operations. Our approach has been adopted by LLVM: it is used by its popular C compiler (Clang).
doi_str_mv 10.1093/comjnl/bxx046
format Article
fullrecord <record><control><sourceid>oup_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1093_comjnl_bxx046</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/comjnl/bxx046</oup_id><sourcerecordid>10.1093/comjnl/bxx046</sourcerecordid><originalsourceid>FETCH-LOGICAL-c336t-b7af9c3ebf90290ac97e6404bd54497410057d542e1ada8257e6c7f15ce7a0b13</originalsourceid><addsrcrecordid>eNqFj0FLwzAAhYMoWKdH7z16iXtJ08QcR3E6GOjBibeSZql0dE1JWpj_3o569_QevI8HHyH3DI8MOltafzx07bI6nSDkBUmYkKAcUl2SBGCgQnJck5sYDwA4tEwI1iYOLqTvvh9bMzS-Sws_dkNMd7HpvtPV5xdPN10cwmjPa7wlV7Vpo7v7ywXZrZ8_ile6fXvZFKsttVkmB1opU2ubuarW4BrGauWkgKj2uRBaCQbkaurcMbM3TzyfZqtqllunDCqWLQidf23wMQZXl31ojib8lAzlWbecdctZd-IfZt6P_T_oL_JrWBk</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Faster Population Counts Using AVX2 Instructions</title><source>Oxford University Press Journals All Titles (1996-Current)</source><creator>Muła, Wojciech ; Kurz, Nathan ; Lemire, Daniel</creator><creatorcontrib>Muła, Wojciech ; Kurz, Nathan ; Lemire, Daniel</creatorcontrib><description>Abstract Counting the number of ones in a binary stream is a common operation in database, information-retrieval, cryptographic and machine-learning applications. Most processors have dedicated instructions to count the number of ones in a word (e.g. popcnt on ×64 processors). Maybe surprisingly, we show that a vectorized approach using SIMD instructions can be twice as fast as using the dedicated instructions on recent Intel processors. The benefits can be even greater for applications such as similarity measures (e.g. the Jaccard index) that require additional Boolean operations. Our approach has been adopted by LLVM: it is used by its popular C compiler (Clang).</description><identifier>ISSN: 0010-4620</identifier><identifier>EISSN: 1460-2067</identifier><identifier>DOI: 10.1093/comjnl/bxx046</identifier><language>eng</language><publisher>Oxford University Press</publisher><ispartof>Computer journal, 2018-01, Vol.61 (1), p.111-120</ispartof><rights>The British Computer Society 2017. All rights reserved. For Permissions, please email: journals.permissions@oup.com 2017</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c336t-b7af9c3ebf90290ac97e6404bd54497410057d542e1ada8257e6c7f15ce7a0b13</citedby><cites>FETCH-LOGICAL-c336t-b7af9c3ebf90290ac97e6404bd54497410057d542e1ada8257e6c7f15ce7a0b13</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,1578,27901,27902</link.rule.ids></links><search><creatorcontrib>Muła, Wojciech</creatorcontrib><creatorcontrib>Kurz, Nathan</creatorcontrib><creatorcontrib>Lemire, Daniel</creatorcontrib><title>Faster Population Counts Using AVX2 Instructions</title><title>Computer journal</title><description>Abstract Counting the number of ones in a binary stream is a common operation in database, information-retrieval, cryptographic and machine-learning applications. Most processors have dedicated instructions to count the number of ones in a word (e.g. popcnt on ×64 processors). Maybe surprisingly, we show that a vectorized approach using SIMD instructions can be twice as fast as using the dedicated instructions on recent Intel processors. The benefits can be even greater for applications such as similarity measures (e.g. the Jaccard index) that require additional Boolean operations. Our approach has been adopted by LLVM: it is used by its popular C compiler (Clang).</description><issn>0010-4620</issn><issn>1460-2067</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNqFj0FLwzAAhYMoWKdH7z16iXtJ08QcR3E6GOjBibeSZql0dE1JWpj_3o569_QevI8HHyH3DI8MOltafzx07bI6nSDkBUmYkKAcUl2SBGCgQnJck5sYDwA4tEwI1iYOLqTvvh9bMzS-Sws_dkNMd7HpvtPV5xdPN10cwmjPa7wlV7Vpo7v7ywXZrZ8_ile6fXvZFKsttVkmB1opU2ubuarW4BrGauWkgKj2uRBaCQbkaurcMbM3TzyfZqtqllunDCqWLQidf23wMQZXl31ojib8lAzlWbecdctZd-IfZt6P_T_oL_JrWBk</recordid><startdate>20180101</startdate><enddate>20180101</enddate><creator>Muła, Wojciech</creator><creator>Kurz, Nathan</creator><creator>Lemire, Daniel</creator><general>Oxford University Press</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20180101</creationdate><title>Faster Population Counts Using AVX2 Instructions</title><author>Muła, Wojciech ; Kurz, Nathan ; Lemire, Daniel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c336t-b7af9c3ebf90290ac97e6404bd54497410057d542e1ada8257e6c7f15ce7a0b13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Muła, Wojciech</creatorcontrib><creatorcontrib>Kurz, Nathan</creatorcontrib><creatorcontrib>Lemire, Daniel</creatorcontrib><collection>CrossRef</collection><jtitle>Computer journal</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Muła, Wojciech</au><au>Kurz, Nathan</au><au>Lemire, Daniel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Faster Population Counts Using AVX2 Instructions</atitle><jtitle>Computer journal</jtitle><date>2018-01-01</date><risdate>2018</risdate><volume>61</volume><issue>1</issue><spage>111</spage><epage>120</epage><pages>111-120</pages><issn>0010-4620</issn><eissn>1460-2067</eissn><abstract>Abstract Counting the number of ones in a binary stream is a common operation in database, information-retrieval, cryptographic and machine-learning applications. Most processors have dedicated instructions to count the number of ones in a word (e.g. popcnt on ×64 processors). Maybe surprisingly, we show that a vectorized approach using SIMD instructions can be twice as fast as using the dedicated instructions on recent Intel processors. The benefits can be even greater for applications such as similarity measures (e.g. the Jaccard index) that require additional Boolean operations. Our approach has been adopted by LLVM: it is used by its popular C compiler (Clang).</abstract><pub>Oxford University Press</pub><doi>10.1093/comjnl/bxx046</doi><tpages>10</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0010-4620
ispartof Computer journal, 2018-01, Vol.61 (1), p.111-120
issn 0010-4620
1460-2067
language eng
recordid cdi_crossref_primary_10_1093_comjnl_bxx046
source Oxford University Press Journals All Titles (1996-Current)
title Faster Population Counts Using AVX2 Instructions
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T06%3A17%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-oup_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Faster%20Population%20Counts%20Using%20AVX2%20Instructions&rft.jtitle=Computer%20journal&rft.au=Mu%C5%82a,%20Wojciech&rft.date=2018-01-01&rft.volume=61&rft.issue=1&rft.spage=111&rft.epage=120&rft.pages=111-120&rft.issn=0010-4620&rft.eissn=1460-2067&rft_id=info:doi/10.1093/comjnl/bxx046&rft_dat=%3Coup_cross%3E10.1093/comjnl/bxx046%3C/oup_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_oup_id=10.1093/comjnl/bxx046&rfr_iscdi=true