Rapid profiling via stratified sampling

Sophisticated binary translators and dynamic optimizers demand a program profiler with low overhead, high accuracy, and the ability to collect a variety of profile types. A profiling scheme that achieves these goals is proposed. Conceptually, the hardware compresses a stream of profile data by count...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Sastry, S. Subramanya, Bodík, Rastislav, Smith, James E.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 289
container_issue
container_start_page 278
container_title
container_volume
creator Sastry, S. Subramanya
Bodík, Rastislav
Smith, James E.
description Sophisticated binary translators and dynamic optimizers demand a program profiler with low overhead, high accuracy, and the ability to collect a variety of profile types. A profiling scheme that achieves these goals is proposed. Conceptually, the hardware compresses a stream of profile data by counting identical events; the compressed profile dam is passed to software for analysis. Compressing the high-bandwidth event stream greatly reduces software overhead. Because optimizations can tolerate some profiling errors, we allow the stream compressor to be lossy, thereby enabling a low-cost sampling-based hardware design. Because the hardware compressor is insensitive to the event content, it supports various profile types and can process multiple types simultaneously. Basic components of our framework are periodic and random samplers, counters, and hash functions. These components are composed to form a variety of stream compressors. One design is both simple and very effective: the input stream is hash-split into multiple substreams, each of which is fed into a simple periodic sampler that selects every kth event. This stratified periodic sampler performs better than conventional random sampling because it biases each substream towards a small number of unique events, thereby reducing sampling error, and allowing faster convergence to an accurate profile. For example, convergence to a given level of accuracy is about twice as fast for gcc. When sampling overhead is considered, the stratified periodic profiler achieves less than 3% error while incurring an overhead of only 3.5% for gcc.
doi_str_mv 10.1145/379240.379273
format Conference Proceeding
fullrecord <record><control><sourceid>proquest_acm_b</sourceid><recordid>TN_cdi_acm_books_10_1145_379240_379273</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>31528648</sourcerecordid><originalsourceid>FETCH-LOGICAL-a2373-b75f21437feb4d379b44aac46cc5fdecec42970d963c69a4da684fc3396d50af3</originalsourceid><addsrcrecordid>eNqNkEtLxDAUhQMqOI6zdN-VbuyYx00yWcrgCwYE0XW4zUOi7bQ2HX-_LRXcejcHLh-Hw0fIBaNrxkDeCG040PUUWhyRM6qVkYwpro_JgjIlSmkUnJJVzh90PJBMG7MgVy_YJV90fRtTnfbvxXfCIg89Dimm4IuMTTf9z8lJxDqH1W8uydv93ev2sdw9Pzxtb3clcqFFWWkZOQOhY6jAj2sqAEQHyjkZfXDBATeaeqOEUwbBo9pAdEIY5SXFKJbkcu4dF30dQh5sk7ILdY370B6yFUzyjYLNH4iusVXbfmbLqJ1c2NmFnV2M4PW_QFv1KUTxA4vfXGU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype><pqid>31528648</pqid></control><display><type>conference_proceeding</type><title>Rapid profiling via stratified sampling</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><source>ACM Digital Library</source><creator>Sastry, S. Subramanya ; Bodík, Rastislav ; Smith, James E.</creator><creatorcontrib>Sastry, S. Subramanya ; Bodík, Rastislav ; Smith, James E.</creatorcontrib><description>Sophisticated binary translators and dynamic optimizers demand a program profiler with low overhead, high accuracy, and the ability to collect a variety of profile types. A profiling scheme that achieves these goals is proposed. Conceptually, the hardware compresses a stream of profile data by counting identical events; the compressed profile dam is passed to software for analysis. Compressing the high-bandwidth event stream greatly reduces software overhead. Because optimizations can tolerate some profiling errors, we allow the stream compressor to be lossy, thereby enabling a low-cost sampling-based hardware design. Because the hardware compressor is insensitive to the event content, it supports various profile types and can process multiple types simultaneously. Basic components of our framework are periodic and random samplers, counters, and hash functions. These components are composed to form a variety of stream compressors. One design is both simple and very effective: the input stream is hash-split into multiple substreams, each of which is fed into a simple periodic sampler that selects every kth event. This stratified periodic sampler performs better than conventional random sampling because it biases each substream towards a small number of unique events, thereby reducing sampling error, and allowing faster convergence to an accurate profile. For example, convergence to a given level of accuracy is about twice as fast for gcc. When sampling overhead is considered, the stratified periodic profiler achieves less than 3% error while incurring an overhead of only 3.5% for gcc.</description><identifier>ISSN: 0163-5964</identifier><identifier>ISBN: 0769511627</identifier><identifier>ISBN: 9780769511627</identifier><identifier>DOI: 10.1145/379240.379273</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Computer systems organization ; Computer systems organization -- Dependable and fault-tolerant systems and networks ; General and reference -- Cross-computing tools and techniques -- Evaluation ; General and reference -- Cross-computing tools and techniques -- Metrics ; General and reference -- Cross-computing tools and techniques -- Performance ; Hardware -- Electronic design automation -- Logic synthesis -- Circuit optimization ; Networks -- Network performance evaluation</subject><ispartof>Computer architecture news, 2001, p.278-289</ispartof><rights>2001 Authors</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>309,310,777,781,786,787,4036,4037,23911,23912,25121,27906</link.rule.ids></links><search><creatorcontrib>Sastry, S. Subramanya</creatorcontrib><creatorcontrib>Bodík, Rastislav</creatorcontrib><creatorcontrib>Smith, James E.</creatorcontrib><title>Rapid profiling via stratified sampling</title><title>Computer architecture news</title><description>Sophisticated binary translators and dynamic optimizers demand a program profiler with low overhead, high accuracy, and the ability to collect a variety of profile types. A profiling scheme that achieves these goals is proposed. Conceptually, the hardware compresses a stream of profile data by counting identical events; the compressed profile dam is passed to software for analysis. Compressing the high-bandwidth event stream greatly reduces software overhead. Because optimizations can tolerate some profiling errors, we allow the stream compressor to be lossy, thereby enabling a low-cost sampling-based hardware design. Because the hardware compressor is insensitive to the event content, it supports various profile types and can process multiple types simultaneously. Basic components of our framework are periodic and random samplers, counters, and hash functions. These components are composed to form a variety of stream compressors. One design is both simple and very effective: the input stream is hash-split into multiple substreams, each of which is fed into a simple periodic sampler that selects every kth event. This stratified periodic sampler performs better than conventional random sampling because it biases each substream towards a small number of unique events, thereby reducing sampling error, and allowing faster convergence to an accurate profile. For example, convergence to a given level of accuracy is about twice as fast for gcc. When sampling overhead is considered, the stratified periodic profiler achieves less than 3% error while incurring an overhead of only 3.5% for gcc.</description><subject>Computer systems organization</subject><subject>Computer systems organization -- Dependable and fault-tolerant systems and networks</subject><subject>General and reference -- Cross-computing tools and techniques -- Evaluation</subject><subject>General and reference -- Cross-computing tools and techniques -- Metrics</subject><subject>General and reference -- Cross-computing tools and techniques -- Performance</subject><subject>Hardware -- Electronic design automation -- Logic synthesis -- Circuit optimization</subject><subject>Networks -- Network performance evaluation</subject><issn>0163-5964</issn><isbn>0769511627</isbn><isbn>9780769511627</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2001</creationdate><recordtype>conference_proceeding</recordtype><recordid>eNqNkEtLxDAUhQMqOI6zdN-VbuyYx00yWcrgCwYE0XW4zUOi7bQ2HX-_LRXcejcHLh-Hw0fIBaNrxkDeCG040PUUWhyRM6qVkYwpro_JgjIlSmkUnJJVzh90PJBMG7MgVy_YJV90fRtTnfbvxXfCIg89Dimm4IuMTTf9z8lJxDqH1W8uydv93ev2sdw9Pzxtb3clcqFFWWkZOQOhY6jAj2sqAEQHyjkZfXDBATeaeqOEUwbBo9pAdEIY5SXFKJbkcu4dF30dQh5sk7ILdY370B6yFUzyjYLNH4iusVXbfmbLqJ1c2NmFnV2M4PW_QFv1KUTxA4vfXGU</recordid><startdate>2001</startdate><enddate>2001</enddate><creator>Sastry, S. Subramanya</creator><creator>Bodík, Rastislav</creator><creator>Smith, James E.</creator><general>ACM</general><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>2001</creationdate><title>Rapid profiling via stratified sampling</title><author>Sastry, S. Subramanya ; Bodík, Rastislav ; Smith, James E.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a2373-b75f21437feb4d379b44aac46cc5fdecec42970d963c69a4da684fc3396d50af3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2001</creationdate><topic>Computer systems organization</topic><topic>Computer systems organization -- Dependable and fault-tolerant systems and networks</topic><topic>General and reference -- Cross-computing tools and techniques -- Evaluation</topic><topic>General and reference -- Cross-computing tools and techniques -- Metrics</topic><topic>General and reference -- Cross-computing tools and techniques -- Performance</topic><topic>Hardware -- Electronic design automation -- Logic synthesis -- Circuit optimization</topic><topic>Networks -- Network performance evaluation</topic><toplevel>online_resources</toplevel><creatorcontrib>Sastry, S. Subramanya</creatorcontrib><creatorcontrib>Bodík, Rastislav</creatorcontrib><creatorcontrib>Smith, James E.</creatorcontrib><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sastry, S. Subramanya</au><au>Bodík, Rastislav</au><au>Smith, James E.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Rapid profiling via stratified sampling</atitle><btitle>Computer architecture news</btitle><date>2001</date><risdate>2001</risdate><spage>278</spage><epage>289</epage><pages>278-289</pages><issn>0163-5964</issn><isbn>0769511627</isbn><isbn>9780769511627</isbn><abstract>Sophisticated binary translators and dynamic optimizers demand a program profiler with low overhead, high accuracy, and the ability to collect a variety of profile types. A profiling scheme that achieves these goals is proposed. Conceptually, the hardware compresses a stream of profile data by counting identical events; the compressed profile dam is passed to software for analysis. Compressing the high-bandwidth event stream greatly reduces software overhead. Because optimizations can tolerate some profiling errors, we allow the stream compressor to be lossy, thereby enabling a low-cost sampling-based hardware design. Because the hardware compressor is insensitive to the event content, it supports various profile types and can process multiple types simultaneously. Basic components of our framework are periodic and random samplers, counters, and hash functions. These components are composed to form a variety of stream compressors. One design is both simple and very effective: the input stream is hash-split into multiple substreams, each of which is fed into a simple periodic sampler that selects every kth event. This stratified periodic sampler performs better than conventional random sampling because it biases each substream towards a small number of unique events, thereby reducing sampling error, and allowing faster convergence to an accurate profile. For example, convergence to a given level of accuracy is about twice as fast for gcc. When sampling overhead is considered, the stratified periodic profiler achieves less than 3% error while incurring an overhead of only 3.5% for gcc.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/379240.379273</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0163-5964
ispartof Computer architecture news, 2001, p.278-289
issn 0163-5964
language eng
recordid cdi_acm_books_10_1145_379240_379273
source IEEE Electronic Library (IEL) Conference Proceedings; ACM Digital Library
subjects Computer systems organization
Computer systems organization -- Dependable and fault-tolerant systems and networks
General and reference -- Cross-computing tools and techniques -- Evaluation
General and reference -- Cross-computing tools and techniques -- Metrics
General and reference -- Cross-computing tools and techniques -- Performance
Hardware -- Electronic design automation -- Logic synthesis -- Circuit optimization
Networks -- Network performance evaluation
title Rapid profiling via stratified sampling
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T16%3A03%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_acm_b&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Rapid%20profiling%20via%20stratified%20sampling&rft.btitle=Computer%20architecture%20news&rft.au=Sastry,%20S.%20Subramanya&rft.date=2001&rft.spage=278&rft.epage=289&rft.pages=278-289&rft.issn=0163-5964&rft.isbn=0769511627&rft.isbn_list=9780769511627&rft_id=info:doi/10.1145/379240.379273&rft_dat=%3Cproquest_acm_b%3E31528648%3C/proquest_acm_b%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=31528648&rft_id=info:pmid/&rfr_iscdi=true