Rapid profiling via stratified sampling
Sophisticated binary translators and dynamic optimizers demand a program profiler with low overhead, high accuracy, and the ability to collect a variety of profile types. A profiling scheme that achieves these goals is proposed. Conceptually, the hardware compresses a stream of profile data by count...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 289 |
---|---|
container_issue | |
container_start_page | 278 |
container_title | |
container_volume | |
creator | Sastry, S. Subramanya Bodík, Rastislav Smith, James E. |
description | Sophisticated binary translators and dynamic optimizers demand a program profiler with low overhead, high accuracy, and the ability to collect a variety of profile types. A profiling scheme that achieves these goals is proposed. Conceptually, the hardware compresses a stream of profile data by counting identical events; the compressed profile dam is passed to software for analysis. Compressing the high-bandwidth event stream greatly reduces software overhead. Because optimizations can tolerate some profiling errors, we allow the stream compressor to be lossy, thereby enabling a low-cost sampling-based hardware design. Because the hardware compressor is insensitive to the event content, it supports various profile types and can process multiple types simultaneously.
Basic components of our framework are periodic and random samplers, counters, and hash functions. These components are composed to form a variety of stream compressors. One design is both simple and very effective: the input stream is hash-split into multiple substreams, each of which is fed into a simple periodic sampler that selects every kth event. This stratified periodic sampler performs better than conventional random sampling because it biases each substream towards a small number of unique events, thereby reducing sampling error, and allowing faster convergence to an accurate profile. For example, convergence to a given level of accuracy is about twice as fast for gcc. When sampling overhead is considered, the stratified periodic profiler achieves less than 3% error while incurring an overhead of only 3.5% for gcc. |
doi_str_mv | 10.1145/379240.379273 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>proquest_acm_b</sourceid><recordid>TN_cdi_acm_books_10_1145_379240_379273</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>31528648</sourcerecordid><originalsourceid>FETCH-LOGICAL-a2373-b75f21437feb4d379b44aac46cc5fdecec42970d963c69a4da684fc3396d50af3</originalsourceid><addsrcrecordid>eNqNkEtLxDAUhQMqOI6zdN-VbuyYx00yWcrgCwYE0XW4zUOi7bQ2HX-_LRXcejcHLh-Hw0fIBaNrxkDeCG040PUUWhyRM6qVkYwpro_JgjIlSmkUnJJVzh90PJBMG7MgVy_YJV90fRtTnfbvxXfCIg89Dimm4IuMTTf9z8lJxDqH1W8uydv93ev2sdw9Pzxtb3clcqFFWWkZOQOhY6jAj2sqAEQHyjkZfXDBATeaeqOEUwbBo9pAdEIY5SXFKJbkcu4dF30dQh5sk7ILdY370B6yFUzyjYLNH4iusVXbfmbLqJ1c2NmFnV2M4PW_QFv1KUTxA4vfXGU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype><pqid>31528648</pqid></control><display><type>conference_proceeding</type><title>Rapid profiling via stratified sampling</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><source>ACM Digital Library</source><creator>Sastry, S. Subramanya ; Bodík, Rastislav ; Smith, James E.</creator><creatorcontrib>Sastry, S. Subramanya ; Bodík, Rastislav ; Smith, James E.</creatorcontrib><description>Sophisticated binary translators and dynamic optimizers demand a program profiler with low overhead, high accuracy, and the ability to collect a variety of profile types. A profiling scheme that achieves these goals is proposed. Conceptually, the hardware compresses a stream of profile data by counting identical events; the compressed profile dam is passed to software for analysis. Compressing the high-bandwidth event stream greatly reduces software overhead. Because optimizations can tolerate some profiling errors, we allow the stream compressor to be lossy, thereby enabling a low-cost sampling-based hardware design. Because the hardware compressor is insensitive to the event content, it supports various profile types and can process multiple types simultaneously.
Basic components of our framework are periodic and random samplers, counters, and hash functions. These components are composed to form a variety of stream compressors. One design is both simple and very effective: the input stream is hash-split into multiple substreams, each of which is fed into a simple periodic sampler that selects every kth event. This stratified periodic sampler performs better than conventional random sampling because it biases each substream towards a small number of unique events, thereby reducing sampling error, and allowing faster convergence to an accurate profile. For example, convergence to a given level of accuracy is about twice as fast for gcc. When sampling overhead is considered, the stratified periodic profiler achieves less than 3% error while incurring an overhead of only 3.5% for gcc.</description><identifier>ISSN: 0163-5964</identifier><identifier>ISBN: 0769511627</identifier><identifier>ISBN: 9780769511627</identifier><identifier>DOI: 10.1145/379240.379273</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Computer systems organization ; Computer systems organization -- Dependable and fault-tolerant systems and networks ; General and reference -- Cross-computing tools and techniques -- Evaluation ; General and reference -- Cross-computing tools and techniques -- Metrics ; General and reference -- Cross-computing tools and techniques -- Performance ; Hardware -- Electronic design automation -- Logic synthesis -- Circuit optimization ; Networks -- Network performance evaluation</subject><ispartof>Computer architecture news, 2001, p.278-289</ispartof><rights>2001 Authors</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>309,310,777,781,786,787,4036,4037,23911,23912,25121,27906</link.rule.ids></links><search><creatorcontrib>Sastry, S. Subramanya</creatorcontrib><creatorcontrib>Bodík, Rastislav</creatorcontrib><creatorcontrib>Smith, James E.</creatorcontrib><title>Rapid profiling via stratified sampling</title><title>Computer architecture news</title><description>Sophisticated binary translators and dynamic optimizers demand a program profiler with low overhead, high accuracy, and the ability to collect a variety of profile types. A profiling scheme that achieves these goals is proposed. Conceptually, the hardware compresses a stream of profile data by counting identical events; the compressed profile dam is passed to software for analysis. Compressing the high-bandwidth event stream greatly reduces software overhead. Because optimizations can tolerate some profiling errors, we allow the stream compressor to be lossy, thereby enabling a low-cost sampling-based hardware design. Because the hardware compressor is insensitive to the event content, it supports various profile types and can process multiple types simultaneously.
Basic components of our framework are periodic and random samplers, counters, and hash functions. These components are composed to form a variety of stream compressors. One design is both simple and very effective: the input stream is hash-split into multiple substreams, each of which is fed into a simple periodic sampler that selects every kth event. This stratified periodic sampler performs better than conventional random sampling because it biases each substream towards a small number of unique events, thereby reducing sampling error, and allowing faster convergence to an accurate profile. For example, convergence to a given level of accuracy is about twice as fast for gcc. When sampling overhead is considered, the stratified periodic profiler achieves less than 3% error while incurring an overhead of only 3.5% for gcc.</description><subject>Computer systems organization</subject><subject>Computer systems organization -- Dependable and fault-tolerant systems and networks</subject><subject>General and reference -- Cross-computing tools and techniques -- Evaluation</subject><subject>General and reference -- Cross-computing tools and techniques -- Metrics</subject><subject>General and reference -- Cross-computing tools and techniques -- Performance</subject><subject>Hardware -- Electronic design automation -- Logic synthesis -- Circuit optimization</subject><subject>Networks -- Network performance evaluation</subject><issn>0163-5964</issn><isbn>0769511627</isbn><isbn>9780769511627</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2001</creationdate><recordtype>conference_proceeding</recordtype><recordid>eNqNkEtLxDAUhQMqOI6zdN-VbuyYx00yWcrgCwYE0XW4zUOi7bQ2HX-_LRXcejcHLh-Hw0fIBaNrxkDeCG040PUUWhyRM6qVkYwpro_JgjIlSmkUnJJVzh90PJBMG7MgVy_YJV90fRtTnfbvxXfCIg89Dimm4IuMTTf9z8lJxDqH1W8uydv93ev2sdw9Pzxtb3clcqFFWWkZOQOhY6jAj2sqAEQHyjkZfXDBATeaeqOEUwbBo9pAdEIY5SXFKJbkcu4dF30dQh5sk7ILdY370B6yFUzyjYLNH4iusVXbfmbLqJ1c2NmFnV2M4PW_QFv1KUTxA4vfXGU</recordid><startdate>2001</startdate><enddate>2001</enddate><creator>Sastry, S. Subramanya</creator><creator>Bodík, Rastislav</creator><creator>Smith, James E.</creator><general>ACM</general><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>2001</creationdate><title>Rapid profiling via stratified sampling</title><author>Sastry, S. Subramanya ; Bodík, Rastislav ; Smith, James E.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a2373-b75f21437feb4d379b44aac46cc5fdecec42970d963c69a4da684fc3396d50af3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2001</creationdate><topic>Computer systems organization</topic><topic>Computer systems organization -- Dependable and fault-tolerant systems and networks</topic><topic>General and reference -- Cross-computing tools and techniques -- Evaluation</topic><topic>General and reference -- Cross-computing tools and techniques -- Metrics</topic><topic>General and reference -- Cross-computing tools and techniques -- Performance</topic><topic>Hardware -- Electronic design automation -- Logic synthesis -- Circuit optimization</topic><topic>Networks -- Network performance evaluation</topic><toplevel>online_resources</toplevel><creatorcontrib>Sastry, S. Subramanya</creatorcontrib><creatorcontrib>Bodík, Rastislav</creatorcontrib><creatorcontrib>Smith, James E.</creatorcontrib><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sastry, S. Subramanya</au><au>Bodík, Rastislav</au><au>Smith, James E.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Rapid profiling via stratified sampling</atitle><btitle>Computer architecture news</btitle><date>2001</date><risdate>2001</risdate><spage>278</spage><epage>289</epage><pages>278-289</pages><issn>0163-5964</issn><isbn>0769511627</isbn><isbn>9780769511627</isbn><abstract>Sophisticated binary translators and dynamic optimizers demand a program profiler with low overhead, high accuracy, and the ability to collect a variety of profile types. A profiling scheme that achieves these goals is proposed. Conceptually, the hardware compresses a stream of profile data by counting identical events; the compressed profile dam is passed to software for analysis. Compressing the high-bandwidth event stream greatly reduces software overhead. Because optimizations can tolerate some profiling errors, we allow the stream compressor to be lossy, thereby enabling a low-cost sampling-based hardware design. Because the hardware compressor is insensitive to the event content, it supports various profile types and can process multiple types simultaneously.
Basic components of our framework are periodic and random samplers, counters, and hash functions. These components are composed to form a variety of stream compressors. One design is both simple and very effective: the input stream is hash-split into multiple substreams, each of which is fed into a simple periodic sampler that selects every kth event. This stratified periodic sampler performs better than conventional random sampling because it biases each substream towards a small number of unique events, thereby reducing sampling error, and allowing faster convergence to an accurate profile. For example, convergence to a given level of accuracy is about twice as fast for gcc. When sampling overhead is considered, the stratified periodic profiler achieves less than 3% error while incurring an overhead of only 3.5% for gcc.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/379240.379273</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0163-5964 |
ispartof | Computer architecture news, 2001, p.278-289 |
issn | 0163-5964 |
language | eng |
recordid | cdi_acm_books_10_1145_379240_379273 |
source | IEEE Electronic Library (IEL) Conference Proceedings; ACM Digital Library |
subjects | Computer systems organization Computer systems organization -- Dependable and fault-tolerant systems and networks General and reference -- Cross-computing tools and techniques -- Evaluation General and reference -- Cross-computing tools and techniques -- Metrics General and reference -- Cross-computing tools and techniques -- Performance Hardware -- Electronic design automation -- Logic synthesis -- Circuit optimization Networks -- Network performance evaluation |
title | Rapid profiling via stratified sampling |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T16%3A03%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_acm_b&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Rapid%20profiling%20via%20stratified%20sampling&rft.btitle=Computer%20architecture%20news&rft.au=Sastry,%20S.%20Subramanya&rft.date=2001&rft.spage=278&rft.epage=289&rft.pages=278-289&rft.issn=0163-5964&rft.isbn=0769511627&rft.isbn_list=9780769511627&rft_id=info:doi/10.1145/379240.379273&rft_dat=%3Cproquest_acm_b%3E31528648%3C/proquest_acm_b%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=31528648&rft_id=info:pmid/&rfr_iscdi=true |