Performance debugging for distributed systems of black boxes

Many interesting large-scale systems are distributed systems of multiple communicating components. Such systems can be very hard to debug, especially when they exhibit poor performance. The problem becomes much harder when systems are composed of "black-box" components: software from many...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: AGUILERA, Marcos K, MOGUL, Jeffrey C, WIENER, Janet L, REYNOLDS, Patrick, MUTHITACHAROEN, Athicha
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 89
container_issue 5
container_start_page 74
container_title
container_volume 37
creator AGUILERA, Marcos K
MOGUL, Jeffrey C
WIENER, Janet L
REYNOLDS, Patrick
MUTHITACHAROEN, Athicha
description Many interesting large-scale systems are distributed systems of multiple communicating components. Such systems can be very hard to debug, especially when they exhibit poor performance. The problem becomes much harder when systems are composed of "black-box" components: software from many different (perhaps competing) vendors, usually without source code available. Typical solutions-provider employees are not always skilled or experienced enough to debug these systems efficiently. Our goal is to design tools that enable modestly-skilled programmers (and experts, too) to isolate performance bottlenecks in distributed systems composed of black-box nodes.We approach this problem by obtaining message-level traces of system activity, as passively as possible and without any knowledge of node internals or message semantics. We have developed two very different algorithms for inferring the dominant causal paths through a distributed system from these traces. One uses timing information from RPC messages to infer inter-call causality; the other uses signal-processing techniques. Our algorithms can ascribe delay to specific nodes on specific causal paths. Unlike previous approaches to similar problems, our approach requires no modifications to applications, middleware, or messages.
doi_str_mv 10.1145/1165389.945454
format Conference Proceeding
fullrecord <record><control><sourceid>pascalfrancis_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_1165389_945454</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>15408560</sourcerecordid><originalsourceid>FETCH-LOGICAL-c184t-2a4447e239751e2d0f7bbc0fdbb5fd5378f6e3fb930c82928be51a3c875236e43</originalsourceid><addsrcrecordid>eNpFj0tLxDAUhYMoWEe3rrNx2ZrbPJqAGxl8wYAuFNyVJL0p1XY6JB1w_r2VDshdnMuB78BHyDWwAkDIWwAluTaFEXK-E5KBETyXWn2ekoyBmn-j2Tm5SOmLMdCgICN3bxjDGAe79UgbdPu27bYtnSvadGmKndtP2NB0SBMOiY6But76b-rGH0yX5CzYPuHVMVfk4_Hhff2cb16fXtb3m9yDFlNeWiFEhSU3lQQsGxYq5zwLjXMyNJJXOijkwRnOvC5NqR1KsNzrSpZcoeArUiy7Po4pRQz1LnaDjYcaWP3nXh_d68V9Bm4WYGeTt32Is16X_ikpmJaK8V-lvljo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Performance debugging for distributed systems of black boxes</title><source>ACM Digital Library Complete</source><creator>AGUILERA, Marcos K ; MOGUL, Jeffrey C ; WIENER, Janet L ; REYNOLDS, Patrick ; MUTHITACHAROEN, Athicha</creator><creatorcontrib>AGUILERA, Marcos K ; MOGUL, Jeffrey C ; WIENER, Janet L ; REYNOLDS, Patrick ; MUTHITACHAROEN, Athicha</creatorcontrib><description>Many interesting large-scale systems are distributed systems of multiple communicating components. Such systems can be very hard to debug, especially when they exhibit poor performance. The problem becomes much harder when systems are composed of "black-box" components: software from many different (perhaps competing) vendors, usually without source code available. Typical solutions-provider employees are not always skilled or experienced enough to debug these systems efficiently. Our goal is to design tools that enable modestly-skilled programmers (and experts, too) to isolate performance bottlenecks in distributed systems composed of black-box nodes.We approach this problem by obtaining message-level traces of system activity, as passively as possible and without any knowledge of node internals or message semantics. We have developed two very different algorithms for inferring the dominant causal paths through a distributed system from these traces. One uses timing information from RPC messages to infer inter-call causality; the other uses signal-processing techniques. Our algorithms can ascribe delay to specific nodes on specific causal paths. Unlike previous approaches to similar problems, our approach requires no modifications to applications, middleware, or messages.</description><identifier>ISSN: 0163-5980</identifier><identifier>EISSN: 1943-586X</identifier><identifier>DOI: 10.1145/1165389.945454</identifier><identifier>CODEN: OSRED8</identifier><language>eng</language><publisher>New York, NY: Association for Computing Machinery</publisher><subject>Applied sciences ; Computer science; control theory; systems ; Computer systems and distributed systems. User interface ; Computer systems performance. Reliability ; Exact sciences and technology ; Software ; Software engineering</subject><ispartof>Operating systems review, 2003, Vol.37 (5), p.74-89</ispartof><rights>2004 INIST-CNRS</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c184t-2a4447e239751e2d0f7bbc0fdbb5fd5378f6e3fb930c82928be51a3c875236e43</citedby><cites>FETCH-LOGICAL-c184t-2a4447e239751e2d0f7bbc0fdbb5fd5378f6e3fb930c82928be51a3c875236e43</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>309,310,314,776,780,785,786,23909,23910,25118,27901,27902</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=15408560$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>AGUILERA, Marcos K</creatorcontrib><creatorcontrib>MOGUL, Jeffrey C</creatorcontrib><creatorcontrib>WIENER, Janet L</creatorcontrib><creatorcontrib>REYNOLDS, Patrick</creatorcontrib><creatorcontrib>MUTHITACHAROEN, Athicha</creatorcontrib><title>Performance debugging for distributed systems of black boxes</title><title>Operating systems review</title><description>Many interesting large-scale systems are distributed systems of multiple communicating components. Such systems can be very hard to debug, especially when they exhibit poor performance. The problem becomes much harder when systems are composed of "black-box" components: software from many different (perhaps competing) vendors, usually without source code available. Typical solutions-provider employees are not always skilled or experienced enough to debug these systems efficiently. Our goal is to design tools that enable modestly-skilled programmers (and experts, too) to isolate performance bottlenecks in distributed systems composed of black-box nodes.We approach this problem by obtaining message-level traces of system activity, as passively as possible and without any knowledge of node internals or message semantics. We have developed two very different algorithms for inferring the dominant causal paths through a distributed system from these traces. One uses timing information from RPC messages to infer inter-call causality; the other uses signal-processing techniques. Our algorithms can ascribe delay to specific nodes on specific causal paths. Unlike previous approaches to similar problems, our approach requires no modifications to applications, middleware, or messages.</description><subject>Applied sciences</subject><subject>Computer science; control theory; systems</subject><subject>Computer systems and distributed systems. User interface</subject><subject>Computer systems performance. Reliability</subject><subject>Exact sciences and technology</subject><subject>Software</subject><subject>Software engineering</subject><issn>0163-5980</issn><issn>1943-586X</issn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2003</creationdate><recordtype>conference_proceeding</recordtype><recordid>eNpFj0tLxDAUhYMoWEe3rrNx2ZrbPJqAGxl8wYAuFNyVJL0p1XY6JB1w_r2VDshdnMuB78BHyDWwAkDIWwAluTaFEXK-E5KBETyXWn2ekoyBmn-j2Tm5SOmLMdCgICN3bxjDGAe79UgbdPu27bYtnSvadGmKndtP2NB0SBMOiY6But76b-rGH0yX5CzYPuHVMVfk4_Hhff2cb16fXtb3m9yDFlNeWiFEhSU3lQQsGxYq5zwLjXMyNJJXOijkwRnOvC5NqR1KsNzrSpZcoeArUiy7Po4pRQz1LnaDjYcaWP3nXh_d68V9Bm4WYGeTt32Is16X_ikpmJaK8V-lvljo</recordid><startdate>200312</startdate><enddate>200312</enddate><creator>AGUILERA, Marcos K</creator><creator>MOGUL, Jeffrey C</creator><creator>WIENER, Janet L</creator><creator>REYNOLDS, Patrick</creator><creator>MUTHITACHAROEN, Athicha</creator><general>Association for Computing Machinery</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>200312</creationdate><title>Performance debugging for distributed systems of black boxes</title><author>AGUILERA, Marcos K ; MOGUL, Jeffrey C ; WIENER, Janet L ; REYNOLDS, Patrick ; MUTHITACHAROEN, Athicha</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c184t-2a4447e239751e2d0f7bbc0fdbb5fd5378f6e3fb930c82928be51a3c875236e43</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2003</creationdate><topic>Applied sciences</topic><topic>Computer science; control theory; systems</topic><topic>Computer systems and distributed systems. User interface</topic><topic>Computer systems performance. Reliability</topic><topic>Exact sciences and technology</topic><topic>Software</topic><topic>Software engineering</topic><toplevel>online_resources</toplevel><creatorcontrib>AGUILERA, Marcos K</creatorcontrib><creatorcontrib>MOGUL, Jeffrey C</creatorcontrib><creatorcontrib>WIENER, Janet L</creatorcontrib><creatorcontrib>REYNOLDS, Patrick</creatorcontrib><creatorcontrib>MUTHITACHAROEN, Athicha</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>AGUILERA, Marcos K</au><au>MOGUL, Jeffrey C</au><au>WIENER, Janet L</au><au>REYNOLDS, Patrick</au><au>MUTHITACHAROEN, Athicha</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Performance debugging for distributed systems of black boxes</atitle><btitle>Operating systems review</btitle><date>2003-12</date><risdate>2003</risdate><volume>37</volume><issue>5</issue><spage>74</spage><epage>89</epage><pages>74-89</pages><issn>0163-5980</issn><eissn>1943-586X</eissn><coden>OSRED8</coden><abstract>Many interesting large-scale systems are distributed systems of multiple communicating components. Such systems can be very hard to debug, especially when they exhibit poor performance. The problem becomes much harder when systems are composed of "black-box" components: software from many different (perhaps competing) vendors, usually without source code available. Typical solutions-provider employees are not always skilled or experienced enough to debug these systems efficiently. Our goal is to design tools that enable modestly-skilled programmers (and experts, too) to isolate performance bottlenecks in distributed systems composed of black-box nodes.We approach this problem by obtaining message-level traces of system activity, as passively as possible and without any knowledge of node internals or message semantics. We have developed two very different algorithms for inferring the dominant causal paths through a distributed system from these traces. One uses timing information from RPC messages to infer inter-call causality; the other uses signal-processing techniques. Our algorithms can ascribe delay to specific nodes on specific causal paths. Unlike previous approaches to similar problems, our approach requires no modifications to applications, middleware, or messages.</abstract><cop>New York, NY</cop><pub>Association for Computing Machinery</pub><doi>10.1145/1165389.945454</doi><tpages>16</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0163-5980
ispartof Operating systems review, 2003, Vol.37 (5), p.74-89
issn 0163-5980
1943-586X
language eng
recordid cdi_crossref_primary_10_1145_1165389_945454
source ACM Digital Library Complete
subjects Applied sciences
Computer science
control theory
systems
Computer systems and distributed systems. User interface
Computer systems performance. Reliability
Exact sciences and technology
Software
Software engineering
title Performance debugging for distributed systems of black boxes
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T14%3A37%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-pascalfrancis_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Performance%20debugging%20for%20distributed%20systems%20of%20black%20boxes&rft.btitle=Operating%20systems%20review&rft.au=AGUILERA,%20Marcos%20K&rft.date=2003-12&rft.volume=37&rft.issue=5&rft.spage=74&rft.epage=89&rft.pages=74-89&rft.issn=0163-5980&rft.eissn=1943-586X&rft.coden=OSRED8&rft_id=info:doi/10.1145/1165389.945454&rft_dat=%3Cpascalfrancis_cross%3E15408560%3C/pascalfrancis_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true