Investigating Distribution of Data of HTTP Traffic: An Empirical Study

Internet traffic today is dominated by that of the hypertext transfer protocol (HTTP). Understanding the statistical characteristics of the data transferred via HTTP helps better model traffic patterns. In this work, we conduct an empirical study by employing an experiment that accesses roughly 34,0...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Chehadeh, Y.C., Hatahet, A.Z., Agamy, A.E., Bamakhrama, M.A., Banawan, S.A.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 5
container_issue
container_start_page 1
container_title
container_volume
creator Chehadeh, Y.C.
Hatahet, A.Z.
Agamy, A.E.
Bamakhrama, M.A.
Banawan, S.A.
description Internet traffic today is dominated by that of the hypertext transfer protocol (HTTP). Understanding the statistical characteristics of the data transferred via HTTP helps better model traffic patterns. In this work, we conduct an empirical study by employing an experiment that accesses roughly 34,000 of the most popular Web sites on the Internet today and crawls their Web pages. We collect metadata information on the retrieved roughly two million objects. We determine statistics and distributions based on object sizes, occurrence of specific types, and sizes of specific types. The data of the distributions produced can be used as a template model for Web-traffic modeling in future research. We further note an intriguing result that 5.7% of HTTP traffic from Web servers to clients is due to sending spacer objects (image files representing a 1times1 white-space pixel) or to stale links referencing non-existing files. Such squander in bandwidth is not due to overhead and can be minimized by simple additions to the HTML standard and by automating the process of removing stale links
doi_str_mv 10.1109/INNOVATIONS.2006.301928
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_4085443</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4085443</ieee_id><sourcerecordid>4085443</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-959f0d39bd8bf0b5b2dbe27b1c1075e0889d2b54b542faf3e75cd4040f42ffd63</originalsourceid><addsrcrecordid>eNpVjF1LwzAYRiMiKLO_wAvzB1rffLWNd2UfrjBaYcHbkTTJiGzdaDNh_96K3vhw4HBuHoSeCWSEgHypm6b9qFTdNtuMAuQZAyJpeYMSWZSEU84hL3h--6-ZvEfJOH7CNCYFZfkDWtX9lxtj2OsY-j1ehDEOwVxiOPX45PFCR_3jtVLvWA3a-9C94qrHy-M5DKHTB7yNF3t9RHdeH0aX_HmG1Gqp5ut0077V82qTBgkxlUJ6sEwaWxoPRhhqjaOFIR2BQjgoS2mpEXyCeu2ZK0RnOXDwU3ubsxl6-r0NzrndeQhHPVx3HErBOWPf-rROhQ</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Investigating Distribution of Data of HTTP Traffic: An Empirical Study</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Chehadeh, Y.C. ; Hatahet, A.Z. ; Agamy, A.E. ; Bamakhrama, M.A. ; Banawan, S.A.</creator><creatorcontrib>Chehadeh, Y.C. ; Hatahet, A.Z. ; Agamy, A.E. ; Bamakhrama, M.A. ; Banawan, S.A.</creatorcontrib><description>Internet traffic today is dominated by that of the hypertext transfer protocol (HTTP). Understanding the statistical characteristics of the data transferred via HTTP helps better model traffic patterns. In this work, we conduct an empirical study by employing an experiment that accesses roughly 34,000 of the most popular Web sites on the Internet today and crawls their Web pages. We collect metadata information on the retrieved roughly two million objects. We determine statistics and distributions based on object sizes, occurrence of specific types, and sizes of specific types. The data of the distributions produced can be used as a template model for Web-traffic modeling in future research. We further note an intriguing result that 5.7% of HTTP traffic from Web servers to clients is due to sending spacer objects (image files representing a 1times1 white-space pixel) or to stale links referencing non-existing files. Such squander in bandwidth is not due to overhead and can be minimized by simple additions to the HTML standard and by automating the process of removing stale links</description><identifier>ISBN: 9781424406739</identifier><identifier>ISBN: 1424406730</identifier><identifier>EISBN: 9781424406746</identifier><identifier>EISBN: 1424406749</identifier><identifier>DOI: 10.1109/INNOVATIONS.2006.301928</identifier><language>eng</language><publisher>IEEE</publisher><subject>Access protocols ; Bandwidth ; Information retrieval ; Internet ; Pixel ; Statistical distributions ; Traffic control ; Web pages ; Web server ; White spaces</subject><ispartof>2006 Innovations in Information Technology, 2006, p.1-5</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4085443$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2057,27924,54919</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4085443$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Chehadeh, Y.C.</creatorcontrib><creatorcontrib>Hatahet, A.Z.</creatorcontrib><creatorcontrib>Agamy, A.E.</creatorcontrib><creatorcontrib>Bamakhrama, M.A.</creatorcontrib><creatorcontrib>Banawan, S.A.</creatorcontrib><title>Investigating Distribution of Data of HTTP Traffic: An Empirical Study</title><title>2006 Innovations in Information Technology</title><addtitle>INNOVATIONS</addtitle><description>Internet traffic today is dominated by that of the hypertext transfer protocol (HTTP). Understanding the statistical characteristics of the data transferred via HTTP helps better model traffic patterns. In this work, we conduct an empirical study by employing an experiment that accesses roughly 34,000 of the most popular Web sites on the Internet today and crawls their Web pages. We collect metadata information on the retrieved roughly two million objects. We determine statistics and distributions based on object sizes, occurrence of specific types, and sizes of specific types. The data of the distributions produced can be used as a template model for Web-traffic modeling in future research. We further note an intriguing result that 5.7% of HTTP traffic from Web servers to clients is due to sending spacer objects (image files representing a 1times1 white-space pixel) or to stale links referencing non-existing files. Such squander in bandwidth is not due to overhead and can be minimized by simple additions to the HTML standard and by automating the process of removing stale links</description><subject>Access protocols</subject><subject>Bandwidth</subject><subject>Information retrieval</subject><subject>Internet</subject><subject>Pixel</subject><subject>Statistical distributions</subject><subject>Traffic control</subject><subject>Web pages</subject><subject>Web server</subject><subject>White spaces</subject><isbn>9781424406739</isbn><isbn>1424406730</isbn><isbn>9781424406746</isbn><isbn>1424406749</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2006</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpVjF1LwzAYRiMiKLO_wAvzB1rffLWNd2UfrjBaYcHbkTTJiGzdaDNh_96K3vhw4HBuHoSeCWSEgHypm6b9qFTdNtuMAuQZAyJpeYMSWZSEU84hL3h--6-ZvEfJOH7CNCYFZfkDWtX9lxtj2OsY-j1ehDEOwVxiOPX45PFCR_3jtVLvWA3a-9C94qrHy-M5DKHTB7yNF3t9RHdeH0aX_HmG1Gqp5ut0077V82qTBgkxlUJ6sEwaWxoPRhhqjaOFIR2BQjgoS2mpEXyCeu2ZK0RnOXDwU3ubsxl6-r0NzrndeQhHPVx3HErBOWPf-rROhQ</recordid><startdate>200611</startdate><enddate>200611</enddate><creator>Chehadeh, Y.C.</creator><creator>Hatahet, A.Z.</creator><creator>Agamy, A.E.</creator><creator>Bamakhrama, M.A.</creator><creator>Banawan, S.A.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200611</creationdate><title>Investigating Distribution of Data of HTTP Traffic: An Empirical Study</title><author>Chehadeh, Y.C. ; Hatahet, A.Z. ; Agamy, A.E. ; Bamakhrama, M.A. ; Banawan, S.A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-959f0d39bd8bf0b5b2dbe27b1c1075e0889d2b54b542faf3e75cd4040f42ffd63</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Access protocols</topic><topic>Bandwidth</topic><topic>Information retrieval</topic><topic>Internet</topic><topic>Pixel</topic><topic>Statistical distributions</topic><topic>Traffic control</topic><topic>Web pages</topic><topic>Web server</topic><topic>White spaces</topic><toplevel>online_resources</toplevel><creatorcontrib>Chehadeh, Y.C.</creatorcontrib><creatorcontrib>Hatahet, A.Z.</creatorcontrib><creatorcontrib>Agamy, A.E.</creatorcontrib><creatorcontrib>Bamakhrama, M.A.</creatorcontrib><creatorcontrib>Banawan, S.A.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chehadeh, Y.C.</au><au>Hatahet, A.Z.</au><au>Agamy, A.E.</au><au>Bamakhrama, M.A.</au><au>Banawan, S.A.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Investigating Distribution of Data of HTTP Traffic: An Empirical Study</atitle><btitle>2006 Innovations in Information Technology</btitle><stitle>INNOVATIONS</stitle><date>2006-11</date><risdate>2006</risdate><spage>1</spage><epage>5</epage><pages>1-5</pages><isbn>9781424406739</isbn><isbn>1424406730</isbn><eisbn>9781424406746</eisbn><eisbn>1424406749</eisbn><abstract>Internet traffic today is dominated by that of the hypertext transfer protocol (HTTP). Understanding the statistical characteristics of the data transferred via HTTP helps better model traffic patterns. In this work, we conduct an empirical study by employing an experiment that accesses roughly 34,000 of the most popular Web sites on the Internet today and crawls their Web pages. We collect metadata information on the retrieved roughly two million objects. We determine statistics and distributions based on object sizes, occurrence of specific types, and sizes of specific types. The data of the distributions produced can be used as a template model for Web-traffic modeling in future research. We further note an intriguing result that 5.7% of HTTP traffic from Web servers to clients is due to sending spacer objects (image files representing a 1times1 white-space pixel) or to stale links referencing non-existing files. Such squander in bandwidth is not due to overhead and can be minimized by simple additions to the HTML standard and by automating the process of removing stale links</abstract><pub>IEEE</pub><doi>10.1109/INNOVATIONS.2006.301928</doi><tpages>5</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISBN: 9781424406739
ispartof 2006 Innovations in Information Technology, 2006, p.1-5
issn
language eng
recordid cdi_ieee_primary_4085443
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Access protocols
Bandwidth
Information retrieval
Internet
Pixel
Statistical distributions
Traffic control
Web pages
Web server
White spaces
title Investigating Distribution of Data of HTTP Traffic: An Empirical Study
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T17%3A14%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Investigating%20Distribution%20of%20Data%20of%20HTTP%20Traffic:%20An%20Empirical%20Study&rft.btitle=2006%20Innovations%20in%20Information%20Technology&rft.au=Chehadeh,%20Y.C.&rft.date=2006-11&rft.spage=1&rft.epage=5&rft.pages=1-5&rft.isbn=9781424406739&rft.isbn_list=1424406730&rft_id=info:doi/10.1109/INNOVATIONS.2006.301928&rft_dat=%3Cieee_6IE%3E4085443%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424406746&rft.eisbn_list=1424406749&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=4085443&rfr_iscdi=true