Rotten Apples or Bad Harvest? What We Are Measuring When We Are Measuring Abuse

Internet security and technology policy research regularly uses technical indicators of abuse to identify culprits and to tailor mitigation strategies. As a major obstacle, current inferences from abuse data that aim to characterize providers with poor security practices often use a naive normalizat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on Internet technology 2018-11, Vol.18 (4), p.1-25
Hauptverfasser: Tajalizadehkhoob, Samaneh, Böhme, Rainer, Gañán, Carlos, Korczyński, Maciej, Eeten, Michel Van
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 25
container_issue 4
container_start_page 1
container_title ACM transactions on Internet technology
container_volume 18
creator Tajalizadehkhoob, Samaneh
Böhme, Rainer
Gañán, Carlos
Korczyński, Maciej
Eeten, Michel Van
description Internet security and technology policy research regularly uses technical indicators of abuse to identify culprits and to tailor mitigation strategies. As a major obstacle, current inferences from abuse data that aim to characterize providers with poor security practices often use a naive normalization of abuse (abuse counts divided by network size) and do not take into account other inherent or structural properties of providers. Even the size estimates are subject to measurement errors relating to attribution, aggregation, and various sources of heterogeneity. More precise indicators are costly to measure at Internet scale. We address these issues for the case of hosting providers with a statistical model of the abuse data generation process, using phishing sites in hosting networks as a case study. We decompose error sources and then estimate key parameters of the model, controlling for heterogeneity in size and business model. We find that 84% of the variation in abuse counts across 45,358 hosting providers can be explained with structural factors alone. Informed by the fitted model, we systematically select and enrich a subset of 105 homogeneous “statistical twins” with additional explanatory variables, unreasonable to collect for all hosting providers. We find that abuse is positively associated with the popularity of websites hosted and with the prevalence of popular content management systems. Moreover, hosting providers who charge higher prices (after controlling for level differences between countries) witness less abuse. These structural factors together explain a further 77% of the remaining variation. This calls into question premature inferences from raw abuse indicators about the security efforts of actors, and suggests the adoption of similar analysis frameworks in all domains where network measurement aims at informing technology policy.
doi_str_mv 10.1145/3122985
format Article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3122985</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1145_3122985</sourcerecordid><originalsourceid>FETCH-LOGICAL-c240t-f5011963de8a427f3d3324278c108838d71dc05f74f6e53a7ba74b00ea8a5e3c3</originalsourceid><addsrcrecordid>eNplkEFLw0AQhRdRsFbxL-zNU3Qmk8luThKLWqFSEKXHsElmtVKbsJsK_ntT7M3T-3jw3uFT6hLhGjHjG8I0LSwfqQkymyQHxuM9EyVMRXGqzmL8BEDOkSZq-dINg2x12fcbiboL-s61eu7Ct8ThVq8-3KBXossg-llc3IX19n1sx8W_tqx3Uc7ViXebKBeHnKq3h_vX2TxZLB-fZuUiadIMhsQzIBY5tWJdlhpPLVE6gm0QrCXbGmwbYG8ynwuTM7UzWQ0gzjoWamiqrv5-m9DFGMRXfVh_ufBTIVR7D9XBA_0CwTNNMg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Rotten Apples or Bad Harvest? What We Are Measuring When We Are Measuring Abuse</title><source>ACM Digital Library</source><creator>Tajalizadehkhoob, Samaneh ; Böhme, Rainer ; Gañán, Carlos ; Korczyński, Maciej ; Eeten, Michel Van</creator><creatorcontrib>Tajalizadehkhoob, Samaneh ; Böhme, Rainer ; Gañán, Carlos ; Korczyński, Maciej ; Eeten, Michel Van</creatorcontrib><description>Internet security and technology policy research regularly uses technical indicators of abuse to identify culprits and to tailor mitigation strategies. As a major obstacle, current inferences from abuse data that aim to characterize providers with poor security practices often use a naive normalization of abuse (abuse counts divided by network size) and do not take into account other inherent or structural properties of providers. Even the size estimates are subject to measurement errors relating to attribution, aggregation, and various sources of heterogeneity. More precise indicators are costly to measure at Internet scale. We address these issues for the case of hosting providers with a statistical model of the abuse data generation process, using phishing sites in hosting networks as a case study. We decompose error sources and then estimate key parameters of the model, controlling for heterogeneity in size and business model. We find that 84% of the variation in abuse counts across 45,358 hosting providers can be explained with structural factors alone. Informed by the fitted model, we systematically select and enrich a subset of 105 homogeneous “statistical twins” with additional explanatory variables, unreasonable to collect for all hosting providers. We find that abuse is positively associated with the popularity of websites hosted and with the prevalence of popular content management systems. Moreover, hosting providers who charge higher prices (after controlling for level differences between countries) witness less abuse. These structural factors together explain a further 77% of the remaining variation. This calls into question premature inferences from raw abuse indicators about the security efforts of actors, and suggests the adoption of similar analysis frameworks in all domains where network measurement aims at informing technology policy.</description><identifier>ISSN: 1533-5399</identifier><identifier>EISSN: 1557-6051</identifier><identifier>DOI: 10.1145/3122985</identifier><language>eng</language><ispartof>ACM transactions on Internet technology, 2018-11, Vol.18 (4), p.1-25</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c240t-f5011963de8a427f3d3324278c108838d71dc05f74f6e53a7ba74b00ea8a5e3c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27923,27924</link.rule.ids></links><search><creatorcontrib>Tajalizadehkhoob, Samaneh</creatorcontrib><creatorcontrib>Böhme, Rainer</creatorcontrib><creatorcontrib>Gañán, Carlos</creatorcontrib><creatorcontrib>Korczyński, Maciej</creatorcontrib><creatorcontrib>Eeten, Michel Van</creatorcontrib><title>Rotten Apples or Bad Harvest? What We Are Measuring When We Are Measuring Abuse</title><title>ACM transactions on Internet technology</title><description>Internet security and technology policy research regularly uses technical indicators of abuse to identify culprits and to tailor mitigation strategies. As a major obstacle, current inferences from abuse data that aim to characterize providers with poor security practices often use a naive normalization of abuse (abuse counts divided by network size) and do not take into account other inherent or structural properties of providers. Even the size estimates are subject to measurement errors relating to attribution, aggregation, and various sources of heterogeneity. More precise indicators are costly to measure at Internet scale. We address these issues for the case of hosting providers with a statistical model of the abuse data generation process, using phishing sites in hosting networks as a case study. We decompose error sources and then estimate key parameters of the model, controlling for heterogeneity in size and business model. We find that 84% of the variation in abuse counts across 45,358 hosting providers can be explained with structural factors alone. Informed by the fitted model, we systematically select and enrich a subset of 105 homogeneous “statistical twins” with additional explanatory variables, unreasonable to collect for all hosting providers. We find that abuse is positively associated with the popularity of websites hosted and with the prevalence of popular content management systems. Moreover, hosting providers who charge higher prices (after controlling for level differences between countries) witness less abuse. These structural factors together explain a further 77% of the remaining variation. This calls into question premature inferences from raw abuse indicators about the security efforts of actors, and suggests the adoption of similar analysis frameworks in all domains where network measurement aims at informing technology policy.</description><issn>1533-5399</issn><issn>1557-6051</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNplkEFLw0AQhRdRsFbxL-zNU3Qmk8luThKLWqFSEKXHsElmtVKbsJsK_ntT7M3T-3jw3uFT6hLhGjHjG8I0LSwfqQkymyQHxuM9EyVMRXGqzmL8BEDOkSZq-dINg2x12fcbiboL-s61eu7Ct8ThVq8-3KBXossg-llc3IX19n1sx8W_tqx3Uc7ViXebKBeHnKq3h_vX2TxZLB-fZuUiadIMhsQzIBY5tWJdlhpPLVE6gm0QrCXbGmwbYG8ynwuTM7UzWQ0gzjoWamiqrv5-m9DFGMRXfVh_ufBTIVR7D9XBA_0CwTNNMg</recordid><startdate>20181130</startdate><enddate>20181130</enddate><creator>Tajalizadehkhoob, Samaneh</creator><creator>Böhme, Rainer</creator><creator>Gañán, Carlos</creator><creator>Korczyński, Maciej</creator><creator>Eeten, Michel Van</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20181130</creationdate><title>Rotten Apples or Bad Harvest? What We Are Measuring When We Are Measuring Abuse</title><author>Tajalizadehkhoob, Samaneh ; Böhme, Rainer ; Gañán, Carlos ; Korczyński, Maciej ; Eeten, Michel Van</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c240t-f5011963de8a427f3d3324278c108838d71dc05f74f6e53a7ba74b00ea8a5e3c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tajalizadehkhoob, Samaneh</creatorcontrib><creatorcontrib>Böhme, Rainer</creatorcontrib><creatorcontrib>Gañán, Carlos</creatorcontrib><creatorcontrib>Korczyński, Maciej</creatorcontrib><creatorcontrib>Eeten, Michel Van</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on Internet technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tajalizadehkhoob, Samaneh</au><au>Böhme, Rainer</au><au>Gañán, Carlos</au><au>Korczyński, Maciej</au><au>Eeten, Michel Van</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Rotten Apples or Bad Harvest? What We Are Measuring When We Are Measuring Abuse</atitle><jtitle>ACM transactions on Internet technology</jtitle><date>2018-11-30</date><risdate>2018</risdate><volume>18</volume><issue>4</issue><spage>1</spage><epage>25</epage><pages>1-25</pages><issn>1533-5399</issn><eissn>1557-6051</eissn><abstract>Internet security and technology policy research regularly uses technical indicators of abuse to identify culprits and to tailor mitigation strategies. As a major obstacle, current inferences from abuse data that aim to characterize providers with poor security practices often use a naive normalization of abuse (abuse counts divided by network size) and do not take into account other inherent or structural properties of providers. Even the size estimates are subject to measurement errors relating to attribution, aggregation, and various sources of heterogeneity. More precise indicators are costly to measure at Internet scale. We address these issues for the case of hosting providers with a statistical model of the abuse data generation process, using phishing sites in hosting networks as a case study. We decompose error sources and then estimate key parameters of the model, controlling for heterogeneity in size and business model. We find that 84% of the variation in abuse counts across 45,358 hosting providers can be explained with structural factors alone. Informed by the fitted model, we systematically select and enrich a subset of 105 homogeneous “statistical twins” with additional explanatory variables, unreasonable to collect for all hosting providers. We find that abuse is positively associated with the popularity of websites hosted and with the prevalence of popular content management systems. Moreover, hosting providers who charge higher prices (after controlling for level differences between countries) witness less abuse. These structural factors together explain a further 77% of the remaining variation. This calls into question premature inferences from raw abuse indicators about the security efforts of actors, and suggests the adoption of similar analysis frameworks in all domains where network measurement aims at informing technology policy.</abstract><doi>10.1145/3122985</doi><tpages>25</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1533-5399
ispartof ACM transactions on Internet technology, 2018-11, Vol.18 (4), p.1-25
issn 1533-5399
1557-6051
language eng
recordid cdi_crossref_primary_10_1145_3122985
source ACM Digital Library
title Rotten Apples or Bad Harvest? What We Are Measuring When We Are Measuring Abuse
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T08%3A26%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Rotten%20Apples%20or%20Bad%20Harvest?%20What%20We%20Are%20Measuring%20When%20We%20Are%20Measuring%20Abuse&rft.jtitle=ACM%20transactions%20on%20Internet%20technology&rft.au=Tajalizadehkhoob,%20Samaneh&rft.date=2018-11-30&rft.volume=18&rft.issue=4&rft.spage=1&rft.epage=25&rft.pages=1-25&rft.issn=1533-5399&rft.eissn=1557-6051&rft_id=info:doi/10.1145/3122985&rft_dat=%3Ccrossref%3E10_1145_3122985%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true