Malicious web content detection by machine learning

The recent development of the dynamic HTML gives attackers a new and powerful technique to compromise computer systems. A malicious dynamic HTML code is usually embedded in a normal webpage. The malicious webpage infects the victim when a user browses it. Furthermore, such DHTML code can disguise it...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2010, Vol.37 (1), p.55-60
Hauptverfasser: Hou, Yung-Tsung, Chang, Yimeng, Chen, Tsuhan, Laih, Chi-Sung, Chen, Chia-Mei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 60
container_issue 1
container_start_page 55
container_title Expert systems with applications
container_volume 37
creator Hou, Yung-Tsung
Chang, Yimeng
Chen, Tsuhan
Laih, Chi-Sung
Chen, Chia-Mei
description The recent development of the dynamic HTML gives attackers a new and powerful technique to compromise computer systems. A malicious dynamic HTML code is usually embedded in a normal webpage. The malicious webpage infects the victim when a user browses it. Furthermore, such DHTML code can disguise itself easily through obfuscation or transformation, which makes the detection even harder. Anti-virus software packages commonly use signature-based approaches which might not be able to efficiently identify camouflaged malicious HTML codes. Therefore, our paper proposes a malicious web page detection using the technique of machine learning. Our study analyzes the characteristic of a malicious webpage systematically and presents important features for machine learning. Experimental results demonstrate that our method is resilient to code obfuscations and can correctly determine whether a webpage is malicious or not.
doi_str_mv 10.1016/j.eswa.2009.05.023
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_34881505</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S095741740900445X</els_id><sourcerecordid>21071708</sourcerecordid><originalsourceid>FETCH-LOGICAL-c395t-c20f45139ba689622dc86b3dc2f61a42d3d6f5f94a926bbfe9e096591919d41b3</originalsourceid><addsrcrecordid>eNqFkD1PwzAQhi0EEuXjDzBlQiwJZzt2YokFIb6kIhaYLce-gKvUKXYK6r_HVZmLbrjleV_dPYRcUKgoUHm9qDD9mIoBqApEBYwfkBltG17KRvFDMgMlmrKmTX1MTlJaANAGoJkR_mIGb_24TsUPdoUdw4RhKhxOaCc_hqLbFEtjP33AYkATgw8fZ-SoN0PC8799St4f7t_unsr56-Pz3e28tFyJqbQM-lpQrjojWyUZc7aVHXeW9ZKamjnuZC96VRvFZNf1qBCUFIrmcTXt-Cm53PWu4vi1xjTppU8Wh8EEzAdrXrctFSD-BRmFJv_bZvBqL5ghCpIDbzLKdqiNY0oRe72KfmniRlPQW-d6obfO9da5BqGz8xy62YUwa_n2GHWyHoNF52P2qd3o98V_ASjsiQg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1701063037</pqid></control><display><type>article</type><title>Malicious web content detection by machine learning</title><source>Elsevier ScienceDirect Journals</source><creator>Hou, Yung-Tsung ; Chang, Yimeng ; Chen, Tsuhan ; Laih, Chi-Sung ; Chen, Chia-Mei</creator><creatorcontrib>Hou, Yung-Tsung ; Chang, Yimeng ; Chen, Tsuhan ; Laih, Chi-Sung ; Chen, Chia-Mei</creatorcontrib><description>The recent development of the dynamic HTML gives attackers a new and powerful technique to compromise computer systems. A malicious dynamic HTML code is usually embedded in a normal webpage. The malicious webpage infects the victim when a user browses it. Furthermore, such DHTML code can disguise itself easily through obfuscation or transformation, which makes the detection even harder. Anti-virus software packages commonly use signature-based approaches which might not be able to efficiently identify camouflaged malicious HTML codes. Therefore, our paper proposes a malicious web page detection using the technique of machine learning. Our study analyzes the characteristic of a malicious webpage systematically and presents important features for machine learning. Experimental results demonstrate that our method is resilient to code obfuscations and can correctly determine whether a webpage is malicious or not.</description><identifier>ISSN: 0957-4174</identifier><identifier>EISSN: 1873-6793</identifier><identifier>DOI: 10.1016/j.eswa.2009.05.023</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><subject>Dynamic HTML ; Dynamical systems ; Dynamics ; Expert systems ; HTML ; HyperText Markup Language ; Machine learning ; Malicious webpage ; Software packages ; Transformations</subject><ispartof>Expert systems with applications, 2010, Vol.37 (1), p.55-60</ispartof><rights>2009 Elsevier Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c395t-c20f45139ba689622dc86b3dc2f61a42d3d6f5f94a926bbfe9e096591919d41b3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S095741740900445X$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,4010,27900,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Hou, Yung-Tsung</creatorcontrib><creatorcontrib>Chang, Yimeng</creatorcontrib><creatorcontrib>Chen, Tsuhan</creatorcontrib><creatorcontrib>Laih, Chi-Sung</creatorcontrib><creatorcontrib>Chen, Chia-Mei</creatorcontrib><title>Malicious web content detection by machine learning</title><title>Expert systems with applications</title><description>The recent development of the dynamic HTML gives attackers a new and powerful technique to compromise computer systems. A malicious dynamic HTML code is usually embedded in a normal webpage. The malicious webpage infects the victim when a user browses it. Furthermore, such DHTML code can disguise itself easily through obfuscation or transformation, which makes the detection even harder. Anti-virus software packages commonly use signature-based approaches which might not be able to efficiently identify camouflaged malicious HTML codes. Therefore, our paper proposes a malicious web page detection using the technique of machine learning. Our study analyzes the characteristic of a malicious webpage systematically and presents important features for machine learning. Experimental results demonstrate that our method is resilient to code obfuscations and can correctly determine whether a webpage is malicious or not.</description><subject>Dynamic HTML</subject><subject>Dynamical systems</subject><subject>Dynamics</subject><subject>Expert systems</subject><subject>HTML</subject><subject>HyperText Markup Language</subject><subject>Machine learning</subject><subject>Malicious webpage</subject><subject>Software packages</subject><subject>Transformations</subject><issn>0957-4174</issn><issn>1873-6793</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><recordid>eNqFkD1PwzAQhi0EEuXjDzBlQiwJZzt2YokFIb6kIhaYLce-gKvUKXYK6r_HVZmLbrjleV_dPYRcUKgoUHm9qDD9mIoBqApEBYwfkBltG17KRvFDMgMlmrKmTX1MTlJaANAGoJkR_mIGb_24TsUPdoUdw4RhKhxOaCc_hqLbFEtjP33AYkATgw8fZ-SoN0PC8799St4f7t_unsr56-Pz3e28tFyJqbQM-lpQrjojWyUZc7aVHXeW9ZKamjnuZC96VRvFZNf1qBCUFIrmcTXt-Cm53PWu4vi1xjTppU8Wh8EEzAdrXrctFSD-BRmFJv_bZvBqL5ghCpIDbzLKdqiNY0oRe72KfmniRlPQW-d6obfO9da5BqGz8xy62YUwa_n2GHWyHoNF52P2qd3o98V_ASjsiQg</recordid><startdate>2010</startdate><enddate>2010</enddate><creator>Hou, Yung-Tsung</creator><creator>Chang, Yimeng</creator><creator>Chen, Tsuhan</creator><creator>Laih, Chi-Sung</creator><creator>Chen, Chia-Mei</creator><general>Elsevier Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7U9</scope><scope>H94</scope></search><sort><creationdate>2010</creationdate><title>Malicious web content detection by machine learning</title><author>Hou, Yung-Tsung ; Chang, Yimeng ; Chen, Tsuhan ; Laih, Chi-Sung ; Chen, Chia-Mei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c395t-c20f45139ba689622dc86b3dc2f61a42d3d6f5f94a926bbfe9e096591919d41b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Dynamic HTML</topic><topic>Dynamical systems</topic><topic>Dynamics</topic><topic>Expert systems</topic><topic>HTML</topic><topic>HyperText Markup Language</topic><topic>Machine learning</topic><topic>Malicious webpage</topic><topic>Software packages</topic><topic>Transformations</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hou, Yung-Tsung</creatorcontrib><creatorcontrib>Chang, Yimeng</creatorcontrib><creatorcontrib>Chen, Tsuhan</creatorcontrib><creatorcontrib>Laih, Chi-Sung</creatorcontrib><creatorcontrib>Chen, Chia-Mei</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Virology and AIDS Abstracts</collection><collection>AIDS and Cancer Research Abstracts</collection><jtitle>Expert systems with applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hou, Yung-Tsung</au><au>Chang, Yimeng</au><au>Chen, Tsuhan</au><au>Laih, Chi-Sung</au><au>Chen, Chia-Mei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Malicious web content detection by machine learning</atitle><jtitle>Expert systems with applications</jtitle><date>2010</date><risdate>2010</risdate><volume>37</volume><issue>1</issue><spage>55</spage><epage>60</epage><pages>55-60</pages><issn>0957-4174</issn><eissn>1873-6793</eissn><abstract>The recent development of the dynamic HTML gives attackers a new and powerful technique to compromise computer systems. A malicious dynamic HTML code is usually embedded in a normal webpage. The malicious webpage infects the victim when a user browses it. Furthermore, such DHTML code can disguise itself easily through obfuscation or transformation, which makes the detection even harder. Anti-virus software packages commonly use signature-based approaches which might not be able to efficiently identify camouflaged malicious HTML codes. Therefore, our paper proposes a malicious web page detection using the technique of machine learning. Our study analyzes the characteristic of a malicious webpage systematically and presents important features for machine learning. Experimental results demonstrate that our method is resilient to code obfuscations and can correctly determine whether a webpage is malicious or not.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/j.eswa.2009.05.023</doi><tpages>6</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0957-4174
ispartof Expert systems with applications, 2010, Vol.37 (1), p.55-60
issn 0957-4174
1873-6793
language eng
recordid cdi_proquest_miscellaneous_34881505
source Elsevier ScienceDirect Journals
subjects Dynamic HTML
Dynamical systems
Dynamics
Expert systems
HTML
HyperText Markup Language
Machine learning
Malicious webpage
Software packages
Transformations
title Malicious web content detection by machine learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T00%3A05%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Malicious%20web%20content%20detection%20by%20machine%20learning&rft.jtitle=Expert%20systems%20with%20applications&rft.au=Hou,%20Yung-Tsung&rft.date=2010&rft.volume=37&rft.issue=1&rft.spage=55&rft.epage=60&rft.pages=55-60&rft.issn=0957-4174&rft.eissn=1873-6793&rft_id=info:doi/10.1016/j.eswa.2009.05.023&rft_dat=%3Cproquest_cross%3E21071708%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1701063037&rft_id=info:pmid/&rft_els_id=S095741740900445X&rfr_iscdi=true