User Identification in the Process of Web Usage Data Preprocessing

If we are talking about user behavior analytics, we have to understand what the main source of valuable information is. One of these sources is definitely a web server. There are multiple places where we can extract the necessary data. The most common ways are to search for these data in access log,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of emerging technologies in learning 2019-01, Vol.14 (9), p.21
Hauptverfasser: Kapusta, Jozef, Munk, Michal, Halvoník, Dominik, Drlík, Martin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 9
container_start_page 21
container_title International journal of emerging technologies in learning
container_volume 14
creator Kapusta, Jozef
Munk, Michal
Halvoník, Dominik
Drlík, Martin
description If we are talking about user behavior analytics, we have to understand what the main source of valuable information is. One of these sources is definitely a web server. There are multiple places where we can extract the necessary data. The most common ways are to search for these data in access log, error log, custom log files of web server, proxy server log file, web browser log, browser cookies etc. A web server log is in its default form known as a Common Log File (W3C, 1995) and keeps information about IP address; date and time of visit; ac-cessed and referenced resource. There are standardized methodologies which contain several steps leading to extract new knowledge from provided data. Usu-ally, the first step is in each one of them to identify users, users’ sessions, page views, and clickstreams. This process is called pre-processing. Main goal of this stage is to receive unprocessed web server log file as input and after processing outputs meaningful representations which can be used in next phase. In this pa-per, we describe in detail user session identification which can be considered as most important part of data pre-processing. Our paper aims to compare the us-er/session identification using the STT with the identification of user/session us-ing cookies. This comparison was performed concerning the quality of the se-quential rules generated, i.e., a comparison was made regarding generation useful, trivial and inexplicable rules.
doi_str_mv 10.3991/ijet.v14i09.9854
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2666953172</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2666953172</sourcerecordid><originalsourceid>FETCH-LOGICAL-c271t-5e4e09ba0d5f09b91bd8926523f3880fb7391641da741c447689ee32c8897bf53</originalsourceid><addsrcrecordid>eNpNkE1LxDAYhIMouK7ePQY8d036pmly1PVrYUEPFo8hbd-sKdquSVbw39tSD55mYIYZeAi55GwFWvNr32FafXPhmV5pVYgjsuBKQsZAwfE_f0rOYuwYk6BBL8htFTHQTYt98s43Nvmhp76n6R3pSxgajJEOjr5hTatod0jvbLJjgvs59P3unJw4-xHx4k-XpHq4f10_Zdvnx836Zps1eclTVqBApmvL2sKNqnndKp3LIgcHSjFXl6C5FLy1peCNEKVUGhHyRild1q6AJbmad8frrwPGZLrhEPrx0uRSSl0AL_OxxeZWE4YYAzqzD_7Thh_DmZlImYmUmUmZiRT8AiH1XAc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2666953172</pqid></control><display><type>article</type><title>User Identification in the Process of Web Usage Data Preprocessing</title><source>EZB-FREE-00999 freely available EZB journals</source><source>EBSCOhost Education Source</source><creator>Kapusta, Jozef ; Munk, Michal ; Halvoník, Dominik ; Drlík, Martin</creator><creatorcontrib>Kapusta, Jozef ; Munk, Michal ; Halvoník, Dominik ; Drlík, Martin</creatorcontrib><description>If we are talking about user behavior analytics, we have to understand what the main source of valuable information is. One of these sources is definitely a web server. There are multiple places where we can extract the necessary data. The most common ways are to search for these data in access log, error log, custom log files of web server, proxy server log file, web browser log, browser cookies etc. A web server log is in its default form known as a Common Log File (W3C, 1995) and keeps information about IP address; date and time of visit; ac-cessed and referenced resource. There are standardized methodologies which contain several steps leading to extract new knowledge from provided data. Usu-ally, the first step is in each one of them to identify users, users’ sessions, page views, and clickstreams. This process is called pre-processing. Main goal of this stage is to receive unprocessed web server log file as input and after processing outputs meaningful representations which can be used in next phase. In this pa-per, we describe in detail user session identification which can be considered as most important part of data pre-processing. Our paper aims to compare the us-er/session identification using the STT with the identification of user/session us-ing cookies. This comparison was performed concerning the quality of the se-quential rules generated, i.e., a comparison was made regarding generation useful, trivial and inexplicable rules.</description><identifier>ISSN: 1863-0383</identifier><identifier>EISSN: 1863-0383</identifier><identifier>DOI: 10.3991/ijet.v14i09.9854</identifier><language>eng</language><publisher>Vienna: International Association of Online Engineering (IAOE)</publisher><subject>User behavior</subject><ispartof>International journal of emerging technologies in learning, 2019-01, Vol.14 (9), p.21</ispartof><rights>2019. This work is published under http://creativecommons.org/licenses/by/3.0/at/deed.en_GB (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c271t-5e4e09ba0d5f09b91bd8926523f3880fb7391641da741c447689ee32c8897bf53</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Kapusta, Jozef</creatorcontrib><creatorcontrib>Munk, Michal</creatorcontrib><creatorcontrib>Halvoník, Dominik</creatorcontrib><creatorcontrib>Drlík, Martin</creatorcontrib><title>User Identification in the Process of Web Usage Data Preprocessing</title><title>International journal of emerging technologies in learning</title><description>If we are talking about user behavior analytics, we have to understand what the main source of valuable information is. One of these sources is definitely a web server. There are multiple places where we can extract the necessary data. The most common ways are to search for these data in access log, error log, custom log files of web server, proxy server log file, web browser log, browser cookies etc. A web server log is in its default form known as a Common Log File (W3C, 1995) and keeps information about IP address; date and time of visit; ac-cessed and referenced resource. There are standardized methodologies which contain several steps leading to extract new knowledge from provided data. Usu-ally, the first step is in each one of them to identify users, users’ sessions, page views, and clickstreams. This process is called pre-processing. Main goal of this stage is to receive unprocessed web server log file as input and after processing outputs meaningful representations which can be used in next phase. In this pa-per, we describe in detail user session identification which can be considered as most important part of data pre-processing. Our paper aims to compare the us-er/session identification using the STT with the identification of user/session us-ing cookies. This comparison was performed concerning the quality of the se-quential rules generated, i.e., a comparison was made regarding generation useful, trivial and inexplicable rules.</description><subject>User behavior</subject><issn>1863-0383</issn><issn>1863-0383</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpNkE1LxDAYhIMouK7ePQY8d036pmly1PVrYUEPFo8hbd-sKdquSVbw39tSD55mYIYZeAi55GwFWvNr32FafXPhmV5pVYgjsuBKQsZAwfE_f0rOYuwYk6BBL8htFTHQTYt98s43Nvmhp76n6R3pSxgajJEOjr5hTatod0jvbLJjgvs59P3unJw4-xHx4k-XpHq4f10_Zdvnx836Zps1eclTVqBApmvL2sKNqnndKp3LIgcHSjFXl6C5FLy1peCNEKVUGhHyRild1q6AJbmad8frrwPGZLrhEPrx0uRSSl0AL_OxxeZWE4YYAzqzD_7Thh_DmZlImYmUmUmZiRT8AiH1XAc</recordid><startdate>20190101</startdate><enddate>20190101</enddate><creator>Kapusta, Jozef</creator><creator>Munk, Michal</creator><creator>Halvoník, Dominik</creator><creator>Drlík, Martin</creator><general>International Association of Online Engineering (IAOE)</general><scope>AAYXX</scope><scope>CITATION</scope><scope>0-V</scope><scope>3V.</scope><scope>7XB</scope><scope>88B</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>CJNVE</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>M0P</scope><scope>PIMPY</scope><scope>PQEDU</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope></search><sort><creationdate>20190101</creationdate><title>User Identification in the Process of Web Usage Data Preprocessing</title><author>Kapusta, Jozef ; Munk, Michal ; Halvoník, Dominik ; Drlík, Martin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c271t-5e4e09ba0d5f09b91bd8926523f3880fb7391641da741c447689ee32c8897bf53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>User behavior</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kapusta, Jozef</creatorcontrib><creatorcontrib>Munk, Michal</creatorcontrib><creatorcontrib>Halvoník, Dominik</creatorcontrib><creatorcontrib>Drlík, Martin</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Social Sciences Premium Collection</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Education Database (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Social Science Premium Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>Education Collection</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>ProQuest Education Journals</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Education</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><jtitle>International journal of emerging technologies in learning</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kapusta, Jozef</au><au>Munk, Michal</au><au>Halvoník, Dominik</au><au>Drlík, Martin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>User Identification in the Process of Web Usage Data Preprocessing</atitle><jtitle>International journal of emerging technologies in learning</jtitle><date>2019-01-01</date><risdate>2019</risdate><volume>14</volume><issue>9</issue><spage>21</spage><pages>21-</pages><issn>1863-0383</issn><eissn>1863-0383</eissn><abstract>If we are talking about user behavior analytics, we have to understand what the main source of valuable information is. One of these sources is definitely a web server. There are multiple places where we can extract the necessary data. The most common ways are to search for these data in access log, error log, custom log files of web server, proxy server log file, web browser log, browser cookies etc. A web server log is in its default form known as a Common Log File (W3C, 1995) and keeps information about IP address; date and time of visit; ac-cessed and referenced resource. There are standardized methodologies which contain several steps leading to extract new knowledge from provided data. Usu-ally, the first step is in each one of them to identify users, users’ sessions, page views, and clickstreams. This process is called pre-processing. Main goal of this stage is to receive unprocessed web server log file as input and after processing outputs meaningful representations which can be used in next phase. In this pa-per, we describe in detail user session identification which can be considered as most important part of data pre-processing. Our paper aims to compare the us-er/session identification using the STT with the identification of user/session us-ing cookies. This comparison was performed concerning the quality of the se-quential rules generated, i.e., a comparison was made regarding generation useful, trivial and inexplicable rules.</abstract><cop>Vienna</cop><pub>International Association of Online Engineering (IAOE)</pub><doi>10.3991/ijet.v14i09.9854</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1863-0383
ispartof International journal of emerging technologies in learning, 2019-01, Vol.14 (9), p.21
issn 1863-0383
1863-0383
language eng
recordid cdi_proquest_journals_2666953172
source EZB-FREE-00999 freely available EZB journals; EBSCOhost Education Source
subjects User behavior
title User Identification in the Process of Web Usage Data Preprocessing
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T21%3A36%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=User%20Identification%20in%20the%20Process%20of%20Web%20Usage%20Data%20Preprocessing&rft.jtitle=International%20journal%20of%20emerging%20technologies%20in%20learning&rft.au=Kapusta,%20Jozef&rft.date=2019-01-01&rft.volume=14&rft.issue=9&rft.spage=21&rft.pages=21-&rft.issn=1863-0383&rft.eissn=1863-0383&rft_id=info:doi/10.3991/ijet.v14i09.9854&rft_dat=%3Cproquest_cross%3E2666953172%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2666953172&rft_id=info:pmid/&rfr_iscdi=true