Extracted external domains from Wikipedia dump - 20/03/2024

This dataset contains 6,459,779 distinct domains derived from the external links section of Wikipedia pages. The external links section of a page such as OpenWeb  contains only one link. The primary objective of assembling this dataset is to improve content prioritization and filtering in web crawli...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Istaiti, Mahmoud, Al-Maamari, Mohammed, Zerhoudi, Saber, Dinzinger, Michael, Granitzer, Michael, Mitrovic, Jelena
Format: Dataset
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Istaiti, Mahmoud
Al-Maamari, Mohammed
Zerhoudi, Saber
Dinzinger, Michael
Granitzer, Michael
Mitrovic, Jelena
description This dataset contains 6,459,779 distinct domains derived from the external links section of Wikipedia pages. The external links section of a page such as OpenWeb  contains only one link. The primary objective of assembling this dataset is to improve content prioritization and filtering in web crawling techniques. The dataset is structured as a text file, with each line representing a distinct domain.
doi_str_mv 10.5281/zenodo.11076686
format Dataset
fullrecord <record><control><sourceid>datacite_PQ8</sourceid><recordid>TN_cdi_datacite_primary_10_5281_zenodo_11076686</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_5281_zenodo_11076686</sourcerecordid><originalsourceid>FETCH-datacite_primary_10_5281_zenodo_110766863</originalsourceid><addsrcrecordid>eNqVzb0KwjAUQOEsDqLOrvcF2vy0xoKjVHwAwTFcmlsINklJI1SfXkR9AKczHT7GtlKUO9VI_qQQbSylFHutG71kh3bOCbtMFmjOlAIOYKNHFyboU_RwdTc3knUI9u5HKEAJLiquhKrXbNHjMNHm2xXjp_ZyPBcWM3YukxmT85geRgrz5s2HNz---v94AS6KPXg</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>dataset</recordtype></control><display><type>dataset</type><title>Extracted external domains from Wikipedia dump - 20/03/2024</title><source>DataCite</source><creator>Istaiti, Mahmoud ; Al-Maamari, Mohammed ; Zerhoudi, Saber ; Dinzinger, Michael ; Granitzer, Michael ; Mitrovic, Jelena</creator><creatorcontrib>Istaiti, Mahmoud ; Al-Maamari, Mohammed ; Zerhoudi, Saber ; Dinzinger, Michael ; Granitzer, Michael ; Mitrovic, Jelena</creatorcontrib><description>This dataset contains 6,459,779 distinct domains derived from the external links section of Wikipedia pages. The external links section of a page such as OpenWeb  contains only one link. The primary objective of assembling this dataset is to improve content prioritization and filtering in web crawling techniques. The dataset is structured as a text file, with each line representing a distinct domain.</description><identifier>DOI: 10.5281/zenodo.11076686</identifier><language>eng</language><publisher>Zenodo</publisher><creationdate>2024</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0009-0006-5635-5945 ; 0000-0002-0127-8034 ; 0000-0003-3566-5507</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,1887</link.rule.ids><linktorsrc>$$Uhttps://commons.datacite.org/doi.org/10.5281/zenodo.11076686$$EView_record_in_DataCite.org$$FView_record_in_$$GDataCite.org$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Istaiti, Mahmoud</creatorcontrib><creatorcontrib>Al-Maamari, Mohammed</creatorcontrib><creatorcontrib>Zerhoudi, Saber</creatorcontrib><creatorcontrib>Dinzinger, Michael</creatorcontrib><creatorcontrib>Granitzer, Michael</creatorcontrib><creatorcontrib>Mitrovic, Jelena</creatorcontrib><title>Extracted external domains from Wikipedia dump - 20/03/2024</title><description>This dataset contains 6,459,779 distinct domains derived from the external links section of Wikipedia pages. The external links section of a page such as OpenWeb  contains only one link. The primary objective of assembling this dataset is to improve content prioritization and filtering in web crawling techniques. The dataset is structured as a text file, with each line representing a distinct domain.</description><fulltext>true</fulltext><rsrctype>dataset</rsrctype><creationdate>2024</creationdate><recordtype>dataset</recordtype><sourceid>PQ8</sourceid><recordid>eNqVzb0KwjAUQOEsDqLOrvcF2vy0xoKjVHwAwTFcmlsINklJI1SfXkR9AKczHT7GtlKUO9VI_qQQbSylFHutG71kh3bOCbtMFmjOlAIOYKNHFyboU_RwdTc3knUI9u5HKEAJLiquhKrXbNHjMNHm2xXjp_ZyPBcWM3YukxmT85geRgrz5s2HNz---v94AS6KPXg</recordid><startdate>20240427</startdate><enddate>20240427</enddate><creator>Istaiti, Mahmoud</creator><creator>Al-Maamari, Mohammed</creator><creator>Zerhoudi, Saber</creator><creator>Dinzinger, Michael</creator><creator>Granitzer, Michael</creator><creator>Mitrovic, Jelena</creator><general>Zenodo</general><scope>DYCCY</scope><scope>PQ8</scope><orcidid>https://orcid.org/0009-0006-5635-5945</orcidid><orcidid>https://orcid.org/0000-0002-0127-8034</orcidid><orcidid>https://orcid.org/0000-0003-3566-5507</orcidid></search><sort><creationdate>20240427</creationdate><title>Extracted external domains from Wikipedia dump - 20/03/2024</title><author>Istaiti, Mahmoud ; Al-Maamari, Mohammed ; Zerhoudi, Saber ; Dinzinger, Michael ; Granitzer, Michael ; Mitrovic, Jelena</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-datacite_primary_10_5281_zenodo_110766863</frbrgroupid><rsrctype>datasets</rsrctype><prefilter>datasets</prefilter><language>eng</language><creationdate>2024</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Istaiti, Mahmoud</creatorcontrib><creatorcontrib>Al-Maamari, Mohammed</creatorcontrib><creatorcontrib>Zerhoudi, Saber</creatorcontrib><creatorcontrib>Dinzinger, Michael</creatorcontrib><creatorcontrib>Granitzer, Michael</creatorcontrib><creatorcontrib>Mitrovic, Jelena</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Istaiti, Mahmoud</au><au>Al-Maamari, Mohammed</au><au>Zerhoudi, Saber</au><au>Dinzinger, Michael</au><au>Granitzer, Michael</au><au>Mitrovic, Jelena</au><format>book</format><genre>unknown</genre><ristype>DATA</ristype><title>Extracted external domains from Wikipedia dump - 20/03/2024</title><date>2024-04-27</date><risdate>2024</risdate><abstract>This dataset contains 6,459,779 distinct domains derived from the external links section of Wikipedia pages. The external links section of a page such as OpenWeb  contains only one link. The primary objective of assembling this dataset is to improve content prioritization and filtering in web crawling techniques. The dataset is structured as a text file, with each line representing a distinct domain.</abstract><pub>Zenodo</pub><doi>10.5281/zenodo.11076686</doi><orcidid>https://orcid.org/0009-0006-5635-5945</orcidid><orcidid>https://orcid.org/0000-0002-0127-8034</orcidid><orcidid>https://orcid.org/0000-0003-3566-5507</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.5281/zenodo.11076686
ispartof
issn
language eng
recordid cdi_datacite_primary_10_5281_zenodo_11076686
source DataCite
title Extracted external domains from Wikipedia dump - 20/03/2024
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T20%3A22%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite_PQ8&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.au=Istaiti,%20Mahmoud&rft.date=2024-04-27&rft_id=info:doi/10.5281/zenodo.11076686&rft_dat=%3Cdatacite_PQ8%3E10_5281_zenodo_11076686%3C/datacite_PQ8%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true