Extracted external domains from Wikipedia dump - 20/03/2024
This dataset contains 6,459,779 distinct domains derived from the external links section of Wikipedia pages. The external links section of a page such as OpenWeb contains only one link. The primary objective of assembling this dataset is to improve content prioritization and filtering in web crawli...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Dataset |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Istaiti, Mahmoud Al-Maamari, Mohammed Zerhoudi, Saber Dinzinger, Michael Granitzer, Michael Mitrovic, Jelena |
description | This dataset contains 6,459,779 distinct domains derived from the external links section of Wikipedia pages.
The external links section of a page such as OpenWeb contains only one link.
The primary objective of assembling this dataset is to improve content prioritization and filtering in web crawling techniques.
The dataset is structured as a text file, with each line representing a distinct domain. |
doi_str_mv | 10.5281/zenodo.11076686 |
format | Dataset |
fullrecord | <record><control><sourceid>datacite_PQ8</sourceid><recordid>TN_cdi_datacite_primary_10_5281_zenodo_11076686</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_5281_zenodo_11076686</sourcerecordid><originalsourceid>FETCH-datacite_primary_10_5281_zenodo_110766863</originalsourceid><addsrcrecordid>eNqVzb0KwjAUQOEsDqLOrvcF2vy0xoKjVHwAwTFcmlsINklJI1SfXkR9AKczHT7GtlKUO9VI_qQQbSylFHutG71kh3bOCbtMFmjOlAIOYKNHFyboU_RwdTc3knUI9u5HKEAJLiquhKrXbNHjMNHm2xXjp_ZyPBcWM3YukxmT85geRgrz5s2HNz---v94AS6KPXg</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>dataset</recordtype></control><display><type>dataset</type><title>Extracted external domains from Wikipedia dump - 20/03/2024</title><source>DataCite</source><creator>Istaiti, Mahmoud ; Al-Maamari, Mohammed ; Zerhoudi, Saber ; Dinzinger, Michael ; Granitzer, Michael ; Mitrovic, Jelena</creator><creatorcontrib>Istaiti, Mahmoud ; Al-Maamari, Mohammed ; Zerhoudi, Saber ; Dinzinger, Michael ; Granitzer, Michael ; Mitrovic, Jelena</creatorcontrib><description>This dataset contains 6,459,779 distinct domains derived from the external links section of Wikipedia pages.
The external links section of a page such as OpenWeb contains only one link.
The primary objective of assembling this dataset is to improve content prioritization and filtering in web crawling techniques.
The dataset is structured as a text file, with each line representing a distinct domain.</description><identifier>DOI: 10.5281/zenodo.11076686</identifier><language>eng</language><publisher>Zenodo</publisher><creationdate>2024</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0009-0006-5635-5945 ; 0000-0002-0127-8034 ; 0000-0003-3566-5507</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,1887</link.rule.ids><linktorsrc>$$Uhttps://commons.datacite.org/doi.org/10.5281/zenodo.11076686$$EView_record_in_DataCite.org$$FView_record_in_$$GDataCite.org$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Istaiti, Mahmoud</creatorcontrib><creatorcontrib>Al-Maamari, Mohammed</creatorcontrib><creatorcontrib>Zerhoudi, Saber</creatorcontrib><creatorcontrib>Dinzinger, Michael</creatorcontrib><creatorcontrib>Granitzer, Michael</creatorcontrib><creatorcontrib>Mitrovic, Jelena</creatorcontrib><title>Extracted external domains from Wikipedia dump - 20/03/2024</title><description>This dataset contains 6,459,779 distinct domains derived from the external links section of Wikipedia pages.
The external links section of a page such as OpenWeb contains only one link.
The primary objective of assembling this dataset is to improve content prioritization and filtering in web crawling techniques.
The dataset is structured as a text file, with each line representing a distinct domain.</description><fulltext>true</fulltext><rsrctype>dataset</rsrctype><creationdate>2024</creationdate><recordtype>dataset</recordtype><sourceid>PQ8</sourceid><recordid>eNqVzb0KwjAUQOEsDqLOrvcF2vy0xoKjVHwAwTFcmlsINklJI1SfXkR9AKczHT7GtlKUO9VI_qQQbSylFHutG71kh3bOCbtMFmjOlAIOYKNHFyboU_RwdTc3knUI9u5HKEAJLiquhKrXbNHjMNHm2xXjp_ZyPBcWM3YukxmT85geRgrz5s2HNz---v94AS6KPXg</recordid><startdate>20240427</startdate><enddate>20240427</enddate><creator>Istaiti, Mahmoud</creator><creator>Al-Maamari, Mohammed</creator><creator>Zerhoudi, Saber</creator><creator>Dinzinger, Michael</creator><creator>Granitzer, Michael</creator><creator>Mitrovic, Jelena</creator><general>Zenodo</general><scope>DYCCY</scope><scope>PQ8</scope><orcidid>https://orcid.org/0009-0006-5635-5945</orcidid><orcidid>https://orcid.org/0000-0002-0127-8034</orcidid><orcidid>https://orcid.org/0000-0003-3566-5507</orcidid></search><sort><creationdate>20240427</creationdate><title>Extracted external domains from Wikipedia dump - 20/03/2024</title><author>Istaiti, Mahmoud ; Al-Maamari, Mohammed ; Zerhoudi, Saber ; Dinzinger, Michael ; Granitzer, Michael ; Mitrovic, Jelena</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-datacite_primary_10_5281_zenodo_110766863</frbrgroupid><rsrctype>datasets</rsrctype><prefilter>datasets</prefilter><language>eng</language><creationdate>2024</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Istaiti, Mahmoud</creatorcontrib><creatorcontrib>Al-Maamari, Mohammed</creatorcontrib><creatorcontrib>Zerhoudi, Saber</creatorcontrib><creatorcontrib>Dinzinger, Michael</creatorcontrib><creatorcontrib>Granitzer, Michael</creatorcontrib><creatorcontrib>Mitrovic, Jelena</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Istaiti, Mahmoud</au><au>Al-Maamari, Mohammed</au><au>Zerhoudi, Saber</au><au>Dinzinger, Michael</au><au>Granitzer, Michael</au><au>Mitrovic, Jelena</au><format>book</format><genre>unknown</genre><ristype>DATA</ristype><title>Extracted external domains from Wikipedia dump - 20/03/2024</title><date>2024-04-27</date><risdate>2024</risdate><abstract>This dataset contains 6,459,779 distinct domains derived from the external links section of Wikipedia pages.
The external links section of a page such as OpenWeb contains only one link.
The primary objective of assembling this dataset is to improve content prioritization and filtering in web crawling techniques.
The dataset is structured as a text file, with each line representing a distinct domain.</abstract><pub>Zenodo</pub><doi>10.5281/zenodo.11076686</doi><orcidid>https://orcid.org/0009-0006-5635-5945</orcidid><orcidid>https://orcid.org/0000-0002-0127-8034</orcidid><orcidid>https://orcid.org/0000-0003-3566-5507</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.5281/zenodo.11076686 |
ispartof | |
issn | |
language | eng |
recordid | cdi_datacite_primary_10_5281_zenodo_11076686 |
source | DataCite |
title | Extracted external domains from Wikipedia dump - 20/03/2024 |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T20%3A22%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite_PQ8&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.au=Istaiti,%20Mahmoud&rft.date=2024-04-27&rft_id=info:doi/10.5281/zenodo.11076686&rft_dat=%3Cdatacite_PQ8%3E10_5281_zenodo_11076686%3C/datacite_PQ8%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |