Replication Data for: Quantifying gender biases towards politicians on Reddit

This dataset contains ~10 million comments posted on Reddit between July 2018 and December 2019 that mention a cis-male or cis-female politician. They were extracted from pushshift's historical data dumps of Reddit comments (https://files.pushshift.io/reddit/comments/). We extracted subreddits...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Marjanovic, Sara, Stanczak, Karolina, Augenstein, Isabelle
Format: Dataset
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Marjanovic, Sara
Stanczak, Karolina
Augenstein, Isabelle
description This dataset contains ~10 million comments posted on Reddit between July 2018 and December 2019 that mention a cis-male or cis-female politician. They were extracted from pushshift's historical data dumps of Reddit comments (https://files.pushshift.io/reddit/comments/). We extracted subreddits of political relevance and then isolated comments about politicians using a pre-trained named entity linker. These comments were then used to look at gender biases in comment content (e.g. sentiment and specific adjectives used) and structure (e.g. comment length). We present this dataset for others to use to investigate political gender biases presented on public fora. The file is compressed as a .7z file and decompresses into a 13 GB .csv file containing all comments used in our paper. The CSV contains the Reddit comment IDs, comment texts (with the politician's name obscured with the token [NAME]), Wikidata ID of the mentioned politician, name used to refer to the politician in question, and various information about the politician as linked to their Wikidata ID (e.g. gender, country of origin, etc.). All comments should be in the English language as they were extracted from predominantly English-speaking communities. You can read more details about our methodology on comment collection and our investigation on the presented gender biases at our preprint (https://arxiv.org/pdf/2112.12014).
doi_str_mv 10.7910/dvn/ywrxep
format Dataset
fullrecord <record><control><sourceid>datacite_PQ8</sourceid><recordid>TN_cdi_datacite_primary_10_7910_dvn_ywrxep</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_7910_dvn_ywrxep</sourcerecordid><originalsourceid>FETCH-LOGICAL-d71p-5b3b8e3b2c68dfc16cd6cd8235bcf1b69d1a6128c8d4fc8101865e379c2bf7c43</originalsourceid><addsrcrecordid>eNotj8tqwzAURLXpoqTd9Au0LjixrFiWsyvpE1JKQ_ZCulcKF1zZyGpT_31dEhiY1RzmMHYnymXTinKFP3E1ndKvH67Z-94PHYHN1Ef-aLPloU8b_vltY6YwUTzyo4_oE3dkRz_y3J9swpEPfUeZgGwc-Tzde0TKN-wq2G70t5desMPz02H7Wuw-Xt62D7sCGzEUtZNOe-kqUBoDCAU4R1eydhCEUy0Kq0SlQeM6gBal0Kr2smmhcqGBtVyw-zMW58NA2Zsh0ZdNkxGl-Vc0s6I5K8o_jC9OXg</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>dataset</recordtype></control><display><type>dataset</type><title>Replication Data for: Quantifying gender biases towards politicians on Reddit</title><source>DataCite</source><creator>Marjanovic, Sara ; Stanczak, Karolina ; Augenstein, Isabelle</creator><creatorcontrib>Marjanovic, Sara ; Stanczak, Karolina ; Augenstein, Isabelle</creatorcontrib><description>This dataset contains ~10 million comments posted on Reddit between July 2018 and December 2019 that mention a cis-male or cis-female politician. They were extracted from pushshift's historical data dumps of Reddit comments (https://files.pushshift.io/reddit/comments/). We extracted subreddits of political relevance and then isolated comments about politicians using a pre-trained named entity linker. These comments were then used to look at gender biases in comment content (e.g. sentiment and specific adjectives used) and structure (e.g. comment length). We present this dataset for others to use to investigate political gender biases presented on public fora. The file is compressed as a .7z file and decompresses into a 13 GB .csv file containing all comments used in our paper. The CSV contains the Reddit comment IDs, comment texts (with the politician's name obscured with the token [NAME]), Wikidata ID of the mentioned politician, name used to refer to the politician in question, and various information about the politician as linked to their Wikidata ID (e.g. gender, country of origin, etc.). All comments should be in the English language as they were extracted from predominantly English-speaking communities. You can read more details about our methodology on comment collection and our investigation on the presented gender biases at our preprint (https://arxiv.org/pdf/2112.12014).</description><identifier>DOI: 10.7910/dvn/ywrxep</identifier><language>eng</language><publisher>Harvard Dataverse</publisher><subject>Computer and Information Science ; Social Sciences</subject><creationdate>2022</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0001-7326-9594 ; 0000-0002-7378-7698 ; 0000-0003-1562-7909</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,1894</link.rule.ids><linktorsrc>$$Uhttps://commons.datacite.org/doi.org/10.7910/dvn/ywrxep$$EView_record_in_DataCite.org$$FView_record_in_$$GDataCite.org$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Marjanovic, Sara</creatorcontrib><creatorcontrib>Stanczak, Karolina</creatorcontrib><creatorcontrib>Augenstein, Isabelle</creatorcontrib><title>Replication Data for: Quantifying gender biases towards politicians on Reddit</title><description>This dataset contains ~10 million comments posted on Reddit between July 2018 and December 2019 that mention a cis-male or cis-female politician. They were extracted from pushshift's historical data dumps of Reddit comments (https://files.pushshift.io/reddit/comments/). We extracted subreddits of political relevance and then isolated comments about politicians using a pre-trained named entity linker. These comments were then used to look at gender biases in comment content (e.g. sentiment and specific adjectives used) and structure (e.g. comment length). We present this dataset for others to use to investigate political gender biases presented on public fora. The file is compressed as a .7z file and decompresses into a 13 GB .csv file containing all comments used in our paper. The CSV contains the Reddit comment IDs, comment texts (with the politician's name obscured with the token [NAME]), Wikidata ID of the mentioned politician, name used to refer to the politician in question, and various information about the politician as linked to their Wikidata ID (e.g. gender, country of origin, etc.). All comments should be in the English language as they were extracted from predominantly English-speaking communities. You can read more details about our methodology on comment collection and our investigation on the presented gender biases at our preprint (https://arxiv.org/pdf/2112.12014).</description><subject>Computer and Information Science</subject><subject>Social Sciences</subject><fulltext>true</fulltext><rsrctype>dataset</rsrctype><creationdate>2022</creationdate><recordtype>dataset</recordtype><sourceid>PQ8</sourceid><recordid>eNotj8tqwzAURLXpoqTd9Au0LjixrFiWsyvpE1JKQ_ZCulcKF1zZyGpT_31dEhiY1RzmMHYnymXTinKFP3E1ndKvH67Z-94PHYHN1Ef-aLPloU8b_vltY6YwUTzyo4_oE3dkRz_y3J9swpEPfUeZgGwc-Tzde0TKN-wq2G70t5desMPz02H7Wuw-Xt62D7sCGzEUtZNOe-kqUBoDCAU4R1eydhCEUy0Kq0SlQeM6gBal0Kr2smmhcqGBtVyw-zMW58NA2Zsh0ZdNkxGl-Vc0s6I5K8o_jC9OXg</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Marjanovic, Sara</creator><creator>Stanczak, Karolina</creator><creator>Augenstein, Isabelle</creator><general>Harvard Dataverse</general><scope>DYCCY</scope><scope>PQ8</scope><orcidid>https://orcid.org/0000-0001-7326-9594</orcidid><orcidid>https://orcid.org/0000-0002-7378-7698</orcidid><orcidid>https://orcid.org/0000-0003-1562-7909</orcidid></search><sort><creationdate>2022</creationdate><title>Replication Data for: Quantifying gender biases towards politicians on Reddit</title><author>Marjanovic, Sara ; Stanczak, Karolina ; Augenstein, Isabelle</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-d71p-5b3b8e3b2c68dfc16cd6cd8235bcf1b69d1a6128c8d4fc8101865e379c2bf7c43</frbrgroupid><rsrctype>datasets</rsrctype><prefilter>datasets</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer and Information Science</topic><topic>Social Sciences</topic><toplevel>online_resources</toplevel><creatorcontrib>Marjanovic, Sara</creatorcontrib><creatorcontrib>Stanczak, Karolina</creatorcontrib><creatorcontrib>Augenstein, Isabelle</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Marjanovic, Sara</au><au>Stanczak, Karolina</au><au>Augenstein, Isabelle</au><format>book</format><genre>unknown</genre><ristype>DATA</ristype><title>Replication Data for: Quantifying gender biases towards politicians on Reddit</title><date>2022</date><risdate>2022</risdate><abstract>This dataset contains ~10 million comments posted on Reddit between July 2018 and December 2019 that mention a cis-male or cis-female politician. They were extracted from pushshift's historical data dumps of Reddit comments (https://files.pushshift.io/reddit/comments/). We extracted subreddits of political relevance and then isolated comments about politicians using a pre-trained named entity linker. These comments were then used to look at gender biases in comment content (e.g. sentiment and specific adjectives used) and structure (e.g. comment length). We present this dataset for others to use to investigate political gender biases presented on public fora. The file is compressed as a .7z file and decompresses into a 13 GB .csv file containing all comments used in our paper. The CSV contains the Reddit comment IDs, comment texts (with the politician's name obscured with the token [NAME]), Wikidata ID of the mentioned politician, name used to refer to the politician in question, and various information about the politician as linked to their Wikidata ID (e.g. gender, country of origin, etc.). All comments should be in the English language as they were extracted from predominantly English-speaking communities. You can read more details about our methodology on comment collection and our investigation on the presented gender biases at our preprint (https://arxiv.org/pdf/2112.12014).</abstract><pub>Harvard Dataverse</pub><doi>10.7910/dvn/ywrxep</doi><orcidid>https://orcid.org/0000-0001-7326-9594</orcidid><orcidid>https://orcid.org/0000-0002-7378-7698</orcidid><orcidid>https://orcid.org/0000-0003-1562-7909</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.7910/dvn/ywrxep
ispartof
issn
language eng
recordid cdi_datacite_primary_10_7910_dvn_ywrxep
source DataCite
subjects Computer and Information Science
Social Sciences
title Replication Data for: Quantifying gender biases towards politicians on Reddit
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T05%3A17%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite_PQ8&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.au=Marjanovic,%20Sara&rft.date=2022&rft_id=info:doi/10.7910/dvn/ywrxep&rft_dat=%3Cdatacite_PQ8%3E10_7910_dvn_ywrxep%3C/datacite_PQ8%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true