Replication Data for: Quantifying gender biases towards politicians on Reddit
This dataset contains ~10 million comments posted on Reddit between July 2018 and December 2019 that mention a cis-male or cis-female politician. They were extracted from pushshift's historical data dumps of Reddit comments (https://files.pushshift.io/reddit/comments/). We extracted subreddits...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Dataset |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Marjanovic, Sara Stanczak, Karolina Augenstein, Isabelle |
description | This dataset contains ~10 million comments posted on Reddit between July 2018 and December 2019 that mention a cis-male or cis-female politician. They were extracted from pushshift's historical data dumps of Reddit comments (https://files.pushshift.io/reddit/comments/). We extracted subreddits of political relevance and then isolated comments about politicians using a pre-trained named entity linker. These comments were then used to look at gender biases in comment content (e.g. sentiment and specific adjectives used) and structure (e.g. comment length). We present this dataset for others to use to investigate political gender biases presented on public fora.
The file is compressed as a .7z file and decompresses into a 13 GB .csv file containing all comments used in our paper. The CSV contains the Reddit comment IDs, comment texts (with the politician's name obscured with the token [NAME]), Wikidata ID of the mentioned politician, name used to refer to the politician in question, and various information about the politician as linked to their Wikidata ID (e.g. gender, country of origin, etc.). All comments should be in the English language as they were extracted from predominantly English-speaking communities.
You can read more details about our methodology on comment collection and our investigation on the presented gender biases at our preprint (https://arxiv.org/pdf/2112.12014). |
doi_str_mv | 10.7910/dvn/ywrxep |
format | Dataset |
fullrecord | <record><control><sourceid>datacite_PQ8</sourceid><recordid>TN_cdi_datacite_primary_10_7910_dvn_ywrxep</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_7910_dvn_ywrxep</sourcerecordid><originalsourceid>FETCH-LOGICAL-d71p-5b3b8e3b2c68dfc16cd6cd8235bcf1b69d1a6128c8d4fc8101865e379c2bf7c43</originalsourceid><addsrcrecordid>eNotj8tqwzAURLXpoqTd9Au0LjixrFiWsyvpE1JKQ_ZCulcKF1zZyGpT_31dEhiY1RzmMHYnymXTinKFP3E1ndKvH67Z-94PHYHN1Ef-aLPloU8b_vltY6YwUTzyo4_oE3dkRz_y3J9swpEPfUeZgGwc-Tzde0TKN-wq2G70t5desMPz02H7Wuw-Xt62D7sCGzEUtZNOe-kqUBoDCAU4R1eydhCEUy0Kq0SlQeM6gBal0Kr2smmhcqGBtVyw-zMW58NA2Zsh0ZdNkxGl-Vc0s6I5K8o_jC9OXg</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>dataset</recordtype></control><display><type>dataset</type><title>Replication Data for: Quantifying gender biases towards politicians on Reddit</title><source>DataCite</source><creator>Marjanovic, Sara ; Stanczak, Karolina ; Augenstein, Isabelle</creator><creatorcontrib>Marjanovic, Sara ; Stanczak, Karolina ; Augenstein, Isabelle</creatorcontrib><description>This dataset contains ~10 million comments posted on Reddit between July 2018 and December 2019 that mention a cis-male or cis-female politician. They were extracted from pushshift's historical data dumps of Reddit comments (https://files.pushshift.io/reddit/comments/). We extracted subreddits of political relevance and then isolated comments about politicians using a pre-trained named entity linker. These comments were then used to look at gender biases in comment content (e.g. sentiment and specific adjectives used) and structure (e.g. comment length). We present this dataset for others to use to investigate political gender biases presented on public fora.
The file is compressed as a .7z file and decompresses into a 13 GB .csv file containing all comments used in our paper. The CSV contains the Reddit comment IDs, comment texts (with the politician's name obscured with the token [NAME]), Wikidata ID of the mentioned politician, name used to refer to the politician in question, and various information about the politician as linked to their Wikidata ID (e.g. gender, country of origin, etc.). All comments should be in the English language as they were extracted from predominantly English-speaking communities.
You can read more details about our methodology on comment collection and our investigation on the presented gender biases at our preprint (https://arxiv.org/pdf/2112.12014).</description><identifier>DOI: 10.7910/dvn/ywrxep</identifier><language>eng</language><publisher>Harvard Dataverse</publisher><subject>Computer and Information Science ; Social Sciences</subject><creationdate>2022</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0001-7326-9594 ; 0000-0002-7378-7698 ; 0000-0003-1562-7909</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,1894</link.rule.ids><linktorsrc>$$Uhttps://commons.datacite.org/doi.org/10.7910/dvn/ywrxep$$EView_record_in_DataCite.org$$FView_record_in_$$GDataCite.org$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Marjanovic, Sara</creatorcontrib><creatorcontrib>Stanczak, Karolina</creatorcontrib><creatorcontrib>Augenstein, Isabelle</creatorcontrib><title>Replication Data for: Quantifying gender biases towards politicians on Reddit</title><description>This dataset contains ~10 million comments posted on Reddit between July 2018 and December 2019 that mention a cis-male or cis-female politician. They were extracted from pushshift's historical data dumps of Reddit comments (https://files.pushshift.io/reddit/comments/). We extracted subreddits of political relevance and then isolated comments about politicians using a pre-trained named entity linker. These comments were then used to look at gender biases in comment content (e.g. sentiment and specific adjectives used) and structure (e.g. comment length). We present this dataset for others to use to investigate political gender biases presented on public fora.
The file is compressed as a .7z file and decompresses into a 13 GB .csv file containing all comments used in our paper. The CSV contains the Reddit comment IDs, comment texts (with the politician's name obscured with the token [NAME]), Wikidata ID of the mentioned politician, name used to refer to the politician in question, and various information about the politician as linked to their Wikidata ID (e.g. gender, country of origin, etc.). All comments should be in the English language as they were extracted from predominantly English-speaking communities.
You can read more details about our methodology on comment collection and our investigation on the presented gender biases at our preprint (https://arxiv.org/pdf/2112.12014).</description><subject>Computer and Information Science</subject><subject>Social Sciences</subject><fulltext>true</fulltext><rsrctype>dataset</rsrctype><creationdate>2022</creationdate><recordtype>dataset</recordtype><sourceid>PQ8</sourceid><recordid>eNotj8tqwzAURLXpoqTd9Au0LjixrFiWsyvpE1JKQ_ZCulcKF1zZyGpT_31dEhiY1RzmMHYnymXTinKFP3E1ndKvH67Z-94PHYHN1Ef-aLPloU8b_vltY6YwUTzyo4_oE3dkRz_y3J9swpEPfUeZgGwc-Tzde0TKN-wq2G70t5desMPz02H7Wuw-Xt62D7sCGzEUtZNOe-kqUBoDCAU4R1eydhCEUy0Kq0SlQeM6gBal0Kr2smmhcqGBtVyw-zMW58NA2Zsh0ZdNkxGl-Vc0s6I5K8o_jC9OXg</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Marjanovic, Sara</creator><creator>Stanczak, Karolina</creator><creator>Augenstein, Isabelle</creator><general>Harvard Dataverse</general><scope>DYCCY</scope><scope>PQ8</scope><orcidid>https://orcid.org/0000-0001-7326-9594</orcidid><orcidid>https://orcid.org/0000-0002-7378-7698</orcidid><orcidid>https://orcid.org/0000-0003-1562-7909</orcidid></search><sort><creationdate>2022</creationdate><title>Replication Data for: Quantifying gender biases towards politicians on Reddit</title><author>Marjanovic, Sara ; Stanczak, Karolina ; Augenstein, Isabelle</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-d71p-5b3b8e3b2c68dfc16cd6cd8235bcf1b69d1a6128c8d4fc8101865e379c2bf7c43</frbrgroupid><rsrctype>datasets</rsrctype><prefilter>datasets</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer and Information Science</topic><topic>Social Sciences</topic><toplevel>online_resources</toplevel><creatorcontrib>Marjanovic, Sara</creatorcontrib><creatorcontrib>Stanczak, Karolina</creatorcontrib><creatorcontrib>Augenstein, Isabelle</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Marjanovic, Sara</au><au>Stanczak, Karolina</au><au>Augenstein, Isabelle</au><format>book</format><genre>unknown</genre><ristype>DATA</ristype><title>Replication Data for: Quantifying gender biases towards politicians on Reddit</title><date>2022</date><risdate>2022</risdate><abstract>This dataset contains ~10 million comments posted on Reddit between July 2018 and December 2019 that mention a cis-male or cis-female politician. They were extracted from pushshift's historical data dumps of Reddit comments (https://files.pushshift.io/reddit/comments/). We extracted subreddits of political relevance and then isolated comments about politicians using a pre-trained named entity linker. These comments were then used to look at gender biases in comment content (e.g. sentiment and specific adjectives used) and structure (e.g. comment length). We present this dataset for others to use to investigate political gender biases presented on public fora.
The file is compressed as a .7z file and decompresses into a 13 GB .csv file containing all comments used in our paper. The CSV contains the Reddit comment IDs, comment texts (with the politician's name obscured with the token [NAME]), Wikidata ID of the mentioned politician, name used to refer to the politician in question, and various information about the politician as linked to their Wikidata ID (e.g. gender, country of origin, etc.). All comments should be in the English language as they were extracted from predominantly English-speaking communities.
You can read more details about our methodology on comment collection and our investigation on the presented gender biases at our preprint (https://arxiv.org/pdf/2112.12014).</abstract><pub>Harvard Dataverse</pub><doi>10.7910/dvn/ywrxep</doi><orcidid>https://orcid.org/0000-0001-7326-9594</orcidid><orcidid>https://orcid.org/0000-0002-7378-7698</orcidid><orcidid>https://orcid.org/0000-0003-1562-7909</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.7910/dvn/ywrxep |
ispartof | |
issn | |
language | eng |
recordid | cdi_datacite_primary_10_7910_dvn_ywrxep |
source | DataCite |
subjects | Computer and Information Science Social Sciences |
title | Replication Data for: Quantifying gender biases towards politicians on Reddit |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T05%3A17%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite_PQ8&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.au=Marjanovic,%20Sara&rft.date=2022&rft_id=info:doi/10.7910/dvn/ywrxep&rft_dat=%3Cdatacite_PQ8%3E10_7910_dvn_ywrxep%3C/datacite_PQ8%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |