PolicyQA: A Reading Comprehension Dataset for Privacy Policies

Privacy policy documents are long and verbose. A question answering (QA) system can assist users in finding the information that is relevant and important to them. Prior studies in this domain frame the QA task as retrieving the most relevant text segment or a list of sentences from the policy docum...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ahmad, Wasi Uddin, Chi, Jianfeng, Tian, Yuan, Chang, Kai-Wei
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Ahmad, Wasi Uddin Chi, Jianfeng Tian, Yuan Chang, Kai-Wei
description	Privacy policy documents are long and verbose. A question answering (QA) system can assist users in finding the information that is relevant and important to them. Prior studies in this domain frame the QA task as retrieving the most relevant text segment or a list of sentences from the policy document given a question. On the contrary, we argue that providing users with a short text span from policy documents reduces the burden of searching the target information from a lengthy text segment. In this paper, we present PolicyQA, a dataset that contains 25,017 reading comprehension style examples curated from an existing corpus of 115 website privacy policies. PolicyQA provides 714 human-annotated questions written for a wide range of privacy practices. We evaluate two existing neural QA models and perform rigorous analysis to reveal the advantages and challenges offered by PolicyQA.
doi_str_mv	10.48550/arxiv.2010.02557
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2010_02557</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2010_02557</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-f9b72967ad49231077f756028d9a4452bf5412e7979c321603a8c571c97596cd3</originalsourceid><addsrcrecordid>eNotz81Kw0AUBeDZuJC2D-DKeYHU-btzM10IIdYfKFil-3A7mdGBNimTUszbq9HVgQPnwMfYjRRLUwKIO8pf6bJU4qcQCgCv2f22PyQ_vlUrXvH3QG3qPnjdH085fIZuSH3HH-hMQzjz2Ge-zelCfuTTKoVhzq4iHYaw-M8Z2z2ud_VzsXl9eqmrTUEWsYhuj8pZpNY4paVAjAhWqLJ1ZAyofQQjVUCHzmslrdBUekDpHYKzvtUzdvt3OwGaU05HymPzC2kmiP4GjrFA7Q</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>PolicyQA: A Reading Comprehension Dataset for Privacy Policies</title><source>arXiv.org</source><creator>Ahmad, Wasi Uddin ; Chi, Jianfeng ; Tian, Yuan ; Chang, Kai-Wei</creator><creatorcontrib>Ahmad, Wasi Uddin ; Chi, Jianfeng ; Tian, Yuan ; Chang, Kai-Wei</creatorcontrib><description>Privacy policy documents are long and verbose. A question answering (QA) system can assist users in finding the information that is relevant and important to them. Prior studies in this domain frame the QA task as retrieving the most relevant text segment or a list of sentences from the policy document given a question. On the contrary, we argue that providing users with a short text span from policy documents reduces the burden of searching the target information from a lengthy text segment. In this paper, we present PolicyQA, a dataset that contains 25,017 reading comprehension style examples curated from an existing corpus of 115 website privacy policies. PolicyQA provides 714 human-annotated questions written for a wide range of privacy practices. We evaluate two existing neural QA models and perform rigorous analysis to reveal the advantages and challenges offered by PolicyQA.</description><identifier>DOI: 10.48550/arxiv.2010.02557</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2020-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2010.02557$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2010.02557$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Ahmad, Wasi Uddin</creatorcontrib><creatorcontrib>Chi, Jianfeng</creatorcontrib><creatorcontrib>Tian, Yuan</creatorcontrib><creatorcontrib>Chang, Kai-Wei</creatorcontrib><title>PolicyQA: A Reading Comprehension Dataset for Privacy Policies</title><description>Privacy policy documents are long and verbose. A question answering (QA) system can assist users in finding the information that is relevant and important to them. Prior studies in this domain frame the QA task as retrieving the most relevant text segment or a list of sentences from the policy document given a question. On the contrary, we argue that providing users with a short text span from policy documents reduces the burden of searching the target information from a lengthy text segment. In this paper, we present PolicyQA, a dataset that contains 25,017 reading comprehension style examples curated from an existing corpus of 115 website privacy policies. PolicyQA provides 714 human-annotated questions written for a wide range of privacy practices. We evaluate two existing neural QA models and perform rigorous analysis to reveal the advantages and challenges offered by PolicyQA.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz81Kw0AUBeDZuJC2D-DKeYHU-btzM10IIdYfKFil-3A7mdGBNimTUszbq9HVgQPnwMfYjRRLUwKIO8pf6bJU4qcQCgCv2f22PyQ_vlUrXvH3QG3qPnjdH085fIZuSH3HH-hMQzjz2Ge-zelCfuTTKoVhzq4iHYaw-M8Z2z2ud_VzsXl9eqmrTUEWsYhuj8pZpNY4paVAjAhWqLJ1ZAyofQQjVUCHzmslrdBUekDpHYKzvtUzdvt3OwGaU05HymPzC2kmiP4GjrFA7Q</recordid><startdate>20201006</startdate><enddate>20201006</enddate><creator>Ahmad, Wasi Uddin</creator><creator>Chi, Jianfeng</creator><creator>Tian, Yuan</creator><creator>Chang, Kai-Wei</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20201006</creationdate><title>PolicyQA: A Reading Comprehension Dataset for Privacy Policies</title><author>Ahmad, Wasi Uddin ; Chi, Jianfeng ; Tian, Yuan ; Chang, Kai-Wei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-f9b72967ad49231077f756028d9a4452bf5412e7979c321603a8c571c97596cd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Ahmad, Wasi Uddin</creatorcontrib><creatorcontrib>Chi, Jianfeng</creatorcontrib><creatorcontrib>Tian, Yuan</creatorcontrib><creatorcontrib>Chang, Kai-Wei</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ahmad, Wasi Uddin</au><au>Chi, Jianfeng</au><au>Tian, Yuan</au><au>Chang, Kai-Wei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>PolicyQA: A Reading Comprehension Dataset for Privacy Policies</atitle><date>2020-10-06</date><risdate>2020</risdate><abstract>Privacy policy documents are long and verbose. A question answering (QA) system can assist users in finding the information that is relevant and important to them. Prior studies in this domain frame the QA task as retrieving the most relevant text segment or a list of sentences from the policy document given a question. On the contrary, we argue that providing users with a short text span from policy documents reduces the burden of searching the target information from a lengthy text segment. In this paper, we present PolicyQA, a dataset that contains 25,017 reading comprehension style examples curated from an existing corpus of 115 website privacy policies. PolicyQA provides 714 human-annotated questions written for a wide range of privacy practices. We evaluate two existing neural QA models and perform rigorous analysis to reveal the advantages and challenges offered by PolicyQA.</abstract><doi>10.48550/arxiv.2010.02557</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2010.02557
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2010_02557
source	arXiv.org
subjects	Computer Science - Computation and Language
title	PolicyQA: A Reading Comprehension Dataset for Privacy Policies
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T04%3A10%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=PolicyQA:%20A%20Reading%20Comprehension%20Dataset%20for%20Privacy%20Policies&rft.au=Ahmad,%20Wasi%20Uddin&rft.date=2020-10-06&rft_id=info:doi/10.48550/arxiv.2010.02557&rft_dat=%3Carxiv_GOX%3E2010_02557%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true