PolicyQA: A Reading Comprehension Dataset for Privacy Policies
Privacy policy documents are long and verbose. A question answering (QA) system can assist users in finding the information that is relevant and important to them. Prior studies in this domain frame the QA task as retrieving the most relevant text segment or a list of sentences from the policy docum...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Ahmad, Wasi Uddin Chi, Jianfeng Tian, Yuan Chang, Kai-Wei |
description | Privacy policy documents are long and verbose. A question answering (QA)
system can assist users in finding the information that is relevant and
important to them. Prior studies in this domain frame the QA task as retrieving
the most relevant text segment or a list of sentences from the policy document
given a question. On the contrary, we argue that providing users with a short
text span from policy documents reduces the burden of searching the target
information from a lengthy text segment. In this paper, we present PolicyQA, a
dataset that contains 25,017 reading comprehension style examples curated from
an existing corpus of 115 website privacy policies. PolicyQA provides 714
human-annotated questions written for a wide range of privacy practices. We
evaluate two existing neural QA models and perform rigorous analysis to reveal
the advantages and challenges offered by PolicyQA. |
doi_str_mv | 10.48550/arxiv.2010.02557 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2010_02557</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2010_02557</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-f9b72967ad49231077f756028d9a4452bf5412e7979c321603a8c571c97596cd3</originalsourceid><addsrcrecordid>eNotz81Kw0AUBeDZuJC2D-DKeYHU-btzM10IIdYfKFil-3A7mdGBNimTUszbq9HVgQPnwMfYjRRLUwKIO8pf6bJU4qcQCgCv2f22PyQ_vlUrXvH3QG3qPnjdH085fIZuSH3HH-hMQzjz2Ge-zelCfuTTKoVhzq4iHYaw-M8Z2z2ud_VzsXl9eqmrTUEWsYhuj8pZpNY4paVAjAhWqLJ1ZAyofQQjVUCHzmslrdBUekDpHYKzvtUzdvt3OwGaU05HymPzC2kmiP4GjrFA7Q</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>PolicyQA: A Reading Comprehension Dataset for Privacy Policies</title><source>arXiv.org</source><creator>Ahmad, Wasi Uddin ; Chi, Jianfeng ; Tian, Yuan ; Chang, Kai-Wei</creator><creatorcontrib>Ahmad, Wasi Uddin ; Chi, Jianfeng ; Tian, Yuan ; Chang, Kai-Wei</creatorcontrib><description>Privacy policy documents are long and verbose. A question answering (QA)
system can assist users in finding the information that is relevant and
important to them. Prior studies in this domain frame the QA task as retrieving
the most relevant text segment or a list of sentences from the policy document
given a question. On the contrary, we argue that providing users with a short
text span from policy documents reduces the burden of searching the target
information from a lengthy text segment. In this paper, we present PolicyQA, a
dataset that contains 25,017 reading comprehension style examples curated from
an existing corpus of 115 website privacy policies. PolicyQA provides 714
human-annotated questions written for a wide range of privacy practices. We
evaluate two existing neural QA models and perform rigorous analysis to reveal
the advantages and challenges offered by PolicyQA.</description><identifier>DOI: 10.48550/arxiv.2010.02557</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2020-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2010.02557$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2010.02557$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Ahmad, Wasi Uddin</creatorcontrib><creatorcontrib>Chi, Jianfeng</creatorcontrib><creatorcontrib>Tian, Yuan</creatorcontrib><creatorcontrib>Chang, Kai-Wei</creatorcontrib><title>PolicyQA: A Reading Comprehension Dataset for Privacy Policies</title><description>Privacy policy documents are long and verbose. A question answering (QA)
system can assist users in finding the information that is relevant and
important to them. Prior studies in this domain frame the QA task as retrieving
the most relevant text segment or a list of sentences from the policy document
given a question. On the contrary, we argue that providing users with a short
text span from policy documents reduces the burden of searching the target
information from a lengthy text segment. In this paper, we present PolicyQA, a
dataset that contains 25,017 reading comprehension style examples curated from
an existing corpus of 115 website privacy policies. PolicyQA provides 714
human-annotated questions written for a wide range of privacy practices. We
evaluate two existing neural QA models and perform rigorous analysis to reveal
the advantages and challenges offered by PolicyQA.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz81Kw0AUBeDZuJC2D-DKeYHU-btzM10IIdYfKFil-3A7mdGBNimTUszbq9HVgQPnwMfYjRRLUwKIO8pf6bJU4qcQCgCv2f22PyQ_vlUrXvH3QG3qPnjdH085fIZuSH3HH-hMQzjz2Ge-zelCfuTTKoVhzq4iHYaw-M8Z2z2ud_VzsXl9eqmrTUEWsYhuj8pZpNY4paVAjAhWqLJ1ZAyofQQjVUCHzmslrdBUekDpHYKzvtUzdvt3OwGaU05HymPzC2kmiP4GjrFA7Q</recordid><startdate>20201006</startdate><enddate>20201006</enddate><creator>Ahmad, Wasi Uddin</creator><creator>Chi, Jianfeng</creator><creator>Tian, Yuan</creator><creator>Chang, Kai-Wei</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20201006</creationdate><title>PolicyQA: A Reading Comprehension Dataset for Privacy Policies</title><author>Ahmad, Wasi Uddin ; Chi, Jianfeng ; Tian, Yuan ; Chang, Kai-Wei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-f9b72967ad49231077f756028d9a4452bf5412e7979c321603a8c571c97596cd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Ahmad, Wasi Uddin</creatorcontrib><creatorcontrib>Chi, Jianfeng</creatorcontrib><creatorcontrib>Tian, Yuan</creatorcontrib><creatorcontrib>Chang, Kai-Wei</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ahmad, Wasi Uddin</au><au>Chi, Jianfeng</au><au>Tian, Yuan</au><au>Chang, Kai-Wei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>PolicyQA: A Reading Comprehension Dataset for Privacy Policies</atitle><date>2020-10-06</date><risdate>2020</risdate><abstract>Privacy policy documents are long and verbose. A question answering (QA)
system can assist users in finding the information that is relevant and
important to them. Prior studies in this domain frame the QA task as retrieving
the most relevant text segment or a list of sentences from the policy document
given a question. On the contrary, we argue that providing users with a short
text span from policy documents reduces the burden of searching the target
information from a lengthy text segment. In this paper, we present PolicyQA, a
dataset that contains 25,017 reading comprehension style examples curated from
an existing corpus of 115 website privacy policies. PolicyQA provides 714
human-annotated questions written for a wide range of privacy practices. We
evaluate two existing neural QA models and perform rigorous analysis to reveal
the advantages and challenges offered by PolicyQA.</abstract><doi>10.48550/arxiv.2010.02557</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2010.02557 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2010_02557 |
source | arXiv.org |
subjects | Computer Science - Computation and Language |
title | PolicyQA: A Reading Comprehension Dataset for Privacy Policies |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T04%3A10%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=PolicyQA:%20A%20Reading%20Comprehension%20Dataset%20for%20Privacy%20Policies&rft.au=Ahmad,%20Wasi%20Uddin&rft.date=2020-10-06&rft_id=info:doi/10.48550/arxiv.2010.02557&rft_dat=%3Carxiv_GOX%3E2010_02557%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |