MK-SQuIT: Synthesizing Questions using Iterative Template-filling

The aim of this work is to create a framework for synthetically generating question/query pairs with as little human input as possible. These datasets can be used to train machine translation systems to convert natural language questions into queries, a useful tool that could allow for more natural...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Spiegel, Benjamin A, Cheong, Vincent, Kaplan, James E, Sanchez, Anthony
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Spiegel, Benjamin A Cheong, Vincent Kaplan, James E Sanchez, Anthony
description	The aim of this work is to create a framework for synthetically generating question/query pairs with as little human input as possible. These datasets can be used to train machine translation systems to convert natural language questions into queries, a useful tool that could allow for more natural access to database information. Existing methods of dataset generation require human input that scales linearly with the size of the dataset, resulting in small datasets. Aside from a short initial configuration task, no human input is required during the query generation process of our system. We leverage WikiData, a knowledge base of RDF triples, as a source for generating the main content of questions and queries. Using multiple layers of question templating we are able to sidestep some of the most challenging parts of query generation that have been handled by humans in previous methods; humans never have to modify, aggregate, inspect, annotate, or generate any questions or queries at any step in the process. Our system is easily configurable to multiple domains and can be modified to generate queries in natural languages other than English. We also present an example dataset of 110,000 question/query pairs across four WikiData domains. We then present a baseline model that we train using the dataset which shows promise in a commercial QA setting.
doi_str_mv	10.48550/arxiv.2011.02566
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2011_02566</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2011_02566</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-4050440906df63f9abea24704ee83bb614234159f0be14b2f3b4c83b2a7b5afd3</originalsourceid><addsrcrecordid>eNotj8tOwzAQRb1hgQofwIr8gMPYHjsNu6riEbUIVc0-GtMxWEpDlTgV5ev7gNXV1ZGOdIS4U5Dj1Fp4oP4n7nMNSuWgrXPXYva2kOvVWNWP2frQpS8e4m_sPrPVyEOK392QjcP5V4l7SnHPWc3bXUuJZYhte0I34ipQO_Dt_05E_fxUz1_l8v2lms-WklzhJIIFRCjBbYIzoSTPpLEAZJ4a751CbVDZMoBnhV4H4_HjRDQV3lLYmIm4_9NeGppdH7fUH5pzS3NpMUcl40Pv</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>MK-SQuIT: Synthesizing Questions using Iterative Template-filling</title><source>arXiv.org</source><creator>Spiegel, Benjamin A ; Cheong, Vincent ; Kaplan, James E ; Sanchez, Anthony</creator><creatorcontrib>Spiegel, Benjamin A ; Cheong, Vincent ; Kaplan, James E ; Sanchez, Anthony</creatorcontrib><description>The aim of this work is to create a framework for synthetically generating question/query pairs with as little human input as possible. These datasets can be used to train machine translation systems to convert natural language questions into queries, a useful tool that could allow for more natural access to database information. Existing methods of dataset generation require human input that scales linearly with the size of the dataset, resulting in small datasets. Aside from a short initial configuration task, no human input is required during the query generation process of our system. We leverage WikiData, a knowledge base of RDF triples, as a source for generating the main content of questions and queries. Using multiple layers of question templating we are able to sidestep some of the most challenging parts of query generation that have been handled by humans in previous methods; humans never have to modify, aggregate, inspect, annotate, or generate any questions or queries at any step in the process. Our system is easily configurable to multiple domains and can be modified to generate queries in natural languages other than English. We also present an example dataset of 110,000 question/query pairs across four WikiData domains. We then present a baseline model that we train using the dataset which shows promise in a commercial QA setting.</description><identifier>DOI: 10.48550/arxiv.2011.02566</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2020-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2011.02566$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2011.02566$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Spiegel, Benjamin A</creatorcontrib><creatorcontrib>Cheong, Vincent</creatorcontrib><creatorcontrib>Kaplan, James E</creatorcontrib><creatorcontrib>Sanchez, Anthony</creatorcontrib><title>MK-SQuIT: Synthesizing Questions using Iterative Template-filling</title><description>The aim of this work is to create a framework for synthetically generating question/query pairs with as little human input as possible. These datasets can be used to train machine translation systems to convert natural language questions into queries, a useful tool that could allow for more natural access to database information. Existing methods of dataset generation require human input that scales linearly with the size of the dataset, resulting in small datasets. Aside from a short initial configuration task, no human input is required during the query generation process of our system. We leverage WikiData, a knowledge base of RDF triples, as a source for generating the main content of questions and queries. Using multiple layers of question templating we are able to sidestep some of the most challenging parts of query generation that have been handled by humans in previous methods; humans never have to modify, aggregate, inspect, annotate, or generate any questions or queries at any step in the process. Our system is easily configurable to multiple domains and can be modified to generate queries in natural languages other than English. We also present an example dataset of 110,000 question/query pairs across four WikiData domains. We then present a baseline model that we train using the dataset which shows promise in a commercial QA setting.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAQRb1hgQofwIr8gMPYHjsNu6riEbUIVc0-GtMxWEpDlTgV5ev7gNXV1ZGOdIS4U5Dj1Fp4oP4n7nMNSuWgrXPXYva2kOvVWNWP2frQpS8e4m_sPrPVyEOK392QjcP5V4l7SnHPWc3bXUuJZYhte0I34ipQO_Dt_05E_fxUz1_l8v2lms-WklzhJIIFRCjBbYIzoSTPpLEAZJ4a751CbVDZMoBnhV4H4_HjRDQV3lLYmIm4_9NeGppdH7fUH5pzS3NpMUcl40Pv</recordid><startdate>20201104</startdate><enddate>20201104</enddate><creator>Spiegel, Benjamin A</creator><creator>Cheong, Vincent</creator><creator>Kaplan, James E</creator><creator>Sanchez, Anthony</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20201104</creationdate><title>MK-SQuIT: Synthesizing Questions using Iterative Template-filling</title><author>Spiegel, Benjamin A ; Cheong, Vincent ; Kaplan, James E ; Sanchez, Anthony</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-4050440906df63f9abea24704ee83bb614234159f0be14b2f3b4c83b2a7b5afd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Spiegel, Benjamin A</creatorcontrib><creatorcontrib>Cheong, Vincent</creatorcontrib><creatorcontrib>Kaplan, James E</creatorcontrib><creatorcontrib>Sanchez, Anthony</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Spiegel, Benjamin A</au><au>Cheong, Vincent</au><au>Kaplan, James E</au><au>Sanchez, Anthony</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MK-SQuIT: Synthesizing Questions using Iterative Template-filling</atitle><date>2020-11-04</date><risdate>2020</risdate><abstract>The aim of this work is to create a framework for synthetically generating question/query pairs with as little human input as possible. These datasets can be used to train machine translation systems to convert natural language questions into queries, a useful tool that could allow for more natural access to database information. Existing methods of dataset generation require human input that scales linearly with the size of the dataset, resulting in small datasets. Aside from a short initial configuration task, no human input is required during the query generation process of our system. We leverage WikiData, a knowledge base of RDF triples, as a source for generating the main content of questions and queries. Using multiple layers of question templating we are able to sidestep some of the most challenging parts of query generation that have been handled by humans in previous methods; humans never have to modify, aggregate, inspect, annotate, or generate any questions or queries at any step in the process. Our system is easily configurable to multiple domains and can be modified to generate queries in natural languages other than English. We also present an example dataset of 110,000 question/query pairs across four WikiData domains. We then present a baseline model that we train using the dataset which shows promise in a commercial QA setting.</abstract><doi>10.48550/arxiv.2011.02566</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2011.02566
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2011_02566
source	arXiv.org
subjects	Computer Science - Computation and Language
title	MK-SQuIT: Synthesizing Questions using Iterative Template-filling
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T15%3A42%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MK-SQuIT:%20Synthesizing%20Questions%20using%20Iterative%20Template-filling&rft.au=Spiegel,%20Benjamin%20A&rft.date=2020-11-04&rft_id=info:doi/10.48550/arxiv.2011.02566&rft_dat=%3Carxiv_GOX%3E2011_02566%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true