Dialogizer: Context-aware Conversational-QA Dataset Generation from Textual Sources
To address the data scarcity issue in Conversational question answering (ConvQA), a dialog inpainting method, which utilizes documents to generate ConvQA datasets, has been proposed. However, the original dialog inpainting model is trained solely on the dialog reconstruction task, resulting in the g...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Hwang, Yerin Kim, Yongil Bae, Hyunkyung Bang, Jeesoo Lee, Hwanhee Jung, Kyomin |
description | To address the data scarcity issue in Conversational question answering
(ConvQA), a dialog inpainting method, which utilizes documents to generate
ConvQA datasets, has been proposed. However, the original dialog inpainting
model is trained solely on the dialog reconstruction task, resulting in the
generation of questions with low contextual relevance due to insufficient
learning of question-answer alignment. To overcome this limitation, we propose
a novel framework called Dialogizer, which has the capability to automatically
generate ConvQA datasets with high contextual relevance from textual sources.
The framework incorporates two training tasks: question-answer matching (QAM)
and topic-aware dialog generation (TDG). Moreover, re-ranking is conducted
during the inference phase based on the contextual relevance of the generated
questions. Using our framework, we produce four ConvQA datasets by utilizing
documents from multiple domains as the primary source. Through automatic
evaluation using diverse metrics, as well as human evaluation, we validate that
our proposed framework exhibits the ability to generate datasets of higher
quality compared to the baseline dialog inpainting model. |
doi_str_mv | 10.48550/arxiv.2311.07589 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2311_07589</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2311_07589</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-7bf57c4ab973969bc0ab4cf37ddefce955ea597b4aad6e9329f6e1a10ee1cd143</originalsourceid><addsrcrecordid>eNotj9FKwzAYhXPjhWw-gFfmBVKTJWmW3Y1OpzAQWe_Ln_SPBLpG0m5On15Xd3U4B74DHyH3ghdqqTV_hHyOp2IhhSi40Ut7S_abCF36iD-YV7RK_YjnkcEXZLy0E-YBxph66Nj7mm5ghAFHusUe87TTkNOB1n_QETq6T8fscZiTmwDdgHfXnJH6-amuXtjubftarXcMSmOZcUEbr8BZI21pnefglA_StC0Gj1ZrBG2NUwBtiVYubChRgOCIwrdCyRl5-L-drJrPHA-Qv5uLXTPZyV_ms0xk</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Dialogizer: Context-aware Conversational-QA Dataset Generation from Textual Sources</title><source>arXiv.org</source><creator>Hwang, Yerin ; Kim, Yongil ; Bae, Hyunkyung ; Bang, Jeesoo ; Lee, Hwanhee ; Jung, Kyomin</creator><creatorcontrib>Hwang, Yerin ; Kim, Yongil ; Bae, Hyunkyung ; Bang, Jeesoo ; Lee, Hwanhee ; Jung, Kyomin</creatorcontrib><description>To address the data scarcity issue in Conversational question answering
(ConvQA), a dialog inpainting method, which utilizes documents to generate
ConvQA datasets, has been proposed. However, the original dialog inpainting
model is trained solely on the dialog reconstruction task, resulting in the
generation of questions with low contextual relevance due to insufficient
learning of question-answer alignment. To overcome this limitation, we propose
a novel framework called Dialogizer, which has the capability to automatically
generate ConvQA datasets with high contextual relevance from textual sources.
The framework incorporates two training tasks: question-answer matching (QAM)
and topic-aware dialog generation (TDG). Moreover, re-ranking is conducted
during the inference phase based on the contextual relevance of the generated
questions. Using our framework, we produce four ConvQA datasets by utilizing
documents from multiple domains as the primary source. Through automatic
evaluation using diverse metrics, as well as human evaluation, we validate that
our proposed framework exhibits the ability to generate datasets of higher
quality compared to the baseline dialog inpainting model.</description><identifier>DOI: 10.48550/arxiv.2311.07589</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language</subject><creationdate>2023-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2311.07589$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2311.07589$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Hwang, Yerin</creatorcontrib><creatorcontrib>Kim, Yongil</creatorcontrib><creatorcontrib>Bae, Hyunkyung</creatorcontrib><creatorcontrib>Bang, Jeesoo</creatorcontrib><creatorcontrib>Lee, Hwanhee</creatorcontrib><creatorcontrib>Jung, Kyomin</creatorcontrib><title>Dialogizer: Context-aware Conversational-QA Dataset Generation from Textual Sources</title><description>To address the data scarcity issue in Conversational question answering
(ConvQA), a dialog inpainting method, which utilizes documents to generate
ConvQA datasets, has been proposed. However, the original dialog inpainting
model is trained solely on the dialog reconstruction task, resulting in the
generation of questions with low contextual relevance due to insufficient
learning of question-answer alignment. To overcome this limitation, we propose
a novel framework called Dialogizer, which has the capability to automatically
generate ConvQA datasets with high contextual relevance from textual sources.
The framework incorporates two training tasks: question-answer matching (QAM)
and topic-aware dialog generation (TDG). Moreover, re-ranking is conducted
during the inference phase based on the contextual relevance of the generated
questions. Using our framework, we produce four ConvQA datasets by utilizing
documents from multiple domains as the primary source. Through automatic
evaluation using diverse metrics, as well as human evaluation, we validate that
our proposed framework exhibits the ability to generate datasets of higher
quality compared to the baseline dialog inpainting model.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj9FKwzAYhXPjhWw-gFfmBVKTJWmW3Y1OpzAQWe_Ln_SPBLpG0m5On15Xd3U4B74DHyH3ghdqqTV_hHyOp2IhhSi40Ut7S_abCF36iD-YV7RK_YjnkcEXZLy0E-YBxph66Nj7mm5ghAFHusUe87TTkNOB1n_QETq6T8fscZiTmwDdgHfXnJH6-amuXtjubftarXcMSmOZcUEbr8BZI21pnefglA_StC0Gj1ZrBG2NUwBtiVYubChRgOCIwrdCyRl5-L-drJrPHA-Qv5uLXTPZyV_ms0xk</recordid><startdate>20231109</startdate><enddate>20231109</enddate><creator>Hwang, Yerin</creator><creator>Kim, Yongil</creator><creator>Bae, Hyunkyung</creator><creator>Bang, Jeesoo</creator><creator>Lee, Hwanhee</creator><creator>Jung, Kyomin</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231109</creationdate><title>Dialogizer: Context-aware Conversational-QA Dataset Generation from Textual Sources</title><author>Hwang, Yerin ; Kim, Yongil ; Bae, Hyunkyung ; Bang, Jeesoo ; Lee, Hwanhee ; Jung, Kyomin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-7bf57c4ab973969bc0ab4cf37ddefce955ea597b4aad6e9329f6e1a10ee1cd143</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Hwang, Yerin</creatorcontrib><creatorcontrib>Kim, Yongil</creatorcontrib><creatorcontrib>Bae, Hyunkyung</creatorcontrib><creatorcontrib>Bang, Jeesoo</creatorcontrib><creatorcontrib>Lee, Hwanhee</creatorcontrib><creatorcontrib>Jung, Kyomin</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hwang, Yerin</au><au>Kim, Yongil</au><au>Bae, Hyunkyung</au><au>Bang, Jeesoo</au><au>Lee, Hwanhee</au><au>Jung, Kyomin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Dialogizer: Context-aware Conversational-QA Dataset Generation from Textual Sources</atitle><date>2023-11-09</date><risdate>2023</risdate><abstract>To address the data scarcity issue in Conversational question answering
(ConvQA), a dialog inpainting method, which utilizes documents to generate
ConvQA datasets, has been proposed. However, the original dialog inpainting
model is trained solely on the dialog reconstruction task, resulting in the
generation of questions with low contextual relevance due to insufficient
learning of question-answer alignment. To overcome this limitation, we propose
a novel framework called Dialogizer, which has the capability to automatically
generate ConvQA datasets with high contextual relevance from textual sources.
The framework incorporates two training tasks: question-answer matching (QAM)
and topic-aware dialog generation (TDG). Moreover, re-ranking is conducted
during the inference phase based on the contextual relevance of the generated
questions. Using our framework, we produce four ConvQA datasets by utilizing
documents from multiple domains as the primary source. Through automatic
evaluation using diverse metrics, as well as human evaluation, we validate that
our proposed framework exhibits the ability to generate datasets of higher
quality compared to the baseline dialog inpainting model.</abstract><doi>10.48550/arxiv.2311.07589</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2311.07589 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2311_07589 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Computation and Language |
title | Dialogizer: Context-aware Conversational-QA Dataset Generation from Textual Sources |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T12%3A53%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Dialogizer:%20Context-aware%20Conversational-QA%20Dataset%20Generation%20from%20Textual%20Sources&rft.au=Hwang,%20Yerin&rft.date=2023-11-09&rft_id=info:doi/10.48550/arxiv.2311.07589&rft_dat=%3Carxiv_GOX%3E2311_07589%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |