Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering

In this paper, we investigate the challenges of using reinforcement learning agents for question-answering over knowledge graphs for real-world applications. We examine the performance metrics used by state-of-the-art systems and determine that they are inadequate for such settings. More specificall...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Godin, Fréderic, Kumar, Anjishnu, Mittal, Arpit
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Godin, Fréderic Kumar, Anjishnu Mittal, Arpit
description	In this paper, we investigate the challenges of using reinforcement learning agents for question-answering over knowledge graphs for real-world applications. We examine the performance metrics used by state-of-the-art systems and determine that they are inadequate for such settings. More specifically, they do not evaluate the systems correctly for situations when there is no answer available and thus agents optimized for these metrics are poor at modeling confidence. We introduce a simple new performance metric for evaluating question-answering agents that is more representative of practical usage conditions, and optimize for this metric by extending the binary reward structure used in prior work to a ternary reward structure which also rewards an agent for not answering a question rather than giving an incorrect answer. We show that this can drastically improve the precision of answered questions while only not answering a limited number of previously correctly answered questions. Employing a supervised learning strategy using depth-first-search paths to bootstrap the reinforcement learning algorithm further improves performance.
doi_str_mv	10.48550/arxiv.1902.10236
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1902_10236</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1902_10236</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-5c13601ed9050f26002b8e62f442adc1687718b4be19b36a05245e54232beb693</originalsourceid><addsrcrecordid>eNo9j8tKxEAQRXvjQkY_wJX1A4n9TuIuDL4gjKgBl6E7qWjQ6Uil4-jfG8fB1YF74cBh7EzwVOfG8AtHX8NnKgouU8GlssfsrUJHYQgv8PyKATZjhDhCGaYd0iWUUCMFR9_wiDtHHTxFmts4E0I_0jIOYWGLWwwR_k3eTdjBw4xTHMZwkC3HCTvq3fuEpweuWH19Va9vk-r-5m5dVomzmU1MK5TlAruCG95Ly7n0OVrZay1d1wqbZ5nIvfYoCq-s40Zqg0ZLJT16W6gVO__T7mubDxq2S0HzW93sq9UPY4hSSg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering</title><source>arXiv.org</source><creator>Godin, Fréderic ; Kumar, Anjishnu ; Mittal, Arpit</creator><creatorcontrib>Godin, Fréderic ; Kumar, Anjishnu ; Mittal, Arpit</creatorcontrib><description>In this paper, we investigate the challenges of using reinforcement learning agents for question-answering over knowledge graphs for real-world applications. We examine the performance metrics used by state-of-the-art systems and determine that they are inadequate for such settings. More specifically, they do not evaluate the systems correctly for situations when there is no answer available and thus agents optimized for these metrics are poor at modeling confidence. We introduce a simple new performance metric for evaluating question-answering agents that is more representative of practical usage conditions, and optimize for this metric by extending the binary reward structure used in prior work to a ternary reward structure which also rewards an agent for not answering a question rather than giving an incorrect answer. We show that this can drastically improve the precision of answered questions while only not answering a limited number of previously correctly answered questions. Employing a supervised learning strategy using depth-first-search paths to bootstrap the reinforcement learning algorithm further improves performance.</description><identifier>DOI: 10.48550/arxiv.1902.10236</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2019-02</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1902.10236$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1902.10236$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Godin, Fréderic</creatorcontrib><creatorcontrib>Kumar, Anjishnu</creatorcontrib><creatorcontrib>Mittal, Arpit</creatorcontrib><title>Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering</title><description>In this paper, we investigate the challenges of using reinforcement learning agents for question-answering over knowledge graphs for real-world applications. We examine the performance metrics used by state-of-the-art systems and determine that they are inadequate for such settings. More specifically, they do not evaluate the systems correctly for situations when there is no answer available and thus agents optimized for these metrics are poor at modeling confidence. We introduce a simple new performance metric for evaluating question-answering agents that is more representative of practical usage conditions, and optimize for this metric by extending the binary reward structure used in prior work to a ternary reward structure which also rewards an agent for not answering a question rather than giving an incorrect answer. We show that this can drastically improve the precision of answered questions while only not answering a limited number of previously correctly answered questions. Employing a supervised learning strategy using depth-first-search paths to bootstrap the reinforcement learning algorithm further improves performance.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNo9j8tKxEAQRXvjQkY_wJX1A4n9TuIuDL4gjKgBl6E7qWjQ6Uil4-jfG8fB1YF74cBh7EzwVOfG8AtHX8NnKgouU8GlssfsrUJHYQgv8PyKATZjhDhCGaYd0iWUUCMFR9_wiDtHHTxFmts4E0I_0jIOYWGLWwwR_k3eTdjBw4xTHMZwkC3HCTvq3fuEpweuWH19Va9vk-r-5m5dVomzmU1MK5TlAruCG95Ly7n0OVrZay1d1wqbZ5nIvfYoCq-s40Zqg0ZLJT16W6gVO__T7mubDxq2S0HzW93sq9UPY4hSSg</recordid><startdate>20190226</startdate><enddate>20190226</enddate><creator>Godin, Fréderic</creator><creator>Kumar, Anjishnu</creator><creator>Mittal, Arpit</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20190226</creationdate><title>Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering</title><author>Godin, Fréderic ; Kumar, Anjishnu ; Mittal, Arpit</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-5c13601ed9050f26002b8e62f442adc1687718b4be19b36a05245e54232beb693</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Godin, Fréderic</creatorcontrib><creatorcontrib>Kumar, Anjishnu</creatorcontrib><creatorcontrib>Mittal, Arpit</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Godin, Fréderic</au><au>Kumar, Anjishnu</au><au>Mittal, Arpit</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering</atitle><date>2019-02-26</date><risdate>2019</risdate><abstract>In this paper, we investigate the challenges of using reinforcement learning agents for question-answering over knowledge graphs for real-world applications. We examine the performance metrics used by state-of-the-art systems and determine that they are inadequate for such settings. More specifically, they do not evaluate the systems correctly for situations when there is no answer available and thus agents optimized for these metrics are poor at modeling confidence. We introduce a simple new performance metric for evaluating question-answering agents that is more representative of practical usage conditions, and optimize for this metric by extending the binary reward structure used in prior work to a ternary reward structure which also rewards an agent for not answering a question rather than giving an incorrect answer. We show that this can drastically improve the precision of answered questions while only not answering a limited number of previously correctly answered questions. Employing a supervised learning strategy using depth-first-search paths to bootstrap the reinforcement learning algorithm further improves performance.</abstract><doi>10.48550/arxiv.1902.10236</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1902.10236
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1902_10236
source	arXiv.org
subjects	Computer Science - Computation and Language
title	Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T04%3A45%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20When%20Not%20to%20Answer:%20A%20Ternary%20Reward%20Structure%20for%20Reinforcement%20Learning%20based%20Question%20Answering&rft.au=Godin,%20Fr%C3%A9deric&rft.date=2019-02-26&rft_id=info:doi/10.48550/arxiv.1902.10236&rft_dat=%3Carxiv_GOX%3E1902_10236%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true