Hierarchical Reinforcement Learning With Guidance for Multi-Domain Dialogue Policy

Achieving high performance in a multi-domain dialogue system with low computation is undoubtedly challenging. Previous works applying an end-to-end approach have been very successful. However, the computational cost remains a major issue since the large-sized language model using GPT-2 is required....

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2023, Vol.31, p.748-761
Hauptverfasser:	Rohmatillah, Mahdin, Chien, Jen-Tzung
Format:	Artikel
Sprache:	eng
Schlagworte:	Cloning Costs Dialogue system Domains guidance learning hierarchical reinforcement learning Human performance Interactive computer systems Machine learning Optimization Pipelines policy optimization Reinforcement learning Representations Task analysis Training Transformers
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	761
container_issue
container_start_page	748
container_title	IEEE/ACM transactions on audio, speech, and language processing
container_volume	31
creator	Rohmatillah, Mahdin Chien, Jen-Tzung
description	Achieving high performance in a multi-domain dialogue system with low computation is undoubtedly challenging. Previous works applying an end-to-end approach have been very successful. However, the computational cost remains a major issue since the large-sized language model using GPT-2 is required. Meanwhile, the optimization for individual components in the dialogue system has not shown promising result, especially for the component of dialogue management due to the complexity of multi-domain state and action representation. To cope with these issues, this article presents an efficient guidance learning where the imitation learning and the hierarchical reinforcement learning (HRL) with human-in-the-loop are performed to achieve high performance via an inexpensive dialogue agent. The behavior cloning with auxiliary tasks is exploited to identify the important features in latent representation. In particular, the proposed HRL is designed to treat each goal of a dialogue with the corresponding sub-policy so as to provide efficient dialogue policy learning by utilizing the guidance from human through action pruning and action evaluation, as well as the reward obtained from the interaction with the simulated user in the environment. Experimental results on ConvLab-2 framework show that the proposed method achieves state-of-the-art performance in dialogue policy optimization and outperforms the GPT-2 based solutions in end-to-end system evaluation.
doi_str_mv	10.1109/TASLP.2023.3235202
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2765185040</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10011569</ieee_id><sourcerecordid>2765185040</sourcerecordid><originalsourceid>FETCH-LOGICAL-c291t-2e3397a6dc93dd05df3211b3e8198211aab2cde5b6cd0dd5470bb18cf4eccbe43</originalsourceid><addsrcrecordid>eNpNkE1PAjEURRujiQT5A8ZFE9eDr-18dUlAwWSMBDEum077BkqGGezMLPj3DoKJq3cX99yXHELuGYwZA_m0nnxkyzEHLsaCi6gPV2TABZeBFBBe_2Uu4ZaMmmYHAAwSKZNwQFYLh157s3VGl3SFripqb3CPVUsz1L5y1YZ-uXZL552zujJI-wJ968rWBbN6r11FZ06X9aZDuqxLZ4535KbQZYOjyx2Sz5fn9XQRZO_z1-kkCwyXrA04CiETHVsjhbUQ2UJwxnKBKZNpn7TOubEY5bGxYG0UJpDnLDVFiMbkGIoheTzvHnz93WHTql3d-ap_qXgSRyyNIIS-xc8t4-um8Viog3d77Y-KgTrpU7_61EmfuujroYcz5BDxHwCMRbEUP7MbbF0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2765185040</pqid></control><display><type>article</type><title>Hierarchical Reinforcement Learning With Guidance for Multi-Domain Dialogue Policy</title><source>ACM Digital Library</source><source>IEEE Xplore</source><creator>Rohmatillah, Mahdin ; Chien, Jen-Tzung</creator><creatorcontrib>Rohmatillah, Mahdin ; Chien, Jen-Tzung</creatorcontrib><description>Achieving high performance in a multi-domain dialogue system with low computation is undoubtedly challenging. Previous works applying an end-to-end approach have been very successful. However, the computational cost remains a major issue since the large-sized language model using GPT-2 is required. Meanwhile, the optimization for individual components in the dialogue system has not shown promising result, especially for the component of dialogue management due to the complexity of multi-domain state and action representation. To cope with these issues, this article presents an efficient guidance learning where the imitation learning and the hierarchical reinforcement learning (HRL) with human-in-the-loop are performed to achieve high performance via an inexpensive dialogue agent. The behavior cloning with auxiliary tasks is exploited to identify the important features in latent representation. In particular, the proposed HRL is designed to treat each goal of a dialogue with the corresponding sub-policy so as to provide efficient dialogue policy learning by utilizing the guidance from human through action pruning and action evaluation, as well as the reward obtained from the interaction with the simulated user in the environment. Experimental results on ConvLab-2 framework show that the proposed method achieves state-of-the-art performance in dialogue policy optimization and outperforms the GPT-2 based solutions in end-to-end system evaluation.</description><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASLP.2023.3235202</identifier><identifier>CODEN: ITASFA</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Cloning ; Costs ; Dialogue system ; Domains ; guidance learning ; hierarchical reinforcement learning ; Human performance ; Interactive computer systems ; Machine learning ; Optimization ; Pipelines ; policy optimization ; Reinforcement learning ; Representations ; Task analysis ; Training ; Transformers</subject><ispartof>IEEE/ACM transactions on audio, speech, and language processing, 2023, Vol.31, p.748-761</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c291t-2e3397a6dc93dd05df3211b3e8198211aab2cde5b6cd0dd5470bb18cf4eccbe43</cites><orcidid>0000-0003-3466-8941 ; 0000-0001-8417-2165</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10011569$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,792,4009,27902,27903,27904,54736</link.rule.ids></links><search><creatorcontrib>Rohmatillah, Mahdin</creatorcontrib><creatorcontrib>Chien, Jen-Tzung</creatorcontrib><title>Hierarchical Reinforcement Learning With Guidance for Multi-Domain Dialogue Policy</title><title>IEEE/ACM transactions on audio, speech, and language processing</title><addtitle>TASLP</addtitle><description>Achieving high performance in a multi-domain dialogue system with low computation is undoubtedly challenging. Previous works applying an end-to-end approach have been very successful. However, the computational cost remains a major issue since the large-sized language model using GPT-2 is required. Meanwhile, the optimization for individual components in the dialogue system has not shown promising result, especially for the component of dialogue management due to the complexity of multi-domain state and action representation. To cope with these issues, this article presents an efficient guidance learning where the imitation learning and the hierarchical reinforcement learning (HRL) with human-in-the-loop are performed to achieve high performance via an inexpensive dialogue agent. The behavior cloning with auxiliary tasks is exploited to identify the important features in latent representation. In particular, the proposed HRL is designed to treat each goal of a dialogue with the corresponding sub-policy so as to provide efficient dialogue policy learning by utilizing the guidance from human through action pruning and action evaluation, as well as the reward obtained from the interaction with the simulated user in the environment. Experimental results on ConvLab-2 framework show that the proposed method achieves state-of-the-art performance in dialogue policy optimization and outperforms the GPT-2 based solutions in end-to-end system evaluation.</description><subject>Cloning</subject><subject>Costs</subject><subject>Dialogue system</subject><subject>Domains</subject><subject>guidance learning</subject><subject>hierarchical reinforcement learning</subject><subject>Human performance</subject><subject>Interactive computer systems</subject><subject>Machine learning</subject><subject>Optimization</subject><subject>Pipelines</subject><subject>policy optimization</subject><subject>Reinforcement learning</subject><subject>Representations</subject><subject>Task analysis</subject><subject>Training</subject><subject>Transformers</subject><issn>2329-9290</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><recordid>eNpNkE1PAjEURRujiQT5A8ZFE9eDr-18dUlAwWSMBDEum077BkqGGezMLPj3DoKJq3cX99yXHELuGYwZA_m0nnxkyzEHLsaCi6gPV2TABZeBFBBe_2Uu4ZaMmmYHAAwSKZNwQFYLh157s3VGl3SFripqb3CPVUsz1L5y1YZ-uXZL552zujJI-wJ968rWBbN6r11FZ06X9aZDuqxLZ4535KbQZYOjyx2Sz5fn9XQRZO_z1-kkCwyXrA04CiETHVsjhbUQ2UJwxnKBKZNpn7TOubEY5bGxYG0UJpDnLDVFiMbkGIoheTzvHnz93WHTql3d-ap_qXgSRyyNIIS-xc8t4-um8Viog3d77Y-KgTrpU7_61EmfuujroYcz5BDxHwCMRbEUP7MbbF0</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Rohmatillah, Mahdin</creator><creator>Chien, Jen-Tzung</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-3466-8941</orcidid><orcidid>https://orcid.org/0000-0001-8417-2165</orcidid></search><sort><creationdate>2023</creationdate><title>Hierarchical Reinforcement Learning With Guidance for Multi-Domain Dialogue Policy</title><author>Rohmatillah, Mahdin ; Chien, Jen-Tzung</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c291t-2e3397a6dc93dd05df3211b3e8198211aab2cde5b6cd0dd5470bb18cf4eccbe43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Cloning</topic><topic>Costs</topic><topic>Dialogue system</topic><topic>Domains</topic><topic>guidance learning</topic><topic>hierarchical reinforcement learning</topic><topic>Human performance</topic><topic>Interactive computer systems</topic><topic>Machine learning</topic><topic>Optimization</topic><topic>Pipelines</topic><topic>policy optimization</topic><topic>Reinforcement learning</topic><topic>Representations</topic><topic>Task analysis</topic><topic>Training</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rohmatillah, Mahdin</creatorcontrib><creatorcontrib>Chien, Jen-Tzung</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rohmatillah, Mahdin</au><au>Chien, Jen-Tzung</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Hierarchical Reinforcement Learning With Guidance for Multi-Domain Dialogue Policy</atitle><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle><stitle>TASLP</stitle><date>2023</date><risdate>2023</risdate><volume>31</volume><spage>748</spage><epage>761</epage><pages>748-761</pages><issn>2329-9290</issn><eissn>2329-9304</eissn><coden>ITASFA</coden><abstract>Achieving high performance in a multi-domain dialogue system with low computation is undoubtedly challenging. Previous works applying an end-to-end approach have been very successful. However, the computational cost remains a major issue since the large-sized language model using GPT-2 is required. Meanwhile, the optimization for individual components in the dialogue system has not shown promising result, especially for the component of dialogue management due to the complexity of multi-domain state and action representation. To cope with these issues, this article presents an efficient guidance learning where the imitation learning and the hierarchical reinforcement learning (HRL) with human-in-the-loop are performed to achieve high performance via an inexpensive dialogue agent. The behavior cloning with auxiliary tasks is exploited to identify the important features in latent representation. In particular, the proposed HRL is designed to treat each goal of a dialogue with the corresponding sub-policy so as to provide efficient dialogue policy learning by utilizing the guidance from human through action pruning and action evaluation, as well as the reward obtained from the interaction with the simulated user in the environment. Experimental results on ConvLab-2 framework show that the proposed method achieves state-of-the-art performance in dialogue policy optimization and outperforms the GPT-2 based solutions in end-to-end system evaluation.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TASLP.2023.3235202</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-3466-8941</orcidid><orcidid>https://orcid.org/0000-0001-8417-2165</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2329-9290
ispartof	IEEE/ACM transactions on audio, speech, and language processing, 2023, Vol.31, p.748-761
issn	2329-9290 2329-9304
language	eng
recordid	cdi_proquest_journals_2765185040
source	ACM Digital Library; IEEE Xplore
subjects	Cloning Costs Dialogue system Domains guidance learning hierarchical reinforcement learning Human performance Interactive computer systems Machine learning Optimization Pipelines policy optimization Reinforcement learning Representations Task analysis Training Transformers
title	Hierarchical Reinforcement Learning With Guidance for Multi-Domain Dialogue Policy
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T20%3A25%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Hierarchical%20Reinforcement%20Learning%20With%20Guidance%20for%20Multi-Domain%20Dialogue%20Policy&rft.jtitle=IEEE/ACM%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Rohmatillah,%20Mahdin&rft.date=2023&rft.volume=31&rft.spage=748&rft.epage=761&rft.pages=748-761&rft.issn=2329-9290&rft.eissn=2329-9304&rft.coden=ITASFA&rft_id=info:doi/10.1109/TASLP.2023.3235202&rft_dat=%3Cproquest_cross%3E2765185040%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2765185040&rft_id=info:pmid/&rft_ieee_id=10011569&rfr_iscdi=true