On Preference Learning Based on Sequential Bayesian Optimization with Pairwise Comparison

User preference learning is generally a hard problem. Individual preferences are typically unknown even to users themselves, while the space of choices is infinite. Here we study user preference learning from information-theoretic perspective. We model preference learning as a system with two intera...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ignatenko, Tanya, Kondrashov, Kirill, Cox, Marco, de Vries, Bert
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Information Theory Computer Science - Learning Mathematics - Information Theory Statistics - Machine Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Ignatenko, Tanya Kondrashov, Kirill Cox, Marco de Vries, Bert
description	User preference learning is generally a hard problem. Individual preferences are typically unknown even to users themselves, while the space of choices is infinite. Here we study user preference learning from information-theoretic perspective. We model preference learning as a system with two interacting sub-systems, one representing a user with his/her preferences and another one representing an agent that has to learn these preferences. The user with his/her behaviour is modeled by a parametric preference function. To efficiently learn the preferences and reduce search space quickly, we propose the agent that interacts with the user to collect the most informative data for learning. The agent presents two proposals to the user for evaluation, and the user rates them based on his/her preference function. We show that the optimum agent strategy for data collection and preference learning is a result of maximin optimization of the normalized weighted Kullback-Leibler (KL) divergence between true and agent-assigned predictive user response distributions. The resulting value of KL-divergence, which we also call remaining system uncertainty (RSU), provides an efficient performance metric in the absence of the ground truth. This metric characterises how well the agent can predict user and, thus, the quality of the underlying learned user (preference) model. Our proposed agent comprises sequential mechanisms for user model inference and proposal generation. To infer the user model (preference function), Bayesian approximate inference is used in the agent. The data collection strategy is to generate proposals, responses to which help resolving uncertainty associated with prediction of the user responses the most. The efficiency of our approach is validated by numerical simulations. Also a real-life example of preference learning application is provided.
doi_str_mv	10.48550/arxiv.2103.13192
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2103_13192</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2103_13192</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-b70ff9aed02a5f6f5d59f012a1ca2295443d3793387788689b804dbdd68e3a103</originalsourceid><addsrcrecordid>eNotz81OhDAUBWA2LszoA7iyLwD2h0K7VOJfQsIkzsYVuUxv9SZDwYKO49PLjK5OcnJyki9JrgTPcqM1v4H4TV-ZFFxlQgkrz5PXJrB1RI8RwxZZjRADhTd2BxM6NgT2gh-fGGaC3dIdcCIIrBln6ukHZloGe5rf2Roo7mlCVg39CJGmIVwkZx52E17-5yrZPNxvqqe0bh6fq9s6haKUaVdy7y2g4xK0L7x22nouJIgtSGl1niunSquUKUtjCmM7w3PXOVcYVLBAVsn13-3J1o6ReoiH9mhsT0b1C3LRTMw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>On Preference Learning Based on Sequential Bayesian Optimization with Pairwise Comparison</title><source>arXiv.org</source><creator>Ignatenko, Tanya ; Kondrashov, Kirill ; Cox, Marco ; de Vries, Bert</creator><creatorcontrib>Ignatenko, Tanya ; Kondrashov, Kirill ; Cox, Marco ; de Vries, Bert</creatorcontrib><description>User preference learning is generally a hard problem. Individual preferences are typically unknown even to users themselves, while the space of choices is infinite. Here we study user preference learning from information-theoretic perspective. We model preference learning as a system with two interacting sub-systems, one representing a user with his/her preferences and another one representing an agent that has to learn these preferences. The user with his/her behaviour is modeled by a parametric preference function. To efficiently learn the preferences and reduce search space quickly, we propose the agent that interacts with the user to collect the most informative data for learning. The agent presents two proposals to the user for evaluation, and the user rates them based on his/her preference function. We show that the optimum agent strategy for data collection and preference learning is a result of maximin optimization of the normalized weighted Kullback-Leibler (KL) divergence between true and agent-assigned predictive user response distributions. The resulting value of KL-divergence, which we also call remaining system uncertainty (RSU), provides an efficient performance metric in the absence of the ground truth. This metric characterises how well the agent can predict user and, thus, the quality of the underlying learned user (preference) model. Our proposed agent comprises sequential mechanisms for user model inference and proposal generation. To infer the user model (preference function), Bayesian approximate inference is used in the agent. The data collection strategy is to generate proposals, responses to which help resolving uncertainty associated with prediction of the user responses the most. The efficiency of our approach is validated by numerical simulations. Also a real-life example of preference learning application is provided.</description><identifier>DOI: 10.48550/arxiv.2103.13192</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Information Theory ; Computer Science - Learning ; Mathematics - Information Theory ; Statistics - Machine Learning</subject><creationdate>2021-03</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2103.13192$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2103.13192$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Ignatenko, Tanya</creatorcontrib><creatorcontrib>Kondrashov, Kirill</creatorcontrib><creatorcontrib>Cox, Marco</creatorcontrib><creatorcontrib>de Vries, Bert</creatorcontrib><title>On Preference Learning Based on Sequential Bayesian Optimization with Pairwise Comparison</title><description>User preference learning is generally a hard problem. Individual preferences are typically unknown even to users themselves, while the space of choices is infinite. Here we study user preference learning from information-theoretic perspective. We model preference learning as a system with two interacting sub-systems, one representing a user with his/her preferences and another one representing an agent that has to learn these preferences. The user with his/her behaviour is modeled by a parametric preference function. To efficiently learn the preferences and reduce search space quickly, we propose the agent that interacts with the user to collect the most informative data for learning. The agent presents two proposals to the user for evaluation, and the user rates them based on his/her preference function. We show that the optimum agent strategy for data collection and preference learning is a result of maximin optimization of the normalized weighted Kullback-Leibler (KL) divergence between true and agent-assigned predictive user response distributions. The resulting value of KL-divergence, which we also call remaining system uncertainty (RSU), provides an efficient performance metric in the absence of the ground truth. This metric characterises how well the agent can predict user and, thus, the quality of the underlying learned user (preference) model. Our proposed agent comprises sequential mechanisms for user model inference and proposal generation. To infer the user model (preference function), Bayesian approximate inference is used in the agent. The data collection strategy is to generate proposals, responses to which help resolving uncertainty associated with prediction of the user responses the most. The efficiency of our approach is validated by numerical simulations. Also a real-life example of preference learning application is provided.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Information Theory</subject><subject>Computer Science - Learning</subject><subject>Mathematics - Information Theory</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz81OhDAUBWA2LszoA7iyLwD2h0K7VOJfQsIkzsYVuUxv9SZDwYKO49PLjK5OcnJyki9JrgTPcqM1v4H4TV-ZFFxlQgkrz5PXJrB1RI8RwxZZjRADhTd2BxM6NgT2gh-fGGaC3dIdcCIIrBln6ukHZloGe5rf2Roo7mlCVg39CJGmIVwkZx52E17-5yrZPNxvqqe0bh6fq9s6haKUaVdy7y2g4xK0L7x22nouJIgtSGl1niunSquUKUtjCmM7w3PXOVcYVLBAVsn13-3J1o6ReoiH9mhsT0b1C3LRTMw</recordid><startdate>20210324</startdate><enddate>20210324</enddate><creator>Ignatenko, Tanya</creator><creator>Kondrashov, Kirill</creator><creator>Cox, Marco</creator><creator>de Vries, Bert</creator><scope>AKY</scope><scope>AKZ</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20210324</creationdate><title>On Preference Learning Based on Sequential Bayesian Optimization with Pairwise Comparison</title><author>Ignatenko, Tanya ; Kondrashov, Kirill ; Cox, Marco ; de Vries, Bert</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-b70ff9aed02a5f6f5d59f012a1ca2295443d3793387788689b804dbdd68e3a103</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Information Theory</topic><topic>Computer Science - Learning</topic><topic>Mathematics - Information Theory</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Ignatenko, Tanya</creatorcontrib><creatorcontrib>Kondrashov, Kirill</creatorcontrib><creatorcontrib>Cox, Marco</creatorcontrib><creatorcontrib>de Vries, Bert</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Mathematics</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ignatenko, Tanya</au><au>Kondrashov, Kirill</au><au>Cox, Marco</au><au>de Vries, Bert</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On Preference Learning Based on Sequential Bayesian Optimization with Pairwise Comparison</atitle><date>2021-03-24</date><risdate>2021</risdate><abstract>User preference learning is generally a hard problem. Individual preferences are typically unknown even to users themselves, while the space of choices is infinite. Here we study user preference learning from information-theoretic perspective. We model preference learning as a system with two interacting sub-systems, one representing a user with his/her preferences and another one representing an agent that has to learn these preferences. The user with his/her behaviour is modeled by a parametric preference function. To efficiently learn the preferences and reduce search space quickly, we propose the agent that interacts with the user to collect the most informative data for learning. The agent presents two proposals to the user for evaluation, and the user rates them based on his/her preference function. We show that the optimum agent strategy for data collection and preference learning is a result of maximin optimization of the normalized weighted Kullback-Leibler (KL) divergence between true and agent-assigned predictive user response distributions. The resulting value of KL-divergence, which we also call remaining system uncertainty (RSU), provides an efficient performance metric in the absence of the ground truth. This metric characterises how well the agent can predict user and, thus, the quality of the underlying learned user (preference) model. Our proposed agent comprises sequential mechanisms for user model inference and proposal generation. To infer the user model (preference function), Bayesian approximate inference is used in the agent. The data collection strategy is to generate proposals, responses to which help resolving uncertainty associated with prediction of the user responses the most. The efficiency of our approach is validated by numerical simulations. Also a real-life example of preference learning application is provided.</abstract><doi>10.48550/arxiv.2103.13192</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2103.13192
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2103_13192
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Information Theory Computer Science - Learning Mathematics - Information Theory Statistics - Machine Learning
title	On Preference Learning Based on Sequential Bayesian Optimization with Pairwise Comparison
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T09%3A47%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20Preference%20Learning%20Based%20on%20Sequential%20Bayesian%20Optimization%20with%20Pairwise%20Comparison&rft.au=Ignatenko,%20Tanya&rft.date=2021-03-24&rft_id=info:doi/10.48550/arxiv.2103.13192&rft_dat=%3Carxiv_GOX%3E2103_13192%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true