Reward data determination method and device and server

The invention provides a reward data determination method and device and a server. In one embodiment, the reward data determination method comprises the steps of firstly obtaining click state data ofa first sample user for a current label and current action strategy data determined by a preset quest...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	LIANG ZHONGPING, ZHANG LIN
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The invention provides a reward data determination method and device and a server. In one embodiment, the reward data determination method comprises the steps of firstly obtaining click state data ofa first sample user for a current label and current action strategy data determined by a preset questioning model according to the click state data of the first sample user for the current label; anddetermining reward data which is fed back to a preset questioning model and is used for reinforcement learning by calling a preset reward model trained in advance according to the click state data ofthe first sample user for the current label and the current action strategy data. Therefore, the reward data for reinforcement learning can be quickly and accurately obtained. 本说明书提供了奖励数据的确定方法、装置和服务器。在一个实施例中，奖励数据的确定方法通过先获取第一样本用户针对当前标签的点击状态数据，以及预设的提问模型根据第一样本用户针对当前标签的点击状态数据所确定出的当前动作策略数据；再通过调用事先训练好的预设的奖励模型根据所述第一样本用户针对当前标签的点击状态数据，以及当前动作策略数据，确定出反馈给预设的提问模型的用于强化学习的奖励数据。从而能够快速、准确地获取用于强化学习的奖励数据。