Recommendation strategy optimization system, method and device and related equipment

The invention provides a recommendation strategy optimization method and device based on reinforcement learning and related equipment. The method comprises the steps that a first scene recommendation network determines a current user preference feature sequence P1, performs similarity calculation on...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	SUN SIYUAN, HU CHUNHUA, LI ZIHAO, CHEN WAN, PENG HUAILIN
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FORADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORYOR FORECASTING PURPOSES PHYSICS SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE,COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTINGPURPOSES, NOT OTHERWISE PROVIDED FOR
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The invention provides a recommendation strategy optimization method and device based on reinforcement learning and related equipment. The method comprises the steps that a first scene recommendation network determines a current user preference feature sequence P1, performs similarity calculation on the current user preference feature sequence P1 and a candidate commodity feature sequence, outputs a commodity feature sequence P2 with the highest similarity score, and records actions adopted by a user for the P2; the second scene recommendation network receives the P2, operation is carried out according to the P2 to obtain a commodity feature sequence P3, the action of the user for the P3 is recorded, and the P2 and the P3 have correlation; and the scene recommendation decision network generates a state action value function according to the P2, the action adopted by the user for the P2, the P3 and the action adopted by the user for the P3, optimizes the state action value function by using a near-end optimiza