Reinforcement learning model environment self-adaption method based on zero sample generalization

The invention discloses a reinforcement learning model environment self-adaption method based on zero sample generalization, which comprises the following steps of: randomly extracting two groups of data from training data obtained by model and environment interaction, performing data enhancement on...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: QIU CHEN, GUO BIN, FANG YUYANG, LIU JIAQI, YU ZHIWEN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention discloses a reinforcement learning model environment self-adaption method based on zero sample generalization, which comprises the following steps of: randomly extracting two groups of data from training data obtained by model and environment interaction, performing data enhancement on one group of state images, and performing semantic coding on the enhanced original state image and the unenhanced original state image respectively; style-independent semantic information and key content semantic information are extracted through an IBN module and an attention module, then predicted values are obtained through a Q function, and finally an encoder for enhancing data is updated by combining prediction errors of two groups of data. 本发明公开了一种基于零样本泛化的强化学习模型环境自适应方法,首先从模型与环境交互得到训练数据中随机取出两组数据,对其中一组状态图像进行数据增强,然后分别对增强和未增强原状态图像进行语义编码,通过IBN模块与注意力模块提取风格无关语义信息与关键内容语义信息,再分别通过Q函数得到预测值,最后联合两组数据的预测误差更新增强数据的编码器。