Improving alignment of dialogue agents via targeted human judgements

We present Sparrow, an information-seeking dialogue agent trained to be more helpful, correct, and harmless compared to prompted language model baselines. We use reinforcement learning from human feedback to train our models with two new additions to help human raters judge agent behaviour. First, t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Glaese, Amelia, McAleese, Nat, Trębacz, Maja, Aslanides, John, Firoiu, Vlad, Ewalds, Timo, Rauh, Maribeth, Weidinger, Laura, Chadwick, Martin, Thacker, Phoebe, Campbell-Gillingham, Lucy, Uesato, Jonathan, Huang, Po-Sen, Comanescu, Ramona, Yang, Fan, See, Abigail, Dathathri, Sumanth, Greig, Rory, Chen, Charlie, Fritz, Doug, Elias, Jaume Sanchez, Green, Richard, Mokrá, Soňa, Fernando, Nicholas, Wu, Boxi, Foley, Rachel, Young, Susannah, Gabriel, Iason, Isaac, William, Mellor, John, Hassabis, Demis, Kavukcuoglu, Koray, Hendricks, Lisa Anne, Irving, Geoffrey
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!