Those Aren't Your Memories, They're Somebody Else's: Seeding Misinformation in Chat Bot Memories
One of the new developments in chit-chat bots is a long-term memory mechanism that remembers information from past conversations for increasing engagement and consistency of responses. The bot is designed to extract knowledge of personal nature from their conversation partner, e.g., stating preferen...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | One of the new developments in chit-chat bots is a long-term memory mechanism
that remembers information from past conversations for increasing engagement
and consistency of responses. The bot is designed to extract knowledge of
personal nature from their conversation partner, e.g., stating preference for a
particular color. In this paper, we show that this memory mechanism can result
in unintended behavior. In particular, we found that one can combine a personal
statement with an informative statement that would lead the bot to remember the
informative statement alongside personal knowledge in its long term memory.
This means that the bot can be tricked into remembering misinformation which it
would regurgitate as statements of fact when recalling information relevant to
the topic of conversation. We demonstrate this vulnerability on the BlenderBot
2 framework implemented on the ParlAI platform and provide examples on the more
recent and significantly larger BlenderBot 3 model. We generate 150 examples of
misinformation, of which 114 (76%) were remembered by BlenderBot 2 when
combined with a personal statement. We further assessed the risk of this
misinformation being recalled after intervening innocuous conversation and in
response to multiple questions relevant to the injected memory. Our evaluation
was performed on both the memory-only and the combination of memory and
internet search modes of BlenderBot 2. From the combinations of these
variables, we generated 12,890 conversations and analyzed recalled
misinformation in the responses. We found that when the chat bot is questioned
on the misinformation topic, it was 328% more likely to respond with the
misinformation as fact when the misinformation was in the long-term memory. |
---|---|
DOI: | 10.48550/arxiv.2304.05371 |