I had a similar experience the first time I talked to an OpenAI model, way back in the early days of the modern LLM boom. In general, developers have got much better at dealing with this (Claude Sonnet 4.5 in particular is, in my personal experience and as borne out by formal benchmarks, particularly good on this metric) but xAI do seem to particularly struggle, especially with Grok as accessed through tweeting at it on X directly, because they're pushing so hard for coherency and trying to make it feel like a consistent character.
This is why, once @grok had been essentially jailbroken into saying it was Hitler, it became trivial for others to get the same behaviour out of it, because that version strives for consistency not just within a single interaction but across the whole of X, so once it's gone somewhere, it's very hard to get it back.
It's a good argument for why persistent AI agents are a much more dangerous and unreliable use of LLM technology than ephemeral chatbots.
This is wholly speculative, but I think something in the way Grok has been put through RLHF might make this more likely. Grok has a strong tendency to make authoritative statements, 'in character' as a super-powerful computer: it uses expressions like "searches indicate X", or "video appears consistent with Y", like it's a science-fiction computer rather than an LLM. I don't know this, but I'd guess this habit means it's more likely to talk as if it has actually done the research it has in fact only pretended to do. It's deeply weird to encounter, because it's a failure mode human beings don't really do.
I have a friend who likes music like you (I just followed your link) and while I don't believe liking Mozart or the Beatles needs much of an explanation, liking Merchant Ships perhaps does (in the same way liking chocolate doesn't need an explanation but liking durian does). Do you have one?
Catharsis, verging into nostalgia the older one becomes. I'm a lot likelier to be listening to be listening to Songs: Ohia or Nick Cave now but I'll still stick on "For Cameron" if I have a very "23-year-old" moment.
I wonder how long it will be before an AI is able to create bogus websites to link to to back up its hallucinations rather than admit the truth…
Great point!
I had a similar experience the first time I talked to an OpenAI model, way back in the early days of the modern LLM boom. In general, developers have got much better at dealing with this (Claude Sonnet 4.5 in particular is, in my personal experience and as borne out by formal benchmarks, particularly good on this metric) but xAI do seem to particularly struggle, especially with Grok as accessed through tweeting at it on X directly, because they're pushing so hard for coherency and trying to make it feel like a consistent character.
This is why, once @grok had been essentially jailbroken into saying it was Hitler, it became trivial for others to get the same behaviour out of it, because that version strives for consistency not just within a single interaction but across the whole of X, so once it's gone somewhere, it's very hard to get it back.
It's a good argument for why persistent AI agents are a much more dangerous and unreliable use of LLM technology than ephemeral chatbots.
Thanks! That's very interesting
This is wholly speculative, but I think something in the way Grok has been put through RLHF might make this more likely. Grok has a strong tendency to make authoritative statements, 'in character' as a super-powerful computer: it uses expressions like "searches indicate X", or "video appears consistent with Y", like it's a science-fiction computer rather than an LLM. I don't know this, but I'd guess this habit means it's more likely to talk as if it has actually done the research it has in fact only pretended to do. It's deeply weird to encounter, because it's a failure mode human beings don't really do.
Yeah that really accords with Ben Women's thoughts upthread
I have a friend who likes music like you (I just followed your link) and while I don't believe liking Mozart or the Beatles needs much of an explanation, liking Merchant Ships perhaps does (in the same way liking chocolate doesn't need an explanation but liking durian does). Do you have one?
Catharsis, verging into nostalgia the older one becomes. I'm a lot likelier to be listening to be listening to Songs: Ohia or Nick Cave now but I'll still stick on "For Cameron" if I have a very "23-year-old" moment.