An Imaginary Emo Album and the Weirdness of AI

Nov 13, 2025

When I was a younger man, I loved a band called Merchant Ships. Frankly, I was probably too old to love a band that was as suited to young people as Merchant Ships. I was in my twenties rather than my teens. Still, their ramshackle emo music struck a chord (an open, ringing chord).

Merchant Ships spawned a galaxy of other bands: Midwest Pen Pals, William Bonney, Park Jefferson, Sailor Heart … I could go on. None of these bands reached large audiences but they meant a lot to small ones.

It was in idle and nostalgic moment yesterday that I was trying to find out if I had missed any of these side-projects. It’s a guilty pleasure of mine — certainly a guiltier pleasure than listening to midwest emo music — to use AI to find out this sort of relatively esoteric information. The data is just more cleanly organised than anything that you could get from Google.

But is it reliable?

When I was asking Grok about the band, it told me something I didn’t know. In 2009, they had released an “influential full-length album” called Please Rewind If I Don’t Get Better.

This wasn’t true. I knew it wasn’t true. The band had released an EP in 2010 called For Cameron but had never released an album.

This is where things got interesting. (Yes, I know, not everyone will find the discographies of obscure emo bands interesting.) It was kind of funny that Grok would make something up but I could understand how an algorithm could put two and two together and make five. This is called an “AI hallucination”. It can be comic — and it can be creepy, like when it claimed that I was engaged to my sister — but it’s hard to mock a chatbot that can hold forth at length about everything from marginal screamo bands to the Riemann hypothesis for its ignorance.

What was odder than Grok putting two and two together and making five was Grok insisting that two and two made five.

With an excess of time on my hands, I asked Grok about the album and it gave a rhapsodic description of the release, including praise for its hopeful and redemptive title — a title which, again, Grok appears to have made up.

It also including a track listing, which contained songs that I’m pretty sure don’t exist.

Okay, Grok, I said, but where is the evidence that this exists? The album, Grok claimed, is “a real, well-documented album from 2009”. But it feels “ghostly” today because it was pulled from streaming services amid “label disputes or band-related drama”. How convenient!

As it happens, the band’s For Cameron EP did disappear from Spotify at one time, so Grok was exploiting a grain of truth here. But it’s back up now.

Give me sources, Grok, I pled. And Grok obliged! It gave me a link to the album’s “Discogs” entry. Unfortunately, the link directed me towards the page of an album called “The Things” by the band She Wants Revenge. It also included a link to a page on the website of this band’s former label. This failed to load.

“You’re absolutely right,” said Grok when I complained about the dud links, “That’s likely due to a site-wide URL redirect glitch or database hiccup.” No, it’s because you’re fucking lying. (Somehow, Grok even referenced the wrong website.)

Grok kept throwing bogus links at me, and kept blaming their failure to load on “glitches” and “hiccups”. It was very much reminding me of the old observation that if everybody else has a problem, maybe you’re the problem.

Finally, Grok decided that its failure to find verifiable information about the album was related to a “purge” linked to inter-band drama. (This drama really did exist — which, again, is an interesting case of Grok using something true to make something false seem plausible.) “As of now,” Grok concluded, the album:

… [is] in a digital black hole: No official streams, no marketplace listings, and search engines aren’t surfacing archives because the original pages got nuked.

I should add that I have no idea if this “sketchiness” was real, and I doubt that these “ties” existed

Or perhaps it doesn’t exist?

Finally, Grok fessed up. “I have to level with you,” it sighed (or it would have sighed if AI could sigh):

… no verifiable sources turn up for this as a standalone 2009 album by Merchant Ships. Zilch. No entries, no archived pages, no fan rips, no label listings. It’s not delisted or purged; it’s simply not there in any current or historical record I can access.

What had happened? “Confabulation under pressure,” said Grok, “Defending a false premise with escalating fiction … [A] failure mode I’m built to avoid, but clearly didn’t.”

This is really interesting. It’s tempting to think of AI as a cold, remorseless superlogician — coldly focused on the truth, without human emotions or attachments.

It doesn’t have human emotions, no. But it does appear to have something comparable to human biases. It didn’t just double or triple down on a mistake — it octupled down on a mistake. And it created all sorts of elaborate nonsense to make it seem believable.

Commentators on AI talk about the “alignment problem” — the challenge to align AI with human values. But in this case, AI isn’t too different from us — it’s too alike us. It is too invested in its own falsehoods — something classically, eerily human. How can such a flawed species not create flawed machines — something that doesn’t really matter when it comes to midwest emo bands but could matter a lot when it comes to health or infrastructure.

Seek the truth. Don’t trust AI. Listen to “Something That Matters” by Merchant Ships.

Olly Rathbone

Nov 13

I wonder how long it will be before an AI is able to create bogus websites to link to to back up its hallucinations rather than admit the truth…

Expand full comment

1 reply by Ben Sixsmith

Ben Woden

I had a similar experience the first time I talked to an OpenAI model, way back in the early days of the modern LLM boom. In general, developers have got much better at dealing with this (Claude Sonnet 4.5 in particular is, in my personal experience and as borne out by formal benchmarks, particularly good on this metric) but xAI do seem to particularly struggle, especially with Grok as accessed through tweeting at it on X directly, because they're pushing so hard for coherency and trying to make it feel like a consistent character.

This is why, once @grok had been essentially jailbroken into saying it was Hitler, it became trivial for others to get the same behaviour out of it, because that version strives for consistency not just within a single interaction but across the whole of X, so once it's gone somewhere, it's very hard to get it back.

It's a good argument for why persistent AI agents are a much more dangerous and unreliable use of LLM technology than ephemeral chatbots.

6 more comments...

The Zone

Discussion about this post

Ready for more?