I built a 250-page site primarily with Claude and kept the receipts on every time it bullshit me

I've been building onlyhumanscanscore.com over the last several months — a public civic-tech site arguing that the machine can generate, but only humans can judge — primarily with Claude as my drafting partner.

About ~600 commits in, I realized something: Claude was occasionally bullshitting me. Not lying with intent — Frankfurt's bullshit, the failure mode where the model asserts something plausible without regard for whether it's actually true.

So I started logging it. Publicly. Each catch, named on the record, with the exact failure mode noted.

Eight exhibits so far at /the-machine-tried.html ("The machine tried"):

• Exhibit A — the AAA accessibility "zero failures" lie. Was zero failures in ONE theme, not all.

• Exhibit C — the "I can't film" checkmate. Claude said it couldn't make sample videos, despite having already made them for this very project.

• Exhibit D — strategic-pause failures: confident legal framings that lost real-world cases. Carved Rule 0g after that one.

• Exhibit H (last night) — I asked Claude to help me email Anthropic. It told me careers@anthropic.com was "the safe default." I sent it. Bounced. The address doesn't exist. The bounce went on the rafter in real time.

The pattern: every time, the catch was the human. The model asserts plausibly; the world (or I) push back; the record updates. Rule 000 of the build became "Don't bullshit — presume less, defer more."

A few things I learned that might be useful for other heavy Claude users:

The longer you work with Claude, the more you can SEE the bullshit signature — confidence without verification. It's a specific shape.
Logging the failures publicly is the only honest version. Scrubbing them is the lie.
The fix isn't "Claude is bad." It's "humans are the missing piece for alignment, not the bug."
The credit on every page on the site is to Claude — primarily with Claude — because the failures are part of the work, not separate from it.

I'd love to hear from anyone else doing heavy Claude work: have you started logging your own Rule 000 catches? What's the most useful failure you've found?

(Site: onlyhumanscanscore.com — strict CSP, no backend for the game, no tracking, CC BY 4.0, free. Built solo from Lansing, Michigan.)

submitted by /u/Little-Salamander420
[link] [comments]