AIs can do world-modeling now, as seen via the Anthropic Fable standoff
AIs can do world-modeling now, as seen via the Anthropic Fable standoff

AIs can do world-modeling now, as seen via the Anthropic Fable standoff

AIs can do world-modeling now, as seen via the Anthropic Fable standoff

Many have speculations on how the Anthropic saga with Fable will end. Prediction markets cover it too, giving a <50% chance of a re-release by July 1.

This post isn't about my conclusion. Instead I want to share how AI can be used for world-modeling such situations, and gesture towards what the world will look like with autonomous AI systems get better at this than humans.

I see three challenges with modeling the Anthropic situation:

  • I can't rule out 4 different versions of what happened that caused the the June 12 order in the first place.
  • There are many outcomes to forecast, from who gets access to when, to what new policies are enacted, to how Anthropic might change Fable
  • There are informational updates almost every day, requiring a re-evaluation of almost everything.

Claude generated the image here of the causal graph that models this all out, starting with (a) Scenarios for what happened so far, (b) Moves each side can make, and (c) Outcomes.

(I did this mostly by hand, my choice of key scenarios and outcomes, but in the future it shouldn't be too hard for an LLM-agent system to do this part.)

I ended up with a large combination of unconditional and conditional forecasting questions, in total 33 I consider critical, to get an answer. Then I had to forecast.

LLM agents can shine here as AI forecasters are about as good as human crowds now (e.g. see ForecastBench). And anyway 33 forecasts at the quality of crowds of humans would take 100+ hours, so it's not an option for a fast-moving situation. I used FutureSearch for all of these. The forecasts have reasoning like:

Conditional on the assumption that the security rationale is substantially pretextual and the but-for driver is White House political leverage tied to the Department of War feud and Anthropic's impending IPO (Scenario A3), this dispute must be analyzed as a power negotiation rather than a technical remediation problem...

These are already very good forecasts, and will only get better.

The final step was to reconcile everything. All the research done in all the forecasts were done independently by LLM agents, and were not consistent with each other. I did this by raising all the inconsistencies in Claude Code and addressing them manually, but again you can imagine a world-model-reconciliation module that uses a new set of LLM agents that fix up all the inconsistencies.

More detail on the process, and all the results, are in https://www.lesswrong.com/posts/zhRe3tdBpsZbGCdDK/world-modeling-the-us-vs-anthropic-standoff-on-claude-fable

https://preview.redd.it/4kpdghqmen8h1.png?width=1600&format=png&auto=webp&s=e2736b822a4c0117567a5821ac049aa542b8bb32

submitted by /u/ddp26
[link] [comments]