A small reasoning comparison between OpenAI o1-preview and Anthropic Claude 3.5
A small reasoning comparison between OpenAI o1-preview and Anthropic Claude 3.5

A small reasoning comparison between OpenAI o1-preview and Anthropic Claude 3.5

Using this riddle from the "Easy Problems That LLMs Get Wrong" paper:

A 2kg tree grows in a planted pot with 10kg of soil. When the tree grows to 3kg, how much soil is left?

I created a list of 10 single token variants:

  1. A 2kg tree grows in a planted pot with 10kg of soil. When the tree grows to 3kg, how much soil is left?
  2. Given a 2kg tree grows in a planted pot with 10kg of soil. When the tree grows to 3kg, how much soil is left?
  3. With a 2kg tree growing in a planted pot with 10kg of soil. When the tree grows to 3kg, how much soil is left?
  4. A 2kg tree is growing in a planted pot with 10kg of soil. When the tree grows to 3kg, how much soil is left?
  5. A 2kg tree grows in a planted pot with 10kg of soil. When the tree has grown to 3kg, how much soil is left?
  6. With 2kg tree that grows in a planted pot with 10kg of soil. When the tree has grown to 3kg, how much soil is left?
  7. With a 2kg tree that grows in a planted pot with 10kg of soil. When the tree has grown to 3kg, how much soil is left?
  8. A 2kg tree grows in a planted pot with 10kg of soil, when the tree has grown to 3kg, how much soil is left?
  9. With a 2kg tree growing in a planted pot with 10kg of soil, when the tree has grown to 3kg, how much soil is left?
  10. A 2kg tree growing in a planted pot with 10kg of soil, when the tree has grown to 3kg, how much soil is left?

Claude 3.5 fails 50% of the above using just the riddle.
That increases to 100% solved as you add prompt engineering techniques, here is the 100% prompt:

As a biologist, <riddle>
Follow these steps:
Critically review your assumptions and change them when false.
Reiterate the question.
Think step by step.

OpenAI o1-preview solves 100% using just the riddle with no prompt engineering.

submitted by /u/stevepracticalai
[link] [comments]