Using this riddle from the "Easy Problems That LLMs Get Wrong" paper:
A 2kg tree grows in a planted pot with 10kg of soil. When the tree grows to 3kg, how much soil is left?
I created a list of 10 single token variants:
- A 2kg tree grows in a planted pot with 10kg of soil. When the tree grows to 3kg, how much soil is left?
- Given a 2kg tree grows in a planted pot with 10kg of soil. When the tree grows to 3kg, how much soil is left?
- With a 2kg tree growing in a planted pot with 10kg of soil. When the tree grows to 3kg, how much soil is left?
- A 2kg tree is growing in a planted pot with 10kg of soil. When the tree grows to 3kg, how much soil is left?
- A 2kg tree grows in a planted pot with 10kg of soil. When the tree has grown to 3kg, how much soil is left?
- With 2kg tree that grows in a planted pot with 10kg of soil. When the tree has grown to 3kg, how much soil is left?
- With a 2kg tree that grows in a planted pot with 10kg of soil. When the tree has grown to 3kg, how much soil is left?
- A 2kg tree grows in a planted pot with 10kg of soil, when the tree has grown to 3kg, how much soil is left?
- With a 2kg tree growing in a planted pot with 10kg of soil, when the tree has grown to 3kg, how much soil is left?
- A 2kg tree growing in a planted pot with 10kg of soil, when the tree has grown to 3kg, how much soil is left?
Claude 3.5 fails 50% of the above using just the riddle.
That increases to 100% solved as you add prompt engineering techniques, here is the 100% prompt:
As a biologist, <riddle>
Follow these steps:
Critically review your assumptions and change them when false.
Reiterate the question.
Think step by step.
OpenAI o1-preview solves 100% using just the riddle with no prompt engineering.
[link] [comments]