A reward for ‘I don’t know’?
A reward for ‘I don’t know’?

A reward for ‘I don’t know’?

So I know this had been an issue for a long time. LLMs giving incorrect answers and us having to decipher whether it’s true or not…. A big issue obviously.

This is such a simple question but I don’t know if this is something that can’t work, but can we get around this by giving a reward for saying ‘I don’t know’?

I’ve had a few different jobs where you need to give information that is accurate with varying degrees of severity for giving wrong information. I recall a training moment where a new guy was asked a question in front of everyone and he didn’t know but took a guess. His boss went ape shit on him because giving wrong information can kill your credibility and even worse can get someone hurt or killed.

Therefore, it is encouraged to say when you don’t know ‘I don’t know, but I’ll research that and get back to you.’

Children will often guess or lie because they don’t want to be shamed for not having an answer.

Is this the same thing that is happening with LLMs? Are companies basically telling them that they have to give an answer when they aren’t sure? Because I personally would rather have an answer of ‘I don’t know’ rather than something that’s false.

I’m guessing it’s not that easy but I’m wondering if this has been tried. I’m guessing someone could say ‘well if the reward is the same as a good answer then why won’t they just say I don’t know all the time?’ Well maybe you could give 2 rewards for correct answers and 1 for I don’t know answers? And there could be perhaps a secondary fact checking mechanism that runs after the interaction, and removes rewards for incorrect information.

And knowing that this is running might cause the LLMs to hesitate when giving answers that they aren’t certain of. Maybe the could get more rewards when they give answers that have qualifiers such as ‘this is estimated to be 70% accurate?

Just a thought and wondering if this has been attempted and maybe why it has or hasn’t worked.

Thanks!

submitted by /u/endrid
[link] [comments]