![]() | I pulled the dataset from the paper and looked at broke out task time by if a model actually succeeded at completing or not, and here's what's happening:
Thought this was worth sharing. I've dug into this quite a bit more, but don't have time write it all out tonight. Happy to answer questions if anybody has them. [link] [comments] |