Evals, benchmarking, and more
This is more of a general question for the entire community (developers, end users, curious individuals). How do you see evals + benchmarking? Are they really relevant behind your decision to use a certain AI model? Are AI model releases (such as Llam…