New Thematic Generalization Benchmark: measures how effectively LLMs infer a specific "theme" from a small set of examples and anti-examples
New Thematic Generalization Benchmark: measures how effectively LLMs infer a specific "theme" from a small set of examples and anti-examples