LLM Evaluation

On the Limits of LLM Reasoning: When Accuracy Is Not Enough featured image

On the Limits of LLM Reasoning: When Accuracy Is Not Enough

High accuracy in multiple-choice benchmarks often reflects recall rather than genuine reasoning. By blocking memorization through answer modification, we reveal systematic …

Eva Sánchez Salido
Read more