On the Limits of LLM Reasoning: Evidence From Contamination, Translation, and Answer Modification in Multiple-Choice Benchmarks

Jan 1, 2026·
Eva Sánchez Salido
,
Julio Gonzalo
,
Guillermo Marco
· 1 min read
DOI

Add the full text or supplementary notes for the publication here using Markdown formatting.