On the Limits of LLM Reasoning: Evidence From Contamination, Translation, and Answer Modification in Multiple-Choice Benchmarks
Jan 1, 2026·
,,·
1 min read
Eva Sánchez Salido
Julio Gonzalo
Guillermo Marco
Add the full text or supplementary notes for the publication here using Markdown formatting.