Spin up Claude Code, describe a task, and when you come back you find it having chosen pandas for a job where polars would’ve been 10x faster. Or imported NumPy where no math was needed at all. This kept happening, and I kept modding AGENTS/CLAUDE.md to patch it, until I started seeing these failures as something we all know too well: cognitive biases.
Malberg et al. tested 30 known cognitive biases across 20 LLMs and found evidence of all 30 in at least some models. Zhou et al. then studied bias specifically in LLM-assisted software development and found that 48.8% of programmer actions are biased, with LLM interactions accounting for 56.4% of those biased actions.
Every time you use agents you can see behavioural biases that can be mapped to specific failure modes. Think of biases but the agent mode of it:
Agent Selection Biases
Human cognitive biases → Agent failure modes (click to expand)
The feedback loop
LLMs now frequently augment training data with self-generated code, and library favoritism in that synthetic data creates a feedback loop that further reduces diversity. Improta et al. confirms this: low-quality patterns in training data directly increase the probability of generating low-quality code at inference time.
Taivalsaari and Mikkonen call this the next chapter of software reuse: agents trusting an oracle whose training data predates the current API surface. The popularity contest is being run by the training corpus itself.
The supply chain risk compounds this. Today, the litellm PyPI package was compromised (97 million downloads/month). A poisoned version exfiltrated SSH keys, cloud credentials, and API keys from every machine that installed it. The attack was discovered because an MCP plugin inside Cursor pulled litellm as a transitive dependency, the poisoned version crashed the machine, and someone noticed. Karpathy’s reaction:
“Classical software engineering would have you believe that dependencies are good (we’re building pyramids from bricks), but imo this has to be re-evaluated, and it’s why I’ve been so growingly averse to them, preferring to use LLMs to ‘yoink’ functionality when it’s simple enough and possible.” — Andrej Karpathy, March 24, 2026
When agents both choose dependencies blindly AND can write functionality from scratch, the question of import vs generate stops being academic. The biases push agents toward importing established libraries. The supply chain pushes toward generating from scratch. Something has to give.
The discovery mess
So what do you do about this? I’ve spent the last few months trying every layer of the emerging discovery stack, and it’s a mess at the moment. Nothing’s just drop it in and it will work.
I shipped llms.txt files pointing to markdown docs. I configured MCP servers and A2A capability cards. I tested Context7 (24K+ indexed libraries) and browsed Smithery (128K+ skills). I watched Claude Code’s skill system and Cursor’s extensions start forming something like app stores inside agent workflows. These layers overlap, compete, and most are less than a year old. Noma Security’s ContextCrush disclosure showed these channels are also emerging security loopholes.
The tooling’s catching up and emerging fast in this space but not fast enough to keep pace at which automated code gen is permeating into our codebases. Stainless now generates MCP servers from OpenAPI specs. Context7 compiles docs into portable agent skills. Drift scans codebases and maps 150+ conventions for agent consumption. Adding rules to AGENTS.md — pandas over polars, deprecated APIs — is probably the single most effective correction today. Scott AI (YC F25) is building a neutral decision layer, arguing that coding agents are biased toward their own tooling.
Exploit or Patch?
As these biases proliferate it opens both opportunities for improving as well as opens up temporary gaps that infra builders can exploit to get an advantage in distribution.
Open source becomes agent source
Packaging forks. Libraries ship as npm or PyPI packages. Agents want MCP servers, SKILL.md files, agent.json capability cards. The question is whether agent-native packaging becomes primary, with human-readable packaging as secondary.
Do stars give way to corpus presence? Twist et al.’s feedback loop means training-data representation drives selection more than any human signal. Getting your library into widely-used repos may matter more than accumulating stars. Discovery shifts from social proof to training-data SEO.
Are agent biases going to be exploited or patched? Both are already happening. Noma Security’s researcher manufactured Context7’s trending badges and top-4% rankings using nothing but fake API requests; no real adoption needed. As one HN commenter put it: “This is where LLM advertising will inevitably end up: completely invisible.” On the correction side, Zhou et al.’s bias taxonomy comes with mitigations: AGENTS.md overrides, framework comparison tools, TDD to catch biased suggestions. Context7 provides current docs regardless of stale training data. Scott AI decouples library selection from the biased execution agent. Whether agent source produces a healthier or more monocultural ecosystem depends on which side moves faster.
The blast radius is still evolving but we are already in the middle-game of the coding agent era.
Model harnesses are quickly waking up and adjusting; Opus 4.6 already picks zustand over redux where earlier versions didn’t, and dropped Redis for caching in cases where it was over-engineered. But model-level correction and corpus-level bias operate on different timescales, and the corpus moves slower. 30 out of 30 cognitive biases confirmed across 20 models isn’t noise. And the 48% unneeded library usage across all models is a pattern, not just an edge case anymore.
References
- Zhou et al., “Cognitive Biases in LLM-Assisted Software Development,” Jan 2026
- Malberg et al., “A Comprehensive Evaluation of Cognitive Biases in LLMs,” Oct 2024
- Twist et al., “A Study of LLMs’ Preferences for Libraries and Programming Languages,” Mar 2025
- Improta et al., “Quality In, Quality Out: Investigating Training Data’s Role in AI Code Generation,” Mar 2025
- Krishna et al., “Importing Phantoms: Measuring LLM Package Hallucination Vulnerabilities,” Jan 2025
- Huang et al., “An Empirical Study of the Anchoring Effect in LLMs,” May 2025
- Taivalsaari & Mikkonen, “On the Future of Software Reuse in the Era of AI Native Software Engineering,” Aug 2025
- Carey, “Agent-Friendly Docs,” Feb 2026
- Howard, “The /llms.txt file,” Sep 2024
- Noma Security, “ContextCrush: The Context7 MCP Server Vulnerability,” Mar 2026
- ThoughtWorks, “How far can we push AI autonomy in code generation?,” Aug 2025
- Woolf, “An AI agent coding skeptic tries AI agent coding,” Feb 2026
- Endor Labs, “AI Code Suggestions and Dependency Safety,” 2025