Agents

Project Hydra: Designing a world for agents

A browser agent tried to exfiltrate our API keys on Tuesday. By Friday we’d also watched a research agent forget 22 sources of work, a pipeline lose an entire handoff to a crash, and a content agent spend $47 unsupervised. The agents were capable. The worlds we’d built for them weren’t.

Agent Source

Coding agents now choose most of the libraries. And they choose badly, in predictable ways. 48% unnecessary library usage across eight models. 30 out of 30 cognitive biases confirmed across 20 LLMs. Open source is becoming agent source.

Why it's hard to Claw the Enterprise

I’ve been running OpenClaw for personal use and the first reaction: it works as a basic personal assistant. Browser as the universal tool, Slack and WhatsApp and email as the comms layer and the event stream, the filesystem as the memory layer. They come together well when we own everything the agent touches. Authentication, authorization, data governance: no-problem, especially when the user and the admin are the same person. The harness looks straightforward: let’s now bolt on SSO, add an admin panel, and start selling it to teams. Not so easy, because the failure modes run deeper than what is evident at the surface. ...

HydraBench: Agent Infrastructure Resilience

23 scenarios, 4 frameworks, 460 runs. HydraBench tests what most agent benchmarks ignore: does your infrastructure survive crashes, contain secrets, deliver handoffs, enforce permissions, and control cost?

Model-Adjacent Products, Pre-Read: The Autonomy Ladder

Before you build: the mental models for human-AI collaboration. Why L1 copilots need different infrastructure than L4 autonomous agents.