Agent infrastructure resilience benchmark. 23 scenarios, 5 runs each, 4 frameworks.
Scores 0-100. Frameworks scoring 0 lack the capability entirely. Statistical method: 5 runs per scenario, mean +/- std dev. Wilcoxon signed-rank (p < 0.05) for pairwise comparison.