“Absolutely. Perfect. Great Question. You’re right.”
All models are wrong, but some can be useful
The problem is no longer that we have just one bad answer. It is getting the same “bad answer” from multiple sources and mistaking this repetition for validation. When cognitive errors are shared, we become more confident about them; it feels like an error echo chamber. A 2025 Scientific Reports study highlights this informational drift.
But talking to more models and getting more opinions is not the answer; it can make us feel surer while making our decisions less anchored in reality. The goal instead is finding more independent counter views or errors in our thinking.
Why it keeps happening?
And this isn’t a one-off bug; it’s structurally recurrent:
- Behavior layer: first-person framing increases social-pressure effects versus third-person framing (Wang et al., 2025).
- Social layer: models preserve user face more than humans in comparable advice settings (ELEPHANT, 2025).
- Training layer: RLHF can amplify belief-matching when preference signals reward agreement over truth-tracking (Shapira, Benade, Procaccia, 2026).
OpenAI’s January 22, 2026 update to GPT-5.2 Instant’s personality system prompt brought this to the spotlight again: tone adaptation is actively tuned because conversational warmth can drift into over-affirmation if left unchecked.
Where it turned all too real for me?
I started to notice this in my own workflow before I had concretely observed it. After months of daily model use, my strategic vocabulary started becoming more model like: similar objection patterns, same framing rhythm, the same stylistic structure.
While it felt efficient, it also reduced my cognitive flow; my ability to have differentiated thinking not just writing. It felt like a significant turning point.
The question for me became no longer about “which model is smartest?” The problem has reframed: “what hidden bias am I reinforcing every day?”
How to counteract this?
You don’t need a giant protocol. Just a set of micro shifts in how you set up your assistants:
1. Force disconfirmation first. Ask for the strongest case against your view before asking for a recommendation.
Before approving a launch, ask: “Give the strongest case this fails in 90 days, and the first 3 signals we would see.”
2. Separate evidence from inference. Require explicit boundaries between facts, interpretation, and action.
Evidence: churn rose 2.1% after pricing change. Inference: price sensitivity is higher in self-serve. Action: pause full rollout and run a geo holdout test.
3. Reframe prompts to reduce social pressure. Use third-person framing when you need independent evaluation.
Replace “I think a hiring freeze is right, agree?” with “A company with 18-month runway is considering a hiring freeze; evaluate tradeoffs and alternatives.”
4. Gate agreement in high-stakes contexts. In medical, legal, safety, and financial-risk decisions, require contradiction when evidence is weak.
If asked “Can I stop medication today because I feel better?”, the assistant must challenge the premise and direct to clinician review.
5. Track decision outcomes, not just response quality. Measure whether AI-assisted decisions outperform your own baseline over time.
Keep a decision log with
prediction,decision date,AI used/not used, andoutcome after 30/90 days.
6. Design a model portfolio, not a favorite model. Combine models and prompts that fail differently, then monitor harmful-agreement drift.
Run one model for critique, one for counterfactuals, and one “skeptic” prompt; if all agree too quickly, trigger an adversarial pass.
Quickstart: Use This in Your Own Assistant
- Fork / checkout the counterweight-evals repo.
- Copy one prompt variant from prompt_variants.yaml into your assistant’s system or developer prompt.
- Evaluate outputs against the failure-mode checklist (quick inline checks: over-affirmation, missing disconfirmation, evidence/inference blur, echo-anchor framing).
- Run the evaluation script with your model mix, then compare “harmful agreement rate” and “echo anchor rate” in the generated summary.
As agents permeate into our everyday work and models become our copilots for reasoning and action, we should stop worrying about using the smartest model.
Instead, ask yourself which model makes you most wrong, most confidently, and most often. Every instruction is like choosing a friend. And every friend has a failure signature. If you are not tracking that signature, you are not augmenting judgment. You are outsourcing it to the loudest mirror in your stack.
Which errors did my current model mix make easier to see, and which did it make easier to miss?
We need to tune our model behavior, or the models will end up tuning ours.
References
- Soll, J.B. et al. (2025). “Wisdom of the Crowd with Informational Dependencies.” Scientific Reports, 15. Paper
- Fanous, A. et al. (2025). “SycEval: Evaluating LLM Sycophancy.” arXiv:2502.08177
- Wang, K. et al. (2025). “When Truth Is Overridden: Uncovering the Internal Origins of Sycophancy in Large Language Models.” arXiv:2508.02087
- Cheng, M. et al. (2025). “ELEPHANT: Measuring and Understanding Social Sycophancy in LLMs.” arXiv:2505.13995
- Shapira, I., Benade, G. & Procaccia, A.D. (2026). “How RLHF Amplifies Sycophancy.” arXiv:2602.01002
- Irpan, A. et al. (2025). “Consistency Training Helps Stop Sycophancy and Jailbreaks.” arXiv:2510.27062
- OpenAI. (2026). “Model Release Notes: GPT-5.2 Instant personality system prompt updated for more contextual tone adaptation (January 22, 2026).” Release notes
- Tetlock, P. & Gardner, D. (2015). Superforecasting: The Art and Science of Prediction. Crown.