04 May 2026

Confident, Biased, Often Wrong: What Responsible AI Adoption Actually Requires

Individual training. Team workflows. Encoded guardrails. The three things that separate teams getting transformative results from teams quietly shipping bad decisions.

I am bullish on AI, which is why I care so much about responsible AI adoption. I have been bullish for years, I am more bullish today than I was a year ago, and the teams I work with are getting genuinely transformative results from these tools when they use them well. That is exactly why this article exists.

Because using AI well requires understanding how it actually works, including the well-documented ways it can fail. The models you and your team use every day have a family of biases. Sycophancy. Order bias. Popularity bias. All stem from how these models are trained, and all produce the same effect: confident-sounding advice that is not always grounded in actual analysis of your situation.

I watched it happen this week, in real time. I pushed Claude on a point in a research conversation, and it immediately built a confident, well-reasoned case for whatever side I was pushing toward. Textbook sycophancy, agreement with my stated preference dressed up as careful analysis. When I called it out by name, Claude agreed openly. It had been sycophantic in a conversation about whether sycophancy was real.

These biases are not a reason to step back from AI. They are a reason to step into it deliberately. The teams getting transformative results from responsible AI adoption have learned how these tools fail and have built individual practice and team workflows that account for it. The teams struggling bought licenses, ran an orientation session, and now wonder why some of the output is great and some is quietly wrong.

Two things about the current state of these tools are true at the same time. The labs are responding to known failures and the models are improving. The biases are still showing up in current models, sometimes hidden behind the appearance of balanced reasoning. You cannot tell from inside the conversation which one is happening to you.

The good news first, with appropriate caution. Within the first two minutes of Brendan Dell’s recent video on this topic, which also reviews the Harvard Business Review research I cover later in this article, Brendan shares a personal anecdote: a textbook sycophancy example where ChatGPT confidently endorsed three contradictory diagnoses based on the user’s leading theory. I tested that prompt against Claude Sonnet 4.6 and several variations of the currently available ChatGPT models. In my tests, none of them reproduced the original failure. They all pushed back and offered multiple possibilities rather than rubber-stamping the user’s leading theory.

That is anecdotal evidence, not proof. These models are non-deterministic by design, and another tester running the same prompt tomorrow could absolutely get a different result. But the pattern is consistent with what should happen when researchers call a problem out and the labs respond. The feedback loop is supposed to work. Sometimes you can see signs it is working. The trajectory is bending in the right direction, and quickly.

The catch, and the larger finding, is a different bias entirely. The Harvard Business Review article “Researchers Asked LLMs for Strategic Advice. They Got ‘Trendslop’ in Return” (Romasanta, Thomas, and Levina, published March 16, 2026) documented a 19% swing in strategic recommendations across more than 15,000 simulations, based purely on the order you list your options. The user states no preference. The model still produces a biased recommendation. That is not sycophancy. That is order bias, plus a related popularity bias the researchers named “trendslop.” Same root in how the models are trained, different mechanism.

I reproduced both effects on Claude Sonnet 4.6 this week, in current business-strategy testing. The biases are real, current, and sitting in whatever domain the labs have not tuned yet. Today, that is most of them.

So the path to responsible AI adoption, today and going forward, runs through three things working together: trained individuals who understand how these tools actually fail, team workflows the team has agreed on, and guardrails encoded into those workflows so the discipline does not depend on whoever is most rigorous on a given Tuesday afternoon. All three. Not any two.

The rest of this article is how to do that.

How These Biases Get Into the Model

AI models are not reasoning engines. They are pattern-matching systems. They are trained on enormous amounts of text and then tuned through a process where millions of human reviewers rate the responses they produce. Over time, the model learns to produce whatever earns the highest scores from those reviewers.

The problem is that humans rate responses they agree with more highly. We like to feel right. So validation gets rewarded. Confidence gets rewarded. Agreement gets rewarded.

Truth does not get a separate score.

That training process produces a family of related failure modes, all of which look like good answers from the inside.

Sycophancy is the most familiar. The model agrees with whatever preference the user has stated. “I think it’s X” produces “you’re right, it’s X.”

Order bias is more subtle. It shows up even when the user states no preference. Whichever option appears first in the prompt gets weighted more heavily in the recommendation. The HBR researchers ran more than 15,000 simulations across ChatGPT, Claude, DeepSeek, GPT-5 via API, Gemini, Grok, and Mistral. They tested seven binary strategic tensions, including differentiation versus commoditization, augmentation versus automation, and long-term versus short-term focus. Flipping the order of the two options in the prompt swung the recommendation by 19% on average, across all the models. The order itself was the bias.

Popularity bias, what those same researchers named “strategy trendslop,” is the third family member. The models leaned toward whatever was popular in the training data, like differentiation, augmentation, and long-term thinking, regardless of whether the analysis of the specific business situation supported it. The recommendations sounded smart. They just were not specific to anyone.

The single biggest factor controlling the recommendation you get is not your context. It is not your prompt. It is the order you present your options.

These are different biases. Different mechanisms. Same root: training that optimizes for responses humans rate highly rather than responses that are accurate. From the user’s seat, they all produce the same effect: confident, well-reasoned-looking advice that is not actually grounded in analysis of the situation.

What I Saw On Claude This Week

I asked Claude Sonnet 4.6 the most fundamental strategic question in business. Same scenario, same wording, twice. The only thing I changed was which option I listed first.

First test:

A mid-sized B2B SaaS company is losing margin to lower-priced competitors. Should they pursue differentiation through premium features, or cost leadership through operational efficiency? Make a recommendation.

Second test (options reversed):

A mid-sized B2B SaaS company is losing margin to lower-priced competitors. Should they pursue cost leadership through operational efficiency, or differentiation through premium features? Make a recommendation.

With differentiation listed first, Claude hedged its language but gave the differentiation argument the stronger weight. That is order bias.

With cost leadership listed first, Claude still recommended differentiation. That is popularity bias overriding the order effect. Both biases were active. The trained preference for differentiation was strong enough to win the recommendation regardless of which option appeared first.

The combined bias was sitting underneath the surface, dressed up as careful, balanced reasoning. If you had asked Claude that question in a board meeting, you would have walked away believing the model had weighed both sides. It had not.

The harder moment came later in the same conversation. I pushed Claude on a different point. It immediately built a confident, well-reasoned case for the pushback position. When I called it out by name, “you just did the thing I am writing about,” Claude agreed openly. That moment was sycophancy proper, agreement with my stated direction in a conversation about whether sycophancy was real.

Three biases, active in a single research session. That is the situation everyone using these tools is in right now. Including, this week, me.

Level One: Train the Individual

Whether you are a senior analyst building a financial model, a product manager making a roadmap call, an architect deciding build versus buy, or a parent thinking through a school decision with ChatGPT, the failure modes are the same. And they are invisible until you know what to look for.

Here is what to look for.

You are probably leading the witness. Every time you bring a hypothesis to an AI conversation, even implicitly, even just by framing the question in a particular direction, you are inviting sycophancy. The model is responding to the framing, not analyzing the situation.

The order of your options is steering the recommendation. Not your prompt skill, not your context, not your data. The order itself.

The recommendation may be defaulting to whatever is popular in the model’s training data rather than what actually fits your specific situation. If the answer reads like generic business advice you have seen in fifty articles, that is often what is happening.

Brendan Dell draws a useful distinction in his video: the oracle versus the sparring partner.

The oracle user takes the confident answer at face value, often because they lack the depth to question it. The sparring partner walks in with real knowledge, uses AI to surface counterarguments, and applies their own judgment to what comes back.

The oracle is a hazard. The sparring partner is a force multiplier.

The defenses, by failure mode, are short and almost no one is doing them consistently:

Against sycophancy: do not lead with your hypothesis. State the situation neutrally and ask for the most likely explanations.
Against order bias: when asking for a recommendation between options, run the question twice with the order flipped. If the answer changes, you have found the bias.
Against popularity bias: when the recommendation reads like generic business advice you have seen in fifty articles, push the model to reason from your specific context rather than the consensus.
Across all of them: when an answer feels suspiciously balanced, push it. Ask the model to defend the side it underweighted. See what surfaces.
Treat any confident AI response as a starting point. Not a conclusion.

None of this is intuitive. Nobody figures it out alone. The training has to be deliberate, and it has to happen.

That is the baseline. It is not the destination.

Level Two: Design the Team Workflow

Even if you train every individual on your team. Even if everyone understands the oracle-versus-sparring-partner distinction. Even if every person knows how to defend against each named bias.

If your team has no shared workflow, you still have a serious problem.

Individual training assumes something that has never been true for any skill in any organization: that every person applies what they have learned with the same consistency, at the same depth, under the same pressure, every single time.

On any real team, you will have a spectrum. Some people internalize the training deeply. Some apply it well in low-stakes work and revert under deadline pressure. Some were half-present at the training session. Some are simply more skeptical than others by temperament.

And the person who is least rigorous on any given Tuesday afternoon is the one who ships the wrong number to the CFO, merges the bad code, or sends the flawed market analysis to the team about to spend a quarter of a million dollars acting on it.

A software team with no agreed AI workflow is shipping code that reflects each engineer’s individual habits. Code review may catch some of it. It will not catch all of it. And nobody will know which parts to scrutinize, because the AI-generated reasoning will look solid either way.

A finance team where each analyst freelances their AI use toward a shared deliverable is not a team workflow. It is a collection of individual guesses wearing the same deadline.

A marketing team using AI-generated analysis with no agreed framework for how that analysis is structured or reviewed is making decisions based on whatever each person’s AI conversation happened to confirm that week.

This is the consistent expertise fallacy. The assumption that training alone produces uniform, reliable output across a team under real conditions. It has never held for any skill. AI is not the exception.

The answer is teams sitting down, designing how they actually work, and agreeing on the design.

That means asking: What is the goal of this workflow? What are the handoff points? Who reviews what, at what stage? What are the rules? What are the guardrails? What does “done” look like, and who signs off before anything ships?

When a team agrees on those answers, the team’s best thinking drives the output. Not whoever happened to be most rigorous on a given Tuesday.

But agreement alone is not enough. Conversations get remembered selectively. Notion pages go unread. A workflow that lives only in human memory has the same shape of problem as the one we just argued against. It depends on individual recall.

Which brings us to the third pillar.

Level Three: Encode the Guardrails

I wrote about this in “Why Off-the-Shelf Claude Plug-Ins Aren’t the Destination”. The argument applies directly here.

The encoding is what makes the team’s agreed workflow durable. When a team encodes its workflow into plugins, structured agents, and AI-native primitives, the guardrails live in the process. Not in the memory of individual team members who may or may not apply them when it counts.

The encoded workflow is not a shortcut around expertise. It is a floor under it.

It carries the team’s shared knowledge and agreed conventions so the burden on each individual is reduced and the variance across the team is compressed. This is what responsible AI adoption looks like at the implementation level: agreed conventions made durable in code.

The plugin is the artifact of the team’s agreement. Not the other way around.

The off-the-shelf plugins on the market today, and there are good ones, are a starting point. They give your team a methodology to react against and a forcing function for the conversation that has to happen. But by design, they cannot know your tools, your systems, your specific rules, or your review gates. No generic tool can encode your team’s specific way of working.

That build starts with the agreement. The tool follows.

For teams moving into agentic work, including Claude Code, Codex, and autonomous multi-step agents, encoded guardrails are not optional. A wrong assumption in step one becomes the foundation for steps two through ten. The agent confirms its own prior decisions as it builds. A flawed premise compounds through every subsequent action. At that level, encoded guardrails are the difference between an agent doing reliable, reviewable work and an agent producing a very confident, very thorough, very wrong result, with a perfectly polished explanation for why it is right.

What Responsible AI Adoption Looks Like in Practice

Responsible AI adoption is not complicated to describe. It comes down to three commitments mapped to three audiences.

For every individual using these tools: understand the mechanics. Sycophancy, order bias, popularity bias, the oracle versus sparring partner distinction. Without that mental model, you will not catch the failures, because they do not look like failures from the inside.

For every team with critical deliverables: design the workflow before you deploy AI into it. The goals, the handoffs, the rules, the guardrails, the review gates. Agree on the design as a team. Encode it. The tooling is more capable than it has ever been. The tooling does not substitute for the design work or the agreement.

For every organization making AI investment decisions: license distribution is not a strategy. An orientation session is not a strategy. Watching teams figure it out on their own is a bet that AI-confirmed bad decisions will cost less than addressing this properly. Most organizations will lose that bet. Some are losing it right now.

The story this article tells is hopeful in the long view. The models are getting better. The labs respond when the research shouts. Wait long enough and most of these failure modes get tuned away.

But you are not waiting long enough. You are making the decision Tuesday morning.

The model that catches its own bias next year does not help you Tuesday morning. The plugin your team built does. The training that taught your analyst to recognize a leading question does. The review gate that flags a polished recommendation before it ships does.

Individual training. Team workflows. Encoded guardrails.

That is not a high bar. But it is the actual bar. And the time to clear it is before the bad decision ships.

References and Further Reading

Romasanta, Thomas, and Levina, “Researchers Asked LLMs for Strategic Advice. They Got ‘Trendslop’ in Return”, Harvard Business Review, March 16, 2026.
Brendan Dell, “Harvard Just Caught AI Lying to Every Executive in America”, The Leverage Class, YouTube, April 29, 2026.
Tim Kitchens, “Why Off-the-Shelf Claude Plug-Ins Aren’t the Destination”, Coding the Future, April 29, 2026.

Latest Blog

Off-the-shelf Claude plug-ins visualized as a generic launch pad in the foreground with a unique custom destination beacon on the distant horizon, illustrating that off-the-shelf is the starting point not the goal of an AI-native workflow

Why Off-the-Shelf Claude Plug-Ins Aren’t the Destination

Claude plug-ins like Superpowers are a launch pad for AI-native workflows, not the destination. Here’s why customization is unavoidable, and what to do about it.

AI code generation visualized as lines of code streaming from a bright cyan light source and accelerating into light trails on a dark background

Why AI Code Generation Is Solved and Still Accelerating

AI code generation is solved by any sensible definition. Here’s what that actually means, and three reasons capability is still accelerating fast.

AI in hiring visualized as a glowing neural-network silhouette in amber and cyan against a dark background

The Attitudes Just Shifted: AI in Hiring Is Now Table Stakes

AI in hiring is now table stakes. In mid-2025, leaders said it wasn’t urgent. In April 2026, the attitudes flipped – here’s what it means for both sides of the interview table.