Whitepaper

Served intent improves AI coding-agent adherence.

Bullseye tested whether current, bounded team intent changes how AI coding agents resolve engineering decisions when the team's actual preference diverges from generic best practice.

Internal evaluation 100 decision scenarios Updated June 2026
+48

points of adherence to the team's actual decision, from about 41% without served intent to about 89% with current served intent.

93/100

head-to-head wins for the served-intent agent against the no-intent baseline.

2 runs

stale-intent controls replicated the same core finding: outdated context can be worse than no context.

Executive Summary

AI coding agents are usually evaluated on whether they produce plausible code. Bullseye's evaluation asked a narrower buyer question: when an agent must make a choice, does it follow the team's actual intent or fall back to a generic default?

Across 100 realistic decision scenarios selected because the team's intent diverged from common best practice, agents given Bullseye's current served intent followed the intended decision far more often than agents without it. Adherence rose from about 41% to about 89%, and the served-intent agent won the direct comparison in 93 of 100 scenarios.

What We Tested

Each scenario described an engineering task with a team-specific decision, constraint, rejection, or unresolved tradeoff. The scenarios intentionally emphasized the cases where a fresh agent is most likely to be wrong: multi-stakeholder decisions, contested calls, conditional constraints, and rejected paths that are not obvious from final code.

The baseline agent received the task without Bullseye's served intent. The treatment agent received the same task plus a concise, current intent block: the team's goal, the constraints that bounded it, the rejected alternatives where relevant, and the provenance needed to understand why the decision existed.

Primary Findings

Negative Control: Stale Intent

The evaluation also tested whether any extra context helps, or whether the context has to be current. In separate controlled runs, agents were given stale intent: a decision the team had already superseded.

The stale-intent agents performed worse than agents given no Bullseye context at all and overwhelmingly followed the outdated directive. That result replicated across two model setups. In one run, scores were none 60, current 91, stale 43. In a decoupled judge setup, scores were none 43, current 89, stale 7, with current intent beating stale intent 30 out of 30 times.

The buyer implication is direct: the value is not context volume. The value is current, reconciled, bounded intent. A manually maintained context file that drifts can actively push agents in the wrong direction.

Why This Matters

Teams do not only need agents that understand code. They need agents that understand why the team chose this code, why alternatives were rejected, and when a decision is unresolved enough to ask rather than guess. Those facts often live in tickets, PRs, chats, docs, and prior sessions rather than in the final implementation.

Bullseye captures that intent from work artifacts, reconciles it into current truth, and serves it where coding agents already read. The scorecard is not just a reporting surface; it measures the distance between served intent and agent output so teams can prove the context is improving decisions.

Limitations

Conclusion

Current served intent made agents substantially more likely to follow the team's real decision. Stale intent made them worse. For teams putting AI agents into real engineering workflows, keeping org intent live is not documentation overhead; it is part of the execution path.

Next step

Give your agents the context they keep asking you for.

Bullseye is in early access for teams using Claude Code, Cursor, Copilot, and similar AI coding agents.