The headline of Copilot’s first ever introduction ended with “AI pair programmer”. It was a signal on what to expect from this first-of-a-kind product - it will pair with you as a programmer. And those developers that have paired with another developer in real life know that it is a highly human experience. It only works if expectations are aligned, if the two developers agree on how to share the one machine, one editor, one task between them.
Now in 2025, the growing reasoning capabilities of large-language models (LLMs), the ability for the models to use tools like terminal and browser, and the integration into the development environment are putting Copilot on the path of becoming the “AI peer programmer”. And as for any member of a developer team, we have 4 expectations for these agents to enable a productive work experience: They need to be predictable, steerable, tolerable, and verifiable.
Predictable
As developers become increasingly dependent on AI-based collaboration, it’s important that they can form a meaningful intuition around what their agents can and cannot do for them. Otherwise, if we simply pass the stochastic burden of LLMs onto the user, and expect them to develop their own expectations based on trial and error, then their success will likely be left to too much chance. In other words, before handing a task to an agent, the developer must have a high level of confidence that the agent can solve the task to a meaningful degree.
This is why it’s important to design the user experience with constraints and discoverability in mind that can effectively guide the user towards success, and give them the means to iterate in a way that will result in higher value. If we give users too much of a “magical” experience, even if it’s initially fun, it will quickly become frustrating as the novelty wears off and the user feels their time is being wasted.
Examples from Copilot today:
- Due to the contextual nature of Copilot’s “ghost text”, many users can intuit that its suggestions are based on the preceding code in the file, and therefore, can predict (to some degree) when it will offer suggestions of higher or lower value.
- Highlighting a span of code, and then asking Copilot Chat to explain it, makes it very clear what the AI will take into consideration when responding to you
- By providing a structured timeline of steps, with specific entry points from GitHub, Copilot Workspace indicates to users what it’s meant to be good at, and how to make forward progress
Steerable
LLM-based suggestions are only as good as the context they’re given. Which is why it’s commonly called “grounding”. Therefore, it’s expected that even when assistance is “correct”, it won’t always be “perfect”.
That’s why it’s critical the end-user can “steer” a suggestion, in order to iterate it towards the exact solution they’re looking for, in a simple and lightweight way. If suggestions feel like a binary decision (accept/ignore), then the cost of it being “partially wrong” becomes too high.
Examples from Copilot today:
- When a user accepts a Copilot suggestion, they can easily edit it inline. And therefore, the suggestion doesn’t need to be 100% accurate to be valuable.
- When a user interacts with Copilot Chat, they can refine a response by asking follow-up questions. This “steerability” is what makes conversational interfaces compelling
- Copilot Workspace allows editing the task/spec/plan/implementation, and provides an undo stack to easily iterate and refine. Additionally, all code editors are editable, and you can open a terminal or Codespace if you need to make larger edits or make use of other tools
Tolerable
LLMs are non-deterministic, and therefore, they can, and will be wrong. As such, it’s important that the cost of an agent being wrong feels low, so that the end-user doesn’t perceive it as being a distraction. In essence, the strengths of an AI assistant needs to outweigh its inevitable weaknesses, and a critical part of that, is a UX that makes it simple for the user to ignore unwanted or unhelpful suggestions, while remaining in their flow.
This is effectively an invariant for any passive AI assistance, e.g. unsolicited pull request bot comments, in order to prevent a “Clippy effect” and too much noise. And for interactive experiences, the solution is typically the previous property, i.e. steerability.
Examples from Copilot today:
- “Ghost text” allows Copilot completions to be displayed inline, but with a UX that allows the developer to easily ignore them, and keep typing
- Because it streams a response so quickly, it’s acceptable when Copilot Chat gets an answer initially wrong, because it feels lightweight to refine and iterate with it.
- By having Copilot Workspace generate an initial spec/plan for an issue, it can help act as a jumpstart or thought partner. And so even if it gets it wrong, it’s still likely useful for progressing your task/thinking forward
Verifiable
Despite the increasing ubiquity of AI-infused and AI-native development, it’s still nascent, in the sense of orgs deciding if and what they can be confident in using it for. Because source code isn’t static, it’s insufficient for it to simply “look right”. It also needs to “be right”, and therefore, behave as expected, adhere to best practices, be free of security issues, etc.
As a result, it’s important that a user can not only steer assistance towards a final solution, but that they can trust the solution enough to accept and commit it. Empowering this level of trust will become critical, as we expect developers to use AI assistance for new, and increasingly complex tasks (example).
Examples from Copilot today:
- After accepting a Copilot suggestion, a developer can immediately see error squiggles in their editor, and run a linter/test suite to validate its correctness
- Copilot Chat displays citations to used references and external docs, as a means of “proving” to the user that it consulted the correct materials when making its suggestion
- Copilot Workspace provides an integrated terminal in order to validate code changes, as well as secure port forwarding to view a running web app. It also provides an integrated file browser which allows easily verifying the current and proposed spec
–
Only those agents that can fulfill these 4 expectations will be delightful to use.
Only those agents will actually increase the joy of being a developer, letting us amplify our creativity while leaving behind the boilerplate that has slowed us down ever since a bug was taped into this famous log book.
Only those agents will become true peer programmers in our teams.