From reading the article. They offered their developers both Claude code and Copilot.

What they wanted was for them to use both and feedback which was better.

The developers voted with their feet and didn’t use Copilot.

What Microsoft were hoping was that the opposite would happen...

> The developers voted with their feet and didn’t use Copilot.

This was true in January -- since then, the Copilot CLI team has spent countless hours with engineering leaders and the biggest Claude Code users at the company to understand Copilot's shortcomings, define evals to properly test them head-to-head, and close the gap between the products.

The result? Claude Code usage was organically decreasing and Copilot CLI usage was organically increasing -- when this announcement was made, internal Copilot CLI usage had been greater than Claude Code usage for weeks!

For months, Employees had the option to choose claude code or copilot. Now they dont.

Underlying model choice still has no restrictions. Opus 4.6 is by far the most popular. there's still big $$$ bills going anthropic's way.

Curious if anyone around here stayed on 4.6 (having a choice to use 4.7)

I went to 4.7, didn't have a choice, found it unsatisfactory, then Claude quietly added in the option to use 4.6, so I'm back on 4.6, and I'm not the only one in my company.

I had far more hallucinations with 4.7 than 4.6.

I'll try it again after a few more months for them to get it right, but 4.6 is what changed my mind on LLMs as a tool, and 4.7 felt like a step backwards, so for now I'm sticking with something that has delivered me value, instead of arguing with a model ostensibly better that was making shit up 1 - 2 times a day. It was really disappointing.

I can give examples if needed, I screenshotted the most aggravating ones, but what worries me is which ones I didn't recognise.

How did you manage to do that?

/model command returns only 4 choices for me: Opus 4.7, two Sonnet options and Haiku.

I’ve read (but haven’t tested yet) that you can still enable Opus 4.6 with:

  /model claude-opus-4-6[1M]

That gets billed as extra usage apparently:

/model claude-opus-4-6[1M]

  ⎿  Set model to Opus 4.6 (1M context) · Billed as extra usage

In my model I have also opis 4.6.

Maybe this is becaus I'm on api pricing? (All new contracts for corps are pushed to that by Anthropic).

[deleted]

env var

This works, thanks :)

For anyone else who may want this, use: export ANTHROPIC_MODEL=claude-opus-4-6

Opus 4.7 went through a major degradation a few weeks ago (way more hallucinations and rabbit holes than usual). Anthropic fixed it. Give it another shot.

I still find it lazy and confused vs 4.6. I don’t like adaptive reasoning.

Opus 4.7 seems very smart but the adaptive reasoning makes me always uncertain how hard it is actually trying. And it is far too argumentative. It seems to think it HAS to contradict you in ever response.

I have stuck with 4.6. I fully believe 4.7 can be smarter for truly complex and long running agentic use. But I prefer the more direct, literal mechanistic style and 4.6 seems to be peak Opus for that.

Stay with 4.6 if you can, it is disabled (afaik) on vscode claude code extension.

4.7 IMO is around 10-20% worse at understanding your prompt intention. You need more effort to explain your intention clearer so it doesn't divert.

Same. 4.7 intelligence is significantly worse than 4.6 on ALL 3P Harnesses. So only on Claude Code and Anthropic API/Subscription you get decent performance but on every other Harness and/or Cloud Provider inference (Bedrock) it performs worse than 4.6 on almost every task. This is not just anecdotal, i've talked to many colleagues from AWS, Microsoft and so on and they all agree that something fishy is going on.

I switched back to even Sonnet 4.6 in Claude Code over Opus 4.7. Every day or two I try a new task on Opus 4.7 and regret it.

Looking now I see that "Opus 4.6 Legacy" is an option that was not there before, so maybe Anthropic noticed that others are having the same difficulty.

Never used 4.7 outside CC extension VSCode. TIL, will keep that in mind

I was recently talking to someone about that! I wasn't sure if it was my imagination, but I felt like Opus 4.6 was way more diligent about looking things up online and making sure that its response was accurate. While Opus 4.7 seems content to just throw out an answer as quickly as possible with little care for accuracy; I started to always remind it to do an online search and to double check its work, to the point where I had to add a custom memory.

I switched back to 4.6 thinking, as most did, 4.7 introduced some jankinesss to it. I switched back soon enough to 4.7. I think I might've adapted myself to what and how 4.7 does things. 4.6 felt a step backward.

4.7 is better if your spec is clearer. 4.6 is better if you give it more freedom doing it's tasks. 4.6 felt it'll steer off often if you give detailed specs than 4.7 though, so perhaps that's it

Agreed. 4.7 is a smarter but weirder model. It will get confused in unexpected ways, but when it's not confused it will perform better than 4.6.

It's not a bad idea to skip it and wait until the next model release, but I personally will stick with 4.7.

How does their versionimg work? Because I've assumed that they're constantly tweaking their system prompts, I'm hoping in a couple of months, 4.7 will be improved over my first impressions- I caught significant hallucinations, something I'd rarely experienced with 4.6, if at all, I honestly can't remember one - but what I worried me was thebout the hallucinations I didn't catch.

That is a load-bearing decision!

That’s a decision-shaped comment.

I still use 4.6 if I need Opus. It's mostly GPT-5.5 for me. Only if I know it cannot do some thing like push without running the tests (because AGENTS.md said so), I switch to 4.6.

Although GPT's been acting weird since Thursday...

Switched back when 4.7 had an issue last week and it was wayyy faster. I assume mostly because a lot of people have moved off but might consider using it more just for the speed boost.

4.7 turned out to be a disaster in multilingual settings, so I sticked to 4.6 so far. 4.7 seemed to be optimized for (very specific slice of) coding at the expense of everything else.

It also seems to be designed to optimising the design / planning phase of a typical programming project.

I’ve stayed on 4.6. Was thinking of trying 4.7 though just today. Still, I did not jump on it day one.

I don't want to change from 4.6 because I'm finding it so good (I could change).

I've spent the last couple of days building Swift bindings to a monster CPP lib and I've actually had fun.

i use 4.6 and i've configured advisor to be on 4.7, so, when something's more complex the advisor can help. at least that's how i do with claude code, not sure of the others have implemented the concept of advisors.

I use copilot cli and I can pick Anthropic models. The Microsoft interface seems fine to me, and equivalent. Not sure what the big deal is.

Funny I had the opposite experience. The Claude models seemed equivalent to GPT-5.4/5 in a generic harness like Copilot CLI or Opencode or Pi, but Claude Code the app/harness is so much better than all the others that I switched at work, even though I'd much prefer to use a non-proprietary harness (and eventually I do want to get Pi set up to be comparable).

Well, maybe I shouldn't have assumed the harness wasn't a big deal. When Microsoft changes its pricing on 6/1 I'll try some of the others.

Harness makes a difference. Also in copilot you have smaller context for Claude models.

And you get a token based pricing since June 1.

Anthropic's Claude harness is much better than Copilot, i.e. the tools and instructions in each harness are different. Anthropic is just that much better (for claude models, likely an amount of co-development).

Personally, I looked into Copilot's prompt and saw things that made me put it down immediately to start working on my own. I'm now using OpenCode for reasons and I like it better than any Big Ai tool. Using OC with Qwen3.6-MoE (for context) and generally happy with the results.

Wouldn't they be forced into API pricing instead of per-seat like that though? That would potentially be a massive cost increase. But I've discovered through talking to colleagues some companies are already doing that. I can't understand why you'd ever do that when you can get VC subsidized pricing for now. At least for all initial in-plan usage. I doubt many developers go past the limit anyway and for those you switch just the extra usage to on demand anyway.

Teams is the only one with seat pricing. Teams has a user cap of 150. Enterprise is usage based pricing only now (with a £20/user service charge)

Most of us never had the option for work to pay for Claude Code -- some internal orgs did this. That being said I had a personal Claude Code subscription for a bit.

Honestly I find GitHub Copilot CLI (and now also the new GitHub Copilot app) quite decent. I mostly use it with Opus 4.7, or rarely with GPT-5.5. The VSCode extension is ok, but CLI or app are the better experience IMO.

Do people bring their own then (considering work doesn’t pay for it)?

Our corp specifically prohibits that, because of code leak/training.

I wish I could understand the appeal of using Claude Code inside VScode rather than Copilot. I feel like I'm missing something obvious.

I'm with you there. I can't stand the CLI that wants to take you away from the mostly bad code it writes. Give me the structure, let me finesse it - to do that I need to actually see it no matter how much Anthropic pretends that it's perfect.

I run Claude code inside an emacs vterm for moderately long lived work streams, and an ever shifting set of tmuxes for quick small features or bug fixes. The way I ensure I read the code at least a bit is the same as for wholly hand written code: I never do git add . only for one file at a time, and I got diff each file just prior to adding it (except sometimes for code genned files). I also arrange mostly to do incremental dev, sort of agile where I am the client and claude is the dev team and I check the utility of each feature one by one, so what I end up with delights me. It does tend to do more than is needed, so I will mostly delete code it has written rather than fix things. Like really not every module tunable constant needs to be over rideable from env vars. I am happy with the resulting systems, they have not collapsed into unmaintainable messes yet; the Claude in vterm in emacs is nice where I can think and run shell commands and look at code or git history while having a longer running discussion is nice UX.

I just have git diff open in another terminal. Everything I do is in the terminal.

Slightly related (me not understanding) is why the Copilot in VS code is essentially just CLI interface. Why can't it use the IDE tools (search, LSP, ...). All it ever does is trying to execute grep.

There is an option to turn on semantic indexing and search on copilot in vscode. Although I have no perceptual differences when I turn it on. The docs mention something about it.

https://code.visualstudio.com/docs/copilot/reference/workspa...

Claude’s prompt heavily pushes it towards grep. We have an internal cross repo semantic search mcp and to get Claude to consistently use it a skill and prompting was not enough. A pre tool use hook is the answer. Claude will even write one for you if you describe the problem to it :)

Someone mentioned here the other day that when you try and give Claude those tools throughan MCP or skill it tends to go a bit loopy.

At the moment it seems like the way it's been trained has been tightly coupled with grep.

It does feel bizarre though that it doesn't use the symbol servers.

Because it’s far far easier to make a text-generation machine generate text that has decades of how-to explanations on the Internet than to correctly work an internal editor API that changes often and isn’t as well-documented.

Especially if you want effective results.

I replaced common grep with a semantic search wrapper for some projects. It was amusing. It has a response header that lets Claude know it is not using standard grep. Works fine. Have to out smart them ;)

Claude Copilot does seem a bit more lost on the interface side than other models, but then again all of them are. Only the baseline tier seems to have been fine tuned to the platform.

> I wish I could understand the appeal of using Claude Code inside VScode rather than Copilot

MS thinks CoPilot is the Clark Griswold of LLMs when it's really Cousin Eddie...

Same, with regard to TUIs in general. The VS code copilot chat extension has really nice integration for 'human in the loop' style agentic development. I build some tooling - https://www.agentkanban.io to integrate a taskboard and git worktrees with copilot chat

Claude Code will write the whole thing for you. Whereas doesn’t Copilot require input along the way of coding? ie- it doesn’t do all the programming for you

It can code the whole thing for you, copilot in vscode is simply better, people just never tried it.

If you give Copilot a file with a list of tasks to complete, it will try to churn through them (just like most other harness would do these days).

Ah okay, can it work on a whole repo in an agentic way?

Yes, of course, it can also span subagents, work for an hour without interactivity if that's what you want etc. just like any other harness.

Actually due to stupid billing system of github which charges per "premium request" instead of tokens, you could and still can abuse it so it costs nothing. They're changing it from next month to usage based billing though.

I think they were comparing CLIs, not VS extensions.

I'm a little the opposite, what's the point of using an IDE with AI? I genuinely don't get it?

These days I just use Claude Code Desktop or Claude Code in powershell. Standalone, not inside and IDE. Honestly, I'm using Desktop more and more as it gets more features.

The IDE is for me. No AI in it at all. If I want to get Claude to do something specific to a file I just @ the file.

Productivity. You generate the skeleton of the code with Codex/Claude Code/et. al. and refactor it manually. It's kind of unlikely that an AI agent will be able to one-shot every bit of code in the exact way you want, even with a fat AGENTS.md file. With a smart AI-native IDE like Zed, it will quickly be able to pick up what manual change you intent to do without you fully typing out anything, especially if they're repetitive. This helps enormously when you're debugging or profiling your code.

> Productivity. You generate the skeleton of the code with Codex/Claude Code/et. al. and refactor it manually.

This doesn’t mean much if you are using a terminal editor.

the obvious answer is because it's easier , faster, and more efficient to flip a true to false right in front of you than it is to prompt an llm.

if your response is "my prompts don't produce code that needs values flipped, ever." then I would wager you're only touching very simple things with an LLM.

for me I don't care about the token cost and prompt writing so much as the fact that it's just faster to change 0 to 1 and leaves me twiddling my thumbs for an llm output less.

The thing that drove me away from manual edits was that I found myself confusing the LLM all the time. It would read or write, some code, I'd twiddle with things, and then the LLM's future references to the same code would be a mess.

On balance, and via dictation, it feels likely to be faster overall to just enact the changes I want 'inline' of the conversation thread.

Is this stuff any better now? I think current harnesses probably do have things like file change listeners that automatically inform agents before they act on a file they've previously engaged with if it has changed in the meantime.

I try to remember to tell it that the file changed, and should be reloaded. That usually avoids confusion.

If you do manual edits, I find it best to start a new conversation. But if your instructions and documentation is good enough, the new conversations won't have any problems picking up where it needs to be.

Having said that, I fear what June 1st brings for copilot It might suddenly be very useless for me.

Not really. Whenever I manually edit the code, the next turn will overwrite the changes back. You kinda have to let them know not to do that.

But you have IDE for you and cli for agent. Agent works on the same code, you can see the changes right there.

But why did you flip that true to false? It sounds like a missing unit test. So at a minimum it’s do the flip, find the right place to unit test, and write a test. Or I just tell my LLM “this should be false because of X, fix and write a test”

I just use Codex/Claude Code in one window and Neovim in another and navigate around using Niri’s keyboard shortcuts. I much prefer it to VS Code on a traditional desktop in almost every respect.

That said, I never tried copilot.

That’s like asking why anyone would use IDE autoformatting, linting, or build tools rather than constantly swapping to a terminal to run their command line versions. As in, why use tool integration in an integrated development environment? Because that’s the entire point. Classic IDE refactoring and code generation tools are limited to explicitly programmed operations, but a well-integrated LLM can do much more and smarter manipulations without you having to context switch and explain the context of what you want done.

For Windsurf at least, it makes it easier to control context. I can simply drag and drop a file from the IDE into the chat.

I can also click on a file referenced by the AI and have it open immediately in the IDE so that I can inspect it.

Finally, it is a pain to write long, multi-line prompts in a CLI where you can't easily click around to edit different parts.

The primary weakness I've found in IDE based UI is that it struggles to get through the corporate security in order to run commands.

> what's the point

Tab completion.

Smart model can cut down time to write complex firewall yaml dramatically, relying both on the existing file and the ugly draft (eg comma delimited details of the rules I need) I put out. It makes it 5 minutes lead time and 20 presses of tab instead of writing a shell/python full of edge cases or just copying existing rules as a template and laborously editing them -- smart model knows what the specific firewall needs.

But I'm not a developer, so I use both - haiku via github for tab completion and CC for cli.

For me I need to compare the code generated before committing. Also I need to read markdown plans generated for review before commit to execution. VSCode CC extension also generate clickable links to the file directly if the query has something to do with it.

All of them are valid usecase of VSCode CC extension for me.

[dead]

Maybe it's just Microsoft moving to more model agnostic tech within their copilot. I recently started using Microsoft 365 Copilot because corporate added Cowork which runs on Opus 4.7 which was better than the alternative we have available. Unlike the "real" Claude Code or Cowork this only has access to files in a specific onedrive folder in your personal sharepoint container, so it's much more compliant to things like NIS2.

Technically we're using Copilot and we're playing for it through Microsoft licenses, but it's using Opus 4.7. Even before this, most of our custom agents within m365 copilot were one of the GPT models.

Or maybe you're right and they want their developers to use the copilot models.

Microsoft have historically tended to dogfood their own products.

Obviously you want to be aware of what else is on the market, and use the right tool for the job -- but equally if you have a directly competing product, you'd prefer your org's telemetry and suggestions are directed towards improving your own software rather than your competitors'.

This was always a little weird to be because Microsoft internally is actively hostile to cross-org collaboration. If you worked in most of Azure you basically have 0 lanes of communication with someone from the Windows team and vice versa. Triply so for stuff like Kusto or Teams which you'd be dogfooding daily. I guess if there's a horrible stop the world bug it'd get surfaced through telemetry but normal user feedback is not a thing.

Compared to working at other big techs, where I was able to direct msg the engineers on the team for internal protobuf or datalake services in addition to user groups that were generally responsive it was just strange. Also Microsoft doesn't have a monorepo so you can't just commit patches to their service because you don't have access to their repos which I pretty regularly do elsewhere.

Copilot was great when folks were semi-attempting write their own coffee and needed auto complete.

There's a large (and growing!) contingent of people who don't write code these days. (Many don't even use the keyboard.)

Wonder if Amazon will do the same with CC and Kiro now that we internally have access to both.

I think Kiro might have some “first mover” advantage internally, but CC feels better to use.

I never understand why Amazon even bothers to build their own coding agent.

GitHub Copilot is in a somewhat similar place as Microsoft's toy but still different -- it was more or less the first coding agent/assistant, and GitHub/VSCode/Microsoft has enough user base and impact to influence individual users and enterprises' choices.

For Amazon's coding agent -- I just never see anyone outside Amazon even mentions Kiro or Amazon Q. Maybe a little bit when Kiro was offering tons of free credits. But I don't think it's even remotely relevant these days. I don't see news about companies adopting Kiro.

To me, it's just a matter of time before they are sunset, like Chime or a bunch of AWS products.

In fairness, Chime had tons of internal use and I quite liked it.

For Kiro, I agree with you, it seems like wasted effort and Anthropic / OpenAI are miles ahead in their tooling.

Is there any proprietary Amazon end-dev/ops facing service that's worth using? I've never had a good experience with any I've tried - CodeBuild, Cloud9, Q, SageMaker, WorkMail, WorkDocs, Chime, OpsWorks,...

I love AWS at the infrastructure level, but their PaaS tends to be meh, and their end-user directed stuff is usually atrocious.

When I was at AWS I had exactly one customer who used Chime and they loved it.

They were a manufacturing org and only managers had licenses to MS Office and users in Active Directory. Everybody else was registered on a separate OpenLDAP directory to avoid paying MS licenses.

Chime was cheaper per user than onboarding everybody into AD and paying Teams, and they could tack Chime usage into their AWS bill.

[deleted]