Even without hacks, Copilot is still a cheap way to use Claude models:

- $10/month

- Copilot CLI for Claude Code type CLI, VS Code for GUI

- 300 requests (prompts) on Sonnet 4.5, 100 on Opus 4.6 (3x)

- One prompt only ever consumes one request, regardless of tokens used

- Agents auto plan tasks and create PRs

- "New Agent" in VS Code runs agent locally

- "New Cloud Agent" runs agent in the cloud (https://github.com/copilot/agents)

- Additional requests cost $0.04 each

+1. I see all these posts about tokens, and I'm like "who's paying by the token?"

> +1. I see all these posts about tokens, and I'm like "who's paying by the token?"

When you use the API

Yes. That is the question.

Most LLM usage?

There’s some exceptions eg Claude Max

yes, and VS code as mentioned above. That's kind of the joke.

I've had single prompt to Opus consume as many as 13 premium messages. The Copilot harness is so gimped so they can abstract tokens from messages. Every person that started with Copilot that I know that tried CC were amazed at the power difference. Stepping out of a golf cart and into <your favorite fast car>.

It hasn't done that to me. It's worked according to their docs:

> Copilot Chat uses one premium request per user prompt, multiplied by the model's rate.

> Each prompt to Copilot CLI uses one premium request with the default model. For other models, this is multiplied by the model's rate.

> Copilot coding agent uses one premium request per session, multiplied by the model's rate. A session begins when you ask Copilot to create a pull request or make one or more changes to an existing pull request.

https://docs.github.com/en/copilot/concepts/billing/copilot-...

Sorry, I should have specified this was with GHC CLI. I suppose that might not behave similarly to the GUI extension. But it definitely happened on Thursday. One prompt, ctrl-c out and it said 13 premium messages used. It was reading a couple of large files and Opus doesn't seem to let the harness restrict it from reading entire files... just a couple hundred lines at a time.

and now I see your comment mentions that explicitly. The output was quite unambiguous. :shrug:

It seems like it's the cheapest way to access Claude Sonnet 4.5, but the model distribution is clearly throttled compared to Claude Sonnet 4.5 on claude.ai.

That being said, I don't know why anyone would want to pay for LLM access anywhere else.

ChatGPT and claude.ai (free) and GitHub Copilot Pro ($100/yr) seem to be the best combination to me at the moment.

So 100 Opus requests a month? That's not a lot.

Cat's out of the bag now, and it seems they'll probably patch it, but:

Use other flows under standard billing to do iterative planning, spec building, and resource loading for a substantive change set. EG, something 5k+ loc, 10+ file.

Then throw that spec document as your single prompt to the copilot per-request-billed agent. Include in the prompt a caveat that We are being billed per user request. Try to go as far as possible given the prompt. If you encounter difficult underspecified decision points, as far as possible, implement multiple options and indicate in the completion document where selections must be made by the user. Implement specified test structures, and run against your implementation until full passing.

Most of my major chunks of code are written this way, and I never manage to use up the 100 available prompts.

This is basically my workflow. Claude Code for short edits/repairs, VSCode for long generations from spec. Subagents can work for literally days, generation tens of thousands of lines of code with one prompt that costs 12 cents. There's even a summary of tokens used per session in Copilot CLI, telling me I've used hundreds of millions of tokens. You can calculate the eventual API value of that.

Just at the absolute best deal in the AI market.

For $10 flat per request up to 128k tokens they’re losing money. 100 * 100k is 10m tokens. At current api pricing that’s $50 input tokens, not even accounting for output!

And a request can consume more than 128k tokens.

A cloud agent works iteratively on your requests, making multiple commits.

I put large features into my requests and the agent has no problem making hundreds of changes.

You didn't account for cached input tokens - some % of input tokens will be follow-on prompts which are billed at the cheaper cached token rate.

I mean aren't they losing money on everything even the API? This isn't going to end well with how expensive it all really is.

It might be a gym-type situation, where the average of all users just ends up being profitable. Of course it could be bait-and-switch to get people committed to their platform.

Having worked some time in huge businesses, I can assure that there are many corporate copilot subscribers that never use it, that's where they earn money.

In the past we had to buy an expensive license of some niche software, used by a small team, for a VP "in case he wanted to look".

Worse in many gov agencies, whenever they buy software, if it's relatively cheap, everyone gets it.