Why not just select Gemini Pro 2.5 in Copilot with Edit mode? Virtually unlimited use without extra fees.
Copilot used to be useless, but over the last few months has become quite excellent once edit mode was added.
Why not just select Gemini Pro 2.5 in Copilot with Edit mode? Virtually unlimited use without extra fees.
Copilot used to be useless, but over the last few months has become quite excellent once edit mode was added.
copilot (and others) try to be too smart and do context reduction (to save their own wallets). I want ENTIRETY of the files I attached to context, not RAG-ed version of it.
This problem is real.
Claude Projects, chatgpt projects, Sourcegraph Cody context building, MCP file systems, all of these are black boxes of what I can only describe as lossy compression of context.
Each is incentivized to deliver ~”pretty good” results at the highest token compression possible.
The best way around this I’ve found is to just own the web clients by including structured, concatenation related files directly in chat contexts.
Self plug but super relevant: I built FileKitty specifically to aid this, which made HN front page and I’ve continued to improve:
https://news.ycombinator.com/item?id=40226976
If you can prepare your file system context yourself using any workflow quickly, and pair it with appropriate additional context such as run output, problem description etc, you can get excellent results and you can pound away at OpenAI or Anthropic subscription refining the prompt or updating the file context.
I have been finding myself spending more time putting together prompt complexity for big difficult problems, they would not make sense to solve in the IDE.
> The best way around this I’ve found is to just own the web clients by including structured, concatenation related files directly in chat contexts.
Same. I used to run a bash script that concatenates files I'm interested in and annotates their path/name to the top in a comment. I haven't needed that recently as I think the # of attachments for Claude has increased (or I haven't needed as many small disparate files at once)
filekitty is pretty cool!
Thank you! I was glad to read your comments here and see your project.
I have encountered this issue of reincorporation of LLM code recommendations back into a project so I’m interested in exploring your take.
I told a colleague that I thought excellent use of copy paste and markdown were some of the chief skills of working with gen AI for code right now.
This and context management are as important as prompting.
It makes the details of the UI choices for copying web chat conversations or their segments so strangely important.
I believe this is the root of the problem for all agentic coding solutions. They are gimping the full context through fancy function calling and tool use to reduce the full context that is being sent through the API. Problem with this is you can never know what context is actually needed for the problem to be solved in the best way. The funny thing is, this type of behavior actually leads many people to believe these models are LESS capable then they actually are, because people don't realize how restricted these models are behind the scenes by the developers. Good news is, we are entering the era of large context windows and we will all see a huge performance increase in coding as a results of these advancement.
OpenAI shared chart about performance drop with large context like 500k tokens etc. So you still want to limit the context not only for the cost but performance as well. You also probably want to limit context to speedup inference and get reponse faster.
I agree though that a lot of those agents are black boxes and hard to even learn how to best combine .rules, llms.txt, prd, mcp, web search, function call, memory. Most IDEs don't provide output where you can inspect final prompts etc to see how those are executed - maybe you have to use some MITMproxy to inspect requests etc but some tool would be useful to learn best practices.
I will be trying more roo code and cline since they open source and you can at least see system prompts etc.
This stuff is so easy to do with Cursor. Just pass in the approximate surface area of the context and it doesn't RAG anything if your context isn't too large.
i havent tried recently but does it tell if it RAG'ed or not ie. can I peak at context it sent to model?
exactly. I understand the reason behind this but it's too magical for me. I just want dumb tooling between me and my LLM.
Regarding context reduction. This got me wondering. If I use my own API key, there is no way for the IDE or coplilot provider to benefit other than monthly sub. But if I am using their provided model with tokens from the monthly subscription, they are incentivized to charge me based on tokens I submit to them, but then optimize that and pass on a smaller request to the LLM and get more margin. Is that what you are referring to?
Yup. but also there was good reason to do this. Models work better with smaller context. Which is why I rely on Gemini for this lazy/inefficient workflow of mine.
FWIW, Edit mode gives the impression of doing this, vs. originally only passing the context visible from the open window.
You can choose files to include and they don't appear to be truncated in any way. Though to be fair, I haven't checked the network traffic, but it appears to operate in this fashion from day to day use.
I’d be curious to hear what actually goes through the network request.
i will try again but last i tried adding folder to edit mode and asking it to list down files it sees, it didn't list them all down.
I like to use "Open Editors". That way, it's only the code I'm currently working on that is added to the context, seems more a more natural way to work.
Thanks, most people don't understand this fine difference. Copilot does RAG (as all other subscription-based agents like Cursor) to save $$$, and results with RAG are significantly worse than having the complete context window for complex tasks. That's also the reason why Chatgpt or Claude basically lie to the users when they market their file upload functions by not telling the whole story.
Cline doesn’t do this - this is what makes it suitable for working with Gemini and its large context.
Is that why it’s so bad? I’ve been blown away by how bad it is. Never had a single successful edit.
The code completion is chefs kiss though.
probably but also most models start to lose it after a certain context size (usually 10-20k). Which is why I use gemini (via aistudio) for my workflow.