Interesting, I’ve never needed 1M, or even 250k+ context. I’m usually under 100k per request.
About 80% of my code is AI-generated, with a controlled workflow using dev-chat.md and spec.md. I use Flash for code maps and auto-context, and GPT-4.5 or Opus for coding, all via API with a custom tool.
Gemini Pro and Flash have had 1M context for a long time, but even though I use Flash 3 a lot, and it’s awesome, I’ve never needed more than 200k.
For production coding, I use
- a code map strategy on a big repo. Per file: summary, when_to_use, public_types, public_functions. This is done per file and saved until the file changes. With a concurrency of 32, I can usually code-map a huge repo in minutes. (Typically Flash, cheap, fast, and with very good results)
- Then, auto context, but based on code lensing. Meaning auto context takes some globs that narrow the visibility of what the AI can see, and it uses the code map intersection to ask the AI for the proper files to put in context. (Typically Flash, cheap, relatively fast, and very good)
- Then, use a bigger model, GPT 5.4 or Opus 4.6, to do the work. At this point, context is typically between 30k and 80k max.
What I’ve found is that this process is surprisingly effective at getting a high-quality response in one shot. It keeps everything focused on what’s needed for the job.
Higher precision on the input typically leads to higher precision on the output. That’s still true with AI.
For context, 75% of my code is Rust, and the other 25% is TS/CSS for web UI.
Anyway, it’s always interesting to learn about different approaches. I’d love to understand the use case where 1M context is really useful.
Yeah this is the simpler and also effective strategy. A lot of people are building sophisticated AST RAG models. But you really just need to ask Claude to generally build a semantic index for each large-ish piece of code and re-use it when getting context.
You have to make sure the semantic summary takes up significantly less tokens than just reading the code or its just a waste of token/time.
Then have a skill that uses git version logs to perform lazy summary cache when needed.
It seems like a very good use of LLMs. You should write a blog post with detail of your process with examples for people who are not into all AI tools as much. I only use Web UI. Lots of what you are saying is beyond me, but it does sound like clever strategy.
Yeah we all converge to the same workflow, in my ai coding agent I'm working on now, I've added an "index" tool that uses tree-sitter to compress and show the AI a skeleton of a code file.
Here's the implementation for the interested: https://github.com/tontinton/maki/blob/main/maki-code-index%...
Oh, that's great.
I've always wanted to explore how to fit tree-sitter into this workflow. It's great to know that this works well too.
Thanks for sharing the code.
(Here is the AIPack runtime I built, MIT: https://github.com/aipack-ai/aipack), and here is the code for pro@coder (https://github.com/aipack-ai/packs-pro/tree/main/pro/coder) (AIPack is in Rust, and AI Packs are in md / lua)
Very good point. I had two options:
1) Deterministic
2) Agentic So, this is why I started with #2.And then, the results in real coding scenarios have been astonishing.
Way above what I expected.
The way those indexes get combined with the user prompt gets the right files 95% of the time, and with surprisingly high quality.
So, I might add deterministic aspects to it, but since I think I will need the agentic step anyway, I have deprioritized it.
I built myself an AST based solution for that during the last 6 months roughly. I always wondered whether grep and agent-based discovery will be the end of it and thought it just has to be better with a more deterministic approach.
In the end it's hard to measure but personally I feel that my agent rarely misses any context for a given task, so I'm pretty happy with it.
I used a different approach than tree-sitter because I thought I found a nice way to get around having to write language-specific code. I basically use VSCode as a language backend and wrote some logic to basically rebuild the AST tree from VSCode's symbol data and other API.
That allows me to just install the correct language extension and thus enable support for that specific language. The extension has to provide symbol information which most do through LSP.
In the end it was way more effort than just using tree-sitter, however, and I'm thinking of doing a slow migration to that approach sooner or later.
Anyways, I created an extension that spins up an mcp server and provides several tools that basically replace the vanilla discovery tools in my workflow.
The approach is similar to yours, I have an overview tool which runs different centrality ranking metrics over the whole codebase to get the most important symbols and presents that as an architectural overview to the LLM.
Then I have a "get-symbol-context" tool which allows the AI to get all the information that the AST holds about a single symbol, including a parameter to include source code which completely replaces grepping and file reading for me.
The tool also specifies which other symbols call the one in question and which others it calls, respectively.
But yeah, sorry for this being already a quite long comment, if you want to give it a try, I published it on the VSCode marketplace a couple of days ago, and it's basically free right now, although I have to admit that I still want to try to earn a little bit of money with it at some point.
Right now, the daily usage limit is 2000 tool calls per day, which should be enough for anybody.
Would love to hear what you think :)
<https://marketplace.visualstudio.com/items?itemName=LuGoSoft...>
Well, out of all the workflows I have seen, this one is rather nice, might give it a try.
I imagine if the context were being commited and kept up-to-date with CI would work for others to use as well.
However, I'm a little confused on the autocontext/globs narrowing part. Do you, the developer, provide them? Or you feed the full code map to flash + your prompt so it returns the globs based on your prompt?
Also, in general, is your map of a file relatively smaller than the file itself, even for very small files?
- The ..-code-map.json files are per "developer folder," which would create too many conflicts if they were kept in Git.
- I have two main globs, which are lists of globs: knowledge_globs and context_globs. Knowledge can be absolute and should be relatively static. context_globs have to be relative to the workspace, since they are the working files.
- As a dev, you provide them in the top YAML section of the coder-prompt.md.
- The auto-context sub-agent calls the code-map sub-agent. Sub-agents can add to or narrow the given globs, and that is the goal of the auto-context agent.
It looks complicated, but it actually works like a charm.
Hopefully, I answered some of your questions.
I need to make a video about it.
But regardless, I really think it's not about the tools, it's about the techniques. This is where the true value is.
point taken.
This is really interesting; ive done very high level code maps but the entire project seems wild, it works?
So, small model figures out which files to use based on the code map, and then enriches with snippets, so big model ideally gets preloaded with relevant context / snippets up front?
Where does code map live? Is it one big file?
So, I have a pro@coder/.cache/code-map/context-code-map.json.
I also have a `.tmpl-code-map.jsonl` in the same folder so all of my tasks can add to it, and then it gets merged into context-code-map.json.
I keep mtime, but I also compute a blake3 hash, so if mtime does not match, but it is just a "git restore," I do not redo the code map for that file. So it is very incremental.
Then the trick is, when sending the code map to AI, I serialize it in a nice, simple markdown format.
- path/to/file.rs - summary: ... - when to use: ... - public types: .., .., .. - public functions: .., .., ..
- ...
So the AI does not have to interpret JSON, just clean, structured markdown.
Funny, I worked on this addition to my tool for a week, planning everything, but even today, I am surprised by how well it works.
I have zero sed/grep in my workflow. Just this.
My prompt is pro@coder/coder-prompt.md, the first part is YAML for the globs, and the second part is my prompt.
There is a TUI, but all input and output are files, and the TUI is just there to run it and see the status.
whenever I see post like this
i said well yeah, but its too sophiscated to be practical
Fair point, but because I spent a year building and refining my custom tool, this is now the reality for all of my AI requests.
I prompt, press run, and then I get this flow: dev setup (dev-chat or plan) code-map (incremental 0s 2m for initial) auto-context (~20s to 40s) final AI query (~30s to 2m)
For example, just now, in my Rust code (about 60k LOC), I wanted to change the data model and brainstorm with the AI to find the right design, and here is the auto-context it gave me:
- Reducing 381 context files ( 1.62 MB)
- Now 5 context files ( 27.90 KB)
- Reducing 11 knowledge files ( 30.16 KB)
- Now 3 knowledge files ( 5.62 KB)
The knowledge files are my "rust10x" best practices, and the context files are the source files.
(edited to fix formatting)
It's not sophisticated at all, he just uses a model to make some documentation before asking another model to work using the documentation
I think you've kind of hit on the more successful point here, which is that you should be keeping things focused in a sufficiently focused area to have better success and not necessarily needing more context.
Your code map compresses signal on the context side. Same principle applies on the prompt side: prompts that front-load specifics (file, error, expected behavior) resolve in 1-2 turns. Vague ones spiral into 5-6. 1M context doesn't change that — it just gives you more room for the spiral.
very interested in this approach and many other people are for sure. Please do a blog post.
1M context is super useful with Gemini, not so much for coding, but for data analysis.
Even there, I use AI to augment rows and build the code to put data into Json or Polars and create a quick UI to query the data.