I'd be interested in seeing actual agent benchmarks (eg CC or Copilot CLI with grep removed and this tool instead).

For example, I have explored RTK and various LSP implementations and find that the models are so heavily RL'd with grep that they do not trust results in other forms and will continually retry or reread, and all token savings are lost because the model does not trust the results of the other tools.

I just put something in my global CLAUDE.md (under ~/.Claude) asking it to use the LSP instead of grep and have never had this issue since.

can you share that prompt?

My q would have been this. Lsp solved this no?

Token savings is more and more important, but it also important if the agent trusts the result and stops searching. it should measure the full agent loop instead of just the search output

Hey, this is something we're actively working on, but this is hard (and expensive) to do well across harnesses/models. The grep pretraining thing is very interesting though, I've noticed the same. E.g. Sonnet 4.6 seems to trust semble but Opus 4.7 less so. I'm hoping we can quantitatively test this and improve it when we have proper benchmarks for this as well. If you do have any feedback though let me know!

>so heavily RL'd with grep

At least codex listens to me telling it to use rg instead of grep, cause grep is often so slow. But when adding rtk it uses grep through rtk which is kind of annoying.

Yeah we're also interested in doing this, it's on the roadmap together with optimization of the prompt and descriptions so that models have an easier time using it.

Perhaps anecdotally: we do use this tool ourselves of course, and it's been working pretty well so far. Anthropic models call it and seem to trust the results.

Codex CLI is quite happy running RTK. Well with GPT 5.5 xhigh anyway

One thing that irks me is that when it doesn't support eg. a cli flag of find, it gives an error message rather than sending the full output of the command instead. Then the agent wastes tokens retrying, or worse, doesn't even try because the prompting may make them afraid to not run commands without rtk

how effective is RTK for you? worth using?

I found judicial use of rtk on specific commands that you know can be improved with rtk, e.g. go test, pnpm test (vitest), etc. to be worthwhile, at least in CC. But using their default setup which is to prepend rtk to everything is more trouble than its worth. I have a custom-built hook that prepends rtk based on a hierarchical whitelist.

And you should disable the savings reporting feature since it’s worse than useless—it breaks sandboxing and always reports ~100% savings for me because rtk obviously doesn’t know about the head/tail the agent pipes into.

I can't find the relevant issues in their repo, but I've been somewhat skeptical of their tool over-reporting token savings and there are many issues to that effect in the repo.

I'm not likely to install it again in my latest configuration, instead applying some specific tricks to things like `make test` to spit out zero output exit on unsuccessful error codes, that sort of thing. Anecdotally, I see GPT-5.5 often automatically applying context limiting flags to the bash it writes :shrug:

I've had the same experience with RTK, where my agent got stuck in a loop with a faulty RTK command and could not escape it since RTK hard overwrites anything automatically. I've uninstalled it again for the time being.

I had better results with lean ctx and context mode than with rtk.

[deleted]

Wondering too

I think the best bet is to use some kind of proxy so when the model calls grep, you intercept the call, use other tool to search and give back the results to the model.

[deleted]

I forced Claude to have a global memory for RTK and my own AI memory system (GuardRails) which it happily uses both, the only times it doesnt use GuardRails is if I dont mention it at all, otherwise it always uses RTK unless RTK falls apart running a tool it does not support.