That's a cool concept - would be curious about a more common setup for agentic data analysis (ex: for using in Claude Code) like:

* Multiple tasks vs 1

* O3/o3-mini + 4o/4o-mini instead of nano

* Extra credit: Inside a fixed cost/length reasoning loop

Ex: does the md-kv benefit disappear with smarter models that you'r typically use, and thus just become a 2-3x cost?