This is nice! I tried it for a bit and it was indeed quite fast. Are you looking for contributors, or are you building this as a personal tool? I ran into some issues when attempting to use different models, though: gpt-5.5 on Azure doesn't work, even with the OpenAI compatible endpoint, because "max_tokens" has been replaced with "max_completion_tokens". And it doesn't appear possible to pass through custom headers, so I wasn't able to specify reasoning_effort for deepseek models.

Yes, I am open for PRs.

What you showed is a clear bug in my codebase, if you can, open a Github issue with each of your bugs.

Thanks!