Over the past year or two I've just been paying for the API access and using open source frontends like LibreChat to access these models.

This has been working great for the occasional use, I'd probably top up my account by $10 every few months. I figured the amount of tokens I use is vastly smaller than the packaged plans so it made sense to go with the cheaper, pay-as-you-go approach.

But since I've started dabbling in tooling like Claude Code, hoo-boy those tokens burn _fast_, like really fast. Yesterday I somehow burned through $5 of tokens in the space of about 15 minutes. I mean, sure, the Code tool is vastly different to asking an LLM about a certain topic, but I wasn't expecting such a huge leap, a lot of the token usage is masked from you I guess wrapped up in the ever increasing context + back/forth tool orchestration, but still

The simple reason for this is that Claude Code uses way more context and repetitions than what you would use in a typical chat.

$20.00 via Deepseek's api (Yes China, can have my code idc), has lasted me almost a year. Its slow, but better quality output than any of the independently hosted Deepseek models (ime). I don't really use agents or anything tho.

Agreed, I'm still trying to use up my first $5 on Deepseek. The best thing is the off-peak rate is during the US work day and is only 55 cents per million tokens. Great for use with agents cuz you never have to worry about cost or throttling.

Everyone complains about the prices of other models but there are much cheaper alternatives out there and DS is no slouch either.