On API pricing you still pay 10% of the input token price on cache reads. Not sure if the subscription limits count this though.
And of course all conversations now have to compact 80 tokens earlier, and are marginally worse (since results get worse the more stuff is in the context)
Isn't it basically the same as paying dust to crypto exchanges when making a transaction - it's so miniscule that it's not worth caring about?
Well the system prompt is probably permanently cached.
On API pricing you still pay 10% of the input token price on cache reads. Not sure if the subscription limits count this though.
And of course all conversations now have to compact 80 tokens earlier, and are marginally worse (since results get worse the more stuff is in the context)
Takes up a portion of the context window, though
And the beginning of the context window gets more attention, right?