I feel the same, but cannot measure the effect in any context benchmark like fiction.livebench.
Are they aggressively quantizing, or are our expectations silently increasing ?
I feel the same, but cannot measure the effect in any context benchmark like fiction.livebench.
Are they aggressively quantizing, or are our expectations silently increasing ?