I have had bad experience with neuralwatt GLM 5.2. Seems like they may be using quantized version of the model.

Hi I'm the CTO of neuralwatt, would love to hear your feedback on what your experience was. Feel free to email me scott@neuralwatt.com. Also for GLM5.2 we run the FP8 quantization at 1M context which is a common deployment target.

Hi Scott! Was just considering signing up, NW looks great (fp8 GLM 5.2 is good!) Standard cached token pricing for GLM 5.2 is pretty high, I'm wondering whether the KV cache for that model actually is that expensive to serve on average, or if Neuralwatt's energy pricing for long-running GLM 5.2 agents is especially competitive? The live energy stats don't break down by token type, would love to see that. And 2/3 of the examples given in docs/energy-methodology are models you don't even host anymore. Uncertainty and selective stats puts people off signing up, they tend to assume the worst. Oh, and MiMo or DS4 please :)

Thanks for the feedback! Our primary focus is charging by energy, for token pricing we really just try to be close to the market. That being said I'll take a look at our token pricing to see if we need an update there https://portal.neuralwatt.com/energy-pricing Generally our users get much lower cost on energy than token pricing though on a typical request with a high prefix cache hit the input, cached costs is very small and the output energy cost is higher.

We definitely don't have any intention to obfuscate and in fact we actually try and provide more data than any other provider out there about both an individual request, as well as the fleet behavior. Since we tend to focus directly on our energy pricing and optimizing that the issue is likely where the ROI lies on energy optimization versus token optimization (totally correlated but we have other levers to reduce energy while keeping token counts the same).

I had good experience with neuralwatt in my heavy testing on real project in last days. Price/performance for api pricing was great. When using with pi, I was a little confused on if/how it supports diff reasoning levels?

[deleted]