Hacker News

DeepSeek V4 Pro has about 25GB worth of active parameters, so if you can fit the whole ~870GB weights + cache in RAM your tok/s is bounded above by 25GB divided into your system memory bandwidth in GB/s. If you can't fit your whole model in RAM you'll be bottlenecked to some degree by storage bandwidth which is in the single or low double digits in GB/s.

Mind you, it's an absolutely sensible setup either way if you are just testing a few queries and are willing to run them unattended/overnight. Especially since the KV-cache size is apparently really low (~10GB is said to be typical) so you get a lot of batching potential even in consumer setups, which amortizes the cost of fetching weights.