> Does anyone here have experience running large models in a multi-GPU setup with several RTX 6000s in a high-concurrency regime and with large context lengths? (something like Deepseek 4 Flash, Minimax 2.7 etc.)

Join the RTX6kPRO tribe!

- https://discord.gg/pYCvaQTf

- https://github.com/local-inference-lab/rtx6kpro