What are your settings and tokens/second? Even with 2 GPUs (MI100, RX 6600 XT 8GB) and 32GB of RAM it was running at a snails pace for me.
I didn't try a sched_spread with a 3090 and the MI100 which would provide 56GB ram
What are your settings and tokens/second? Even with 2 GPUs (MI100, RX 6600 XT 8GB) and 32GB of RAM it was running at a snails pace for me.
I didn't try a sched_spread with a 3090 and the MI100 which would provide 56GB ram