Hacker News

Pick up a used 3090 with more ram.

It holds it's value so you won't lose much if anything when you resell it.

But otherwise, as said, install Ollama and/or Llama.cpp and run the model using the --verbose flag.

This will print out the token per second result after each promt is returned.

Then find the best model that gives you a token per second speed you are happy with.

And as also said, 'abliterated' models are less censored versions of normal ones.