Hacker News

dcminter 3 days ago [ - ]

I think you'll find that on that card most models that are approaching the 16G memory size will be more than fast enough and sufficient for chat. You're in the happy position of needing steeper requirements rather than faster hardware! :D

Ollama is the easiest way to get started trying things out IMO: https://ollama.com/

giorgioz 3 days ago [ - ]

I found LM Studios so much easier than ollama given it has a UI: https://lmstudio.ai/ Did you know about LM Studio? Why is ollama still recommended given it's just a CLI with worse UX?

dcminter 3 days ago [ - ]

I recommended ollama because IMO that is the easiest way to get started (as I said).

ekianjo 3 days ago [ - ]

lM studio is closed source

prophesi 3 days ago [ - ]

Any FOSS solutions that let you browse models and guesstimates for you on whether you have enough VRAM to fully load the model? That's the only selling point to LM Studio for me.

Ollama's default context length is frustratingly short in the era of 100k+ context windows.

My solution so far has been to boot up LM Studio to check if a model will work well on my machine, manually download the model myself through huggingface, run llama.cpp, and hook it up to open-webui. Which is less than ideal, and LM Studio's proprietary code has access to my machine specs.

ekianjo 3 days ago [ - ]

> Ollama's default context length is frustratingly short in the era of 100k+ context windows.

Nobody uses Ollama as is. It's a model server. In clients you can specify the proper context lengths. This has never been a problem.

prophesi 2 days ago [ - ]

For sure, though it's tripped me up a few times for clients that don't pass in a reasonable context length with each call.

nickthegreek 3 days ago [ - ]

https://huggingface.co/docs/accelerate/v0.32.0/en/usage_guid...

prophesi 3 days ago [ - ]

Thanks! That's really helpful.

y2244 3 days ago [ - ]

And I think LM Studio has non commercial restrictions