Any FOSS solutions that let you browse models and guesstimates for you on whether you have enough VRAM to fully load the model? That's the only selling point to LM Studio for me.
Ollama's default context length is frustratingly short in the era of 100k+ context windows.
My solution so far has been to boot up LM Studio to check if a model will work well on my machine, manually download the model myself through huggingface, run llama.cpp, and hook it up to open-webui. Which is less than ideal, and LM Studio's proprietary code has access to my machine specs.
Any FOSS solutions that let you browse models and guesstimates for you on whether you have enough VRAM to fully load the model? That's the only selling point to LM Studio for me.
Ollama's default context length is frustratingly short in the era of 100k+ context windows.
My solution so far has been to boot up LM Studio to check if a model will work well on my machine, manually download the model myself through huggingface, run llama.cpp, and hook it up to open-webui. Which is less than ideal, and LM Studio's proprietary code has access to my machine specs.
> Ollama's default context length is frustratingly short in the era of 100k+ context windows.
Nobody uses Ollama as is. It's a model server. In clients you can specify the proper context lengths. This has never been a problem.
For sure, though it's tripped me up a few times for clients that don't pass in a reasonable context length with each call.
https://huggingface.co/docs/accelerate/v0.32.0/en/usage_guid...
Thanks! That's really helpful.
And I think LM Studio has non commercial restrictions