Hacker News

andyferris 2 months ago [ - ]

The notes explicitly call out you may want to dial the effort setting back to medium to reduce latency/tokens (high being default, apparently there is a max setting too).

gverrilla 2 months ago [ - ]

There's 3 options to choose from on /model: Low, medium and high effort.