Quick question, for average joe do we still need to "train" LLM or we can just use off the shelf model and use it ("inference"?) for normal use cases like business process augmentation (e.g. helping read paper receipts, or generate cat videos)?
Quick question, for average joe do we still need to "train" LLM or we can just use off the shelf model and use it ("inference"?) for normal use cases like business process augmentation (e.g. helping read paper receipts, or generate cat videos)?
Modern smaller LLMs like Qwen3.6 27B is quite good at visual tasks like describing images. I wouldn't trust it on receipts unless you're fine with a bit less than 100% accuracy, say 90-ish%. For descriptions of images and such I've found they do quite well indeed. A key change was the introduction of more or even dynamic visual tokens, that really helped the model "see" more details.
Generating cat videos is the domain of diffusion models. If you have at least a 16GB GPU and a fair bit of patience you can get quite good results, check out ComfyUI reddit for example.
Just as example, here's what Qwen3.6 27B Q5_K_XL can do given this[1] image. I didn't do any prompt engineering here just a dead simple prompt: "Transcribe the following receipt. Put line items in a separate section, each line item separated by a double newline". Temperature set to 0.5.
Here's the output:
[1]: https://i.pinimg.com/originals/41/08/dc/4108dcf51f15af464bb6...What is the difference between this and using normal OCR and then running that output through a LLM? It seems such a bazooka way to kill a fly to me using a modelime Qwen.
For most tasks I agree. However once you've done your OCR you already have lost a lot of positional and context information, so for some tasks it might not be good enough.
If you have scanned PDFs that follow a template, like an invoice from a repeat supplier, then yeah OCR is definitely the way to go.
I think nowadays a lot of models are trained more at doing this than at knowing things, while being smaller. So I’d say yes!
At least that’s my impression.
You can use modern off-the-shelf models for those types of tasks, however a smaller-but-bespoke model will usually be more cost-efficient if used at scale.
And smaller bespoke models running locally are better for regulated workflows (healthcare, banking etc) as well