Just as example, here's what Qwen3.6 27B Q5_K_XL can do given this[1] image. I didn't do any prompt engineering here just a dead simple prompt: "Transcribe the following receipt. Put line items in a separate section, each line item separated by a double newline". Temperature set to 0.5.
Here's the output:
Publix.
Bradenton Commons Shopping Center
4651 Cortez Rd. W.
Bradenton, FL 34210
Store Manager: Joe Galati
941-792-7195
N/O LF WHEAT BREAD 3.99 F
PBX THCK L/S BACON 7.82 F
PUBLIX BROWN GRAVY 0.83 F
TOP SIRLOIN STEAK 11.74 F
You Saved 3.92
VITA PRTY SNK WINE 6.99 F
You Saved 3.00
ORGANIC CARROTS 1.69 F
BRC FLRT EAT SMART 3.34 F
1 @ 3 FOR 10.00
You Saved 0.15
GINGER ROOT 0.65 F
0.13 lb @ 4.99/ lb
POTATOES RUSSET 0.84 F
0.65 lb @ 1.29/ lb
POTATOES SWEET 0.49 F
0.49 lb @ 0.99/ lb
DELECT BSQUE CK/TN 10.99 T
FS OUTSTRETCH UNSC 15.99 T
Order Total 65.36
Sales Tax 1.89
Grand Total 67.25
Credit Payment 67.25
Change 0.00
Savings Summary
Special Price Savings 7.07
************************************************************
* Your Savings at Publix *
* 7.07 *
************************************************************
Receipt ID: 5957 6249 2191 1277 712
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PRESTO!
Trace #: 766630
Reference #: 0098440513
Acct #: XXXXXXXXXXXX2034
Purchase VISA
[1]: https://i.pinimg.com/originals/41/08/dc/4108dcf51f15af464bb6...
What is the difference between this and using normal OCR and then running that output through a LLM? It seems such a bazooka way to kill a fly to me using a modelime Qwen.
For most tasks I agree. However once you've done your OCR you already have lost a lot of positional and context information, so for some tasks it might not be good enough.
If you have scanned PDFs that follow a template, like an invoice from a repeat supplier, then yeah OCR is definitely the way to go.