Just as example, here's what Qwen3.6 27B Q5_K_XL can do given this[1] image. I didn't do any prompt engineering here just a dead simple prompt: "Transcribe the following receipt. Put line items in a separate section, each line item separated by a double newline". Temperature set to 0.5.

Here's the output:

  Publix.
  Bradenton Commons Shopping Center
  4651 Cortez Rd. W.
  Bradenton, FL 34210
  Store Manager: Joe Galati
  941-792-7195
  
  N/O LF WHEAT BREAD 3.99 F
  
  PBX THCK L/S BACON 7.82 F
  
  PUBLIX BROWN GRAVY 0.83 F
  
  TOP SIRLOIN STEAK 11.74 F
  You Saved 3.92
  
  VITA PRTY SNK WINE 6.99 F
  You Saved 3.00
  
  ORGANIC CARROTS 1.69 F
  
  BRC FLRT EAT SMART 3.34 F
  1 @ 3 FOR 10.00
  You Saved 0.15
  
  GINGER ROOT 0.65 F
  0.13 lb @ 4.99/ lb
  
  POTATOES RUSSET 0.84 F
  0.65 lb @ 1.29/ lb
  
  POTATOES SWEET 0.49 F
  0.49 lb @ 0.99/ lb
  
  DELECT BSQUE CK/TN 10.99 T
  
  FS OUTSTRETCH UNSC 15.99 T
  
  Order Total 65.36
  Sales Tax 1.89
  Grand Total 67.25
  Credit Payment 67.25
  Change 0.00
  
  Savings Summary
  Special Price Savings 7.07
  ************************************************************
  * Your Savings at Publix *
  * 7.07 *
  ************************************************************
  
  Receipt ID: 5957 6249 2191 1277 712
  - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  PRESTO!
  Trace #: 766630
  Reference #: 0098440513
  Acct #: XXXXXXXXXXXX2034
  Purchase VISA
[1]: https://i.pinimg.com/originals/41/08/dc/4108dcf51f15af464bb6...

What is the difference between this and using normal OCR and then running that output through a LLM? It seems such a bazooka way to kill a fly to me using a modelime Qwen.

For most tasks I agree. However once you've done your OCR you already have lost a lot of positional and context information, so for some tasks it might not be good enough.

If you have scanned PDFs that follow a template, like an invoice from a repeat supplier, then yeah OCR is definitely the way to go.