In my experience, LLMs tend to take noticeably longer to process images than text.
It has to get the image data first, basically just IO time before processing it
IIRC there's pre-processing (embedding/tokenization?) before feeding images to LLMs?
Hit this issue optimizing LLM request times. Ending up lowering image resolution. Lost some accuracy but could bear that.
I wonder if these stay in the prefix cache?
It has to get the image data first, basically just IO time before processing it
IIRC there's pre-processing (embedding/tokenization?) before feeding images to LLMs?
Hit this issue optimizing LLM request times. Ending up lowering image resolution. Lost some accuracy but could bear that.
I wonder if these stay in the prefix cache?