You probably know this but it can already generate accurate diagrams. Just ask for the output in a diagram language like mermaid or graphviz
You probably know this but it can already generate accurate diagrams. Just ask for the output in a diagram language like mermaid or graphviz
My experience is it often produces terrible diagrams. Things clearly overlap, lines make no sense. I'm not surprised as if you told me to layout a diagram in XML/YAML there would be obvious mistakes and layout issues.
I'm not really certain a text output model can ever do well here.
FWIW I think a multimodal model could be trained to do extremely well with it given sufficient training data. A combination of textual description of the system and/or diagram, source code (mermaid, SVG, etc.) for the diagram, and the resulting image, with training to translate between all three.
Agreed. Even simply I'm sure a service like this already exists (or could easily exist) where the workflow is something like:
1. User provides information
2. LLM generates structured output for whatever modeling language
3. Same or other multimodal LLM reviews the generated graph for styling / positioning issues and ensure its matches user request.
4. LLM generates structured output based on the feedback.
5. etc...
But you could probably fine-tune a multimodal model to do it in one shot, or way more effectively.
I had a latex tikz diagram problem which sonnet 3.7 couldn't handle even after 10 attempts. Gemini 2.5 Pro solved it on the second try.
Had the same experience. o3-mini failing misreably, claude 3.7 as well, but gemini 2.5 pro solved it perfectly. (image of diagram without source to tikz diagram)
I've had mixed and inconsistent results and it hasn't been able to iterate effectively when it gets close. Could be that I need to refine my approach to prompting. I've tried mermaid and SVG mostly, but will also try graphviz based on your suggestion.
Plantuml (action) diagrams are my go to