really appreciate you taking the time to write this!

we've started trying to work through adding agents like this: https://x.com/barbinbrad/status/1903047303180464586

the trouble is that there are 1000s of possible mutations -- and the quality of an agent tends to diminsh with the amount of "tools" you give it. i need to figure out the right abstraction for this.

I pray you focus on your core product and don’t fall into an agentification rabbit hole.

If you do want everything to be automatable take a page from Blender and give every action a key binding plus a Python method, so Python scripts can take the same actions a human would, but as function calls instead of clicks. Then maybe maybe maybe you can have a text field that allows natural language to transform to an action, but please god stay away from chat interfaces.

Rhino CAD is another interesting interface to look at, there’s a million buttons and menus but there’s also a text field at the top of the viewport where you can just type a command if you already know the name instead of rummaging through submenus. Kind of a CLI within the GUI.

I somewhat agree with you, especially that one could identify a common abstraction that later an LLM could piggyback on top of.

Genuine question though - have you implemented an AI assistant/chat interface recently using LLMs on top of a UI?

I agree it can be a rabbit hole, but I just got through doing it on an app and there were definitely some things it really made way simpler and some complex scenarios that I'm not sure could have been done any more simply.

I built a chat interface in 2017 (this was with chatscript dialog trees with hole-filling and semantic search) that was ostensibly to prevent our data scientists from redundant work, ie, before they spent all day writing a SQL script, describe the job of the script and see if one already exists. The chatbot would then ask for the parameters required for a script, run the job, and then present a CSV of the returned data.

As we collected user feedback and refined the UX, we got closer and closer to an option tree that could be better represented by a drop down menu. It was kind of depressing, but I learned that the actual job of that R & D wasn't to come up with a superintelligent chatbot that replaced data scientists, it was to come up with the infrastructure that would allow data scientists to put their python scripts in a common repository to allow re-use without re-installing locally and screwing around with pyenvs.

Anyway, I'm also traumatized by my involvement with a YC startup that actually had a very good (if ENRONish) product around peer to peer energy futures trading that completely fell apart when investors demanded they make it "AI"

Cool! Yeah, that's the kind of UI/UX I meant.

I agree with the right abstraction and it's tough to find the balance- in our data pipeline app, what we did is make key core functionality of the app exposed so the assistant can use it, and implemented a handful of basic agents out of the box, including one default one that could shell out work to others. We also made it easy as an extension point for users to add a new agent that used the core functionality/tools, just by defining the agent in a markdown file.

We found starting small for critical use cases that saved the most time, but thinking about building blocks, was useful.

Because the responses of the AI assistant come back and are processed on the UI, we found we could give the LLM our UI docs as well as knowledge about UI element IDs, etc so it could respond with input commands that would drive the UI.

This way, we could do something like, provide the LLM with the input/prompt including the context of like - what page/view is the user on, what is their intent, what tools are available in general, what sub agents are available for specialized tasks, etc.

Please don't let my suggestions sway you away from core progress in your app (take with a grain of salt). But it's great you're already experimenting- keep your eyes open if you see a great use case where it accelerates workflow.

Another HNer mentioned people not reading docs- that's a low hanging fruit use case we had too - "how do I use this view?", "what does this field mean?", or retrieving information from other parts of the app without having to navigate away, etc. It can save having to find answers in a doc or navigate elsewhere.

Edit: perhaps a useful exercise - imagine a workflow of "talking to the app to achieve a task" as a way to explore.

"Hey ERP, open the part entry screen for part 12345"

"Hey ERP, can you update the description for part 12345 to correct the spelling error?"

"Hey ERP, how many of widget XYZ are in stock? If there are enough in stock, can you transfer quantity 10 from warehouse A to B?"

"Hey ERP, how do I cancel a sales order?"

"Hey ERP, how does this screen work?"

I think if you break these down, you'll find common abstractions that map to features, API endpoints, user interface sequences and interactions, triggering workflows, looking things up in docs, etc.