I've been kicking around idea for a similar open source project, with the caveats that:
1. I'd like the backend to be configured for any LLM the user might happen to have access to (be that the API for a paid service or something locally hosted on-prem).
2. I'm also wondering how feasible it is to hook it up to a touchscreen running on some hopped-up raspberry pi platform so that it can be interacted with like an Alexa device or any of the similar offerings from other companies. Ideally, that means voice controls as well, which are potentially another technical problem (OpenAI's API will accept an audio file, but for most other services you'd have to do voice to text before sending the prompt off to the API).
3. I'd like to make the integrations extensible. Calendar, weather, but maybe also homebridge, spotify, etc. I'm wondering if MCP servers are the right avenue for that.
I don't have the bandwidth to commit a lot of time to a project like this right now, but if anyone else is charting in this direction I'd love to participate.
I've created exactly this for myself: https://v3rtical.tech/public/sshot.png
It runs locally, but it uses API keys for various LLMs. Currently I much prefer QwQ-32B hosted at Groq. Very fast, pretty smart. Various tools use various LLMs. It can currently generate 3 types of documents I need in my daily work (work reports, invoices, regulatory time-sheets).
It has weather integration. It can parse invoices and generate QR codes for easy mobile banking payments. It can work with my calendars,
Next I plan to do the email integration. But I want to do it properly. This means locally synchronized, indexable IMAP mail. Might evolve into actually usable desktop email client (the existing ones are all awful). We'll see...
You might want to take a look at SillyTavern. Supports multiple backends, accepts voice input, and has a plugin system.
Also Open WebUI. It's a very nice piece of software that provides a ChatGPT/Claude-like interface, but with lots of extra features.
https://docs.openwebui.com/
I keep hearing about it, but never got to check out, the name suggests that it may be waste of time. Maybe it’s a fantastic project but name lets it down?
You are on Hacker News, typing on Apple, listening to Daft Punk, reading an article about Steven, the AI butler hosted on Val Town, comment chain you're replying to talks about using self hosted models (probably llama) and Raspberry Pi, yet SillyTavern is the name that trips you up?
SillyTavern started up as a roleplaying model
As in "you meet a person at a tavern" and then you start chatting.
People provide different personalities to the project, sometimes with avatars and I think some can even change avatars based on their "mood".
Having multiple backends can be a good approach, with various LLMs for different specialized tasks. I haven't tried it yet but WilmerAI is an option for routing your inputs to the appropriate LLM, works well with SillyTavern.
I also want an OSS framework that lets me extend it with my own scripting/modules, and is focused around being an assistant for me and my family. There's a shared set of features (memory storage/retrieval, integrations to chat/email/etc interfaces, syncing to calendar/notion/etc, notifications) that should be put into an OSS framework that would be really powerful.
I also don't have time to run such a thing but would be up for helping and giving money for it. I'm working on other things including a local-first decentralized database/object store that could be used as storage, similar to OrbitDB, though it's not yet usable.
Mostly I've just been unhappy with having access to either a heavily constrained chat interface or having to create my own full Agent framework like the OP did.
So something like MCP, but with a slightly different, more focused, scope?
Why not use a smartphone for the user interface?