I’m a bit confused by what you’re offering. Is it a voice assistant / AI as described on your GitHub? Or is it more general purpose / LLM ?

How does the RAG fit in, a voice-to-RAG seems a bit random as a feature?

I don’t mean to come across as dismissive, I’m genuinely confused as to what you’re offering.

RunAnywhere builds software that makes AI models run fast locally on devices instead of sending requests to the cloud.

Right now, our focus is Apple Silicon.

Today there are two parts:

MetalRT - our proprietary inference engine for Apple Silicon. It speeds up local LLM, speech-to-text, and text-to-speech workloads. We’re expanding model coverage over time, with more modalities and broader support coming next.

RCLI - our open-source CLI that shows this in practice. You can talk to your Mac, query local docs, and trigger actions, all fully on-device.

So the simplest way to think about us is: we’re building the runtime / infrastructure layer for on-device AI, and RCLI is one example of what that enables.

Longer term, we want to bring the same approach to more chips and device types, not just Apple Silicon.

For people asking whether the speedups are real, we’ve published our benchmark methodology and results here: LLM: https://www.runanywhere.ai/blog/metalrt-fastest-llm-decode-e... Speech: https://www.runanywhere.ai/blog/metalrt-speech-fastest-stt-t...

From the TFA: Document Intelligence (RAG): Ingest docs, ask questions by voice — ~4ms hybrid retrieval.

Seems pretty clear. You can supply documents to the model as input and then verbally ask questions about them.

I came to the comments here to see if anyone had worked out what it is, so you're not alone.