RCLI: On‑Device Voice AI for macOS – Zero‑Cloud, Fast

What is RCLI?

RCLI (RunAnywhere Command‑Line Interface) is a fully‑local, open‑source voice assistant for macOS. It bundles a Speech‑to‑Text (STT) engine, a large‑language model (LLM), and Text‑to‑Speech (TTS) all running on Apple Silicon’s GPU through the proprietary MetalRT inference engine. The result is a speech‑activated Mac that can control applications, retrieve information from your local documents, and respond in real time—all without sending data to the cloud.

Key points: - 38 different macOS actions (play Spotify, adjust volume, take screenshots, open URLs, and more) reachable by voice or text. - Local Retrieval‑Augmented Generation (RAG) that indexes PDFs, DOCX, and plain text files with hybrid vector + BM25 search in ~4 ms. - Sub‑200 ms end‑to‑end latency from speaking to hearing the reply. - Zero reliance on external APIs; no API keys required. - A terminal‑based interactive UI that lets you manage models, actions, and the MetalRT engine.

Installation

RCLI is available through Homebrew or a single‑script install. For the quickest setup, run:

curl -fsSL https://raw.githubusercontent.com/RunanywhereAI/RCLI/main/install.sh | bash

Or use Homebrew:

brew tap RunanywhereAI/rcli https://github.com/RunanywhereAI/RCLI.git
brew install rcli
rcli setup   # downloads ~1 GB of local models on first run

If your Mac is running macOS 13+ with an Apple Silicon chip (M3 or later recommended), the MetalRT GPU engine will be used automatically. On M1/M2 machines, RCLI falls back to the fast open‑source llama.cpp inference implementation.

Quick‑Start Commands

Command What it does
rcli Launches the interactive TUI (push‑to‑talk or text input)
rcli listen Continuous voice mode (you just speak)
rcli ask "open Safari" One‑shot text or voice command
rcli metalrt MetalRT GPU engine management
rcli llamacpp Llama.cpp engine management

In the TUI you can press A to enable or disable actions, M to view models, R to import documents for RAG, and X to clear conversation context.

Features in Detail

1. Full‑Featured Voice Pipeline

  • VAD – Silero voice activity detection.
  • STT – Whisper Tiny/Small/Medium or Zipformer streaming.
  • LLM – Qwen3, LFM2 variants, or Qwen3.5; all loaded into MetalRT with Flash Attention.
  • TTS – Kokoro voices or alternative TTS engines.
  • Tool Calling – Works with Qwen3 and LFM2 native tool calling for macOS actions.

2. 38 Mac Actions

RCLI maps intents from the LLM to AppleScript or shell commands. Common categories: - Productivity – create notes, reminders, or run shortcuts. - Communication – send messages, start FaceTime calls. - Media – control Spotify, Apple Music, adjust volume. - System – open/quit apps, lock screen, toggle dark mode. - Web – search, open URLs or maps.

3. Local RAG

Index your folders with rcli rag ingest ~/Documents. Queries over the index are answered via a hybrid retrieval engine that stays entirely on‑device. With ~4 ms response time over thousands of chunks, you can have real‑time document‑based Q&A.

4. Benchmarks

  • MetalRT decode throughput: up to 550 tokens/s, outperforming llama.cpp and Apple MLX on M3 Max.
  • Real‑time factor: MetalRT STT is 714× faster than real‑time, and the overall pipeline stays below 200 ms.

How to Contribute

RCLI welcomes pull requests. Contribute by: - Adding new macOS actions or improving existing ones. - Adding support for more models (LLMs, STT, TTS). - Improving the TUI or adding new documentation.

See CONTRIBUTING.md for build instructions.

Is it Free?

The repository is licensed under MIT. The MetalRT GPU engine itself is proprietary but can be used freely for personal or commercial projects after contacting the vendor.

Summary

RCLI offers a compelling on‑device voice solution for macOS that removes the need for cloud services and API keys. With a growing list of locally‑executed actions, real‑time RAG, and Lightning‑fast MetalRT inference, it’s an ideal project for developers looking to build privacy‑first voice assistants or for power users who want instant control over their Mac.

Next Step: clone the repo, install via Homebrew, and experiment with voice commands. Share your custom actions or voice prompts on community forums and help push the project forward.

Original Article: View Original

Share this article