Local, real-time STT on Apple Silicon.

Use speech models instantly with no complex setup. Built on MLX for high-performance streaming and real-time processing entirely on-device. 100% Offline. No data leaves your machine.

$ uv run https://dictate.sh/stt.py

INFO: Fetching dependencies...
INFO: Loading mlx-community/Qwen3-ASR-0.6B-8bit...
> "Running locally with no lag."
Intent: SYSTEM_CHECK

Performance meets simplicity.

One-Line Install

Zero configuration. Just uv run and you are streaming in seconds.

MLX Powered

Optimized specifically for Apple Silicon hardware for maximum efficiency.

Real-Time Stream

Low-latency architecture lets you see words the moment they are spoken. Uses a rolling-window buffer to prioritize speed.

Live Processing

Analyze intent with models like Qwen3 on the fly as the audio streams.

Advanced control.

Swap out ASR or LLM models, fine-tune VAD sensitivity for noisy environments, select specific audio hardware, and pipe clean text output directly into other CLI tools.

# Use a specific input device
$ uv run stt.py --list-devices
$ uv run stt.py --device 1

# Switch models (ASR & LLM)
$ uv run stt.py --model mlx-community/Qwen3-ASR-1.7B-8bit
$ uv run stt.py --analyze --llm-model mlx-community/Mistral-7B-Instruct-v0.2-4bit

# Tune latency & VAD
$ uv run stt.py --transcribe-interval 0.2 --vad-silence-ms 300

# Save transcripts to a file
$ uv run stt.py | tee transcripts.txt
$ uv run stt.py > transcripts.txt

Models.

Qwen3-ASR

The current default. Supports robust speech recognition across 52 languages and dialects. Built on the Qwen3-Omni foundation, it delivers high accuracy in complex acoustic environments.

Hugging Face →