Models & Engines

Vowen supports multiple transcription engines — both local (offline) and cloud-based. This guide helps you pick the right one.

Quick Recommendations

Use Case	Recommended Model	Why
Quick notes (macOS)	Base.en or Parakeet	Fast, good accuracy, works offline
Quick notes (Windows)	Groq Whisper Turbo	Fast cloud model, free tier
Professional writing	Large v3 Turbo + AI Enhancement	Best accuracy + polished output
Non-English	Large v3 or Groq Large v3	Best multilingual accuracy
Maximum privacy	Any local model	Nothing leaves your device
Real-time preview	Parakeet, Deepgram, Soniox, or Cartesia Ink 2	Shows text as you speak
Meetings (diarization)	Deepgram Nova 3 or AssemblyAI	Identifies who said what

Local Models (Offline)

These run entirely on your machine. No internet required. All local models are downloaded on demand from within the app; nothing is bundled with the installer. The Base model is the default and is offered for download during onboarding.

Whisper Models

Based on OpenAI’s Whisper, supporting 99 languages.

Model	Size	Speed	Accuracy	Best For
Tiny	78 MB	Fastest	Basic	Quick notes, testing
Tiny.en	78 MB	Fastest	Good (English)	Fast English dictation
Base.en	148 MB	Fast	Good	General English use
Base (default)	148 MB	Fast	Good	Multilingual basics
Small	488 MB	Medium	Great	Professional work
Small.en	488 MB	Medium	Great (English)	Detailed English
Medium	1.5 GB	Slow	Excellent	High-quality output
Medium.en	1.5 GB	Slow	Excellent (English)	Long-form English
Large v3	3 GB	Slowest	Best	Maximum accuracy
Large v3 Turbo	1.6 GB	Medium-Fast	Excellent	Best balance

Models with .en suffix are English-only and slightly more accurate for English than their multilingual counterparts.

Parakeet TDT 0.6B

NVIDIA’s streaming-capable model. Supports 25 European languages with auto-detection.

	macOS	Windows
Format	CoreML	ONNX (int8)
Size	~1 GB	~478 MB
Streaming	Yes	Yes
Languages	25	25

Parakeet is excellent for real-time transcription preview and European languages. The first 1-2 transcriptions after launch may be slower as the model loads into memory.

Recent Parakeet improvements:

Voice activity detection (VAD) now drives streaming, for cleaner utterance boundaries and more reliable real-time transcription
Auto-translate to English works with Parakeet (Pro)
On Windows, long recordings are automatically chunked so lengthy audio transcribes reliably
Accidental empty taps (audio under ~0.3 seconds) are discarded silently — they won’t surface an error or leave an entry in your Voice Log

Cloud Models

These send audio to a third-party API. Require an internet connection and API key.

Available Cloud Models

With cloud models, your audio is sent to the provider, processed, and the transcription is returned. Some models stream text back as you speak (Real-time: Yes); others return the full transcript once you stop (Real-time: No).

Model	Provider	Languages	Real-time	Diarization	Free Tier
Whisper Large v3	Groq	99	No	Yes (macOS)¹	Yes (generous)
Whisper Turbo	Groq	99	No	Yes (macOS)¹	Yes (generous)
gpt-4o-transcribe	OpenAI	99	Yes	Yes	Paid
Gemini (Live + Flash)	Google	Many	Yes	Yes	Free API
Nova 2/3	Deepgram	99	Yes	Yes	$200 credit
Scribe v2	ElevenLabs	99	Yes	Yes	Limited
Universal	AssemblyAI	6 streaming / 99 batch	Yes	Yes	$50 credit
Voxtral Mini	Mistral	13	Yes	Yes	Free API
Saaras v3	Sarvam AI	22+	Yes	No	Limited
Soniox (STT v5)	Soniox	60+	Yes	No	Paid
Ink 2	Cartesia	English	Yes	No	Paid
Aurora	XAI	Various	Yes	Yes	Limited
Speechmatics	Speechmatics	39	Yes	Yes	Paid

OpenAI (gpt-4o-transcribe) and Google Gemini are now real-time streaming providers, so they can power live transcription preview and Ask AI live during a meeting, not just file transcription. Gemini’s live sessions are capped at 15 minutes each, while its file transcription handles audio up to ~9.5 hours.

AssemblyAI Universal supports 6 languages in real-time streaming (English, Spanish, French, German, Italian, Portuguese) and 99 languages when used for batch transcription of pre-recorded files.

¹ Groq diarization runs on-device through a built-in Pyannote pipeline and is available on macOS only. See Diarization.

Cartesia Ink 2 is a streaming-first model tuned for the lowest word error rate and fastest live preview. It is English-only (more languages coming) and does not support diarization. Custom vocabulary is applied as post-processing rather than natively. It is recommended for real-time dictation.

Soniox runs its v5 models — real-time streaming for live dictation and an async model for file transcription — covering 60+ languages with low latency. Soniox does not provide speaker diarization; use a diarization-capable model (or on-device diarization on macOS) for meetings where you need to identify who said what.

Setting Up Cloud Models

Go to Settings > Models
Select a cloud model from the list
Enter your API key when prompted
The model is ready to use immediately

Pro tip: Groq is the most popular choice among Vowen users. It’s fast, accurate, and has a generous free tier that covers most daily use.

GPU Acceleration (Windows)

If you have an NVIDIA GPU, you can dramatically speed up local model transcription:

Go to Settings > Models
Scroll down to find “GPU Acceleration”
Download the CUDA acceleration module
Restart Vowen (or your system if needed)

With GPU acceleration, even the Large v3 model responds in 1-2 seconds on modern NVIDIA GPUs.

Choosing Between Local and Cloud

Factor	Local	Cloud
Privacy	Data never leaves device	Audio sent to provider
Speed (macOS)	Fast for small/medium models	Fast always
Speed (Windows)	Slow without GPU	Fast always
Accuracy	Good to excellent	Excellent
Internet	Not required	Required
Cost	Free	Free tier or paid API
Languages	99 (Whisper) / 25 (Parakeet)	Varies by provider

​Quick Recommendations

​Local Models (Offline)

​Whisper Models

​Parakeet TDT 0.6B

​Cloud Models

​Available Cloud Models

​Setting Up Cloud Models

​GPU Acceleration (Windows)

​Choosing Between Local and Cloud

Quick Recommendations

Local Models (Offline)

Whisper Models

Parakeet TDT 0.6B

Cloud Models

Available Cloud Models

Setting Up Cloud Models

GPU Acceleration (Windows)

Choosing Between Local and Cloud