Quick Recommendations
| Use Case | Recommended Model | Why |
|---|---|---|
| Quick notes (macOS) | Base.en or Parakeet | Fast, good accuracy, works offline |
| Quick notes (Windows) | Groq Whisper Turbo | Fast cloud model, free tier |
| Professional writing | Large v3 Turbo + AI Enhancement | Best accuracy + polished output |
| Non-English | Large v3 or Groq Large v3 | Best multilingual accuracy |
| Maximum privacy | Any local model | Nothing leaves your device |
| Real-time preview | Parakeet, Deepgram, or Soniox | Shows text as you speak |
| Meetings (diarization) | Deepgram Nova 3 or AssemblyAI | Identifies who said what |
Local Models (Offline)
These run entirely on your machine. No internet required. All local models are downloaded on demand from within the app; nothing is bundled with the installer. The Base model is the default and is offered for download during onboarding.Whisper Models
Based on OpenAI’s Whisper, supporting 99 languages.| Model | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
| Tiny | 78 MB | Fastest | Basic | Quick notes, testing |
| Tiny.en | 78 MB | Fastest | Good (English) | Fast English dictation |
| Base.en | 148 MB | Fast | Good | General English use |
| Base (default) | 148 MB | Fast | Good | Multilingual basics |
| Small | 488 MB | Medium | Great | Professional work |
| Small.en | 488 MB | Medium | Great (English) | Detailed English |
| Medium | 1.5 GB | Slow | Excellent | High-quality output |
| Medium.en | 1.5 GB | Slow | Excellent (English) | Long-form English |
| Large v3 | 3 GB | Slowest | Best | Maximum accuracy |
| Large v3 Turbo | 1.6 GB | Medium-Fast | Excellent | Best balance |
Models with
.en suffix are English-only and slightly more accurate for English than their multilingual counterparts.Parakeet TDT 0.6B
NVIDIA’s streaming-capable model. Supports 25 European languages with auto-detection.| macOS | Windows | |
|---|---|---|
| Format | CoreML | ONNX (int8) |
| Size | ~1 GB | ~478 MB |
| Streaming | Yes | Yes |
| Languages | 25 | 25 |
- Voice activity detection (VAD) now drives streaming, for cleaner utterance boundaries and more reliable real-time transcription
- Auto-translate to English works with Parakeet (Pro)
- On Windows, long recordings are automatically chunked so lengthy audio transcribes reliably
Cloud Models
These send audio to a third-party API. Require an internet connection and API key.Available Cloud Models
With cloud models, your audio is sent to the provider, processed, and the transcription is returned. Some models stream text back as you speak (Real-time: Yes); others return the full transcript once you stop (Real-time: No).| Model | Provider | Languages | Real-time | Diarization | Free Tier |
|---|---|---|---|---|---|
| Whisper Large v3 | Groq | 99 | No | Yes (macOS)¹ | Yes (generous) |
| Whisper Turbo | Groq | 99 | No | Yes (macOS)¹ | Yes (generous) |
| gpt-4o-transcribe | OpenAI | 99 | Yes | Yes | Paid |
| Gemini (Live + Flash) | Many | Yes | Yes | Free API | |
| Nova 2/3 | Deepgram | 99 | Yes | Yes | $200 credit |
| Scribe v2 | ElevenLabs | 99 | Yes | Yes | Limited |
| Universal | AssemblyAI | 6 streaming / 99 batch | Yes | Yes | $50 credit |
| Voxtral Mini | Mistral | 13 | Yes | Yes | Free API |
| Saaras v3 | Sarvam AI | 22+ | Yes | No | Limited |
| Soniox | Soniox | Various | Yes | Yes | Paid |
| Aurora | XAI | Various | Yes | Yes | Limited |
| Speechmatics | Speechmatics | 39 | Yes | Yes | Paid |
OpenAI (
gpt-4o-transcribe) and Google Gemini are now real-time streaming providers, so they can power live transcription preview and Ask AI live during a meeting, not just file transcription. Gemini’s live sessions are capped at 15 minutes each, while its file transcription handles audio up to ~9.5 hours.AssemblyAI Universal supports 6 languages in real-time streaming (English, Spanish, French, German, Italian, Portuguese) and 99 languages when used for batch transcription of pre-recorded files.
¹ Groq diarization runs on-device through a built-in Pyannote pipeline and is available on macOS only. See Diarization.
Setting Up Cloud Models
- Go to Settings > Models
- Select a cloud model from the list
- Enter your API key when prompted
- The model is ready to use immediately
Pro tip: Groq is the most popular choice among Vowen users. It’s fast, accurate, and has a generous free tier that covers most daily use.
GPU Acceleration (Windows)
If you have an NVIDIA GPU, you can dramatically speed up local model transcription:- Go to Settings > Models
- Scroll down to find “GPU Acceleration”
- Download the CUDA acceleration module
- Restart Vowen (or your system if needed)
Choosing Between Local and Cloud
| Factor | Local | Cloud |
|---|---|---|
| Privacy | Data never leaves device | Audio sent to provider |
| Speed (macOS) | Fast for small/medium models | Fast always |
| Speed (Windows) | Slow without GPU | Fast always |
| Accuracy | Good to excellent | Excellent |
| Internet | Not required | Required |
| Cost | Free | Free tier or paid API |
| Languages | 99 (Whisper) / 25 (Parakeet) | Varies by provider |