Free plan includes 10 manual file transcriptions. Pro unlocks unlimited transcriptions, speaker diarization, parallel queueing, and PDF/TXT export.
Overview
Beyond real-time voice dictation, Vowen can transcribe pre-recorded audio and video files. Use this for interviews, podcasts, recorded meetings, or any media file.How to Transcribe a File
Open the Transcribe dialog
Click the + Transcribe button at the top right of the Vowen window, or open Transcribe from the sidebar and click the same button there. A dialog opens with the upload options.
Pick a transcription model
Choose any of your configured local or cloud models from the Transcription Model dropdown. This selection is per-file, so you can run a single file through a higher-accuracy model without changing your global default.
Set language and speaker options
Pick a language from the Language dropdown, or leave it on Auto-detect. If your selected model supports speaker diarization, toggle Identify Speakers to label each speaker in the output (Pro feature).
Supported Formats
Audio:mp3, wav, m4a, aac, ogg, flac, wma, opus
Video: mp4, mov, avi, mkv, flv, wmv, webm, mpeg, mpg
For video files, Vowen automatically extracts the audio track before transcription.
Timestamps
For manual transcriptions using Parakeet or Whisper CLI mode, timestamps are included in the output. These appear as time badges marking when each segment was spoken. Most cloud transcription models also produce timestamps; refer to the Models Guide for specifics.File Size and Duration Limits
Vowen handles large files automatically. Compression and chunking happen behind the scenes based on the provider you pick:| Provider | Threshold | What Vowen does |
|---|---|---|
| Groq Whisper | 24 MB | Compresses WAV to MP3 at 96 kbps, then splits into chunks of about 30 minutes |
| ElevenLabs Scribe v2 | 50 MB | Splits into 20-minute chunks and merges the results |
| Mistral Voxtral | 50 MB or 27 minutes | Splits into 27-minute chunks |
| Sarvam Saaras v3 | 30 seconds (hard provider limit) | Splits into 28-second chunks |
| Deepgram, AssemblyAI, Soniox, Speechmatics, xAI Aurora | No client-side limit | Audio is sent as-is; provider-side limits apply |
Regenerating Transcriptions
When you open a completed transcription, the detail page shows a Regenerate Transcript panel in the sidebar with every model you have configured, grouped by Local and Cloud. Local models include Parakeet; cloud options can include Groq, Soniox, Deepgram, Mistral, AssemblyAI, Sarvam AI, ElevenLabs, Speechmatics, xAI, and any other provider you have connected.- Open the transcription detail page
- Pick any model from the panel
- For models that support speaker diarization, toggle Identify Speakers to label each speaker in the new transcript
- Click Regenerate Transcript
Export Options
From the transcription detail page or the export modal, you can save the transcript in several formats:| Format | Extension | Plan | Best For |
|---|---|---|---|
| WebVTT | .vtt | Free | Web video captions |
| SubRip | .srt | Free | Standard video editor subtitles |
| JSON | .json | Free | Programmatic post-processing |
| Plain Text | .txt | Pro | Notes, copy-paste, archival |
.pdf | Pro | Shareable formatted document |
Running multiple manual transcriptions in parallel (Queue Transcriptions) and toggling Identify Speakers both require Pro.
WebVTT and SubRip exports require a model that produces fine-grained timestamps. Parakeet’s batch output does not include the granularity these subtitle formats need, so VTT and SRT are disabled when Parakeet is the active model. Switch to a different model in the export modal to enable them, or use Plain Text export.
Editing a Transcript
Open any completed transcription (or meeting note) to edit it segment by segment in the transcript editor:- Edit text — click into a segment and type. Undo and redo with
Cmd/Ctrl+ZandCmd/Ctrl+Shift+Z - Split a segment — open the segment’s actions menu (the
⋯button), place your cursor where you want the cut, and choose Split at caret. The split snaps to the nearest word boundary so it never breaks mid-word, and timestamps are interpolated to the cut point - Delete a segment — choose Delete segment from the same menu
- Reassign a line to a speaker — on diarized transcripts, the menu also lets you move a single line to a different speaker (or a new one). To fold two speakers together entirely, use Merge Speakers in the Speakers sidebar
Finding Text in a Transcript
PressCmd+F (macOS) or Ctrl+F (Windows) inside a transcript to open the find bar. It does a case-insensitive search across the whole transcript, highlights every match, and shows a current/total counter. Press Enter for the next match and Shift+Enter for the previous one (navigation wraps around); press Esc to close.
Playing Back the Audio
When a transcription has a saved audio file, a waveform player appears at the top of the detail page:- Play/Pause, plus Back 10s and Forward 10s skip buttons
- Scrub by clicking or dragging anywhere on the waveform to seek
- Click a segment’s timestamp to jump the audio to that point and start playing
- As audio plays, the current segment is highlighted and scrolled into view automatically (paused while you’re editing)