Skip to main content
Speaker diarization is a Pro feature, available in Meeting Notes and Manual Transcriptions with supported cloud models.

What is Diarization?

Speaker diarization identifies and labels different speakers in a recording. Instead of a single block of text, your transcript shows which person said each part.

Supported Models

Diarization is available with these cloud transcription models:
ModelProvider
Nova 2 / Nova 3Deepgram
Scribe v2ElevenLabs
UniversalAssemblyAI
Voxtral MiniMistral
STT RealtimeSoniox
AuroraxAI
SpeechmaticsSpeechmatics

On-Device Diarization (macOS)

On macOS, diarization also works on-device, without a diarization-capable cloud provider:
ModelProvider
Whisper (all sizes)Local
ParakeetLocal
Whisper Large v3 / Large v3 TurboGroq
These run through a built-in Pyannote speaker pipeline on your machine. When you pick one of them with Identify Speakers on, you can also choose the expected number of speakers to improve accuracy.
On-device diarization (local Whisper/Parakeet and Groq) is macOS only. On Windows, diarization requires one of the diarization-capable cloud models in the table above. See Transcription Models for the full model list.

Enabling Diarization

  1. Select a cloud model that supports diarization (see table above)
  2. When starting Meeting Notes, you’ll see a diarization toggle
  3. Enable it before starting the recording

Mapping Speaker Names

After transcription, speakers are labeled generically (Speaker 1, Speaker 2, etc.). You can map these to real names:
  1. Open the completed meeting note
  2. In the transcript section, look for the speaker mapping panel in the sidebar
  3. Assign names to each speaker label
  4. The transcript updates to show the real names
As you type a name, Vowen suggests names you’ve used before (across your notes and transcriptions) and other speakers in the current note, so recurring attendees are one tap to fill in. Pick a suggestion with the mouse or keyboard (arrows + Enter).

Merging Speakers

Sometimes one real person gets split across two labels (for example, Speaker 2 and Speaker 4 are the same person). You can collapse them into one:
  1. Open the completed transcript or meeting note
  2. In the Speakers sidebar, click the menu on the speaker you want to fold away
  3. Choose Merge into and pick the speaker to keep
  4. Confirm. Every line from the source speaker is reassigned to the target
Merging speakers is destructive and cannot be undone — all lines from the source speaker permanently move to the target. The source speaker’s custom name carries over if the target doesn’t already have one.

Tips for Better Speaker Separation

  • Use a good microphone: clear audio helps the model distinguish speakers
  • Avoid talking over each other: overlapping speech is harder to separate
  • Vendors that ship diarization as a first-class feature (Deepgram, AssemblyAI, Speechmatics) tend to handle hard cases better than ones where it’s bolted on
  • Longer meetings give the model more data to distinguish speakers accurately