Speaker diarization is a Pro feature, available in Meeting Notes and Manual Transcriptions with supported cloud models.
What is Diarization?
Speaker diarization identifies and labels different speakers in a recording. Instead of a single block of text, your transcript shows which person said each part.Supported Models
Diarization is available with these cloud transcription models:| Model | Provider |
|---|---|
| Nova 2 / Nova 3 | Deepgram |
| Scribe v2 | ElevenLabs |
| Universal | AssemblyAI |
| Voxtral Mini | Mistral |
| STT Realtime | Soniox |
| Aurora | xAI |
| Speechmatics | Speechmatics |
On-Device Diarization (macOS)
On macOS, diarization also works on-device, without a diarization-capable cloud provider:| Model | Provider |
|---|---|
| Whisper (all sizes) | Local |
| Parakeet | Local |
| Whisper Large v3 / Large v3 Turbo | Groq |
On-device diarization (local Whisper/Parakeet and Groq) is macOS only. On Windows, diarization requires one of the diarization-capable cloud models in the table above. See Transcription Models for the full model list.
Enabling Diarization
- Select a cloud model that supports diarization (see table above)
- When starting Meeting Notes, you’ll see a diarization toggle
- Enable it before starting the recording
Mapping Speaker Names
After transcription, speakers are labeled generically (Speaker 1, Speaker 2, etc.). You can map these to real names:- Open the completed meeting note
- In the transcript section, look for the speaker mapping panel in the sidebar
- Assign names to each speaker label
- The transcript updates to show the real names
Merging Speakers
Sometimes one real person gets split across two labels (for example,Speaker 2 and Speaker 4 are the same person). You can collapse them into one:
- Open the completed transcript or meeting note
- In the Speakers sidebar, click the menu on the speaker you want to fold away
- Choose Merge into and pick the speaker to keep
- Confirm. Every line from the source speaker is reassigned to the target
Tips for Better Speaker Separation
- Use a good microphone: clear audio helps the model distinguish speakers
- Avoid talking over each other: overlapping speech is harder to separate
- Vendors that ship diarization as a first-class feature (Deepgram, AssemblyAI, Speechmatics) tend to handle hard cases better than ones where it’s bolted on
- Longer meetings give the model more data to distinguish speakers accurately