> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vowen.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Manual Transcription

> Transcribe audio and video files with Vowen.

<div style={{ marginTop: "-2.5rem" }}>
  <Info>Free plan includes 10 manual file transcriptions. <a href="https://vowen.ai" className="font-semibold underline-offset-2" style={{ color: "#8b5cf6", backgroundColor: "rgba(139, 92, 246, 0.12)", border: "1px solid rgba(139, 92, 246, 0.35)", padding: "2px 8px", borderRadius: "6px", fontSize: "0.85em", textDecoration: "none", whiteSpace: "nowrap" }}>Pro</a> unlocks unlimited transcriptions, speaker diarization, parallel queueing, and export (PDF, text, Markdown, subtitles).</Info>
</div>

## Overview

Beyond real-time voice dictation, Vowen can transcribe pre-recorded audio and video files. Use this for interviews, podcasts, recorded meetings, or any media file.

## How to Transcribe a File

<Steps>
  <Step title="Open the Transcribe dialog">
    Click the **+ Transcribe** button at the top right of the Vowen window, or open **Transcribe** from the sidebar and click the same button there. A dialog opens with the upload options.
  </Step>

  <Step title="Pick a transcription model">
    Choose any of your configured local or cloud models from the **Transcription Model** dropdown. This selection is per-file, so you can run a single file through a higher-accuracy model without changing your global default.
  </Step>

  <Step title="Set language and speaker options">
    Pick a language from the **Language** dropdown, or leave it on **Auto-detect**. If your selected model supports speaker diarization, toggle **Identify Speakers** to label each speaker in the output ([Pro feature](/meeting-notes/diarization)). Leave **Add Timestamps** on (the default) to include clickable per-segment timestamps, or turn it off for a clean, timestamp-free transcript. (This toggle is hidden when Identify Speakers is on, since diarized output carries its own segment structure.)
  </Step>

  <Step title="Add your file">
    Drag and drop an audio or video file into the drop zone, or click to browse.
  </Step>

  <Step title="Click Transcribe">
    Vowen handles compression and chunking automatically if needed. The transcription appears with timestamps shown as `[MM:SS]` badges where the model supports them. You can edit, copy, regenerate, or export the result.
  </Step>
</Steps>

## Supported Formats

**Audio:** `mp3`, `wav`, `m4a`, `aac`, `ogg`, `flac`, `wma`, `opus`

**Video:** `mp4`, `mov`, `avi`, `mkv`, `flv`, `wmv`, `webm`, `mpeg`, `mpg`

For video files, Vowen automatically extracts the audio track before transcription.

## Timestamps

For manual transcriptions using Parakeet or Whisper CLI mode, timestamps are included in the output. These appear as time badges marking when each segment was spoken. Most cloud transcription models also produce timestamps; refer to the [Models Guide](/transcription/models) for specifics.

## File Size and Duration Limits

Vowen handles large files automatically. Compression and chunking happen behind the scenes based on the provider you pick:

| Provider                                               | Threshold                        | What Vowen does                                                               |
| ------------------------------------------------------ | -------------------------------- | ----------------------------------------------------------------------------- |
| Groq Whisper                                           | 24 MB                            | Compresses WAV to MP3 at 96 kbps, then splits into chunks of about 30 minutes |
| ElevenLabs Scribe v2                                   | 50 MB                            | Splits into 20-minute chunks and merges the results                           |
| Mistral Voxtral                                        | 50 MB or 27 minutes              | Splits into 27-minute chunks                                                  |
| Sarvam Saaras v3                                       | 30 seconds (hard provider limit) | Splits into 28-second chunks                                                  |
| Deepgram, AssemblyAI, Soniox, Speechmatics, xAI Aurora | No client-side limit             | Audio is sent as-is; provider-side limits apply                               |

You never need to split files manually. The five providers in the last row enforce their own limits at the API level; the others have explicit handling in Vowen.

## Regenerating Transcriptions

When you open a completed transcription, the detail page shows a **Regenerate Transcript** panel in the sidebar with every model you have configured, grouped by **Local** and **Cloud**. Local models include Parakeet; cloud options can include Groq, Soniox, Deepgram, Mistral, AssemblyAI, Sarvam AI, ElevenLabs, Speechmatics, xAI, and any other provider you have connected.

1. Open the transcription detail page
2. Pick any model from the panel
3. For models that support speaker diarization, toggle **Identify Speakers** to label each speaker in the new transcript
4. Click **Regenerate Transcript**

The original transcript is preserved as a version, so regeneration produces a new one without overwriting the first. This is useful for comparing how different models handle the same audio, or for running a higher-accuracy pass after a quick first run.

## Export Options

Open a transcription and click **Export** to save it outside Vowen. The export dialog shows a live preview and lets you pick a format:

| Format     | Extension | Best For                        |
| ---------- | --------- | ------------------------------- |
| PDF        | `.pdf`    | Shareable formatted document    |
| Plain Text | `.txt`    | Notes, copy-paste, archival     |
| Markdown   | `.md`     | Docs, wikis, note apps          |
| SubRip     | `.srt`    | Standard video editor subtitles |
| WebVTT     | `.vtt`    | Web video captions              |

Toggle **Include title & date** to add a header to document exports. Speaker labels and timestamps are included automatically when the transcript has them.

<Note>Exporting requires Pro, as does running multiple manual transcriptions in parallel ([Queue Transcriptions](/pricing)) and toggling Identify Speakers.</Note>

<Note>
  Subtitle formats (SRT and VTT) only appear when the transcript has per-segment timing. Captions are timed straight from those segments — one cue per spoken segment, with speaker names when available. A transcript with no segment timing (for example, some batch output) won't offer subtitles; use PDF, Text, or Markdown instead.
</Note>

## Renaming a Transcription

A transcription's title defaults to the audio file's name, but you can rename it. Open the transcription and edit the title field at the top of the detail page — changes save automatically and the new title shows up in your list.

## Organizing with Tags

Manual transcriptions have their own tag system (separate from meeting-note tags):

* Add tags to any transcription card with **add tag** — create a new colored tag or reuse an existing one
* Use the **tag filter bar** above the list to filter by tags, with a **Match: Any / All** toggle when two or more are selected, plus sort options (Newest, Oldest, Recently tagged)
* Use the **search box** to find a transcription by name

## Multi-Select & Bulk Actions

Select multiple transcriptions to act on them together. Hover a finished card to reveal its checkbox (or use **Select all**); a bulk-action bar appears at the bottom with:

* **Add tags** — apply or remove tags across every selected transcription at once
* **Delete** — remove the selected transcriptions (deletion requires Pro)

Only finished transcriptions can be selected — items still processing don't get a checkbox.

## Editing a Transcript

Open any completed transcription (or meeting note) to edit it segment by segment in the transcript editor:

* **Edit text** — click into a segment and type. Undo and redo with `Cmd/Ctrl+Z` and `Cmd/Ctrl+Shift+Z`
* **Split a segment** — open the segment's actions menu (the `⋯` button), place your cursor where you want the cut, and choose **Split at caret**. The split snaps to the nearest word boundary so it never breaks mid-word, and timestamps are interpolated to the cut point
* **Delete a segment** — choose **Delete segment** from the same menu
* **Reassign a line to a speaker** — on diarized transcripts, the menu also lets you move a single line to a different speaker (or a new one). To fold two speakers together entirely, use [Merge Speakers](/meeting-notes/diarization#merging-speakers) in the Speakers sidebar

## Finding Text in a Transcript

Press `Cmd+F` (macOS) or `Ctrl+F` (Windows) inside a transcript to open the find bar. It does a case-insensitive search across the whole transcript, highlights every match, and shows a `current/total` counter. Press `Enter` for the next match and `Shift+Enter` for the previous one (navigation wraps around); press `Esc` to close.

## Playing Back the Audio

When a transcription has a saved audio file, a waveform player appears at the top of the detail page:

* **Play/Pause**, plus **Back 10s** and **Forward 10s** skip buttons
* **Scrub** by clicking or dragging anywhere on the waveform to seek
* Click a segment's timestamp to **jump the audio to that point** and start playing
* As audio plays, the current segment is **highlighted and scrolled into view** automatically (paused while you're editing)

## Chat with the Transcript

Once a transcription is complete, you can ask AI questions about its contents from the chat panel: pull out action items, summarize a section, find a quote, and so on. See [Chat with Transcriptions](/features/chat).