Skip to main content

Documentation Index

Fetch the complete documentation index at: https://vowen.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Voice command flow: speech to AI to action

What is Command Mode?

Command Mode lets you speak instructions to an AI model instead of transcribing your words verbatim. It handles two distinct kinds of work:
  • Text transformations on selected text. Summarize, rewrite, translate, fix grammar, adjust tone, draft replies, format as lists, and more.
  • Real actions on files and your system. Convert images, audio, video, and config files. Merge PDFs. Compress images. Set timers. Extract color palettes from your screen. Open macOS settings panels.
Speak naturally and Vowen routes your command to the right action. Default shortcut: Alt + Shift on both macOS and Windows. You can change this in Settings > Shortcuts, assign additional shortcuts, or use a mouse button.

How It Works

1

Hold the Command Mode shortcut

Press and hold Alt + Shift (or your custom shortcut).
2

Speak your instruction

Tell the AI what you want done. Be specific and natural.
3

Release

Let go of the shortcut. The AI processes your command, and either the result is pasted into your focused app or an action runs (like converting a file or merging PDFs).

Context Awareness

Command Mode automatically pulls context from several places so the AI knows what you’re talking about without you having to spell it out.

Selected Text

If you have text selected on screen when you activate Command Mode, that selection is included as context. Phrases like “this”, “it”, “the text” automatically refer to the selected text. This is the most common way to give the AI something to operate on. Example workflow:
  1. Select a paragraph of text in any app
  2. Hold Alt + Shift
  3. Say “Make this more concise”
  4. Release. The AI rewrites the selected text and replaces it.

Selected Files

If you have one or more files selected (in Finder on macOS, or wherever your file picker lives), Command Mode passes the filenames to the AI. This is what unlocks the file-action tools below (compress this image, merge these PDFs, convert to webp, and so on).

Screen Context

Command Mode can capture a screenshot and send it to the AI for visual context. This requires two things:
  1. Screen Recording permission granted to Vowen (macOS only)
  2. Settings > Recording > Include screenshot toggled on
When both are active, you can ask things like “What’s the error message on screen?” or “Draft a reply to this email.”

Memory

Your saved Memory notes are automatically included as context. The AI knows your preferences, writing style, and frequently-referenced information. Example: If you have a “My Writing Style” note in Memory, say “Rewrite this in my style” and the AI will reference it.

Other automatic context

Command Mode also passes the following to the AI on every call:
  • Vocabulary entries (top 10) from your Dictionary, so the AI recognizes domain-specific terms
  • Active app name and the matching Tone directive (if Tones is enabled), so output adapts to the app you’re working in
  • Language code from your transcription settings, so the AI responds in your language

Actions

Beyond text manipulation, Command Mode can trigger real actions on files and the system. Speak any of these naturally and Vowen will execute the matching tool.

Image conversion

Convert images between formats with a single voice command. Select one or more images in Finder, speak the target format, and Vowen drops the converted files next to the originals. Supports PNG, JPG, WEBP, AVIF, GIF, TIFF, HEIC, HEIF, and JFIF.
“Convert this image from JPG to PNG”

Config file conversion

Switch a config file between JSON, YAML, TOML, and XML with one voice command. Select the source file in Finder, speak the target format, and Vowen writes the converted file alongside the original. Conversion is bidirectional across all four formats.
“Convert this file from JSON to YAML”

Audio conversion

Convert audio files between formats with a voice command. Vowen supports MP3, WAV, AAC, FLAC, OGG, M4A, and WMA. It can also extract the audio track from any video file if you select one.
“Convert this audio from WAV to MP3”

Video conversion

Convert video files between formats with a voice command. Vowen supports MP4, AVI, MOV, MKV, WEBM, FLV, and WMV. Useful for switching macOS screen recordings (.mov) into the universally compatible .mp4.
“Convert this video from MOV to MP4”

Merge PDFs

Combine multiple PDF files into a single document with a voice command. Select two or more PDFs in Finder in the order you want them combined, ask Vowen to merge, and a new PDF appears alongside the originals.
“Merge these PDFs”

Image compression

Shrink image file sizes without changing format. Select an image in Finder, name a target quality (or just ask to compress), and Vowen writes a smaller version next to the original. Supports JPG, PNG, WEBP, AVIF, HEIC, GIF, and TIFF.
“Compress this image to 70 percent”

Timer

macOS only
Set a countdown timer with a voice command. A floating timer window appears on your screen and beeps when the countdown finishes. Accepts seconds, minutes, or hours up to 24 hours.
“Set a 10 minute timer”

Translation

Translate selected text into any language with a voice command. Highlight the text, speak the target language, and Vowen replaces the selection with the translation. The source language is auto-detected, so you don’t need to specify it.
“Translate this to Spanish”

Open macOS settings

macOS only
Jump straight to a specific macOS System Settings panel with a voice command. Useful for getting to microphone, accessibility, sound, displays, network, and other settings without clicking through menus.
“Open microphone settings”

Color palette

macOS only
Capture the colors on your screen with a voice command. Vowen takes a screenshot, extracts the dominant colors, and shows them as swatches with hex codes in a popup. Useful for sampling palettes from websites, design tools, or any visual reference.
“Get the color palette on the screen”

Text-only commands

For pure text transformations (no files or actions), select text in any app and speak the instruction. The AI rewrites the selection in place.
Selected text
Just wrapped up the Q3 review. We’re tracking well on the product launch, engineering fixed the major bugs from last sprint, marketing is preparing the press kit for next month, and support tickets are down 20%. Sarah will lead the launch coordination going forward.
Summarize
”Summarize this”
Result
Q3 review highlights:
• Product launch on track
• Engineering bugs resolved
• Marketing preparing press kit
• Support tickets down 20%
• Sarah leads launch coordination
Selected text
i should of sent this email yesterday but i forgot, your absolutely right we need to fix this asap, me and the team is on it.
Fix grammar
”Fix the grammar”
Result
I should have sent this email yesterday but I forgot. You’re absolutely right, we need to fix this ASAP. The team and I are on it.
Selected text
hey can u look at the PR I sent? lmk what u think, no rush but ideally before EOD
Make formal
”Make this more formal”
Result
Hi, could you please review the pull request I sent? Let me know your thoughts. There is no immediate rush, but ideally before end of day.
Selected text
for next week we need to confirm the venue book the catering send the invites finalize the agenda and prepare the slides
Format as list
”Turn these into bullet points”
Result
For next week:
• Confirm the venue
• Book the catering
• Send the invites
• Finalize the agenda
• Prepare the slides
Selected text
Hi, I noticed the report you sent was missing the section on customer retention. Could you add it and resend? Thanks, Jamie
Draft a reply
”Write a reply to this”
Result
Hi Jamie,

Apologies for the oversight. I’ll add the customer retention section and send the updated report by end of day.

Thanks for catching that!

Multilingual support

Command Mode works in every language Vowen’s transcription supports. Speak a command in Spanish, Hindi, French, Japanese — Command Mode understands the instruction and runs the task in that language. File conversions, timers, translations, formatting, drafting replies — all of it works regardless of the language you spoke. For best results on non-English commands:
  • Pair a multilingual speech model (Whisper Large v3, Groq Whisper, or any of the cloud providers) with a capable multilingual AI model (GPT-4o, Claude Sonnet, Gemini Pro)
  • Smaller open-weight models can struggle to follow instructions in low-resource languages
  • Translation commands (translate_text tool) auto-detect both source and target languages

Requirements

  • An AI provider must be configured
  • Use a capable instruction-following model. Recommended: GPT-4o, Claude Sonnet, Gemini Pro, or Groq Llama 3.1 8B+ if you want a free option
  • Command Mode is enabled by default. Toggle or rebind in Settings > Shortcuts
Smaller models can produce inconsistent results in Command Mode, especially for tool calls and non-English commands. If commands behave unpredictably, switch to a larger model first.

Set up an AI provider

Connect Groq, OpenAI, Anthropic, Gemini, or any of 10+ supported providers.
Running into issues with Command Mode? See AI & API Issues.