Reviews SuperWhisper

SuperWhisper Review

Local-first dictation with cloud AI modes

macOS Windows Local + Cloud

Version 1.4.0 · Tested

SuperWhisper is a voice dictation app for macOS and Windows offering both local (offline) and cloud AI transcription. Priced at $9/month or $299 lifetime. This independent review covers WER/CER accuracy across 6 test recordings, a privacy analysis, and a UX verdict.

SuperWhisper Verdict

6.6
out of 10
Accuracy
Speed

How we score →

Powerful local engine buried under a broken cloud flagship

SuperWhisper version 1.4.0 scores 6.6/10 overall in VoiceTools independent testing (tested 2026-05-30). Best local model (Whisper Standard) achieves 2.4% aggregate WER across 6 recordings. Best cloud model (Ultra) achieves 2.6% aggregate WER.

Works well for

  • Whisper Standard (hidden): best-in-class local accuracy at 2.4% WER
  • Genuine offline mode — no audio leaves device in local configuration
  • Lifetime license option at $299 — no subscription required

Watch out for

  • S1-Voice (cloud flagship default) shows 15-37% WER across test recordings
  • Trial is 15 minutes total — blocks even offline models after limit
  • Cloud mode sends app name, clipboard, and focused text to Modal.com beyond audio

Best for

  • Power users willing to dig past default settings and switch to Whisper Standard

Not for

  • Anyone who installs and expects the default model to work well

SuperWhisper Accuracy & Speed

Engine
Setup
Model Accuracy Speed
English Local
Parakeet Default
1.1 GB CPU Tested on CPU Ryzen AI 9 HX · 32 GB RAM

NVIDIA Parakeet TDT 0.6B — default local model in SuperWhisper 1.4. Fast, good on clean speech, but has no ITN: numbers and dates come out as spoken words.

Default local model — users see this first

12.4% WER
Word Error Rate What % of words the model got wrong. 0% = every word correct.
8.1% CER Character Error Rate Same as WER but measured letter-by-letter. Usually lower than WER. 22% PER Punctuation Error Rate How accurately the model placed commas, periods, and other punctuation.
4 / 10
~3s
2–5s range
Post-stop latency Seconds from pressing Stop to the final text appearing in your active app. Average across all test recordings.
8 / 10
Whisper Standard Best accuracy hidden in UI
500 MB CPU Tested on CPU Ryzen AI 9 HX · 32 GB RAM

OpenAI Whisper large-v2 running locally. Best accuracy of all tested models but hidden from the main model picker — requires Library search to find.

Hidden — Settings > Library > search "Whisper Standard"

2.4% WER
Word Error Rate What % of words the model got wrong. 0% = every word correct.
1.6% CER Character Error Rate Same as WER but measured letter-by-letter. Usually lower than WER. 18% PER Punctuation Error Rate How accurately the model placed commas, periods, and other punctuation.
8 / 10
~8s
6–12s range
Post-stop latency Seconds from pressing Stop to the final text appearing in your active app. Average across all test recordings.
5 / 10
Cloud
S1-Voice Default

SuperWhisper's proprietary cloud model, presented as the headline AI feature. Applies aggressive rewriting that causes large content losses on some recordings.

Default cloud model — users land here without changing settings

22.4% WER
Word Error Rate What % of words the model got wrong. 0% = every word correct.
14.2% CER Character Error Rate Same as WER but measured letter-by-letter. Usually lower than WER. 38% PER Punctuation Error Rate How accurately the model placed commas, periods, and other punctuation.
3 / 10
~2s
1–4s range
Post-stop latency Seconds from pressing Stop to the final text appearing in your active app. Average across all test recordings.
9 / 10
Ultra Best Cloud

Cloud Whisper-class model offered as "Ultra" tier. More conservative post-processing than S1-Voice — accurate and stable across all recording types.

Better than S1-Voice on every recording — less prominently surfaced in UI

2.6% WER
Word Error Rate What % of words the model got wrong. 0% = every word correct.
1.8% CER Character Error Rate Same as WER but measured letter-by-letter. Usually lower than WER. 20% PER Punctuation Error Rate How accurately the model placed commas, periods, and other punctuation.
8 / 10
~3s
2–5s range
Post-stop latency Seconds from pressing Stop to the final text appearing in your active app. Average across all test recordings.
8 / 10
Tested on: Windows 11 26H2 · AMD Ryzen AI 9 HX 370 · 32 GB RAM · NVIDIA RTX 5070 Laptop 8 GB Identical audio fixtures across 6 recordings
SuperWhisper for Coding & IT Recommended: Whisper Standard Ultra
Ultra Cloud
Coding 7.4% 17 err / 230w
Conference 0.5% 1 err / 220w

Coding

  • snake_case identifiers mostly preserved
  • CLI flags correct
  • No hallucinations
  • "Tauri" → "Tory"
  • "tokio::runtime" → "Toko runtime"

Conference

  • Best on accented English speaker — 0.45% WER
  • "Kubernetes", "PostgreSQL", "microservices" exact
  • Zero dropped sentences
  • "load balancer" → "load balance" (once)
Whisper Standard Local
Coding 7.3% 17 err / 233w
Conference 0.9% 2 err / 220w

Coding

  • snake_case identifiers intact
  • CLI flags like "--release" correct
  • No hallucinations
  • "Tauri" → "Tarry"
  • "axum" → "axm"

Conference

  • Strong on accented English
  • "PostgreSQL" and "Kubernetes" exact
  • Only 1 minor substitution across 89s
  • "schema" → "schema" (case flip once)
Parakeet Local
Coding 16.7% 39 err / 234w
Conference 1.8% 4 err / 222w

Coding

  • Handles prose segments cleanly
  • No hallucinations
  • "cargo.toml" → "Cargo .toml"
  • "tokio::spawn" → "tokio spawn"
  • "impl Trait" → "imp Trait"

Conference

  • Clean transcription of accented English speaker
  • Technical terms like "API" and "SDK" correct
  • "Kubernetes" → "Cubernetes"
S1-Voice Cloud
Coding 35.9% 85 err / 232w
Conference 24.6% 56 err / 220w

Coding

  • Faster response than local models (~2s)
  • "impl AsyncRead" → "RUMP_INSTAL"
  • Dropped entire code block (lines 14–17)
  • "CloudFace" hallucinated (not in source)

Conference

  • Fast turnaround even on longer clip
  • "distributed systems" → "destructive systems"
  • Dropped 3 full sentences mid-recording
  • Non-deterministic: 37% WER run 1 vs 16% run 2 — same audio
SuperWhisper for Everyday & Long-form Recommended: Whisper Standard Ultra
Whisper Standard Local
Casual 0.0% 2 err / 190w
Long-form 2.3% 15 err / 525w

Casual

  • Perfect: zero word errors
  • Disfluencies handled cleanly

Long-form

  • Best long-form accuracy tested — no drift over 3:42
  • Zero hallucinations across full recording
  • "whisper.cpp" partially garbled near end
  • Occasional sentence-boundary missed
Ultra Cloud
Casual 0.0% 0 err / 190w
Long-form 2.5% 14 err / 521w

Casual

  • Perfect: zero errors on casual speech
  • Natural disfluency handling

Long-form

  • No drift — consistent quality start to finish
  • Zero hallucinations across 3:42
  • App name spacing inconsistent once
  • "whisper.cpp" spacing once
Parakeet Local
Casual 0.0% 0 err / 197w
Long-form 3.6% 19 err / 528w

Casual

  • Perfect: zero word errors on casual speech
  • Disfluencies (um, uh) preserved naturally

Long-form

  • No quality drift over 3+ minutes
  • Consistent pace throughout — no mid-recording degradation
  • "whisper.cpp" split across sentences once
  • Occasional dropped filler mid-paragraph
S1-Voice Cloud
Casual 37.0% 71 err / 192w
Long-form 14.4% 80 err / 520w

Casual

  • Lost 2 of 3 sections — only opening paragraph survived
  • Heavy rewriting distorts meaning of what remains

Long-form

  • Manages to complete the full recording without timeout
  • Large sections rewritten — 14% WER from aggressive post-processing
  • Brand names and app identifiers mangled or dropped in later paragraphs
SuperWhisper for Numbers & Structured Data Recommended: Whisper Standard S1-Voice
Whisper Standard Local
Numbers/ITN 2.5% 3 err / 40w

Numbers/ITN

  • Dates and currency nearly exact
  • "$12,400.75" and phone number correct
  • "March 15th, 2026" → "March 15, 2026" (minor format)
S1-Voice Cloud
Numbers/ITN 7.3% 3 err / 41w

Numbers/ITN

  • "$12,400.75" exact
  • Phone number format correct
  • "March 15th, 2026" → "March 15, 2026"
  • "ABC-123456" → "ABC 123456" (hyphen dropped)
Ultra Cloud
Numbers/ITN 10.0% 2 err / 39w

Numbers/ITN

  • Phone number and date format correct
  • "$12,400.75" → "$12400.75" (comma dropped)
  • "Order ID" label partially dropped
Parakeet Local
Numbers/ITN 86.0% 37 err / 43w

Numbers/ITN

  • No ITN — numbers output as spoken words throughout
  • "$12,400.75" → "twelve thousand four hundred dollars and seventy five cents"
  • Phone number and order ID completely garbled
SuperWhisper: Noise Resistance Recommended: Parakeet Ultra
Parakeet Local
Noisy Cafe 0.0% 0 err / 195w

Noisy Cafe

  • Noise has zero effect — identical output to clean version
  • Café background at SNR 5 dB not detected
Whisper Standard Local
Noisy Cafe 0.0% 0 err / 190w

Noisy Cafe

  • Noise has zero effect — identical output to clean version
Ultra Cloud
Noisy Cafe 1.1% 4 err / 190w

Noisy Cafe

  • Near-perfect under café noise — only 1 minor substitution
  • "in-between" split once
S1-Voice Cloud
Noisy Cafe 4.8% 7 err / 188w

Noisy Cafe

  • Handles café noise better than casual clean — rewriting helps here
  • "in-between" → "in between"
  • Some filler words not stripped
Results by recording (6 recordings × 4 models)
Recording Local Cloud
Parakeet Default Whisper Standard Best accuracy S1-Voice Default Ultra Best Cloud
Coding 117s · tech identifiers 16.67% 7.30% 35.93% 7.39%
Casual 82s · everyday speech 0.00% 0.00% 36.98% 0.00%
Noisy Cafe 82s · +5dB background 0.00% 0.00% 4.79% 1.05%
Conference 89s · accented speech 1.80% 0.91% 24.55% 0.45%
Long-form 3:42 · drift & drift 3.60% 2.29% 14.45% 2.50%
Numbers/ITN 35s · dates, currency, IDs 86.05% 2.50% 7.32% 10.00%
Aggregate WER weighted by word count 12.4% 2.4% 22.4% 2.6%

SuperWhisper UX & Integration

Getting started & flow

Onboarding flow

Reached first successful dictation in about a minute — nothing superfluous.

5/5
Hotkey customization

Default shortcut is comfortable and remappable, no system conflicts — but the push-to-talk option does not actually work.

3/5
Error messages

Shows a center-screen message when the trial runs out, but there is no fallback — and settings navigation is scattered across sections.

3/5

Recording experience

Recording overlay UX

Clear recording pill / overlay — recording state is obvious.

5/5
Stop / cancel UX

Easy to cancel a bad dictation; cancel hotkey included.

5/5
Text insertion reliability

Pastes reliably into every app tested.

5/5
Auto-insert vs clipboard

Auto-inserts the text and can restore your previous clipboard afterwards.

5/5

Managing your work

Recording history

Browsable history with search; you can open a recording to see its mode, duration and even the prompt used. No export.

4/5
Mode / model switching

Fast switching by hotkey and from the pill UI.

5/5
Idle resource use

~160 MB RAM · 0.3% CPU at rest (cloud).

2/5

SuperWhisper Features

Text processing

AI post-processing

Cloud AI modes rewrite text — many models, BYOK for several providers. But S1-Voice over-rewrites and drops content.

Custom vocabulary / dictionary

Per-word replacements applied at transcription.

Text snippets / expansion

Bundled into the custom-dictionary feature, not a separate snippets UI — and also doable via LLM post-processing instructions.

Output & extras

File transcription

Hidden behind the tray icon. Broken on LLM modes (returns a stale buffer); only works on the Voice mode, and the UX is so confusing it barely counts.

Export (txt / srt / json)

No txt / srt / json export, and history cannot be bulk-exported.

Music auto-mute

Pause, lower, or fully mute media while recording.

Voice commands

Local recognition

Offline / local inference

Genuine offline mode — no audio leaves device in local configuration.

Multiple model options

Parakeet, Whisper Standard (hidden), S1-Voice, Ultra — but best local model is buried.

SuperWhisper Privacy

SuperWhisper keeps audio on-device when using local models. Cloud models upload audio to modal.com only after you press Stop.

Audio uploaded in cloud mode only

Endpoints: modal.com, api.superwhisper.com

Audio sent only after you press Stop

Nothing is uploaded until you confirm by pressing Stop. Cancel before then and the audio never leaves.

Sends more than audio

In cloud mode: active app name, focused element text, clipboard contents, computer name, locale, timezone

Opt out of training on your data

Your recordings are not used to train models.

Disable analytics & tracking

You can turn off product analytics and telemetry.

Turn off history storage

You can set the app to never store your transcription history.

From the privacy policy not scored

  • Privacy policy guarantees data is never used to train AI models and is not retained on SuperWhisper servers — all storage is local.
  • States it collects no usage data and uses no cookies or tracking technologies.
  • Note: the observed cloud mode still sends app context and clipboard to Modal.com — stronger than the policy implies, so cloud users should not assume "local-only".

Pricing

Monthly $9/mo
Lifetime $299 one-time
Trial limit 15 minutes total (all models)
Account required Optional

Methodology

Accuracy scores use WER (Word Error Rate) computed against multi-reference ground truth with {a|b} alternates for valid transcription variants (e.g. 48% and forty-eight percent are both accepted). Audio delivered via virtual cable from ElevenLabs TTS. Single test session on 2026-05-30.

Read the full methodology →

Limitations of this test
  • TTS source, not human voice — real-world WER will be higher
  • Single session, no variance measurement across multiple runs
  • Punctuation (PER) not shown in this table — see raw data
  • Numbers WER may be overstated for apps that apply ITN (converting spoken to digit form)