Reviews Wispr Flow

Wispr Flow Review

Cloud-only dictation with fast, polished auto-cleanup

macOS Windows iOS Cloud only Free tier

Version 1.5.433 · Tested

Wispr Flow is a voice dictation app for macOS, Windows and iOS with cloud-based AI transcription. Priced at $15/month. This independent review covers WER/CER accuracy across 6 test recordings, a privacy analysis, and a UX verdict.

Wispr Flow Verdict

6.2
out of 10
Accuracy
Speed

How we score →

Fast, clean, effortless — as long as you are online and not asking about privacy

Wispr Flow version 1.5.433 scores 6.2/10 overall in VoiceTools independent testing (tested 2026-05-27). Standard achieves 3.7% aggregate WER across 6 recordings.

Works well for

  • Consistently ~1s result, even on noisy café audio — no model choices to make
  • Auto-cleanup genuinely works: capitalisation, punctuation and ITN land without manual editing
  • Same quality on free and paid — the free tier is not a downgraded model

Watch out for

  • Cloud-only: every recording uploaded (~22.8 MB / 2 min), unusable without good internet
  • No export, no built-in translation, and the text-transformation flow is obscure and hard to discover
  • Onboarding is long and its final mic test is broken — you cannot complete it cleanly

Best for

  • People who want polished English dictation out of the box with zero model-picking — there is even a dedicated vibe-coding mode that handles variable names

Not for

  • Privacy-conscious users and anyone who works offline — there is no local mode at all

Wispr Flow Accuracy & Speed

Model Accuracy Speed
English Cloud
Standard Only model

Wispr Flow's single cloud model. Auto-cleanup (disfluency removal, capitalisation, punctuation, ITN) is always on and tuned by a Light/Medium/High slider — tested on the default Light setting.

No model picker — one cloud model for everyone, free and paid alike

3.7% WER
Word Error Rate What % of words the model got wrong. 0% = every word correct.
2.4% CER Character Error Rate Same as WER but measured letter-by-letter. Usually lower than WER. 24% PER Punctuation Error Rate How accurately the model placed commas, periods, and other punctuation.
7 / 10
~1.5s
1–3s range
Post-stop latency Seconds from pressing Stop to the final text appearing in your active app. Average across all test recordings.
9 / 10
Tested on: Windows 11 26H2 · AMD Ryzen AI 9 HX 370 · 32 GB RAM Identical audio fixtures across 6 recordings
Wispr Flow for Coding & IT Recommended: Standard
Standard Cloud
Coding 7.2% 16 err / 222w
Conference 5.1% 11 err / 215w

Coding

  • Auto-cleanup: punctuation and capitalisation correct
  • No hallucinations or dropped segments
  • "last_seen_at" → "last scene at"
  • "Tauri" → "Atari"
  • "whisper.cpp" → "whisper. cpp" (dot split)

Conference

  • Handles accented speaker reliably
  • Zero dropped sentences
  • ITN active: numbers and dates formatted
  • "Kubernetes" → "Cubernetes"
  • "gRPC" → "GRPC" (casing lost)
Wispr Flow for Everyday & Long-form Recommended: Standard
Standard Cloud
Casual 2.7% 5 err / 186w
Long-form 2.9% 17 err / 518w

Casual

  • Auto-cleanup works: punctuation and caps correct
  • Natural reading — removes "um/uh" cleanly
  • "re-time" → "retime" (hyphen dropped)
  • Minor rewording of closing sentence

Long-form

  • No drift over 3:42 — consistent quality throughout
  • Zero hallucinations, zero dropped sections
  • "Wispr Flow" substituted for app name once
  • One sentence boundary slightly shifted
Wispr Flow for Numbers & Structured Data Recommended: Standard
Standard Cloud
Numbers/ITN 0.0% 0 err / 40w

Numbers/ITN

  • Perfect ITN: "$12,400.75", "1-800-555-0123 ext. 479", "ABC-123456" all exact
  • Date "March 15th, 2026 at 3:30 PM" formatted correctly
Wispr Flow: Noise Resistance Recommended: Standard
Standard Cloud
Noisy Cafe 2.7% 5 err / 186w

Noisy Cafe

  • Identical output to clean version — noise has no effect
  • "re-time" → "retime" (same minor artefact as clean)
Results by recording (6 recordings × 1 models)
Recording Cloud
Standard Only model
Coding 117s · tech identifiers 7.21%
Casual 82s · everyday speech 2.69%
Noisy Cafe 82s · +5dB background 2.69%
Conference 89s · accented speech 5.12%
Long-form 3:42 · drift & drift 2.90%
Numbers/ITN 35s · dates, currency, IDs 0.00%
Aggregate WER weighted by word count 3.7%

Wispr Flow UX & Integration

Getting started & flow

Onboarding flow

Long onboarding whose final built-in mic test is broken — the result never shows, so you cannot finish it cleanly.

0/5
Hotkey customization

Fully customisable, and mouse buttons can be bound as triggers too.

5/5
Error messages

Only seen offline — the no-internet error state is clear.

5/5

Recording experience

Recording overlay UX

The recording pill is clear and well done.

5/5
Stop / cancel UX

Works, but the stop / cancel buttons are small.

4/5
Text insertion reliability

Auto-insert works in every app tested.

5/5
Auto-insert vs clipboard

Always auto-inserts — no toggle. You can add a hotkey to re-insert the last text, but there is no clipboard mode.

2/5

Managing your work

Recording history

A history list exists on the home screen, but there is no search and no export.

2/5
Mode / model switching

A hotkey cycles modes, but it is never clear which mode is currently active.

2/5
Idle resource use

~450 MB RAM · 0.3% CPU at rest (cloud).

1/5

Wispr Flow Features

Text processing

AI post-processing

Cloud LLM rewrite — but you must select already-typed text and trigger a separate "transformation" hotkey; the flow is hard to discover.

Custom vocabulary / dictionary

Per-word auto-replace before insertion.

Text snippets / expansion

Output & extras

File transcription
Export (txt / srt / json)

No txt / srt / json export, and history cannot be bulk-exported.

Music auto-mute
Voice commands

Local recognition

Offline / local inference

Cloud-only. Nothing works without internet.

Multiple model options

A single cloud model. Nothing to pick — which is also part of the appeal.

Wispr Flow Privacy

Wispr Flow streams audio to inference.wisprflow.com on every recording — upload begins while you are still speaking, before you press Stop. Beyond audio: By default collects audio, transcripts and your edits; Privacy Mode is locked behind the paid plan. Also sends Sentry crash data and PostHog product analytics.

Audio uploaded on every recording

Endpoints: inference.wisprflow.com, api.wisprflow.ai, sentry.io, posthog

Audio streamed before you press Stop

Recording is streamed to the server while you talk — if you cancel, it has already left your device.

Sends more than audio

By default collects audio, transcripts and your edits; Privacy Mode is locked behind the paid plan. Also sends Sentry crash data and PostHog product analytics.

Opt out of training on your data — paid plan only

You can opt out of training — the toggle lives on the paid plan, so free / trial recordings may still be used.

Disable analytics & tracking

Analytics and tracking cannot be fully disabled (e.g. Google Analytics, ad attribution).

Turn off history storage

You can set the app to never store your transcription history.

From the privacy policy not scored

  • Privacy policy: third-party LLM data is never used to train those services and is deleted after 30 days.
  • Uses cookies and Google Analytics (opt-out available for analytics) and tracks ad attribution; shares data with advertising partners for tailored ads.
  • Optional "Context Awareness" gathers content from your other apps; pseudonymised text/corrections are collected with consent for model improvement.

Pricing

Monthly $15/mo
Yearly $144/yr
Account required Yes — mandatory

Methodology

Accuracy scores use WER (Word Error Rate) computed against multi-reference ground truth with {a|b} alternates for valid transcription variants (e.g. 48% and forty-eight percent are both accepted). Audio delivered via virtual cable from ElevenLabs TTS. Single test session on 2026-05-27.

Read the full methodology →

Limitations of this test
  • TTS source, not human voice — real-world WER will be higher
  • Single session, no variance measurement across multiple runs
  • Punctuation (PER) not shown in this table — see raw data
  • Numbers WER may be overstated for apps that apply ITN (converting spoken to digit form)