Skip to main content

Dictation modes and languages

InkSpoke's defaults are tuned to just work, but a few settings let you shape how it listens: whether you see words appear as you speak, how it handles very soft speech, which language you're dictating in, and how aggressively it cleans up your microphone. This page walks through each one.

The short version

You don't have to touch any of this. Out of the box InkSpoke uses Standard mode, expects English, auto-detects other languages, and auto-tunes your audio. The settings below are for when you want something different.

Two ways to transcribe: Standard vs Live Preview

The Dictation Mode setting decides when InkSpoke turns your speech into text.

ModeBest forHow it works
Standard (default)Everyday dictation where accuracy matters mostInkSpoke records while you talk and transcribes everything in one pass after you stop. It sees the whole utterance at once, so it makes the highest-accuracy call.
Live PreviewWatching your words land in real timeInkSpoke streams a running preview roughly every ~2 seconds as you speak, then confirms each sentence at a natural pause. The listening overlay shows finalized text plus a blinking draft of the chunk still in progress.

Both modes end the same way — the final, cleaned-up text is injected wherever your cursor is. The difference is whether you get a live "draft" along the way.

Live Preview has two requirements

Live Preview streams speculative previews and confirms them at sentence boundaries detected by InkSpoke's built-in voice-activity model (which ships with the app — no download needed). Two things to know:

  • If that voice-activity model isn't available, InkSpoke quietly falls back to Standard.
  • Live Preview runs only with on-device speech models. If you've switched to a cloud speech provider, InkSpoke also falls back to Standard so it isn't firing off an API request every couple of seconds. See on-device vs. cloud.
SettingDefaultWhat it does
Dictation Mode (DictationMode)StandardStandard transcribes once after you stop. LivePreview streams a preview every ~2 s and finalizes at pauses (on-device models only). Found under Settings → Audio.

Dictating softly: quiet-speech mode

Working in an open office, a library, or a room with someone asleep? Quiet-speech mode (sometimes called "whisper mode") retunes InkSpoke so it can still hear you when you're barely making a sound.

When it's on, InkSpoke does three things: it lowers the threshold at which it decides you're speaking, boosts the gain on your microphone, and asks the speech model to work a little harder on low-energy audio. The result is far better recognition of soft, close-to-the-mic speech — at the cost of being more sensitive to background noise, which is why it's off by default.

SettingDefaultWhat it does
Quiet-speech mode (WhisperModeEnabled)OffTunes detection, gain, and the speech model for very soft dictation.
Power users — what quiet-speech mode changes

Turning it on shifts three internal knobs:

  • Voice-activity speech threshold drops from 0.45 to 0.30, so quieter sounds register as speech.
  • Audio gain is boosted to 3.0×.
  • Whisper's beam size increases from 1 to 5 for more careful decoding of low-energy audio.

Dictating in other languages

InkSpoke isn't English-only. You can set a default language, let it auto-detect, or pick a language per dictation right from the overlay.

Set a default and let auto-detect help

Out of the box, InkSpoke expects English and keeps auto-detect on, so it can identify the spoken language automatically when you switch.

SettingDefaultWhat it does
Language (Language)English (en)The language InkSpoke assumes when auto-detect is off, or as a starting point.
Auto-detect language (AutoDetectLanguage)OnLets InkSpoke identify the language you're speaking rather than forcing a fixed one.

Switch language on the overlay

Every dictation shows a language picker in the listening overlay. It lists Auto followed by your preferred languages, so a mid-session switch is one click:

┌────────────────────────────────────────────────┐
│ ● Listening… ⏱ 0:04 │
├────────────────────────────────────────────────┤
│ ▁▃▅▇▅▃▁▂▄▆▄▂ │
│ │
│ [ Workspace ▾ ] [ EN ▾ ] [ Send ] │
│ ├ Auto │
│ ├ English │
│ ├ Español │
│ └ Français │
└────────────────────────────────────────────────┘

You can also cycle languages with / while the overlay is open, without reaching for the mouse. A language you choose this way can stick as your new default.

Per-workspace preferred language

Workspaces can carry their own preferred language. When a workspace is smart-matched to the app you're in, InkSpoke can pre-select that workspace's language for you — handy if you always write to one team in Spanish and another in English.

Getting clean audio automatically

Good transcription starts with a good signal. InkSpoke runs several audio steps for you, most of which need no configuration at all.

Auto-calibrated gain

Quiet or inconsistent microphones are amplified automatically. In the first moments of a session InkSpoke measures your ambient noise and peak level, works out how much to boost you so your speech lands at a healthy level, and applies that consistently for the whole session (so it never ducks your volume during a pause). If calibration finishes partway in, it even re-amplifies the audio it already buffered. There's nothing to set — it just happens.

Noise suppression

InkSpoke can strip out fans, keyboard clatter, and ambient chatter while preserving your voice, using a neural noise filter (DeepFilterNet) plus a low-pass filter that also acts as a high-frequency noise gate.

SettingDefaultWhat it does
Noise suppression (AudioProcessing.DeepFilterEnabled)OnNeural background-noise removal. Adds about 30 ms of latency and needs the noise-suppression model downloaded; if it isn't present, this step is simply skipped.
Suppression strength (AudioProcessing.DeepFilterStrength)0.75How aggressively noise is removed, on a 0.01.0 scale. Lower it if you find speech sounding over-processed.
Low-pass filter (AudioProcessing.LowpassFilterEnabled)OnA 7.5 kHz low-pass applied before InkSpoke resamples to 16 kHz — trims high-frequency hiss and prevents aliasing.
note

Noise suppression is optional and depends on the DeepFilterNet model being installed. With it off (or not yet downloaded), auto-gain and the low-pass filter still run.

Voice-activity detection and silence timeouts

InkSpoke uses voice-activity detection to tell speech from silence: it trims silent gaps before transcription and drives the "speech detected" state you see on the overlay. Two safety timeouts also stop a session automatically so a forgotten recording doesn't run forever.

SettingDefaultWhat it does
Silence timeout (SilenceTimeoutSeconds)30 sAuto-cancels a session after this much continuous silence. Set to 0 to disable.
Max recording length (MaxRecordingDurationSeconds)300 s (5 min)Hard cap on a single dictation. Set to 0 for no limit.
warning

If you dictate long passages, raise or disable Max recording length — otherwise InkSpoke stops and processes what it has at the 5-minute mark.

Platform notes

These modes and settings behave the same on Windows, macOS, and Linux. Neural noise suppression relies on a downloadable native component and model, so it's the one feature that may be inactive until that model is installed; everything else works everywhere out of the box.

Next steps