Skip to main content

Audio and model settings

This page is the practical reference for two parts of Settings: the Audio tab, where you tune how InkSpoke captures your voice, and the AI Models section, where you choose the models that turn that voice into polished text. For the concepts behind these choices — the two model roles and the three sources — read Models and providers first; this page tells you where each control lives and what its default is.

Both areas open from the left navigation of the Settings window:

  • AudioConfiguration → the Audio tab.
  • AI Models → its own top-level item, split into three tabs: Global Defaults, On-Device, and Providers.

Audio settings

The Audio tab controls the whole capture side of dictation — which mic you use, the sounds you hear, when a recording stops on its own, and how aggressively InkSpoke cleans up the incoming audio.

┌──────────────────────────────────────────────────────────┐
│ Configuration › Audio │
├──────────────────────────────────────────────────────────┤
│ Microphone [ Built-in Microphone ▾ ] ⟳ Refresh │
│ │
│ Sound Effects [ ✓ On ] │
│ ▶ start ▶ stop ▶ cancel (test each) │
│ │
│ Dictation Mode ( ● Standard ◦ Live Preview ) │
│ Silence timeout [ − 30 s + ] │
│ Max recording duration [ − 300 s + ] │
│ │
│ Noise Suppression (DeepFilterNet) [ ✓ On ] │
│ Whisper Mode (quiet speech) [ Off ] │
└──────────────────────────────────────────────────────────┘

Microphone

Pick which input device InkSpoke records from. If you plug in a headset or USB mic after opening Settings, click Refresh to re-scan for it. Your active device is also shown on the listening overlay and can be switched quickly from the system-tray Microphone submenu.

Sound Effects

InkSpoke plays short chimes when a recording starts, stops, and is cancelled, so you get audible confirmation without looking at the overlay. Three test buttons let you preview each cue. Turn the whole set off with a single toggle.

note

The Sound Effects card only appears on platforms that provide capture sounds. If you don't see it, your build doesn't ship them.

Dictation Mode

This chooses how your speech is transcribed:

  • Standard (default) transcribes everything in one pass after you stop — the most accurate option.
  • Live Preview streams a running transcript into the overlay while you're still talking, finalizing at sentence boundaries.

Live Preview runs on the on-device speech path only. If you have a cloud speech model active, InkSpoke shows a warning and quietly falls back to Standard rather than firing a stream of API calls.

Live Preview needs an on-device speech model

Select a cloud (Platform or BYOK) speech model and Live Preview can't stream — you'll get Standard behavior instead. Keep Whisper (on-device) active to see the live transcript. More detail in Dictation modes and languages.

Recording limits

Two safety timers stop a session that runs away — for example if you walk off mid-dictation:

SettingDefaultRangeWhat it does
Silence timeout30 s0–300 sAuto-cancels after this many seconds of continuous silence. 0 disables the timer.
Max recording duration300 s (5 min)0–3600 sHard cap on a single dictation. 0 means no limit.

Noise Suppression (DeepFilterNet)

A neural denoiser that strips background noise — fans, keyboard clatter, ambient chatter — while preserving your voice. It's on by default and adds only a few milliseconds of latency. Because it relies on a downloadable model, the toggle is disabled while that model is still downloading.

Whisper Mode

Whisper Mode (off by default) tunes InkSpoke for dictating quietly — an open office, a shared room, a sleeping partner nearby. When it's on, InkSpoke lowers the threshold for detecting speech, boosts input gain, and asks Whisper to search harder, so soft, near-silent speech still transcribes cleanly. Leave it off for normal-volume dictation. It's also toggleable from the system-tray menu.

Audio settings at a glance

SettingDefaultWhat it does
MicrophoneSystem default until you pick oneChooses the capture device; Refresh re-scans.
Sound EffectsOnStart / stop / cancel chimes, with test buttons.
Dictation ModeStandardStandard (batch) vs Live Preview (streaming, on-device only).
Silence timeout30 sAuto-stop after continuous silence (0 = off).
Max recording duration300 sHard cap per dictation (0 = no limit).
Noise Suppression (DeepFilterNet)OnNeural background-noise removal (needs its model downloaded).
Whisper ModeOffQuiet-speech tuning (lower threshold, higher gain).

AI Models

The AI Models area is where you choose the two models every dictation flows through — a speech model that hears you and a text model that refines the result — and where you download local models or connect your own providers. It has three tabs.

┌────────────────────────────────────────────────────────────────┐
│ AI Models │
│ [ Global Defaults ] [ On-Device · PRO ] [ Providers · PRO ] │
└────────────────────────────────────────────────────────────────┘

Global Defaults

Your baseline models — used for every dictation unless a workspace pins its own. Two grouped pickers list every available model under Platform, On-Device, and BYOK headings.

┌──────────────────────────────────────────────────────────┐
│ AI Models › Global Defaults │
├──────────────────────────────────────────────────────────┤
│ Speech recognition │
│ [ Whisper Small (On-Device) ▾ ] │
│ │
│ Text processing (refinement) │
│ [ Platform AI (Platform) ▾ ] │
│ Workspace-default refinement [ ✓ On ] │
│ Token limit [ − ···· + ] │
│ │
│ (Master AI Refinement is ON) │
└──────────────────────────────────────────────────────────┘

New installs start on the private-by-default pairing: Whisper Small (on-device, offline, free) for speech and Platform AI (cloud) for refinement during your Pro trial.

The text side adds two controls:

ControlDefaultWhat it does
Workspace-default refinementOnWhether workspaces that don't pin their own text model still get refined by this global default.
Token limitCaps how long a refined response can be.

Both sit under the master AI Refinement switch (on Configuration → General, on by default). When that master switch is off, these controls are greyed out with a reminder, and InkSpoke injects your raw transcript verbatim. See How refinement works.

On-Device (Pro)

PRO (requires Pro or Perpetual) — download and manage models that run entirely on your machine, so your audio and text never leave the computer. A banner reminds you that Whisper Small (244M) is free; every other on-device model requires Pro or Perpetual.

┌───────────────────────────────────────────────────────────┐
│ AI Models › On-Device [PRO] │
│ Small (244M) is free · other models need Pro │
│ Storage ▓▓▓▓▓▓▓░░░░░░░░ used by downloaded models │
├───────────────────────────────────────────────────────────┤
│ Whisper Small 244M ● in use │
│ SPEED ●●●●○ ACCURACY ●●●○○ many languages │
│ [ Delete ] │
│ Whisper Large V3 Turbo │
│ SPEED ●●●○○ ACCURACY ●●●●● [ Download ] │
│ ────────────────────────────────────────────────── │
│ Use GPU for dictation [ ✓ On ] │
│ Apple Neural Engine (macOS only) [ Download ] │
│ ────────────────────────────────────────────────── │
│ Local text model (GGUF) │
│ Max context size [ − 16384 + ] │
└───────────────────────────────────────────────────────────┘

Speech — Whisper models. InkSpoke ships the Whisper family in several sizes. Each card shows its size chip, an in use pill for the active one, SPEED and ACCURACY dot ratings, its language support, and a Download / Delete button. A storage bar tracks total disk use; the model that's currently in use can't be deleted.

Whisper sizeTier
TinyPro
BasePro
Small (244M)Free — the default
MediumPro
LargePro
Large V3 TurboPro
Large V3 Turbo Q5 / Q8Pro

As a rule, smaller models are faster and lighter; larger ones are more accurate. See Models and providers for the full trade-off.

Acceleration. A Use GPU for dictation toggle (on by default) appears only where hardware acceleration is available. On macOS you'll also see an optional Apple Neural Engine accelerator — a small extra download that's used automatically once installed. See Platform notes below.

Text — local model. The On-Device tab also offers a downloadable local text model (GGUF format) so refinement can run fully offline. It has one tunable cloud models don't:

SettingDefaultRangeWhat it does
Max context size16,384512 – 131,072How much text (in tokens) the local model can consider at once. Larger uses more memory.
Free your memory automatically

On-device models can be RAM-hungry. By default InkSpoke unloads idle models after 10 minutes and reloads them next time you dictate — the Model Memory strategy on General and hotkeys.

Providers (BYOK, Pro)

PRO (requires Pro or Perpetual) — Bring Your Own Key. Connect any OpenAI-compatible endpoint (a commercial API, or a local server like Ollama or LM Studio) with your own key. Each provider you add can expose several models that then appear in the Global Defaults and workspace pickers alongside Platform and On-Device options.

┌───────────────────────────────────────────────────────────┐
│ AI Models › Providers [PRO] [ + Add ] │
├───────────────────────────────────────────────────────────┤
│ My OpenAI api.openai.com/v1 3 models │
│ Local LM Studio localhost:1234/v1 1 model │
│ │
│ ── Editing a provider ────────────────────────────── │
│ Quick Setup [ preset ▾ ] [ Apply ] │
│ Name [ My OpenAI ] │
│ API Base URL [ https://api.openai.com/v1 ] │
│ API Key [ •••••••••••• ] (stored in keychain) │
│ Timeout [ 30 ] s │
│ Models id · type · token limit · temperature │
│ [ + add row ] │
│ [ Test Connection ] [ Cancel ] [ Save ] │
└───────────────────────────────────────────────────────────┘

Add Provider is disabled unless you're on Pro — but any providers you already added stay viewable and deletable even if your plan lapses; you just can't add new ones until you upgrade again. Each provider card shows its name, base URL, and model count.

Editing a provider gives you these fields:

FieldNotes
Quick SetupOptional preset picker + Apply to pre-fill a common provider.
NameYour label for the provider.
API Base URLThe OpenAI-compatible endpoint. Built-in presets already include the /v1 suffix.
API KeyMasked in the UI and stored in your OS keychain — never in InkSpoke's settings file.
Timeout5–120 s. Defaults to 10 s for text models, 30 s for speech.
TemperatureSampling temperature for the provider.
Models tableOne row per model: id, type (speech or text), token limit (50k–1M, in 5k steps), and temperature. Add or remove rows as needed.
Test ConnectionVerifies your key and URL work before you save.

Providers and their models are stored in InkSpoke's local database, with keys in the keychain — none of it lands in settings.json. You can manage the same personal keys from the web account.

Keys live in your keychain, not the cloud

API keys you paste here are held in your operating system's secure credential store on this device. Removing a provider deletes its stored key.


Platform notes

On-device speech models use whatever acceleration your machine offers. The Use GPU for dictation toggle and the Apple Neural Engine card appear only where they apply:

PlatformOn-device acceleration
WindowsCUDA GPU when present, otherwise CPU.
macOSMetal GPU, plus an optional Apple Neural Engine download used automatically once installed.
LinuxCPU only (no GPU toggle shown).

The Sound Effects card is shown only on platforms that ship capture chimes.

Next steps