Models and providers
InkSpoke runs on two kinds of AI model working back to back: one turns your speech into text, the other polishes that text before it lands in your app. This page explains those two roles, the three places a model can come from, and how you choose which ones are active — globally and per workspace.
The two model roles
Every dictation flows through up to two models, in order:
| Role | What it does | Also called |
|---|---|---|
| Speech recognition | Converts your audio into a raw transcript. | ASR, "speech model" |
| Text processing | Refines the transcript — removes filler, fixes grammar, matches the app's tone. | LLM, "text model", refinement |
The speech model always runs. The text model only runs when AI refinement is on (its master switch, AI Refinement, is enabled by default). Turn refinement off and InkSpoke injects the raw transcript verbatim.
The two roles are configured independently. You can pair an on-device speech model with a cloud text model, or vice versa — whatever fits your privacy and speed needs. See On-device vs. cloud.
The three sources
Both roles can be filled by a model from one of three sources. In every model picker they're grouped exactly this way:
| Source | What it is | Runs | Tier |
|---|---|---|---|
| Platform | InkSpoke's own hosted models — nothing to download or configure. | Cloud | Pro (included in the Pro trial) |
| On-Device | Models you download and run locally: Whisper for speech, a local GGUF model for text. | On your machine | Small speech model is free; the rest are Pro |
| BYOK | Bring Your Own Key — connect any OpenAI-compatible provider with your own API key. | That provider's cloud | Pro |
New installs start with a deliberately simple, private-by-default pairing:
- Speech: Whisper Small (on-device, offline, free).
- Text: Platform AI (cloud) — active during your Pro trial.
You can change either side at any time.
Platform models
Platform models are InkSpoke's built-in, hosted models. There's nothing to set up — pick one and it works. Platform text refinement is part of Pro (and available during the Pro trial); when the trial ends, on-device and BYOK options keep working.
On-Device models
On-device models download to your machine and run entirely offline — your audio and text never leave the computer. They live under AI Models → On-Device (a Pro area), where each model card shows its size, speed/accuracy ratings, language support, and a Download/Delete button. A storage bar tracks total disk use, and the model that's currently in use can't be deleted.
Speech — Whisper sizes. InkSpoke ships the Whisper family in several sizes. Only Small is free; every other size (larger and smaller) requires Pro or Perpetual.
| Whisper size | Tier | Rough trade-off |
|---|---|---|
| Tiny | Pro | Fastest, lowest accuracy. |
| Base | Pro | A step up from Tiny. |
| Small (244M) | Free | The default — a balanced choice for most people. |
| Medium | Pro | More accurate, slower and heavier. |
| Large | Pro | Highest accuracy, most demanding. |
| Large V3 Turbo | Pro | Large-model accuracy, noticeably faster. |
| Large V3 Turbo Q5 / Q8 | Pro | Quantized Turbo variants that trade a little accuracy for lower memory use. |
As a rule, smaller models are faster and lighter; larger ones are more accurate. The Turbo variants aim for near-Large accuracy at higher speed.
Text — local LLM. The On-Device tab also offers a downloadable local text model (GGUF format) so refinement can run fully offline too. It has one tunable you won't find on cloud models:
| Setting | Default | Range | What it does |
|---|---|---|---|
| Max context size | 16,384 | 512 – 131,072 | How much text (tokens) the local model can consider at once. Larger uses more memory. |
On-device models can be memory-hungry. By default InkSpoke unloads idle models after 10 minutes (the Model Memory strategy on Configuration → General), reloading them the next time you dictate. Power users can switch this to Always loaded for zero warm-up, or Manual with an "Unload models now" button.
BYOK — bring your own key
If you already pay for OpenAI, or run a local server like Ollama or LM Studio, or use any other OpenAI-compatible endpoint, connect it directly under AI Models → Providers (a Pro area). Each provider you add can expose several models that then appear in the pickers alongside Platform and On-Device options.
Adding a provider takes a few fields:
- Quick Setup preset (optional) to pre-fill common providers, or fill in manually.
- Name and API Base URL.
- API Key — masked in the UI and stored in your OS keychain, never in InkSpoke's settings file.
- Timeout and temperature, plus a models table where you list each model's id, type (speech or text), and token limit.
- A Test Connection button to confirm the key and URL work before you save.
Your keys stay on your device. You can also manage the same personal keys from the web account.
Choosing the active models
Your global choices live under AI Models → Global Defaults. Two pickers — one per role — each list every available model, grouped by source:
┌───────────────────────────────────────────────────────────┐
│ AI Models › Global Defaults │
├───────────────────────────────────────────────────────────┤
│ Speech recognition │
│ [ Whisper Small (On-Device) ▾ ] │
│ ── Platform ── ── On-Device ── ── BYOK ── │
│ │
│ Text processing (refinement) │
│ [ Platform AI (Platform) ▾ ] │
│ Workspace-default refinement: [ ✓ On ] │
│ Token limit: [ − 2048 + ] │
│ │
│ (Master AI Refinement is ON) │
└───────────────────────────────────────────────────────────┘
Whatever you pick here is the default for every dictation — unless a workspace overrides it.
The text-processing side has two extra controls: a workspace-default refinement toggle (on by default) that decides whether workspaces without their own text model still get refined by this default, and a token limit stepper that caps how long a refined response can be. If the master AI Refinement switch is off, these are greyed out with a reminder.
How workspaces override per context
A workspace can pin its own preferred speech model and preferred text model, so dictation into your IDE can use different models than dictation into email — automatically, based on the app you're in.
When it's time to refine, InkSpoke resolves the text model in strict order:
The speech model resolves more simply: a workspace's preferred speech model (if set) wins for that dictation; otherwise the Global Default is used. You can also override the workspace itself — and therefore its models — for a single session from the picker on the listening overlay.
This is why the same words come out differently in different apps: your "Code" workspace might pin an accuracy-first Whisper size and skip refinement, while your "Email" workspace uses a cloud text model tuned for a professional tone. See Smart matching and precedence.
What needs Pro
Most of the model catalog is a Pro feature. Here's the quick map:
| Capability | Free | Pro / Perpetual |
|---|---|---|
| Whisper Small (on-device speech) | ✅ | ✅ |
| Other on-device Whisper sizes | — | ✅ |
| On-device local text (LLM) model | — | ✅ |
| Platform (cloud) models | Trial | ✅ |
| BYOK providers & keys | — | ✅ |
BYOK providers you already added stay viewable and deletable even if your plan lapses — you just can't add new ones until you're on Pro again.
Platform notes
On-device speech models can use hardware acceleration where it's available:
| Platform | On-device acceleration |
|---|---|
| Windows | CUDA GPU when present, otherwise CPU. |
| macOS | Metal GPU, plus an optional Apple Neural Engine accelerator (a small extra download that's used automatically once installed). |
| Linux | CPU only. |
The GPU toggle (Use GPU for dictation, on by default) appears only on platforms that support it.
Next steps
- Audio and models settings — where the pickers, downloads, and providers live.
- On-device vs. cloud and privacy — decide where your words get processed.
- Create and tune a workspace — pin per-context speech and text models.
- Choosing your models on the web — set active models from your account.