Skip to main content

On-device vs. cloud, and privacy

InkSpoke is offline-first: your words can be transcribed and polished entirely on your own computer, and nothing leaves it unless you choose a cloud model or turn on sync. This page explains what runs where, how you control it, and exactly which data stays private.

Two decisions, made per model

A single dictation goes through two AI steps, and each one is independent:

  1. Speech recognition (ASR) — turning your audio into text. Runs on-device with Whisper.net or Parakeet, or via a cloud provider.
  2. AI refinement (LLM) — cleaning up and reshaping that text. Runs on-device with a local model, or via the built-in InkSpoke Platform cloud model or your own BYOK provider.

You pick a model for each step (in AI Models settings), and the model you pick decides whether that step is local or cloud. So you can, for example, transcribe locally to keep your audio private, and still refine with a fast cloud model.

                       Your speech

┌─────────────┴──────────────┐
│ 1. Speech recognition │
│ Local Whisper / Parakeet │ → audio stays on your device
│ Cloud provider API │ → audio uploaded to provider
└─────────────┬──────────────┘
│ transcribed text
┌─────────────┴──────────────┐
│ 2. AI refinement │
│ Local on-device LLM │ → text stays on your device
│ Cloud Platform / BYOK │ → text sent to provider
└─────────────┬──────────────┘

Injected at your cursor
What "default" actually means

Out of the box, speech runs on-device (Whisper Small — free and offline) but refinement uses the InkSpoke Platform cloud model during your Pro trial. So by default your audio never leaves your machine, but your transcribed text is sent to be polished. To keep the entire loop local, switch refinement to an on-device model too (see below).

On-device models

Local models download once and then run with no network. The speech side has two engines:

EngineWhat it isNotes
Whisper.netThe default local speech recognizer. Default model is Whisper Small (244M).Small is included free; the other sizes (Tiny, Base, Medium, Large, Large-v3 Turbo variants) are Pro.
ParakeetAn alternative ONNX speech engine.Selectable as a speech model when downloaded.

On-device refinement uses a local GGUF language model. Local ASR beyond Whisper Small, and local LLMs, live on the AI Models → On-Device tab, which is a Pro feature.

GPU acceleration

On-device speech can use your GPU to run faster. This is controlled by UseGpuForDictation (on by default), and what it does depends on your OS:

PlatformOn-device speech acceleration
macOSMetal (GPU) + Apple Neural Engine for Whisper
WindowsCUDA (NVIDIA GPUs)
LinuxCPU only — no GPU acceleration
Power users — Parakeet is CPU-bound today

GPU acceleration currently applies to Whisper. The Parakeet engine runs on CPU on all platforms for now; CUDA (Windows) and CoreML (macOS) acceleration for Parakeet are planned but not yet enabled. If you rely on GPU speed, stay on a Whisper model.

Cloud models

Choosing a cloud speech or text model routes that step to a provider over the network:

  • InkSpoke Platform — the built-in cloud provider. Refinement through it uses InkSpoke's Responses API; it's the default text model during your Pro trial.
  • BYOK (bring your own key) — add any OpenAI-compatible provider with your own API key on the AI Models → Providers tab (Pro). Your key is stored in your operating system's keychain, never in a settings file. Requests go directly to your provider under your account.

Cloud speech falls back to local

Cloud speech recognition is designed to fail safe. If a cloud upload doesn't succeed, InkSpoke falls back to your local model so you still get a transcript, and the failed upload is queued for retry rather than lost.

Meetings are local for now

Meeting recording transcribes on-device only — cloud transcription for live meetings is coming soon. Cloud transcription is available when you import an audio or video file.

What stays private

InkSpoke keeps your data on your machine by default:

  • Your audio never leaves your device when you use a local speech model. With a cloud speech model, only then is audio uploaded.
  • Your history, recordings, and workspaces are stored locally (the History screen is even labelled "Local only") unless you turn on cloud sync.
  • Cloud sync is opt-in and end-to-end encrypted. It's off by default (CloudSyncEnabled = false) and requires you to be signed in. When on, your workspaces and settings are encrypted on your device with a key held in your OS keychain — the servers store ciphertext they can't read.
  • API keys and sync keys live in the OS keychain (macOS Keychain, Windows Credential Manager, Linux Secret Service) — never in the plain-text settings.json.

You also have a Privacy Tier setting (Settings → Configuration → General) that sets your overall posture. It defaults to LocalShield, with HybridIntelligence and PrivacyCloud as the other levels.

Cloud refinement sends text, not just audio

Even with local speech, if your refinement model is cloud-based, your transcribed text is sent to that provider. And if you send custom vocabulary to a cloud speech model, that's gated by a separate opt-in (CustomVocabularyCloud). For a fully private loop, keep both the speech model and the refinement model on-device.

Choosing your setup: privacy vs. accuracy and speed

Mix and match the two steps to land where you want on the privacy/performance trade-off:

SetupSpeechRefinementWhat leaves your deviceBest when
Fully on-deviceLocal (Whisper / Parakeet)Local LLMNothingPrivacy is paramount, or you're offline. Quality and speed depend on your hardware and model size. Local LLM needs Pro.
Hybrid (audio stays home)LocalPlatform or BYOK cloudTranscribed text onlyYou want strong refinement quality but never want to upload audio. This is closest to the default.
Fully cloudCloud providerCloud providerAudio + textYou're on modest hardware and want the fastest, most accurate results, and you're comfortable using a provider.
Start local, upgrade selectively

A good rule of thumb: keep speech on-device (it's free and private), and only reach for the cloud on the refinement step where a larger model helps most. You can change either model at any time — nothing is locked in.

Settings that affect this

SettingDefaultWhat it does
Active speech modelWhisper Small (local)Picking a cloud speech model switches ASR to cloud (AsrProvider.Mode).
UseGpuForDictationOnGPU acceleration for on-device speech (Metal / CUDA; no effect on Linux).
CloudSyncEnabledOffOpt-in, end-to-end-encrypted sync of workspaces and settings.
PrivacyTierLocalShieldYour overall privacy posture (LocalShield / HybridIntelligence / PrivacyCloud).
CustomVocabularyCloudGate for sending your custom vocabulary to a cloud speech model.

Next steps