Transcribe a file

Already have a recording — an interview, a lecture, a podcast, a voice memo, a downloaded meeting? InkSpoke can transcribe it for you and produce the same speaker-labeled transcript you get from a live meeting, without you having to play it back in real time.

Under the hood, file import runs the exact same long-form engine as meeting recording. The only difference is where the audio comes from: instead of your mic and system audio, it's a file you already have on disk.

When to use it

Use Transcribe a file whenever the audio already exists and you just want the text:

Turn a recorded interview or call into a transcript.
Get captions (.srt / .vtt) for a video.
Transcribe a lecture, webinar, or podcast episode.
Re-process a file with different settings — a different language, model, or speaker count.

If you want InkSpoke to capture a live meeting from your microphone and the other participants' audio, use Record a meeting instead.

Open the dialog

There are two ways in — both open the same Transcribe a file window:

From the tray menu: click the InkSpoke system-tray icon and choose Transcribe a File…
From History: open Settings → History (or the tray History… item) and use the Transcribe a File action there.

There is no dedicated hotkey for file import — it's always started from the tray menu or History.

┌────────────────────────────────────────────────┐
│  Transcribe a file                         ✕   │
├────────────────────────────────────────────────┤
│  File:  [ Choose file… ]   interview.m4a       │
│                                                │
│  [ Workspace ▾ ]              [ Language ▾ ]   │
│                                                │
│  Transcription model                           │
│  [ Whisper Small · on-device             ▾ ]   │
│                                                │
│  [x] Identify speakers (diarization)           │
│  Speakers   [ Auto-detect ▾ ]                  │
│                                                │
│  ▓▓▓▓▓▓▓▓▓░░░░░░░   Transcribing…   42%        │
│                                                │
│                     [ Cancel ]  [ Transcribe ] │
└────────────────────────────────────────────────┘

Step by step

Choose a file. Click Choose file… and pick your audio or video file (see supported formats below).
Pick a workspace. The workspace you choose applies its vocabulary and dictionary substitutions to the transcript, so domain terms and names come out spelled the way you want. Leave it on your default if you're not sure.
Set the language. Choose the spoken language, or leave it on Auto to let the model detect it.
Pick a transcription model. Defaults to your active on-device Whisper model. You can switch to a larger on-device model or, if you've configured one, a cloud diarized model (see Local vs. cloud).
Choose whether to identify speakers. The Identify speakers (diarization) checkbox is on by default. Leave it on to get a speaker-labeled transcript; turn it off to treat the whole recording as a single "Speaker".
(Optional) Set the speaker count. When diarization is on and you're using an on-device model, you can leave Speakers on Auto-detect or pin it to a fixed number from 1 to 8 if you already know how many people are talking.
Click Transcribe. A progress bar and status text track the work. Cancel aborts at any time.

When it finishes, the transcript opens automatically in the Transcript Viewer, where you can rename speakers, replay audio (if kept), copy the text, or export it. See Speakers and the transcript viewer.

What happens after you click Transcribe

InkSpoke first probes the file for a decodable audio stream, then uses a bundled FFmpeg to decode it to 16 kHz mono audio — the same format the transcription engine expects. Because FFmpeg is bundled with InkSpoke, the common formats below just work without you installing anything.

Supported formats

InkSpoke can read a wide range of containers. For video files, it extracts and transcribes the audio track.

Type	Formats
Audio	`.mp3` · `.wav` · `.m4a` · `.ogg` · `.opus` · `.flac` · `.aac` · `.wma`
Video (audio track)	`.mp4` · `.webm` · `.avi` · `.mkv` · `.mov`

note

File import needs the bundled FFmpeg/FFprobe helper, which ships with InkSpoke. If your build can't find it and it isn't on your system PATH, decoding will fail — reinstall to restore the bundled copy.

The dialog options

Option	Choices	What it does
File	Any supported file	The audio/video to transcribe. Your original file is never modified or moved — InkSpoke just reads it.
Workspace	Any of your workspaces	Applies that workspace's vocabulary and personal-dictionary substitutions to the result.
Language	Auto, or a specific language	The spoken language. Auto lets the model detect it.
Transcription model	On-device Whisper sizes; a cloud diarized option when configured	Which engine transcribes. Defaults to your active on-device Whisper model.
Identify speakers (diarization)	On (default) / Off	Labels who spoke when. Off collapses everything into one "Speaker".
Speakers	Auto-detect, or 1–8	For on-device diarization only: fix the speaker count or let InkSpoke detect it. Hidden when you pick a cloud model (it auto-detects).

These pickers apply to this one import only — nothing here is saved as a default. Global recording defaults live in Settings → Recordings & Meetings.

Local vs. cloud engines

The Transcription model picker decides where the work runs.

	On-device (Whisper)	Cloud (diarized model)
Where it runs	Fully on your machine	Uploaded to a cloud provider
Privacy	Audio never leaves your device	Compressed audio is sent to the provider
Network	Works offline	Requires an internet connection
Speaker count	Auto-detect, or fix it 1–8	Auto-detected by the provider
Length limit	Handles very long files	~25 MB of compressed upload
When it appears	Always (the default)	Only when a diarize-capable cloud model is configured

On-device is the default and needs no setup — it chunks the audio through Whisper (with your workspace vocabulary applied) and labels speakers locally. If you pick a Whisper size you haven't downloaded yet, InkSpoke fetches it on demand before starting.

Cloud appears in the model list as an option ending in "· auto speakers (cloud)", and only when you've configured a diarize-capable cloud model — either your own key (BYOK) or the InkSpoke Platform. It compresses the mixed audio and sends it to a diarized-transcription model, which returns the transcript, speaker labels, and timestamps with the speaker count detected automatically.

Cloud has a size cap — use local for long files

The cloud engine accepts up to about 25 MB of compressed audio per file. Longer recordings are rejected with a message asking you to use a local model instead. For a multi-hour interview or lecture, choose an on-device Whisper model — it has no such limit.

Speaker labels need the diarization models

On-device speaker identification uses two small ML models (about 46.5 MB total) that download on demand the first time you use diarization. You can also fetch them ahead of time from Settings → Recordings & Meetings → Speakers → Download diarization model. Once downloaded, speaker identification runs fully on-device. If diarization is off, the whole file becomes a single "Speaker".

Re-transcribing a file

Not happy with the result — wrong language detected, too few speakers, or you want to try the cloud engine? Just run the import again: open Transcribe a File…, pick the same file, and choose different settings. Each run produces its own transcript, so you can compare. Imported recordings keep a reference to their original source file, so re-importing is quick.

Platform notes

File import behaves the same on Windows, macOS, and Linux — same dialog, same formats, same FFmpeg decoding. The one practical difference is transcription speed, which depends on hardware acceleration:

Platform	On-device acceleration
macOS	CoreML
Windows	CUDA (NVIDIA GPUs)
Linux	CPU

Cloud transcription speed is the same everywhere since the heavy lifting happens on the provider's servers.

Next steps

Record a meeting — capture a live meeting from your mic plus the other participants' audio.
Speakers and the transcript viewer — rename speakers, replay audio, and export to txt, md, srt, vtt, or json.
Models and providers — download Whisper sizes and set up a cloud transcription model.
On-device vs. cloud and privacy — understand where your audio is processed.

When to use it​

Open the dialog​

Step by step​

What happens after you click Transcribe​

Supported formats​

The dialog options​

Local vs. cloud engines​

Re-transcribing a file​

Platform notes​

Next steps​