Record and transcribe a meeting
Goal: capture a real conversation — your voice and the other participants' — and walk away with a clean, speaker-labeled transcript you can rename, replay, and export, all processed on your own device.
Push-to-talk dictation is built for short bursts of your own speech. A meeting recording is the opposite: it grabs both sides of a call at once and turns a long stretch of talking into a labeled transcript. This tutorial takes you through the whole flow, then shows how to do the same thing with an audio or video file you already have.
What you'll learn
- How to start and stop a meeting recording from the system tray
- How InkSpoke captures your microphone plus system audio (the people on the other end)
- How to grant macOS Screen Recording permission so participants are captured
- How to read the live meeting HUD — timer, level meters, and rolling preview
- How to rename speakers in the Transcript Viewer
- How to export the transcript to VTT or Markdown
- How to transcribe an existing file as an alternative to a live recording
Prerequisites
- The InkSpoke desktop app, installed and running. If you haven't set it up yet, start with Install and set up InkSpoke.
- A speech model on device. New installs come with Whisper Small, which is enough to follow along. See Models and providers if you want a larger one.
- macOS only: to capture the other participants, macOS needs Screen Recording permission (covered in Step 1). Without it, meetings record your microphone only. On Windows and Linux, system-audio capture works out of the box.
- Optional but recommended: the on-device diarization model (a one-time ~46.5 MB download) if you want individual per-person speaker splitting rather than just You vs. Participants.
Live meetings run fully on your device — cloud transcription for meetings is coming soon. Everything you record and transcribe here stays local. (Cloud transcription is available for file import — see the last section.)
Time estimate
About 10 minutes to learn the flow — plus however long your actual meeting runs, and a short processing pass at the end.
The flow at a glance
Step 1 — (macOS) grant Screen Recording permission
Skip this step on Windows and Linux. On macOS, capturing the other participants goes through ScreenCaptureKit, which requires Screen Recording permission. Grant it before your first real call so you don't record a meeting mic-only by accident.
- Open Settings → Recordings & Meetings.
- Find the System Audio section. It shows one of:
- green Available — you're all set,
- a Needs permission prompt with a Grant Screen Recording… button (macOS), or
- Not available on this device — meetings will record the microphone only.
- If you see the prompt, click Grant Screen Recording… and approve InkSpoke in macOS System Settings.
┌────────────────────────────────────────────────┐
│ Recordings & Meetings │
├────────────────────────────────────────────────┤
│ System Audio │
│ ⚠ Needs permission │
│ [ Grant Screen Recording… ] │
└────────────────────────────────────────────────┘
On macOS, until Screen Recording permission is granted, a meeting records your microphone only and the HUD shows a warning banner. Your voice is transcribed; the people on the call are not.
Step 2 — (optional) download the diarization model
A two-track meeting already labels your mic as You and the system audio as Participants without any extra download. If you want InkSpoke to split those participants into individual people (Priya, Sam, Alex…), grab the on-device speaker model once:
- In Settings → Recordings & Meetings, open the Speakers section.
- Click Download diarization model and watch the progress bar.
- When it finishes, you'll see "Speaker identification runs fully on-device."
It's a one-time ~46.5 MB download (a pyannote segmentation model plus a voice-embedding model), after which speaker identification runs entirely offline.
Step 3 — start the meeting from the tray
There's no hotkey for meetings — unlike push-to-talk dictation (which uses Alt + Space, or ⌥ + Space on macOS), meetings are driven entirely from the tray menu, so they stay out of your way until you want them.
┌──────────────────────────────────┐
│ InkSpoke │
│ ────────────────────────────── │
│ ▶ Start Meeting Recording │
│ 📄 Transcribe a File… │
│ 🕘 History… │
│ … │
└──────────────────────────────────┘
- Click the InkSpoke tray icon.
- Choose Start Meeting Recording. The coral meeting HUD opens in its Ready phase.
Step 4 — choose your options in the Ready phase
Before anything records, set the options for this session. None of these are saved as defaults — they apply only to the meeting you're about to start.
┌────────────────────────────────────────────────┐
│ ● Meeting Recording ✕ │
├────────────────────────────────────────────────┤
│ │
│ [ Workspace ▾ ] [ Language: Auto ▾ ] │
│ [ Model: Whisper Small (local) ▾ ] │
│ [✓] Identify speakers (diarization) │
│ │
│ [ Cancel ] [ Start Meeting ] │
└────────────────────────────────────────────────┘
| Picker | Default | What it does |
|---|---|---|
| Workspace | Your default | Applies a workspace's vocabulary and dictionary substitutions so names and jargon come out right. |
| Language | Your preferred language, or Auto | Sets the spoken language, or lets InkSpoke detect it. |
| Transcription model | Your active local Whisper model | The on-device model used to transcribe. Only local models appear here. |
| Identify speakers (diarization) | On | Labels who spoke when. Turn it off to keep the whole recording as one Speaker. |
When it looks right, click Start Meeting.
By default InkSpoke keeps only the transcript, not the audio — so the Transcript Viewer's Play button won't appear. If you might want to listen back (or re-transcribe later), turn on Keep recorded audio in Settings → Recordings & Meetings → Storage before you record. See Speakers and the transcript viewer.
Step 5 — record and watch both sides come in
The HUD switches to a live view: an elapsed timer, two level meters, and a rolling preview of what's being said.
┌──────────────────────────────────────────────┐
│ ● Meeting Recording ✕ │ ← drag this strip to move the HUD
├──────────────────────────────────────────────┤
│ 2:47 │ ← elapsed timer (m:ss)
│ │
│ You (mic) ▇▇▇▅▃▁ │ ← your microphone level
│ Participants ▇▇▇▇▇▆▄▂ │ ← system-audio level
│ │
│ "…so the rollout is scheduled for Friday │ ← rolling live preview
│ and the docs still need a review pass." │
│ │
│ [ Cancel ] [ Stop ] │
└─────────────────────────────────────────────────┘
- Both meters should move. You (mic) tracks your microphone; Participants (system) tracks the audio your computer is playing — the people on the call. Seeing both bounce confirms you're capturing the whole conversation.
- The rolling preview shows a running snippet so you know it's hearing everyone. When there's nothing yet, it reads an italic Listening….
The rolling text is a quick, non-authoritative peek — it can drop or garble words. The clean, speaker-labeled transcript is built after you press Stop, from the full recording. Don't judge accuracy by the preview.
If you don't see the Participants meter move at all and you expected other voices, jump to Troubleshooting.
Step 6 — stop and let it process
When the call ends, either:
- Click Stop on the HUD, or
- Open the tray menu again — the item now reads Stop Meeting Recording — and click it.
The HUD moves to its Processing phase while InkSpoke transcribes the full recording and separates the speakers:
┌────────────────────────────────────────────────┐
│ ● Meeting Recording │
├────────────────────────────────────────────────┤
│ Transcribing… │
│ │
│ Diarizing speakers and building the │
│ transcript. │
└────────────────────────────────────────────────┘
Stop ends the meeting and builds the transcript. Cancel discards the session without transcribing anything.
Step 7 — rename the speakers
When processing finishes, the Transcript Viewer opens automatically. At the top is an editable speaker roster. Type a real name into any box — for example, rename Participants to Priya — and it rewrites that label across every segment live and saves automatically.
┌────────────────────────────────────────────────────────────┐
│ Weekly sync 0:42:17 · English │
│ [ ▶ Play ] │
├────────────────────────────────────────────────────────────┤
│ Speakers │
│ [ You ] [ Priya ] │
├────────────────────────────────────────────────────────────┤
│ You Morning — can everyone hear me okay? │
│ Priya Yep, loud and clear. │
│ You Great, let's start with the roadmap. │
│ Priya I had one question about the timeline… │
├────────────────────────────────────────────────────────────┤
│ [ Copy ] [ Export VTT ] [ Other formats ▾ ] │
└────────────────────────────────────────────────────────────┘
The Play / Stop button in the header only appears when audio was kept (Step 4's tip). Renaming isn't cosmetic — it persists to the saved recording, so your exports and future re-opens carry the real names.
Step 8 — export to VTT or Markdown
From the footer you can copy the transcript or save it to a file:
- Export VTT — the default button. Produces a WebVTT
.vttfile (HH:MM:SS.mmmtiming), ideal for video captions. - Other formats ▾ — a flyout with Markdown (.md), Plain text (.txt), SubRip (.srt), and JSON (.json).
- Copy — puts a plain-text
Speaker: textversion on your clipboard.
To get Markdown — a titled document with bold speaker names and clock timestamps, perfect for pasting into notes or a wiki — click Other formats ▾ and choose Markdown (.md). The export filename is derived from the recording's title.
| Format | Extension | Best for |
|---|---|---|
| WebVTT (default) | .vtt | Web video captions |
| Markdown | .md | Notes and docs — title + bold speaker names |
| SubRip | .srt | Subtitle files for video players |
| Plain text | .txt | Quick paste, Speaker: text lines |
| JSON | .json | Feeding another tool — per-segment timing, confidence, speaker origin |
Expected result
You should now have:
- A saved meeting in Settings → History, reopenable any time by clicking its row.
- A speaker-labeled transcript in the Transcript Viewer — labeled You / Participants at minimum, or per-person names if you downloaded the diarization model and renamed speakers.
- A
.vttor.mdfile exported to disk, carrying your renamed speakers. - (If you enabled Keep recorded audio) a Play button to replay the meeting.
Everything above was produced on your own device — no audio left your computer.
Troubleshooting
The "Participants" meter never moves — I only captured my own voice. System audio wasn't captured. On macOS, grant Screen Recording permission (Step 1) — until you do, meetings are mic-only and the HUD shows a warning banner. On Windows/Linux, check Settings → Recordings & Meetings → System Audio; if it says Not available on this device, that build or hardware can't capture loopback and the meeting will record the microphone only.
There's no "Play" button in the Transcript Viewer. The audio wasn't kept. By default InkSpoke stores only the transcript. Turn on Keep recorded audio in Settings → Recordings & Meetings → Storage before your next recording — it can't be enabled retroactively for a meeting you've already recorded.
Everyone shows up as a single "Speaker" (or no per-person split). Diarization was off, or the ML model isn't installed. Make sure Identify speakers (diarization) was checked in the Ready phase, and download the on-device speaker model from Settings → Recordings & Meetings → Speakers → Download diarization model. A two-track meeting still gives you You / Participants even without the model; per-person splitting needs it.
The transcript came out in the wrong language, or a meeting failed to process. There's no one-click "re-transcribe this meeting" button in the current build. If you turned on Keep recorded audio, re-run it through Transcribe a File… (below) with the right language or model selected. This is exactly why keeping audio before an important call is worth it — without a saved file there's nothing to re-run.
Alternative — transcribe a file you already have
Already have the audio — a recorded call, an interview, a downloaded meeting, a voice memo? You don't need to play it back in real time. File import runs the same long-form engine as a live meeting.
┌────────────────────────────────────────────────┐
│ Transcribe a file ✕ │
├────────────────────────────────────────────────┤
│ File: [ Choose file… ] interview.m4a │
│ │
│ [ Workspace ▾ ] [ Language ▾ ] │
│ Transcription model │
│ [ Whisper Small · on-device ▾ ] │
│ [x] Identify speakers (diarization) │
│ Speakers [ Auto-detect ▾ ] │
│ │
│ ▓▓▓▓▓▓▓▓▓░░░░░░░ Transcribing… 42% │
│ [ Cancel ] [ Transcribe ] │
└────────────────────────────────────────────────┘
- From the tray menu, choose Transcribe a File… (or use the Transcribe a File action in Settings → History).
- Click Choose file… and pick your media. Supported types include
.mp3,.wav,.m4a,.ogg,.opus,.flac,.aac,.wmaand, for video (audio track),.mp4,.webm,.avi,.mkv,.mov. - Set the workspace, language, and transcription model, and choose whether to identify speakers. With an on-device model you can also set the Speakers count (Auto-detect, or 1–8).
- Click Transcribe. When it finishes, the same Transcript Viewer opens — rename speakers and export exactly as above.
Unlike live meetings, file import offers a cloud diarized model in the model picker when you've configured one (your own key or the InkSpoke Platform). It auto-detects speaker count but has a ~25 MB compressed-upload cap, so pick an on-device Whisper model for long recordings. See Transcribe a file.
Next steps
- Record a meeting — the full reference for the meeting HUD, capture, and settings.
- Speakers and the transcript viewer — how diarization works, plus renaming, playback, and every export format.
- Transcribe a file — the complete file-import walkthrough, including cloud transcription.
- History and diagnostics — find, filter, and reopen every saved meeting and transcript.
- On-device vs. cloud and privacy — where your audio and text are processed, and why on-device is the default.