Skip to main content

Record and transcribe a meeting

Goal: capture a real conversation — your voice and the other participants' — and walk away with a clean, speaker-labeled transcript you can rename, replay, and export, all processed on your own device.

Push-to-talk dictation is built for short bursts of your own speech. A meeting recording is the opposite: it grabs both sides of a call at once and turns a long stretch of talking into a labeled transcript. This tutorial takes you through the whole flow, then shows how to do the same thing with an audio or video file you already have.

What you'll learn

  • How to start and stop a meeting recording from the system tray
  • How InkSpoke captures your microphone plus system audio (the people on the other end)
  • How to grant macOS Screen Recording permission so participants are captured
  • How to read the live meeting HUD — timer, level meters, and rolling preview
  • How to rename speakers in the Transcript Viewer
  • How to export the transcript to VTT or Markdown
  • How to transcribe an existing file as an alternative to a live recording

Prerequisites

  • The InkSpoke desktop app, installed and running. If you haven't set it up yet, start with Install and set up InkSpoke.
  • A speech model on device. New installs come with Whisper Small, which is enough to follow along. See Models and providers if you want a larger one.
  • macOS only: to capture the other participants, macOS needs Screen Recording permission (covered in Step 1). Without it, meetings record your microphone only. On Windows and Linux, system-audio capture works out of the box.
  • Optional but recommended: the on-device diarization model (a one-time ~46.5 MB download) if you want individual per-person speaker splitting rather than just You vs. Participants.
Meetings transcribe on-device only

Live meetings run fully on your device — cloud transcription for meetings is coming soon. Everything you record and transcribe here stays local. (Cloud transcription is available for file import — see the last section.)

Time estimate

About 10 minutes to learn the flow — plus however long your actual meeting runs, and a short processing pass at the end.

The flow at a glance

Step 1 — (macOS) grant Screen Recording permission

Skip this step on Windows and Linux. On macOS, capturing the other participants goes through ScreenCaptureKit, which requires Screen Recording permission. Grant it before your first real call so you don't record a meeting mic-only by accident.

  1. Open Settings → Recordings & Meetings.
  2. Find the System Audio section. It shows one of:
    • green Available — you're all set,
    • a Needs permission prompt with a Grant Screen Recording… button (macOS), or
    • Not available on this device — meetings will record the microphone only.
  3. If you see the prompt, click Grant Screen Recording… and approve InkSpoke in macOS System Settings.
┌────────────────────────────────────────────────┐
│ Recordings & Meetings │
├────────────────────────────────────────────────┤
│ System Audio │
│ ⚠ Needs permission │
│ [ Grant Screen Recording… ] │
└────────────────────────────────────────────────┘
Without this, participants aren't captured

On macOS, until Screen Recording permission is granted, a meeting records your microphone only and the HUD shows a warning banner. Your voice is transcribed; the people on the call are not.

Step 2 — (optional) download the diarization model

A two-track meeting already labels your mic as You and the system audio as Participants without any extra download. If you want InkSpoke to split those participants into individual people (Priya, Sam, Alex…), grab the on-device speaker model once:

  1. In Settings → Recordings & Meetings, open the Speakers section.
  2. Click Download diarization model and watch the progress bar.
  3. When it finishes, you'll see "Speaker identification runs fully on-device."

It's a one-time ~46.5 MB download (a pyannote segmentation model plus a voice-embedding model), after which speaker identification runs entirely offline.

Step 3 — start the meeting from the tray

There's no hotkey for meetings — unlike push-to-talk dictation (which uses Alt + Space, or + Space on macOS), meetings are driven entirely from the tray menu, so they stay out of your way until you want them.

┌──────────────────────────────────┐
│ InkSpoke │
│ ────────────────────────────── │
│ ▶ Start Meeting Recording │
│ 📄 Transcribe a File… │
│ 🕘 History… │
│ … │
└──────────────────────────────────┘
  1. Click the InkSpoke tray icon.
  2. Choose Start Meeting Recording. The coral meeting HUD opens in its Ready phase.

Step 4 — choose your options in the Ready phase

Before anything records, set the options for this session. None of these are saved as defaults — they apply only to the meeting you're about to start.

┌────────────────────────────────────────────────┐
│ ● Meeting Recording ✕ │
├────────────────────────────────────────────────┤
│ │
│ [ Workspace ▾ ] [ Language: Auto ▾ ] │
│ [ Model: Whisper Small (local) ▾ ] │
│ [✓] Identify speakers (diarization) │
│ │
│ [ Cancel ] [ Start Meeting ] │
└────────────────────────────────────────────────┘
PickerDefaultWhat it does
WorkspaceYour defaultApplies a workspace's vocabulary and dictionary substitutions so names and jargon come out right.
LanguageYour preferred language, or AutoSets the spoken language, or lets InkSpoke detect it.
Transcription modelYour active local Whisper modelThe on-device model used to transcribe. Only local models appear here.
Identify speakers (diarization)OnLabels who spoke when. Turn it off to keep the whole recording as one Speaker.

When it looks right, click Start Meeting.

Want to replay the audio later? Turn on Keep audio first

By default InkSpoke keeps only the transcript, not the audio — so the Transcript Viewer's Play button won't appear. If you might want to listen back (or re-transcribe later), turn on Keep recorded audio in Settings → Recordings & Meetings → Storage before you record. See Speakers and the transcript viewer.

Step 5 — record and watch both sides come in

The HUD switches to a live view: an elapsed timer, two level meters, and a rolling preview of what's being said.

┌──────────────────────────────────────────────┐
│ ● Meeting Recording ✕ │ ← drag this strip to move the HUD
├──────────────────────────────────────────────┤
│ 2:47 │ ← elapsed timer (m:ss)
│ │
│ You (mic) ▇▇▇▅▃▁ │ ← your microphone level
│ Participants ▇▇▇▇▇▆▄▂ │ ← system-audio level
│ │
│ "…so the rollout is scheduled for Friday │ ← rolling live preview
│ and the docs still need a review pass." │
│ │
│ [ Cancel ] [ Stop ] │
└─────────────────────────────────────────────────┘
  • Both meters should move. You (mic) tracks your microphone; Participants (system) tracks the audio your computer is playing — the people on the call. Seeing both bounce confirms you're capturing the whole conversation.
  • The rolling preview shows a running snippet so you know it's hearing everyone. When there's nothing yet, it reads an italic Listening….
The preview isn't the final transcript

The rolling text is a quick, non-authoritative peek — it can drop or garble words. The clean, speaker-labeled transcript is built after you press Stop, from the full recording. Don't judge accuracy by the preview.

If you don't see the Participants meter move at all and you expected other voices, jump to Troubleshooting.

Step 6 — stop and let it process

When the call ends, either:

  • Click Stop on the HUD, or
  • Open the tray menu again — the item now reads Stop Meeting Recording — and click it.

The HUD moves to its Processing phase while InkSpoke transcribes the full recording and separates the speakers:

┌────────────────────────────────────────────────┐
│ ● Meeting Recording │
├────────────────────────────────────────────────┤
│ Transcribing… │
│ │
│ Diarizing speakers and building the │
│ transcript. │
└────────────────────────────────────────────────┘
Cancel vs. Stop

Stop ends the meeting and builds the transcript. Cancel discards the session without transcribing anything.

Step 7 — rename the speakers

When processing finishes, the Transcript Viewer opens automatically. At the top is an editable speaker roster. Type a real name into any box — for example, rename Participants to Priya — and it rewrites that label across every segment live and saves automatically.

┌────────────────────────────────────────────────────────────┐
│ Weekly sync 0:42:17 · English │
│ [ ▶ Play ] │
├────────────────────────────────────────────────────────────┤
│ Speakers │
│ [ You ] [ Priya ] │
├────────────────────────────────────────────────────────────┤
│ You Morning — can everyone hear me okay? │
│ Priya Yep, loud and clear. │
│ You Great, let's start with the roadmap. │
│ Priya I had one question about the timeline… │
├────────────────────────────────────────────────────────────┤
│ [ Copy ] [ Export VTT ] [ Other formats ▾ ] │
└────────────────────────────────────────────────────────────┘

The Play / Stop button in the header only appears when audio was kept (Step 4's tip). Renaming isn't cosmetic — it persists to the saved recording, so your exports and future re-opens carry the real names.

Step 8 — export to VTT or Markdown

From the footer you can copy the transcript or save it to a file:

  • Export VTT — the default button. Produces a WebVTT .vtt file (HH:MM:SS.mmm timing), ideal for video captions.
  • Other formats ▾ — a flyout with Markdown (.md), Plain text (.txt), SubRip (.srt), and JSON (.json).
  • Copy — puts a plain-text Speaker: text version on your clipboard.

To get Markdown — a titled document with bold speaker names and clock timestamps, perfect for pasting into notes or a wiki — click Other formats ▾ and choose Markdown (.md). The export filename is derived from the recording's title.

FormatExtensionBest for
WebVTT (default).vttWeb video captions
Markdown.mdNotes and docs — title + bold speaker names
SubRip.srtSubtitle files for video players
Plain text.txtQuick paste, Speaker: text lines
JSON.jsonFeeding another tool — per-segment timing, confidence, speaker origin

Expected result

You should now have:

  • A saved meeting in Settings → History, reopenable any time by clicking its row.
  • A speaker-labeled transcript in the Transcript Viewer — labeled You / Participants at minimum, or per-person names if you downloaded the diarization model and renamed speakers.
  • A .vtt or .md file exported to disk, carrying your renamed speakers.
  • (If you enabled Keep recorded audio) a Play button to replay the meeting.

Everything above was produced on your own device — no audio left your computer.

Troubleshooting

The "Participants" meter never moves — I only captured my own voice. System audio wasn't captured. On macOS, grant Screen Recording permission (Step 1) — until you do, meetings are mic-only and the HUD shows a warning banner. On Windows/Linux, check Settings → Recordings & Meetings → System Audio; if it says Not available on this device, that build or hardware can't capture loopback and the meeting will record the microphone only.

There's no "Play" button in the Transcript Viewer. The audio wasn't kept. By default InkSpoke stores only the transcript. Turn on Keep recorded audio in Settings → Recordings & Meetings → Storage before your next recording — it can't be enabled retroactively for a meeting you've already recorded.

Everyone shows up as a single "Speaker" (or no per-person split). Diarization was off, or the ML model isn't installed. Make sure Identify speakers (diarization) was checked in the Ready phase, and download the on-device speaker model from Settings → Recordings & Meetings → Speakers → Download diarization model. A two-track meeting still gives you You / Participants even without the model; per-person splitting needs it.

The transcript came out in the wrong language, or a meeting failed to process. There's no one-click "re-transcribe this meeting" button in the current build. If you turned on Keep recorded audio, re-run it through Transcribe a File… (below) with the right language or model selected. This is exactly why keeping audio before an important call is worth it — without a saved file there's nothing to re-run.

Alternative — transcribe a file you already have

Already have the audio — a recorded call, an interview, a downloaded meeting, a voice memo? You don't need to play it back in real time. File import runs the same long-form engine as a live meeting.

┌────────────────────────────────────────────────┐
│ Transcribe a file ✕ │
├────────────────────────────────────────────────┤
│ File: [ Choose file… ] interview.m4a │
│ │
│ [ Workspace ▾ ] [ Language ▾ ] │
│ Transcription model │
│ [ Whisper Small · on-device ▾ ] │
│ [x] Identify speakers (diarization) │
│ Speakers [ Auto-detect ▾ ] │
│ │
│ ▓▓▓▓▓▓▓▓▓░░░░░░░ Transcribing… 42% │
│ [ Cancel ] [ Transcribe ] │
└────────────────────────────────────────────────┘
  1. From the tray menu, choose Transcribe a File… (or use the Transcribe a File action in Settings → History).
  2. Click Choose file… and pick your media. Supported types include .mp3, .wav, .m4a, .ogg, .opus, .flac, .aac, .wma and, for video (audio track), .mp4, .webm, .avi, .mkv, .mov.
  3. Set the workspace, language, and transcription model, and choose whether to identify speakers. With an on-device model you can also set the Speakers count (Auto-detect, or 1–8).
  4. Click Transcribe. When it finishes, the same Transcript Viewer opens — rename speakers and export exactly as above.
File import can use a cloud model

Unlike live meetings, file import offers a cloud diarized model in the model picker when you've configured one (your own key or the InkSpoke Platform). It auto-detects speaker count but has a ~25 MB compressed-upload cap, so pick an on-device Whisper model for long recordings. See Transcribe a file.

Next steps