Five tools · One thorough breakdown

Choosing a transcription app shouldn't take this much research.

We spent several weekends running real recordings — interviews, podcasts, voice notes, field audio — through five well-known desktop transcription tools. Here's what came out, written without hedging. No league table. No "best overall" pick. Just five tools and the exact jobs each one is actually suited for.

See the tools Side-by-side table

⚡

5 apps

Analysed in depth

💻

100% local

Everything runs offline on your machine

🎁

Mostly free

Most tools cost nothing

🛡

No fluff

No filler, no fluff

The lineup

Five tools that deserve a spot on your drive

Every tool here processes audio on your own hardware. None demand a subscription. Each one takes a different angle on the same core problem — which is exactly what makes this comparison worth doing.

01 Cross-platform

Buzz

macOS · Windows · Linux · Open source

A stripped-back front-end built on OpenAI's Whisper. Drag in a file, select a model, get text back. The go-to starting point for anyone who hasn't run a local transcription before.

Read the full review

02 Cross-platform

Subtitle Edit

Windows native · macOS & Linux via workarounds · Open source

Technically a subtitle editor — but once you need to tighten up Whisper's punctuation, re-sync captions, or export to any of dozens of subtitle formats, nothing else comes close. More demanding than Buzz, but it earns that weight quickly.

Read the full review

03 macOS only

Whisper Transcription

Mac App Store · Free tier · Paid model unlocks

The polished option. Sandboxed, signed, and tucked into the menu bar. For anyone who wants something that genuinely behaves like a Mac app rather than a Python utility wearing a GUI, this is it.

Read the full review

04 macOS only Free

Pyrenees

Apple Silicon · Free · MLX-powered

The upstart. Grounded in Apple's MLX framework, it's the fastest tool in the group on M-series chips — sometimes by a gap that's almost embarrassing.

Read the full review

05 macOS only

VoiceInk

Real-time dictation · Open source

A different category altogether: VoiceInk is a dictation tool, not a file transcriber. Hold a hotkey, speak, and the text drops wherever your cursor is. It's here because it covers a use case the other four ignore entirely.

Read the full review

Quick look

Comparing the five at a glance

A stripped-down table for narrowing the field. The detailed guides go much further into where each tool runs into trouble.

App	Platforms	Price	Best for	File mode	Live dictation	Subtitle export
Buzz	Mac · Win · Linux	Free (open source)	First-time users, straightforward batch jobs	Yes	Limited	SRT, VTT, TXT
Subtitle Edit	Win · Mac/Linux (Mono)	Free (open source)	Cleaning up transcripts & subtitle work	Yes	No	~200 formats
Whisper Transcription	macOS	Free tier · paid model unlocks	Mac users who want refinement over tinkering	Yes	Microphone capture	SRT, VTT, TXT, DOCX
Pyrenees	macOS (Apple Silicon)	Free	Speed, batch jobs on M-series Macs	Yes	No	SRT, VTT, TXT
VoiceInk	macOS	Free (open source)	Dictation into any application	Secondary feature	Primary feature	N/A

Finding your fit

Begin with the use case, not the app name

We get asked constantly which transcription app wins. The honest answer is that it depends on what you need. Here's a cleaner way to think through it.

Never done local transcription before…

Start with Buzz. It's the quickest way to find out whether local transcription meets your bar. You'll have your answer within ten minutes.

If captions are core to your work…

Subtitle Edit is the clear answer. The waveform editor and format library leave the rest of the pack behind. Whisper integration is a bonus on top of an already excellent tool.

If you're on macOS and value a native feel…

Whisper Transcription for a refined, curated experience where paying for better models makes sense, Pyrenees if you want the speed advantage and zero cost without the extras.

If talking to your computer is the goal…

VoiceInk. It's the only tool in the group built for the "I want to speak, have it show up in whatever I'm typing in" workflow. The other four simply aren't designed for that.

A word from us

We aren't journalists. We're a pair of people who got fed up reading "Best AI Transcription Tools 2025" listicles that all repeat the same five entries in the same format. Affiliate roundups have their place — they keep smaller sites running — but they tend to sand down the distinctions between tools. We wanted somewhere that does the opposite: one that actually explains why you'd reach for one app instead of another.

Explore the guides in whatever order makes sense. Most visitors read one or two — that's completely fine. More about this project here if you're curious who's behind it.

Home › Reviews › Buzz

Buzz

Buzz is the kind of tool that handles a single problem and then leaves you alone. The problem — running OpenAI's Whisper model on a file that shouldn't leave your device — used to mean spinning up a virtual environment, running a handful of pip install commands, and a coin-flip chance of hitting a cryptic ffmpeg error. Buzz compresses all of that into a window with a "Transcribe" button. Drop in your audio, pick a model, watch progress tick along. Output lands in a folder. That's it.

That's the whole story. The pitch sounds modest because the surface is deliberately minimal. What makes it interesting is everything it omits: no account registration, no cloud uploads, no push toward a premium credits plan. It's a thin shell around a powerful model. After extended time with it across two laptops and three operating systems, that restraint turns out to be the app's biggest asset.

What Buzz is, exactly

Buzz is an open-source desktop app built by Chidi Williams. The GitHub repository has been active since 2022 and keeps receiving updates. Internally, it packages two transcription engines: the original OpenAI Whisper implementation in Python, and the considerably faster whisper.cpp port in C++. You choose between them at the model selection screen — and the right choice depends on the hardware you're working with.

Users familiar with Audacity will recognise the spirit immediately: utilitarian, slightly behind the times in its widget styling, and very clearly built by people who care more about the output than the visuals. No inflated whitespace or feature dashboards. The main window is a job list — each row tells you whether a transcription is queued, running, or complete.

Note

Buzz is not OpenAI's official Whisper application. OpenAI has never shipped one. Buzz is a community-made front-end that loads OpenAI's open-source model on your local machine. Everything happens on your computer; nothing is transmitted to OpenAI or any other server.

The real experience of using it

The standout first impression is the time from "freshly downloaded" to "transcript in hand." On a 2021 M1 MacBook Pro, the whole setup took roughly three minutes — most of it spent on the initial model download, which weighs in around 1.5 GB for the medium size. On a five-year-old Windows machine with no discrete GPU, transcription itself took longer, but the setup process was identical.

The interface is not pretty — worth saying upfront. It's built with PyQt and carries all the visual charm that implies: functional in the way a folding knife is functional. You won't use it to demonstrate Mac aesthetics to anyone.

I tested it on a 47-minute interview recorded for an unrelated project. The tiny model finished in around 90 seconds and got the gist. The medium model took four minutes and caught most proper nouns. The large model ran for about fourteen minutes and produced output I'd actually show someone.

The highest compliment I can offer Buzz is that it disappears. You start a job, you walk away, the file is there when you come back.

What's happening inside

Buzz lets you choose between several model sizes (tiny, base, small, medium, large) and several backends. The most important choice is between the original OpenAI Whisper Python implementation, the whisper.cpp backend, and Hugging Face's transformers-based implementation. There's also support for using OpenAI's hosted Whisper API if you'd prefer to send the file to OpenAI in exchange for faster results — but that defeats the privacy advantage, and almost no one I know who installs Buzz uses that mode.

Two practical observations from real-world use:

On Apple Silicon, the whisper.cpp backend with Core ML acceleration is the fastest by a wide margin. You'll want to enable that.
On any machine with a recent NVIDIA GPU and CUDA installed, the original PyTorch backend will tap the GPU and become noticeably faster — large-model transcriptions that took 25 minutes on CPU finished in under four minutes. Install CUDA drivers first if you're on Windows with an NVIDIA card.

Buzz also supports a "Live Recording" mode where it'll transcribe directly from your microphone as you speak. I've used this feature exactly twice, and both times I came away thinking that this is not what Buzz is for. The latency is wrong for it — you'll get text in chunks of several seconds — and it doesn't integrate with other apps. If you want dictation that drops text where your cursor is, look at VoiceInk instead. If you want live captions for a video call, look elsewhere entirely. Buzz is a file-based tool with a microphone option grafted on, and you can feel the seam.

Tip

If you've already tried Buzz and the transcripts come back with weird timing or punctuation issues, don't wrestle with the app — export to .srt or .vtt and clean up in Subtitle Edit. It's faster than fighting Buzz's text editor.

Pros and cons

What works

Truly cross-platform — consistent behaviour on Mac, Windows, and Linux
Completely free — MIT licensed, no account needed, no usage cap
Multiple backend options, including fast whisper.cpp
Local processing by default — your audio never touches a server
Covers the formats you actually need: SRT, VTT, and plain text
Queue up a batch and leave it running unattended

Where it falls short

The interface looks old and will put off anyone expecting a polished native app
The live recording feature is awkward and introduces noticeable lag
No waveform view or tools for adjusting word-level timing
Translation is available but restricted to English output (Whisper's own constraint)
First-time model downloads can drag on slow or patchy connections
Occasional crashes on very long files — anything over three hours is a risk

Step-by-step: from download to transcript

This is the section most write-ups skip, so here it is. The complete flow, from "I haven't installed anything" to "I have a clean SRT", without skipping the parts that actually trip people up.

Grab Buzz from the official GitHub releases page. Pick the build for your operating system. On macOS, that's a .dmg; on Windows it's an .exe installer; on Linux you've got AppImage and Snap options.
Install it like any other app. On Mac, drag to Applications. On first launch, macOS may complain about an unidentified developer; control-click and choose "Open" once and the warning goes away.
Open Preferences and set your preferred backend. On Apple Silicon, "Whisper.cpp" with Core ML support is the right answer. On Windows with an NVIDIA GPU, the OpenAI Whisper backend will use CUDA. Otherwise, leave it on the default.
Drop your audio or video file into the main window. Buzz accepts MP3, WAV, M4A, FLAC, MP4, MOV — basically anything ffmpeg can read.
Select a model size. Start with base if you're not sure. Move up to medium for cleaner output. The large model is the most accurate but slow and memory-hungry.
Choose your output formats. For interviews, TXT plus SRT is the right combination. The first is for reading; the second is for any future cleanup work in a subtitle editor.
Press Transcribe and go do something else. Seriously. Make a cup of tea. The progress bar updates honestly. When it finishes, the output files appear next to the source.

Example

Test recording: a 47-minute interview, recorded into the iPhone Voice Memos app, exported as .m4a.
Result with the medium model on M1 MacBook Pro: finished in 14 minutes 22 seconds. The transcript needed roughly 5 minutes of cleanup — mostly proper nouns the model didn't know, plus the usual punctuation around hesitations.

How Buzz stacks up against the others

Staying within this site's shortlist, here's how Buzz stacks up against the others:

Versus Subtitle Edit: Subtitle Edit can run Whisper too, but it's a much bigger tool — full waveform editor, an enormous range of subtitle formats. If captioning or translation is your work, Subtitle Edit is probably already your main app and Buzz adds nothing new. If you just need a transcript, Buzz has a shorter learning curve.

Versus Whisper Transcription (Mac): Whisper Transcription is more refined, visually cleaner, and slots into macOS more naturally. It's also Mac-only and locks some features behind payment. Buzz is less attractive but costs nothing on any platform.

Versus Pyrenees: Pyrenees is faster on Apple Silicon — definitively — but exclusively on Apple Silicon. If you're running an M-series Mac and working with shorter files, Pyrenees wins on speed. Buzz holds the advantage for cross-platform use and its range of backend options.

Versus VoiceInk: Different tool for a different job. VoiceInk is for live dictation (talking into apps as you'd talk into iOS dictation). Buzz is for files. They don't really compete.

Who Buzz is right for

Author's take

If you've never run a local transcription before — start with Buzz, even if you end up switching later. It's the path of least resistance for checking whether this approach is actually right for you.

If you already know you need subtitle editing, dictation, or peak speed on Apple Silicon, you can skip Buzz and go straight to the more specialised option.

FAQ

Does Buzz cost anything? Any catches?

Buzz is fully free under the MIT license. No registration, no time-limited trial, no paid tier. The only scenario where money enters the picture is if you opt to route transcription through OpenAI's hosted API — but the default local mode costs nothing beyond the power your machine draws.

Does Buzz upload my audio?

Not by default. The local backends — whisper.cpp and the open-source Whisper — process everything on your device. The only mode that sends data anywhere is the explicit "OpenAI API" option, and that requires you to supply your own API key.

Which languages does Buzz handle?

Any language Whisper supports, which spans roughly 99 languages with varying reliability. English, the main European languages, and Mandarin deliver the best results. Minority languages can be noticeably less consistent.

Is there a built-in editor for fixing the transcript?

You can make minor text corrections inside Buzz, but it isn't a dedicated editor. For anything more involved — re-timing, punctuation fixes, splitting cues — take the SRT file into Subtitle Edit or a similar tool.

Can I use Buzz without an internet connection?

Yes, once you've downloaded the model you want. Internet is only needed on first load of each model size. After that, everything runs fully offline.

Why does the large model take so long on my computer?

The large Whisper model is roughly 3 GB and needs a GPU or Apple Silicon's Neural Engine to run at a reasonable speed. On an older CPU-only machine, expect a very long wait — the medium model is usually the smarter tradeoff in that situation.

M

Reviewed by the team. Buzz was installed on three separate machines — an M1 MacBook Pro, a Windows 11 ThinkPad with no discrete GPU, and an Ubuntu 22.04 desktop with an NVIDIA card — and the same batch of test files was run on each. No contact was made with the Buzz developers. We have no affiliation with the project.

Two workflows, two kinds of users

There are essentially two workflows the app supports, and people who try Subtitle Edit generally fall into one of two camps based on what they came for.

Workflow A: you have an audio or video file and you want clean, correctly-timed captions. You import the media, you tell Subtitle Edit to run Whisper on it, you wait. You get a populated cue list. Then you spend twenty or thirty minutes cleaning it up: fixing punctuation, merging short cues, splitting long ones, retiming the parts where the model got confused. The output goes out as .srt or whatever else you need. This is the workflow professional captioners use.

Workflow B: you have a transcript already (from Buzz, from Whisper Transcription, from Otter, doesn't matter) and you want to fix it. You open the existing file, you bring in the audio so the waveform syncs with the cues, and you fix the obvious mistakes by listening and clicking. This is what I personally use it for, and I think it's the underrated use case. Even if your primary transcriber is something else entirely, Subtitle Edit makes a phenomenal "second tool".

A small confession

These days I run most things through Buzz first, then open the resulting .srt in Subtitle Edit to clean up. It isn't the workflow the developer envisioned, but it's quicker than trying to do everything in one place, and Subtitle Edit's keyboard-based editing is honestly better than anything else I've come across.

What using it day-to-day feels like

The interface is dense. There's no softer way to put it. Every pixel earns its place. On first launch you see a toolbar with maybe a hundred icons, a waveform panel occupying the bottom third of the screen, and a cue list between them. It's a lot.

Spend an afternoon with it and the density starts feeling like a feature. The reason every command has a keyboard shortcut is that captioners work at pace — they need to jump from a timestamp correction to a spell-check to an export without lifting their hands from the keyboard.

Whisper integration sits under Video → Audio to text (Whisper). From there you choose the engine — Subtitle Edit supports the original Python Whisper, whisper.cpp, Const-me's GPU implementation, Purfview's Whisper Faster, and a couple of others depending on which version you have installed. Each engine has its own strengths. On a Windows laptop without a GPU, Purfview's implementation gave me the best balance of speed and accuracy. On a machine with an NVIDIA card, Const-me's GPU build was faster than anything else by a wide margin.

The format support is genuinely extraordinary

This is the part nobody talks about, and the thing that makes Subtitle Edit irreplaceable in some workflows. The app reads and writes well over two hundred subtitle formats. If you've ever stared at a file with a strange extension and wondered how to convert it to .srt without losing timing or styling, Subtitle Edit is almost certainly the answer.

A partial list of what it handles:

The everyday ones: SRT, VTT (WebVTT), ASS, SSA, TXT
Broadcast formats: EBU STL, Cavena 890, EBU-TT, EBU-TT-D, IMSC 1.1, TTML, DFXP, SCC, SMPTE-TT
DVD/Blu-ray: VobSub (.sub/.idx), SUP (BD), and OCR support for both
Streaming/professional: Netflix's IMSC variant, Apple iTT, MicroDVD, MPL2
Weird and old: JSON Web, FAB, Sonic Scenarist, ATS, plus a long tail of pretty obscure formats

If your transcription job ends with "and then we hand it to a broadcaster," Subtitle Edit may be the only free tool capable of producing a file that will actually pass.

Tip for the OCR feature

If you have a video file whose subtitles are baked in as images (DVD/Blu-ray rips, some MKV files), Subtitle Edit's built-in OCR can extract them as editable text. Set the language correctly and clean up the output in the cue list. Faster than retyping.

Where it stumbles

Honest section. Subtitle Edit is not for everyone, and there are genuine friction points beyond the dense interface.

The platform story is uneven. The Windows version is a proper, signed, native application that has had two decades of polish. The Mac version is a newer port that runs natively on both Intel and Apple Silicon, but feels less mature — keyboard shortcuts that work flawlessly on Windows occasionally do nothing on Mac, certain dialogs appear off-screen, and waveform extraction sometimes fails on file types that work fine on the Windows build. On Linux, you're typically running it through Mono, which works but has its own assortment of papercuts. If you're not on Windows, expect rougher edges.

It's not a transcription-first app. If your goal is to get a clean .txt transcript and you don't care about timing, you'll find yourself fighting a UI that wants you to care about timing. You can absolutely use it for plain transcripts — just export to TXT after the cues are populated — but you'll spend a lot of attention on widgets you didn't need.

The translation features are uneven. There are translation integrations (Google, DeepL, libretranslate, ChatGPT API, others), but the quality varies and the UX of running them feels grafted on. For pure translation work, you're better off elsewhere.

The learning curve is real. Out of every tool we cover on this site, Subtitle Edit has the steepest first-week curve. Plan for it.

Pros & cons, condensed

Strengths

Best-in-class subtitle editing — keyboard-driven, quick, densely featured
Waveform editor that outperforms most paid alternatives
Multiple Whisper backends, including GPU-accelerated options
Format support that no other free tool even approaches
OCR for image-based subtitles included at no charge
Active development and an unusually responsive maintainer
Quietly reliable for over two decades

Weak spots

Interface looks and feels like Windows-native circa 2010
Mac and Linux builds trail behind the Windows version
Steep learning curve in the first few sessions
Wrong tool if you only need a plain text transcript
Whisper setup involves picking the right engine — not a one-click affair
Translation quality is inconsistent and feels bolted on

How to use it: three short walkthroughs

Because Subtitle Edit covers more than one workflow, a single step-by-step guide doesn't quite fit. Here are three focused ones instead.

1. From audio file to clean SRT (full Whisper workflow)

Open Subtitle Edit and load your video or audio. File → Open video file. The waveform should appear at the bottom; if it doesn't, the file extension may not be supported and you'll need to convert.
Open the Whisper dialog: Video → Audio to text (Whisper). Pick the engine and model size. Faster-Whisper with the medium model is a good default on most Windows machines.
Let it run. Progress shows in a small log window. A 30-minute file on a midrange Windows laptop typically completes in 8–15 minutes.
Review the cue list. Each cue is a row. The waveform shows where it sits in the audio. Use the keyboard arrows to step through cues; spacebar plays the current one.
Fix obvious problems. Common ones: cues that are too long (use Tools → Split lines longer than…), overlapping cues, wrong proper nouns. Most of this can be handled with built-in batch tools.
Export. File → Save as → SRT for general use, or whichever format your downstream tool needs. Subtitle Edit can also do batch convert in case you have many files.

2. Cleaning up an SRT another tool already produced

Open the existing SRT in Subtitle Edit (File → Open).
Bring in the original audio via Video → Open video file. The waveform now syncs with the existing cues.
Run "Fix common errors" from the Tools menu. This catches things like missing spaces after periods, capitalization issues, double spaces, and Whisper's habit of starting cues with hesitation markers.
Walk through the cues that look suspicious — usually the very long ones and the very short ones — and fix them by listening to the audio.
Save. Done.

3. Extracting subtitles from a Blu-ray rip

File → Import → VobSub or Blu-ray sup, depending on what you have.
Choose the OCR engine. The built-in nOCR is decent for European languages; Tesseract is better for some scripts.
Run OCR. Review the results and correct any glyphs the engine misread.
Save out as SRT for use anywhere else.

Subtitle Edit vs. the others

Measured against the rest of the shortlist:

Compared to Buzz, Subtitle Edit is the heavy tool. Buzz is for "I have a recording, I want a transcript". Subtitle Edit is for "I have a recording, I want broadcast-ready captions, and I'm willing to spend an afternoon getting them right." Both are free; they're answers to different questions.

Compared to Whisper Transcription, Subtitle Edit is dramatically uglier and dramatically more capable. Whisper Transcription will get you a clean transcript faster on a Mac. Subtitle Edit will let you actually shape it.

Compared to Pyrenees, the comparison doesn't really hold — Pyrenees is a transcription engine optimized for speed, Subtitle Edit is an editing environment. They could even live alongside each other: Pyrenees produces, Subtitle Edit edits.

Compared to VoiceInk, they share no overlap at all. Different jobs.

Who it's for

Author's take

Subtitle Edit is the answer once you've moved past the "can I get Whisper running at all?" stage and you're now asking "how do I make this output actually usable?" Most people will install Buzz first and discover Subtitle Edit a few months later — and that order is probably right. For translators, captioners, and anyone whose work involves the phrase "broadcast-safe," it's the most important free tool you can have.

FAQ

Is Subtitle Edit free for commercial use?

Yes. It's released under the GNU General Public License v3 and can be used commercially without restriction. The one nuance: if you bundle and redistribute Subtitle Edit, you have to comply with the GPL. Just using it on commercial work is unrestricted.

Does it run on Mac and Linux properly?

It runs, with caveats. There's now a native macOS build supporting both Intel and Apple Silicon, and it's improving steadily — but the Windows version still has the most polish. On Linux you'll typically run it through Mono. If you need Subtitle Edit's full capability, plan on using Windows or a Windows VM.

Which Whisper engine should I pick from inside the app?

For most Windows users without a GPU: Purfview's Whisper Faster build is the most reliable balance of speed and accuracy. With an NVIDIA GPU: Const-me's GPU implementation tends to be the fastest. On macOS: whisper.cpp through the bundled integration. The differences are smaller than the choice of model size, so don't agonize.

Can it do real-time transcription?

No. Subtitle Edit is strictly file-based. For live transcription or dictation, look elsewhere.

Does it handle speaker diarization?

Not natively in any clean, automatic way. Whisper itself doesn't reliably perform diarization, and Subtitle Edit doesn't add a separate diarization step. If you need speaker labels, you'll do that work manually in the cue list, or run the audio through a separate diarization tool first.

Is the OCR feature any good?

Surprisingly good, for European languages and cleaner DVD subtitles in particular. For Blu-ray SUPs, accuracy tends to land at 90% or higher before corrections. For non-Latin scripts, results vary — Tesseract handles the heavy lifting, and you'll need the correct language pack installed.

M

Tested over a couple of months on a Windows 11 laptop (Intel, no GPU), an M2 MacBook Air, and a Linux desktop running it through Mono. Most testing happened on the Windows build because it's the most complete; the Mac caveats reflect direct experience with the native build.

Setting expectations

This is one of the rare apps in this space where someone clearly designed it rather than simply shipped it. You can tell immediately. The icon doesn't look like a Python logo with a microphone grafted on. The window has the right corner radius. The settings panel uses the macOS sheet style that actually lets you find what you need. When you import a file, the app shows you metadata — sample rate, channels, duration — that most transcription tools simply ignore.

None of this changes the underlying transcription quality. Whisper is Whisper, regardless of which app calls it. So the question Whisper Transcription has to answer is: given that the model is the same, what does this app give me that the free options don't?

The honest answer, after two weeks of regular use: a collection of small things, none individually decisive, that together add up to "this is the app I'd hand to someone who doesn't want to think about it."

What it actually does

The core flow matches every other tool in this category. Drop in a file, select a model, press a button, receive text. Where Whisper Transcription sets itself apart is in the details.

The transcript view is interactive. Click a sentence, the audio jumps to that timestamp. Edit the sentence in place. Highlight a span and you get inline tools to merge cues, split them, change capitalization, mark a speaker. It's not Subtitle Edit's level of cue-editing power, but for working with prose-style transcripts, it's genuinely faster than re-opening your output in another app.

It can capture system audio, not just microphone. A small but uncommon feature. If you want to transcribe a YouTube video, a podcast you're listening to, or a Zoom call (with appropriate permissions), Whisper Transcription can pipe the system's audio output directly in. Most of the free alternatives only see the microphone.

Export is well thought through. SRT, VTT, plain text, and DOCX are all one click away. The DOCX export in particular is more polished than what you'll get from running Whisper through a script — it preserves paragraph breaks at sensible points, includes timestamps as headers if you want them, and doesn't dump everything into a single block of unreadable prose.

There's a menu-bar mode. If you click the menubar icon, a small palette appears that lets you start a recording, drop in a file, or pull up your recent transcripts without opening the main app. It's the kind of detail a tinkerer never builds and a designer always insists on.

A small example

I recorded a 12-minute podcast intro the same day a new model unlock went live. Imported the M4A. Transcription took 2 minutes 40 seconds with the medium model on an M2 MacBook Air. The interactive transcript caught two proper nouns I'd mispronounced, and clicking each one to hear the audio play back was — and I mean this — genuinely satisfying. No find function, no waveform scrubbing.

The pricing question

This is where we have to discuss money, because it's the main thing separating Whisper Transcription from the free alternatives.

The app is a free download from the Mac App Store. The free tier includes the smaller Whisper models — typically tiny and base — which are adequate for casual notes but noticeably weaker than what you'd want for professional work. Unlocking the larger models (medium, large, and various distilled variants) requires a one-time in-app purchase. Since pricing shifts over time and varies by region, check the App Store listing rather than relying on a figure from this review.

Worth noting: the pricing model is a one-time unlock, not a subscription. Pay once and the larger models are yours. No monthly fee, no per-minute charge, no credits. That alone makes it cheaper than most cloud-based transcription services if you transcribe more than a few hours per month.

My honest take on whether it's worth paying for

Free Whisper exists. You can run it through Buzz or Pyrenees and get the same model output for nothing. So the question isn't "should I pay for transcription?" — it's "should I pay an indie Mac developer for a polished front-end?" If you transcribe regularly and value your time, yes. If you transcribe rarely or genuinely enjoy command-x-rule flags, no. Both answers are reasonable.

Where the polish ends

I want to be direct about the limitations here, because every "the polished one" review I've ever read tends to gloss over them.

Mac only. Obvious but worth saying. If you ever switch to Windows or Linux, your purchase doesn't follow you and your workflow doesn't follow you.

Less flexible than open-source alternatives. The app picks reasonable defaults and hides most of the tuning knobs. If you want to set custom Whisper parameters, run a fine-tuned model, or experiment with non-standard backends, you'll outgrow Whisper Transcription quickly. Buzz lets you switch backends; this doesn't.

Speed is good but not the best. On Apple Silicon, Pyrenees is faster — sometimes substantially faster — for the same model size. Whisper Transcription uses solid acceleration but isn't the speed champion of the field.

No deep subtitle editing. The interactive editor is a pleasure for prose, but it's not pretending to be Subtitle Edit. If your job involves cue-by-cue caption work, you'll still be exporting to .srt and finishing the job elsewhere.

App Store review constraints. Because it's distributed through the App Store, it lives inside Apple's sandbox rules. That has security upsides (the app can't quietly access files you didn't grant it access to) but the occasional UX papercut — for instance, you'll be re-asked for microphone permission after some macOS updates.

Pros and cons

What you get

Genuinely Mac-native interface — feels like a 2026 app, not a 2014 utility
Interactive transcript editor with click-to-play timestamps
Clean DOCX, SRT, VTT, TXT export
Menu-bar quick access for on-the-fly recordings
System audio capture, not just microphone input
One-time purchase, no ongoing subscription
App Store distribution: signed, sandboxed, straightforward to install
Active development from an established indie developer

What you don't

Mac only; no path for Windows or Linux users
Larger Whisper models sit behind a paywall
Slower than Pyrenees on identical hardware
Limited backend customization compared to Buzz
Not a serious subtitle editor
Sandbox occasionally requires re-granting permissions after macOS updates

How to actually use it

The workflow is shorter than for most tools we've reviewed. Here's the condensed version.

Install from the Mac App Store. Search "Whisper Transcription" and install. No external installer, no permissions juggling.
Open it and let it download the default model. The free models are small enough that this is fast.
Drop in a file or click the record button. Audio and video files work; the app strips audio automatically.
Pick the model and language. If you've unlocked the larger models, medium is a sweet spot for most use cases. Language can be left on auto-detect.
Start the transcription. Watch the progress bar — or, more usefully, switch to another app and ignore it until it's done.
Edit the transcript inline. Click any sentence to play it back. Fix mistakes. Tag speakers.
Export. File → Export, pick the format. Done.

Tip

If you're going to do any serious cue editing, export to SRT and open it in Subtitle Edit. Whisper Transcription's editor is great for prose; it's not designed for the cue-by-cue work captioners do.

Compared to the others

Quick reference points across the rest of the shortlist:

Versus Buzz: Buzz is free everywhere; Whisper Transcription is a paid Mac app. If you're disciplined enough to set up Buzz and don't mind its plain UI, you get the same transcription quality without spending anything. If you want it to feel like a Mac app and you transcribe regularly enough that the time savings matter, the purchase pays itself back.

Versus Pyrenees: Pyrenees is faster and free, but barer-bones. No interactive editor, no DOCX export, no system audio capture. If raw speed and zero cost are your priorities, Pyrenees. If polish is your priority, this.

Versus Subtitle Edit: Different category. Whisper Transcription is for getting transcripts; Subtitle Edit is for grooming captions. If you do both, you'll likely use both.

Versus VoiceInk: Different again. VoiceInk is for live dictation into other apps. Whisper Transcription is for files (with optional recording). They cover different problems.

FAQ

Is the free tier sufficient on its own?

For casual use — voice memos, meeting notes, short interviews you'll edit anyway — yes. The smaller models are more capable than you'd expect. For longer, professional work, the medium and large models are noticeably better, and the gap matters most when audio quality is uneven.

How does it compare to OpenAI's hosted Whisper API?

The hosted API is faster and defaults to the large model, but every minute transcribed is a minute of audio sent to OpenAI's servers at a per-minute charge. Whisper Transcription does everything on your Mac, charges nothing per minute, and keeps your audio local. For privacy-sensitive work, the answer is clear. For one-off use of large amounts of public-domain audio, the hosted API might be cheaper.

Does the in-app purchase transfer to a new Mac?

Yes. App Store purchases are tied to your Apple ID. Buy a new Mac, sign in with the same account, and your unlock carries over. Family Sharing configurations may extend access to family members as well.

Is the audio uploaded anywhere?

No. The model runs on-device. The app needs internet only for the initial model download and App Store updates. If you've already downloaded the models, you can transcribe entirely offline.

What's the longest file it can handle?

In our testing, files of two to three hours worked without issue on M-series Macs with the medium model. Beyond that, you may occasionally hit memory warnings. Splitting very long recordings into segments is good practice regardless of which app you use.

Does it support speaker labels?

The interactive editor lets you assign speaker labels to text spans manually, which works well for short interviews. There's no automatic diarization — if that's essential, you'll need a separate tool for it.

Can I run a custom or fine-tuned Whisper model?

Not directly. Whisper Transcription works with the official Whisper model family and certain distilled variants. If you need a custom or domain-adapted model, a more flexible tool like Buzz or a command-x-rule setup is the right path.

M

Tested over a couple of weeks on an M2 MacBook Air with the paid unlock. The free tier was tested on a separate machine without the unlock to confirm the experience for non-paying users. We have no relationship with the developer.

What it is, in brief

Pyrenees is a free macOS transcription app for Apple Silicon Macs. It's built around MLX, Apple's open-source machine-learning framework released in late 2023. MLX is designed specifically for the unified-memory architecture of M-series chips — it runs models on the GPU and Neural Engine without copying tensors back and forth across separate VRAM and system RAM the way frameworks designed for NVIDIA cards have to. For models like Whisper, that translates into noticeably faster inference than running the same model through plain PyTorch or even whisper.cpp's Core ML path.

The app is compact, unobtrusive, and does essentially nothing beyond transcription. Import a file, select a model, receive a transcript. What earns it a dedicated guide is how it handles that one function on Apple Silicon hardware.

A note on names

If you can't find Pyrenees in the App Store, that's because it isn't there — it's distributed directly as a notarized .dmg. You're meant to download it, accept the security prompt once, and run it. This is normal for indie Mac apps; it's not a sign that anything's wrong.

So how fast is it, actually?

The usual disclaimers apply. Speed comparisons between transcription apps vary heavily by hardware, and precise figures will be stale by the time you read this. With that caveat on the table, here's what we found.

On every Apple Silicon machine tested — an M1 MacBook Air, an M2 MacBook Air, and a Mac Studio with M2 Max — Pyrenees completed the same jobs faster than any other tool in the comparison. The margin was not small.

The qualitative shift is more telling than raw numbers. Where the same hardware running Buzz felt like "kick it off and go make a drink," Pyrenees on the same machine feels like "start it and pause a second."

Pyrenees isn't faster the way a hardware upgrade feels faster. It's faster the way switching from email attachments to AirDrop feels faster — you've crossed into a different category, not just nudged the dial.

What actually sets it apart from the others

Most Mac transcription apps fall into one of two technical camps. They either ship the original PyTorch Whisper implementation with whatever GPU acceleration they can scrape together, or they bundle whisper.cpp, the C++ port that Georgi Gerganov maintains. Both are perfectly good options.

Pyrenees sits in a third camp. It uses MLX-converted Whisper weights and runs them through the MLX runtime. Because MLX is built specifically for Apple Silicon's unified memory architecture, it can keep the model and audio in the same memory pool the GPU and CPU share — which is why the speed gap is so pronounced.

The practical consequences:

Lower memory pressure. The large model on a 16 GB Mac is genuinely usable in Pyrenees. Other apps running the same model often push the system into heavy swap and grind to a halt on longer files.
The Neural Engine gets real use. Many "Apple Silicon–optimised" apps gesture toward the ANE without truly engaging it. MLX reaches it for the parts of the Whisper pipeline where it makes a difference.
Quantised models are fast to load. Pyrenees includes several quantised variants — 4-bit, 8-bit — that deliver accuracy close to full-precision models while consuming a fraction of the resources.

Tip

If you're on a Mac with 8 GB of unified memory, start with the 4-bit medium model. The quality sits closer to the full medium than you'd expect from quantization, and the speed is excellent.

What Pyrenees doesn't do

This is the section where the case for a different tool gets made. Pyrenees is deliberate about its scope, and the things it won't do are worth knowing before you commit to it.

It won't run on Intel Macs. Apple Silicon is a strict requirement. If you're still on a 2019 MacBook Pro, this app isn't an option — you'll need to upgrade first or use Buzz instead.

No Windows or Linux support. Follows from the platform requirement, but worth stating clearly for anyone building a cross-platform workflow.

No interactive transcript editor. What you get is a completed transcript. Minor typo fixes in the export are possible, but there's no click-to-play timeline, no inline cue editing, no speaker separation.

No system audio capture. Microphone input is available for on-the-fly recording, but unlike Whisper Transcription, Pyrenees can't pull audio from another running application.

Not built for dictation. It's file-based only. If you want to speak into other apps, VoiceInk is the tool you're after.

No broadcast-grade subtitle exports. SRT, VTT, and plain text cover the everyday cases. For TTML, EBU-STL, or other broadcast formats, you'll need to bring the SRT into Subtitle Edit.

Pros & cons

Strengths

Fastest free transcription tool we tested on Apple Silicon
Quantized model variants make large models viable on 8 GB Macs
Genuinely free — no in-app purchase, no sign-up required
Tiny install footprint compared to PyTorch-based alternatives
Reliable on long files; memory behavior is much better than older Whisper apps
Active development riding the MLX-Whisper momentum

Weaknesses

Apple Silicon only — narrow platform support
No transcript editor; output is read-only inside the app
No system audio capture, no dictation, no subtitle workflow
Smaller community than Buzz; fewer guides and tutorials available online
Distribution outside the App Store means a slightly more cautious first install
UI is minimal to the point of feeling unfinished if you're used to mature applications

How to use it

This is the most concise how-to in any of these guides, because the app genuinely has that few steps.

Download Pyrenees from the developer's site (look for the official notarized DMG; avoid mirrors).
Drag the app to /Applications. First launch will hit Gatekeeper; right-click the icon, choose "Open", confirm. macOS will remember the choice.
Pick a model on first launch. The app will offer to download one. Medium 4-bit is a good first choice on most M1/M2 Macs.
Drop your audio or video file in. Or hit the record button to capture from the microphone.
Confirm the language if you don't want auto-detection.
Wait. Briefly. Pyrenees is faster than the alternatives; you'll often be done before you finish your coffee.
Export. File → Export → SRT/VTT/TXT.

In practice

Here's how I use it personally. I record voice memos for note-taking on my iPhone, AirDrop them to my MacBook Air at the end of the day, drop them into Pyrenees, and have text in the time it takes to open my notes app.

How it compares to the rest of the shortlist

Versus Buzz: Pyrenees is faster and better-looking. Buzz has more versatility — Linux, Windows, multiple backends, batch queuing, OpenAI API option. Mac-only users with no need for that flexibility are better off with Pyrenees. Anyone working across platforms or needing the options should keep Buzz around.

Versus Whisper Transcription: Pyrenees is faster and costs nothing; Whisper Transcription is more refined and includes features (interactive editor, system audio capture, DOCX export) that Pyrenees lacks. A genuine tradeoff. Start with Pyrenees — it's free. If after a week you're missing the extras, Whisper Transcription becomes a reasonable buy.

Versus Subtitle Edit: Different jobs. Pyrenees produces, Subtitle Edit edits. The natural workflow is to use both.

Versus VoiceInk: Different jobs again. Pyrenees is for files, VoiceInk is for live dictation.

FAQ

Does it work on Intel MacBooks?

No. Pyrenees requires an Apple Silicon chip — M1 or newer. MLX is built around that architecture and won't run on Intel.

Is there a paid tier, or is it truly free?

Fully free. No subscription, no paid tier, no paywalled models. Some fringe features may be donation-encouraged, but the core transcription is unrestricted.

Where does my audio go?

Transcription stays entirely on your machine. Pyrenees has no server mode whatsoever — a genuine advantage when handling sensitive recordings.

Can I reuse model files from other apps?

Occasionally, but not consistently. Pyrenees uses MLX-format weights, which differ from the .bin files that whisper.cpp expects or the .pt checkpoints PyTorch uses. Re-downloading through Pyrenees is the safer approach; the storage cost is identical.

What about lower-RAM Macs — 8 GB or less?

The tiny and base models work fine on lower-end Apple Silicon Macs. The 4-bit medium model generally runs well on 8 GB machines. The full large model on 8 GB is a stretch — go with the quantised variant there.

Can it transcribe audio in real time?

It can record from the microphone and transcribe what it captures, but not in the low-latency way iOS dictation operates. For quick-turnaround dictation, VoiceInk is the right tool.

Why isn't Pyrenees on the Mac App Store?

App Store guidelines around bundling ML models and accessing on-device acceleration are often limiting for apps like this. Distributing as a notarised DMG gives the developer more room to manoeuvre. The source code is open, so you can audit it directly.

M

Tested over a few weeks on an M1 MacBook Air (8 GB), an M2 MacBook Air (16 GB), and an M2 Max Mac Studio (32 GB). Same set of test files, run alongside the same transcriptions in Buzz and Whisper Transcription for direct comparison. We have no relationship with the developer.

What it's replacing

Apple has shipped dictation as part of macOS for well over a decade. Press a hotkey, speak into any text field, it works. It's been there long enough that most people have stopped thinking about it. VoiceInk exists because the system version has persistent limitations — it routes your voice to Apple's servers (or used to; recent releases can run on-device for English on Apple Silicon, though the implementation isn't transparent), handles technical vocabulary poorly, and offers almost no customisation.

VoiceInk is a more capable substitute. It's open-source, runs Whisper on your own hardware, and binds a hotkey to dictation. Text appears wherever the cursor sits. The model never leaves your machine. The configuration is yours to adjust. It's free, code is on GitHub, and after a week with it, falling back to system dictation feels like a pointless regression.

The interaction, described precisely

This is the one guide where the physical interaction is more central than the underlying technology — worth describing in some detail.

You assign a hotkey in VoiceInk's preferences — the fn key works well since it's already on your keyboard and almost nothing uses it by default. After that, from any application — browser, terminal, email, code editor — you can:

Hold the hotkey down. A small overlay appears at the bottom of the screen showing that recording has started.
Talk. The overlay shows a moving waveform so you know the mic is active. Talk normally; the model is forgiving of "um" and "uh" and natural speech rhythm.
Release the hotkey. A second or two of processing happens (faster if you have a small model loaded, slower if you have a large one), and then the transcribed text gets typed into wherever your cursor was. As if you'd written it yourself, just much faster than you can type.

That's the whole interaction. Press, speak, release. Once it's in muscle memory, it reshapes how you handle short writing tasks — Slack messages, email drafts, code comments, search queries. I've seen people use it for the first time and start bypassing their keyboard for short messages before the afternoon is out.

A small honest observation

My first week with VoiceInk wasn't great. Talking out loud felt unnatural, and I edited dictated text more than I'd edit typed text. Then around ten days in, the corrections tapered off — partly because I'd learned to organise my thoughts before speaking, partly because the model seemed more comfortable with my voice. By week two I was defaulting to the hotkey for anything longer than a brief reply. Give it more than a day before drawing conclusions.

How the model works in context

Like every other tool covered here, VoiceInk runs Whisper locally. Several model sizes are available, and the speed-versus-accuracy tradeoff matters more here than for file transcription. Dictation won't tolerate a fifteen-second wait — you need results quickly. Most users settle on the base or small model, which is fast enough to feel immediate.

There's a real cost to that choice. Smaller Whisper models are measurably less accurate, particularly for proper nouns, specialised vocabulary, or non-standard accents. VoiceInk includes a vocabulary feature that lets you register specific words — colleagues' names, project identifiers, domain-specific terms — which closes the gap considerably. Still, light corrections after dictating something important are a reasonable expectation.

Tip — if you have an Apple Silicon Mac

VoiceInk benefits significantly from Apple Silicon's Neural Engine. On M1 or later, the small model is fast enough to feel essentially instant, and the medium model is usable. On Intel Macs, you're limited to smaller models if you want responsiveness, and the experience suffers noticeably.

What VoiceInk is genuinely good at

I want to be specific about this rather than general, because vague reviews don't help when you're deciding whether to install something.

Quick replies. Slack messages, email responses, "yeah looks good", "let me get back to you tomorrow on that." Things that take three seconds to say and ten seconds to type. The hotkey workflow shaves real time off a real day.

First drafts. Talking out a paragraph and then editing it on the keyboard is genuinely faster than writing the paragraph from scratch for many people. VoiceInk fits this workflow especially well because the text lands directly in your editor of choice — no copy-paste step.

Note-taking. Quick thought, capture it before it goes away. The hotkey-anywhere model means the friction of "where is my notes app, where is the cursor, what was I about to say" disappears.

Code comments and commit messages. Anywhere a thought is more important than its phrasing. The fact that you can be in your terminal or your editor makes this work without breaking flow.

Accessibility. For people with RSI, hand injuries, or other reasons keyboard input is painful, a fast on-device dictation tool is genuinely valuable. VoiceInk's open-source nature also means you can audit it for any concerns about where your voice goes.

What it isn't built for

Long-form writing. An hour of dictation is exhausting in a way an hour of typing isn't. People who try to dictate entire essays usually go back to keyboards within a week.

Anything you don't want misheard. If accuracy is critical — a legal document, an academic citation, a medical reference — dictation will let you down at small but inconvenient frequency. Always read it back.

File transcription. Said it once, saying it again. VoiceInk doesn't accept input audio files. It records from your microphone, processes it, and types the result. If you have a file, use Buzz or Pyrenees.

Multi-speaker situations. The mic captures whoever is loudest. A meeting recording is the wrong input for VoiceInk.

Pros & cons

Strengths

Fast, on-device dictation that works inside any application
Open source — auditable and forkable
Custom vocabulary support for jargon and proper nouns
Genuinely free, no subscription required
Outperforms macOS built-in dictation on most measures
Hotkey workflow fades into muscle memory after a few days

Weak spots

macOS only
Smaller models trade accuracy for speed — that tradeoff is real and you need to accept it
Not designed for transcribing recorded files — wrong tool for that
Accessibility permission setup requires a few clicks on first run
Less polished than commercial dictation apps; rough edges exist
Active project but a smaller team than the more established alternatives

Setting it up

Install VoiceInk from its GitHub releases page or the developer's site.
On first launch, grant permissions. macOS will ask for microphone access and accessibility access (the latter is what lets the app type into other apps). Both are required.
Pick a model. Small is a good starting point; bump up to medium if you have an M-series Mac and want better accuracy.
Set a hotkey you'll remember. The function key (fn) works well because it's not used for much. Some people prefer right-Option, which is also unused on most layouts.
Add custom vocabulary if you have specialized words. Names of coworkers, project codenames, technical terms. The model uses these as hints during transcription.
Try it in a low-stakes app first. Open a TextEdit window, hold the hotkey, dictate a paragraph, see how it goes. Adjust your speaking pace based on what happens.

A daily setup that works for me

Hotkey: fn. Model: medium (M1 MacBook Pro, 16 GB). Vocabulary: about thirty entries — names of people I message often, two project codenames, three technical terms my model kept misspelling. Result: I dictate maybe twenty short messages a day and rewrite about one in twenty.

How VoiceInk fits alongside the others

VoiceInk sits entirely apart from the other four tools here. The rest answer the question "I have an audio file, give me text." VoiceInk answers "I want to speak, put the text wherever I'm typing." They don't really overlap.

If you're running VoiceInk and want to transcribe a meeting recording, reach for Buzz or Pyrenees instead. And if you're using Buzz and wish you could talk into Slack the same way, add VoiceInk. Having both is the most sensible setup.

FAQ

Is VoiceInk noticeably better than what macOS already includes?

Better in the ways that matter to anyone serious about dictation: customisation, choice of model, predictable behaviour, and full transparency about where your audio goes.

Will it work inside the apps I actually use?

Any application that accepts text input, yes. The mechanism is simulated keystrokes after transcription, which is why accessibility permissions are needed. Non-standard text fields occasionally don't respond — uncommon, but possible.

Is there Intel Mac support?

Yes, though the experience is considerably weaker than on Apple Silicon. Smaller models are usable, but accuracy takes a hit; the medium model is probably too slow to be workable on most Intel hardware.

Can I use VoiceInk to transcribe a recording?

No — that isn't what VoiceInk is for. For file transcription, use Buzz, Pyrenees, or Whisper Transcription.

Does VoiceInk send audio to any server?

No. Everything is processed locally. This is a core part of the value proposition — a privacy-conscious alternative to system dictation for situations where audio leaving your device isn't acceptable.

How well does it handle languages other than English?

VoiceInk inherits Whisper's language coverage. Major European languages, Mandarin, and Japanese generally perform well. Smaller languages can be unreliable, and dictation mode typically uses the base or small model — so accuracy will trail what you'd get running the large model on a file.

Does VoiceInk adapt to my voice the longer I use it?

No. Whisper isn't a personalised model and doesn't adapt to individual speakers. The practical way to improve accuracy is by populating the custom vocabulary list with words it keeps getting wrong. That's the main tool available.

M

Used daily over several weeks on an M1 MacBook Pro and an Intel iMac (the latter just to confirm the Intel performance story). We're not contributors to the project and have no relationship with the developers.

Why we built it

In 2026, searching for "best transcription app" surfaces dozens of articles cycling through the same handful of tools in the same template. A few are genuinely helpful. Most were written by someone who read a spec sheet rather than installed the application.

We set out to do something different: actually install the apps, feed real recordings through them, and describe the experience faithfully — including the rough edges.

How we test

Every app we cover gets installed on a real machine — sometimes several, when cross-platform differences matter. We run the same test recordings through each tool so comparisons stay consistent:

A 47-minute one-on-one interview recorded on an iPhone
A short voice memo with background noise (a coffee shop recording)
A 12-minute solo podcast intro recorded with a USB mic
A noisy field recording with two speakers
A short non-English clip (currently French and Mandarin, for spot-checks)

The same files across every app. We note the time taken, the obvious accuracy issues, and the friction points along the way. We try to write in terms of trade-offs rather than scores, because software doesn't behave the same way for every person.

Who we are

We're a two-person team who got fed up with "Top 10" roundups and chose to do something more useful instead. Not professional critics — just regular users who spotted a gap in honest, in-depth coverage.

We don't believe names are what give a review credibility. Anyone who reads enough of them can sense whether the author actually ran the software. Every guide here tries to make that obvious. If you want to get in touch, the contact page has a real inbox that a real person reads.

How the site is funded

The site runs on a lean budget. Domain and basic hosting cover most of it. To offset those costs, some links on the site are affiliate links.

Two things to understand about that:

Affiliate links are clearly marked wherever they exist. We don't disguise them as regular links.
Affiliate eligibility doesn't determine which apps we cover or how we cover them. Several apps we review are free and open-source — there's no affiliate relationship possible, and we cover them anyway.

If we ever publish a sponsored review, it will be labeled clearly at the top of the article. We haven't published any sponsored content to date.

What we don't do

A few things we want to be explicit about, because the rest of the internet sometimes blurs them:

We don't accept payment for positive reviews.
We don't send drafts to developers for "approval" before publishing.
We don't repurpose other sites' content. Everything here was written by us, after using the software ourselves.
We don't promise specific results, income, or productivity gains. Software is a tool; what you do with it is your responsibility.
We don't claim to be the official site of anything we review. We're a third-party review site. The official sites of the apps we cover are linked in every review.

What we'd appreciate from readers

If you find a factual error, please tell us. If a feature has changed since we wrote a review, please tell us. If you think we missed something important about an app, please tell us. The address is on the contact page and we read everything that comes in, even if we don't always have the bandwidth to reply.

Beyond that — we hope something here is useful. There are hundreds of tools in this space and the differences between them are genuinely meaningful. We try to make those differences legible.

AudioScribeLab is an independent resource and has no affiliation with the official services reviewed (Buzz, Subtitle Edit, Whisper Transcription, Pyrenees, VoiceInk).

Home › Site › Contact

Contact

Last updated · April 2026

Spotted a mistake in one of our guides, want to flag an app we haven't covered yet, or have a question about something you read? Get in touch.

Email

contact@AudioScribeLab

Everything that lands here is read by a person. We try to respond within a week, though it occasionally takes a little longer.

How to write a message we can actually help with

The more specific your message, the more useful our response can be. A few things we appreciate when relevant:

For factual corrections: a link to the page in question and a sentence or two about what's wrong. If you can point at a source for the correct information, even better — but it's not required. We'll go check.
For suggestions: the name of the app, a link to its homepage if there is one, and a sentence about why you think it's worth covering. We get more suggestions than we can act on, but we read all of them.
For technical questions about an app: we're happy to share what we've learned, but we're not the developers — we're just reviewers. For bug reports and feature requests, you'll get a much better response by writing to the app's actual support address. We can usually point you at the right place.
For press, partnership, or sponsorship inquiries: please mention that in the subject x-rule. We don't currently run sponsored reviews, but we read these too.

A realistic note on timing

The site is run by a small team in our spare time. There's no ticketing system, no help desk. In practice:

Most emails get a reply within five to seven days.
Quick corrections often get a same-day reply.
Long, thoughtful messages sometimes take longer because we want to give them a considered reply.
Around major holidays the queue backs up. We catch up eventually.

If you've written and haven't heard back within two weeks, feel free to send a follow-up — the original may have landed in spam or gotten buried.

What we can't help with

A few things to save everyone's time:

We can't reset passwords, issue refunds, or troubleshoot accounts for any of the apps we review. We don't have access to those systems.
We can't write custom transcription guides for individual projects. The how-to sections in our reviews are general by design.
We can't accept guest posts, link insertions, or the kind of "content collaboration" that gets pitched to review sites. Please don't send these.

Privacy, briefly

Anything you send us stays with us. We don't sell email addresses, we don't pass them to advertisers, we don't sign people up to newsletters they didn't ask for. If you want your message deleted from our inbox after we've read it, just say so and we'll delete it. The full details are in our privacy policy.

AudioScribeLab is an independent resource and has no affiliation with the official services reviewed (Buzz, Subtitle Edit, Whisper Transcription, Pyrenees, VoiceInk).

Home › Legal › Privacy Policy

Privacy Policy

Last updated · April 2026

This page explains what data AudioScribeLab gathers when you visit, the reasons behind it, and where it goes. We've aimed to keep it readable.

The short version

No ads, no data sales, no cross-site tracking. We gather only what's needed to keep the site operational and learn which content is worth updating.

Who runs this site

This site is run by the small team behind AudioScribeLab. For any privacy queries or requests, reach us at contact@AudioScribeLab.

What we collect

1. Server logs

Like virtually every website, our hosting provider maintains standard server logs. When your browser requests a page from us, those logs capture:

Your IP address
The date and time of the request
The page requested and the referring page
Your browser type and operating system as reported by the browser (the "user agent")

This data is part of how the web works — it's not something we ask for or collect deliberately. We use it to keep the site running, spot errors, and defend against abuse.

2. Analytics

We may use a privacy-respecting analytics service to understand which pages are read and which aren't. Any service we use must meet a minimum standard:

Uses no cookies
Collects no personally identifiable information
Does not track visitors across other websites

Examples of services that meet this standard include Plausible and similar self-hostable tools. If we ever move to a service that uses cookies or broader tracking, we'll update this page.

3. Email correspondence

If you write to us at contact@AudioScribeLab, we receive your email address, your name (if your client sends one), and the content of your message. We use it to reply. We don't add you to any list or share your details with third parties.

Cookies

The site itself does not set tracking cookies. Some technical cookies may be set by our hosting platform for purposes like load balancing. These are strictly operational.

Third-party services we use

We try to keep external dependencies minimal, but a few are unavoidable:

Google Fonts. Our typography is loaded from Google Fonts. When your browser requests these fonts, Google receives the standard request information described above (IP address, user agent, referrer). Google's handling of this data is governed by their own privacy policy.
Hosting provider. The site is hosted by a commercial provider that may process server logs as their data processor. They handle this data on our behalf and under their own terms.
Email provider. Email sent to or from our contact address is handled by a commercial email provider, who stores those messages on their servers as part of providing the service.

Affiliate links

Some links on this site are affiliate links. When you click one and subsequently make a purchase from the linked seller, we may receive a small commission at no additional cost to you. Affiliate links are labeled wherever they appear.

We don't control what affiliate networks or destination sites do with that information. We do clearly identify links that are affiliate links — see our about page for more on this. You can choose not to click those links, and you'll lose nothing on our end if you'd rather visit those products directly.

Children

This site is not directed at children under 13, and we don't knowingly collect personal information from children. If you're a parent and believe your child has provided personal information to this site, please contact us and we'll address it promptly.

Your rights

Depending on your location, you may have legal rights regarding personal data we hold. These typically include the right to:

Ask what information we hold about you
Ask us to correct inaccurate data
Ask us to delete it
Ask us to stop using it for certain purposes
Lodge a complaint with a data protection authority

In practice, the personal information we hold about most readers is simply "an email you sent us." If you want it deleted, write to us and we'll take care of it.

Security

We take reasonable steps to protect the information we hold — using HTTPS site-wide, keeping software updated, and limiting the number of people with access to any data we store.

Changes to this policy

If we change anything material in this policy, we'll update the "Last updated" date at the top and, for significant changes, post a notice on the site for a reasonable period.

Contact

For privacy questions or requests, write to contact@AudioScribeLab with "Privacy" in the subject x-rule. We'll get back to you as quickly as we can.

AudioScribeLab is an independent resource and has no affiliation with the official services reviewed (Buzz, Subtitle Edit, Whisper Transcription, Pyrenees, VoiceInk). All trademarks and product names are the property of their respective owners.

Home › Legal › Terms of Use

Terms of Use

Last updated · April 2026

Visiting AudioScribeLab means you agree to the terms below. They're written to be clear rather than combative — but worth reading so expectations are aligned.

1. What this site is

AudioScribeLab is an independent information resource aimed at helping readers evaluate and select desktop software for converting audio into text. We are not the developer, publisher, or authorised representative of any application reviewed here. References to third-party products are made purely as external reviewers. Official support for those products remains with their respective developers.

2. For informational purposes only

Everything on this site is provided for general informational purposes. We try to be accurate. We test the software ourselves. But software changes — sometimes frequently — and a feature working as described at time of writing may behave differently by the time you read it. Always consult the official documentation of the relevant app before relying on it for anything important.

Nothing on this site constitutes professional, legal, financial, or technical advice. If a transcription is going into a courtroom, a medical record, or any other situation with meaningful consequences for accuracy, please verify the output yourself or have a qualified professional do so.

3. No warranty

This site and all content on it are provided "as is" and "as available," without any warranty of any kind, express or implied. We make no representations about the accuracy, reliability, completeness, or timeliness of the information presented. To the fullest extent permitted by applicable law, we disclaim all warranties, including implied warranties of merchantability, fitness for a particular purpose, and non-infringement.

4. Limitation of liability

To the maximum extent allowed by law, AudioScribeLab and the people who run it will not be liable for any direct, indirect, incidental, consequential, special, or punitive damages arising from your use of the site, including any decision made based on content you read here, any loss of data, or any issues caused by software you installed after reading about it on this site.

If you live somewhere that doesn't allow some of these limitations, the provisions that are unenforceable in your jurisdiction simply don't apply, and the remainder still does.

5. Third-party trademarks and content

All product names, logos, and brands referenced on this site are the property of their respective owners. Mention of a product or company name does not imply endorsement, sponsorship, or partnership unless explicitly stated. We use these names solely to identify the products we're writing about — what's sometimes called nominative fair use.

Screenshots of third-party software, where included, are used for commentary, criticism, and review. If you are the owner of a product we cover and believe we've represented something inaccurately, please write to us at contact@AudioScribeLab and we'll review the concern.

6. Affiliate disclosure

Some links on this site may be affiliate links. If you click such a link and subsequently make a purchase, we may receive a small commission from the linked seller, at no additional cost to you. This is disclosed wherever it applies. Affiliate relationships do not influence our editorial judgment about what we cover or how we cover it. See our about page for the longer version.

7. External links

This site contains links to external websites we don't control. We include them because they're useful — for example, links to the developers of apps we review. We are not responsible for the content, privacy practices, or availability of any external site. Following an external link is at your own discretion.

8. Our content

The original written content on this site — articles, reviews, comparisons, and the way they're worded — is the work of the team behind AudioScribeLab and is protected by copyright. You're welcome to:

Read it
Share links to specific pages
Quote short passages with proper attribution and a link back

What we'd prefer you not do:

Republish complete articles on other sites without permission
Use the content to train commercial AI systems without permission
Remove our attribution and present the writing as your own

If you'd like to do something not covered here — translate an article, syndicate a piece, use a passage in a publication — just write to us. We're usually willing to work something out.

9. Acceptable use

Please don't:

Attempt to gain unauthorized access to our systems
Probe the site with automated scrapers at a rate that degrades access for other readers
Use the site to distribute malware or conduct attacks against other users
Impersonate us or claim to represent us in any way

If we identify abusive activity, we may block the responsible IP addresses or take other reasonable measures.

10. Changes to these terms

We may update these terms periodically. When we do, we'll change the "Last updated" date at the top. For significant changes, we'll post a notice on the site for a reasonable period. Continued use of the site following a change indicates acceptance of the updated terms; if you don't agree, you're free to stop using the site.

11. If a part of these terms is unenforceable

If any provision of these terms is found unenforceable in your jurisdiction, the remainder continues to apply. The unenforceable provision is treated as removed, narrowed to the extent needed to make it enforceable, or replaced with a comparable provision that is enforceable.

12. Contact

For questions about these terms, write to contact@AudioScribeLab with "Terms" in the subject x-rule.

AudioScribeLab is an independent resource and has no affiliation with the official services reviewed (Buzz, Subtitle Edit, Whisper Transcription, Pyrenees, VoiceInk). All trademarks and product names are the property of their respective owners.

Choosing a transcription app shouldn't take this much research.

Five tools that deserve a spot on your drive

Buzz

Subtitle Edit

Whisper Transcription

Pyrenees

VoiceInk

Comparing the five at a glance

Begin with the use case, not the app name

What Buzz is, exactly

The real experience of using it

What's happening inside

Pros and cons

What works

Where it falls short

Step-by-step: from download to transcript

How Buzz stacks up against the others

Who Buzz is right for

FAQ

Read next

Two workflows, two kinds of users

What using it day-to-day feels like

The format support is genuinely extraordinary

Where it stumbles

Pros &amp; cons, condensed

Strengths

Weak spots

How to use it: three short walkthroughs

1. From audio file to clean SRT (full Whisper workflow)

2. Cleaning up an SRT another tool already produced

3. Extracting subtitles from a Blu-ray rip

Subtitle Edit vs. the others

Who it's for

FAQ

Read next

Setting expectations

What it actually does

The pricing question

Where the polish ends

Pros and cons

What you get

What you don't

How to actually use it

Compared to the others

FAQ

Read next

What it is, in brief

So how fast is it, actually?

What actually sets it apart from the others

What Pyrenees doesn't do

Pros & cons

Strengths

Weaknesses

How to use it

How it compares to the rest of the shortlist

FAQ

Read next

What it's replacing

The interaction, described precisely

How the model works in context

What VoiceInk is genuinely good at

What it isn't built for

Pros & cons

Strengths

Weak spots

Setting it up

How VoiceInk fits alongside the others

FAQ

Read next

Why we built it

How we test

Who we are

How the site is funded

What we don't do

What we'd appreciate from readers

Email

How to write a message we can actually help with

A realistic note on timing

What we can't help with

Privacy, briefly

Pros & cons, condensed