How to Choose the AI Transcription Tool That’s Right for You

By Audrey Auerbach Nelson, production intern at Podcast Allies


With everything that goes into producing a podcast, providing episode transcripts might have slipped to the bottom of your to-do list. But it’s important to remember that taking the time to offer transcripts pays off for your show in the long run.

Transcripts make your content more widely accessible—to the d/Deaf and hard-of-hearing community, non-native English speakers, and even hearing folks and native speakers who aren’t auditory learners (or who want to remind themselves of details they heard in your episode). Plus, transcripts are SEO-friendly. They help search engines find your episodes, and boost audience growth and engagement.

Transcripts can also be invaluable during the production process itself. As a new production intern for Podcast Allies, I’ve been helping host Elaine Appleton Grant edit raw interviews for upcoming episodes of Sound Judgment. I’m able to efficiently strike paragraphs and highlight others for the final episode script—often without having to listen and rewind. As listeners, you’ll never see these transcripts, but we couldn’t produce Sound Judgment episodes without them.

Regardless of whether your transcripts are audience-facing or internal, you can use AI transcription software to create and manage them. But there are tons of transcription tools on the market, and choosing the option that’s right for you isn’t easy. So as I transcribed Sound Judgment episodes this week, I compared different software options to determine which will save you—and interns like me!—the most hassle.

For time purposes, I tested three: Riverside.fm, Otter.ai, and Capsho. In addition to choosing an overall winner, I noted which were most optimized for audience-facing transcripts, and which came in handy for the editing process. (Note: For the purposes of this test, I didn’t evaluate transcription tools that are embedded into audio editing software, such as Descript and Hindenburg. Descript forged the way for audio editors to edit sound—and now video—within a transcript itself. Hindenburg pioneered visual-friendly audio editing, and has recently added embedded transcription. Since I’m not yet editing audio, I stuck with simpler transcription tools.)

Which features matter

If you want to produce an audience-facing transcript, your main concerns probably have to do with time and accuracy. AI transcription is still imperfect (sometimes laughably so). Cleaning up rough transcripts yourself—or paying someone else to do it—can pull your resources away from vital production and marketing tasks. So I looked for programs with good word and punctuation accuracy and automatic speaker names. I also tracked the total time I spent fixing transcript mistakes generated by each program I tested. 

On the other hand, if you’re just looking for a messy, mid-production transcript, you might be worried less about word accuracy. Instead you’ll appreciate time stamps, as well as features that support collaborating with team members; searching, highlighting, and commenting on the raw transcript; and playing audio from within the interface.

Riverside.fm

As of this writing in July 2023, Riverside, which is known for its namesake remote recording software, is offering unlimited transcription for free. (Yes, you read that correctly!) You don’t even need to be a Riverside subscriber to take advantage.

I highly suggest Riverside if you’re looking for outward-facing transcripts. It only took me an hour and 15 minutes to clean up a transcript for a 45-minute episode. Riverside is easy to use, allows you to copy and paste formatted transcripts straight from the website, and is quite accurate both on a word-to-word basis and with punctuation placement (especially compared to competitor Otter.ai, which struggles with overall accuracy). However, although Riverside does provide time stamps, it doesn’t generate speaker names or organize paragraphs by speaker. The latter can be a problem; Riverside will often append the beginning of a new speaker’s comment into the previous bite spoken by someone else, and you’ll have to budget in a bit of extra time or money to fix that. 

I don’t necessarily recommend Riverside for working with transcripts during the editing process. The free transcriber doesn’t allow you to collaborate, or highlight or comment on the text itself. And there’s no embedded audio player, so I had to click back and forth between a podcast player and Riverside’s transcription webpage in order to read and listen at the same time. 

Otter.ai

For transcription during the editing process, Otter is a phenomenal bang-for-your-buck option, thanks to its editing and collaboration features and audio playback. 

First, a couple of drawbacks. Its Basic plan, which is free, offers a relatively limited 300 monthly minutes of transcription at up to 30 minutes per conversation. Crucially, this plan does not have unlimited conversation history, meaning you can only access your past 25 conversations. 

Still, Otter’s Basic plan offers impressive benefits—giving you the ability to annotate and edit text, as well as to assign to-do items to yourself and your team members. (These are collaboration features that Capsho and Riverside’s cheapest plans don’t offer.) You can also view an auto-generated summary with conversation takeaways. This summary was invaluable; I drew on it as I drafted my own outline for a final episode script. 

The Basic plan also allows you to search the transcript using keywords, which helped me quickly orient myself in an otherwise overwhelming mass of text. Plus, Otter’s built-in audio player means that you can listen along as you make edits in real time. This was helpful for me as a new producer, because just because a sentence looks great on the page doesn’t mean it sounds good enough to keep in an audio story.

Upgrade to the Pro plan (currently $100 a year, or a far steeper $16.99/month), and you’ll gain the ability to search by speaker, plus an additional 900 monthly minutes of transcription at up to 90 minutes per conversation. The Pro plan also gives you unlimited conversation history so you won’t lose old transcripts.

Unfortunately, Otter struggles when it comes to producing audience-facing transcripts. I transcribed Sound Judgment Episode 12: The Dinner Sisters using Otter, and found that—among other accuracy errors—“then” became “men,” “lunch” became “once,” and “wall” became “one all.” You could overlook these errors in a rough transcript designed for the middle of the production process, but not for a transcript that will be published as part of your show notes. It took me around two hours to correct those errors and clean up the 52-minute episode. 

There are a few additional downsides on the audience-facing front. Unlike Riverside, Otter automatically generates speaker names, and attempts to divide paragraphs by who’s speaking. These divisions are more accurate than Riverside’s—but not as good as competitor Capsho’s. Plus, I found that Otter occasionally left in repeated or filler words that both Capsho and Riverside cut out. 

Capsho

Like Riverside, Capsho is an impressively accurate transcriber. It also provides more accurate speaker names and divisions than Otter. Overall, it’s a fantastic fit for audience-facing transcripts. Case in point—I only spent an hour cleaning up a 49-minute episode, the least amount of time I spent on clean-up during this test.

Capsho may be the overall winner when it comes to efficiency, but it’s pricey. Although it does offer a 14-day free trial, beyond that, you have to choose between steep monthly price plans. The plans that offer transcription are the Copy Studio ($49/month), the Copy Studio + Creative Studio bundle ($79/month), the Copy Studio with SEO Boost ($99/month), and the Copy Studio with SEO Boost + Creative Studio bundle ($129/month).

As far as I can tell, none of these plans offer access to the same kind of team-wide editing features that make Otter such a good fit for the middle of the production process. However, Capsho is designed to solve other productivity problems for podcasters. Namely, the $49/month Copy Studio offers AI-generated descriptions, titles, and social media captions. As you upgrade to each new plan, the AI will do more and more work for you and your podcast, including—but not limited to—writing blog posts, pulling soundbites, and drafting marketing copy.

A thorough look at AI tools for the rest of the production and show notes processes is beyond the scope of this blog post. Just know that while Capsho offers some great benefits as a transcription tool, if you’re on a limited budget and only looking for transcription, you can find it more economically elsewhere. 

Takeaway

After testing all three tools, here’s what I came away with.

If I were able to use multiple tools—one for generating outward-facing transcripts, and one for generating transcripts during the production process—I’d use Riverside for the former task, and Otter for the latter. Meanwhile, if I were only able to use one tool, I’d rely on Otter. Yes, there are storage and accuracy downsides, but overall, it’s a remarkably cheap, user-friendly option, with tons of search, summary, and collaboration features accessible even with the free version. That’s a huge win—especially for beginner producers like me.