How to Best Use Capturecast Video Transcription

Here at Cattura Video, we’re always looking for new ways to push CaptureCast to new limits, and the latest way we’ve managed to do that is through direct transcription integration. Now, as soon as a capture is complete, you can send it out for either human or machine transcription. But what are the differences between these two options?

Machine transcription is a process by which an advanced machine learning algorithm takes a look at the video that has been uploaded and converts any human speech it finds into searchable text within Capturecast. We use the much lauded AWS transcription service in order to do this – we simply take the audio from your video, upload it to the AWS account of your choice (encrypted along the way), and as soon as AWS is done turning it into text, it’ll be sent back to Capturecast, where we’ll make it available for your use immediately.

You can also choose to use human transcription. Unlike machine transcription, an actual human is involved in this process – which does make the process quite a bit more expensive, though it can also result in you getting far more accurate results in some cases. Much like how CaptureCast works with machine transcription, the audio from your video is encrypted and sent directly to a transcription company called, which is well known for the high quality transcriptionists. While the resulting text will take longer to get from Rev, you’ll likely find that it is more correct with regards to words that are perhaps specialized or difficult for a machine to understand.

But what can you do with these transcripts once they’re in Capturecast? We’ve built a full fledged video viewing and editing experience that makes use of your transcripts. All you have to do in this mode is select the text you want to include in your final video and hit export – then, we’ll mark exactly where in the video that text takes place and create an end product that only contains the information you need. In addition, every last bit of text is completely searchable, so if you’re struggling to remember where in a video something was said, all you have to do search for it.

So given how you’re going to use the resulting transcript, how does one pick whether to use human or machine transcription? There’s a few different factors to consider here:

  • Cost of transcription
  • Accuracy of transcription
  • Time to process transcription

First, let’s talk about cost. To be completely frank, human transcription is quite expensive – up to ten times as much as machine transcription. But as always, the old adage applies – you get what you pay for, and price ties in very heavily with the accuracy you’ll get. Machine transcripts can often be good enough, especially if you plan on just using it as a guide for editing your video, but human transcription is far more accurate when it comes to complex, technical words especially. Given the cost being on a one to ten scale, many users find the AWS machine transcription is more than good enough for their needs, but if you’re looking at something much closer to perfection, human transcription is the method to beat.

There’s also the time it takes to process the transcription. Depending on the length of a capture, transcription by humans can take up to 24 hours, but with machine transcription can often take just a few minutes. If the goal is to get something up as soon as possible after the capture is done, the speed you get with transcription will never be able to be surpassed by a human performing the same task. 

So it’s really a balancing act, and a choice that needs to be based on your specific needs for transcription. Are you looking for something that’s done quick, is cheap, and can be used within a few minutes – or do you want something that is far more accurate, but takes time?

We’d be more than happy to help you out with your choice. Get in touch with us, and we can perform a needs assessment for you, and show you more of how the CaptureCast solution works on a day-to-day basis, as well as discuss how you can work machine or human transcription into your overall video capture workflow

