The human brain has 100 billion neurons, each neuron connected to 10 thousand other neurons. Sitting on your shoulders is the most complicated object in the known universe – Michio Kakau.
The complexity of the human brain is simply stunning and miraculous. It’s evident that nothing is on par with the human brain. AI-driven machines, technologies, and software only mimic human activities and cannot include human intelligence. Machines can never bond with human emotions. Thus, it’s tough for AI-driven speech recognition software to understand the tone of voice and the emotions associated with them. They fail to recognize words with background noise, multiple speakers, and different accents.
Almost all universities and schools, the events industry, and the entertainment industry have transformed online during the pandemic. With most schools and colleges shifting to online learning during the pandemic, educational institutions have the responsibility to ensure accessible learning materials for students. The most important feature is accurate captions for the videos.
Also, event management companies have developed many advanced features for their users. One crucial feature is AI-generated captions for their sessions. Captions add value and are more accessible to users. They help you get the most from the videos and keep you engaged. But when captions are auto-generated, they have their limitations.
There are two main categories of captions- human-generated captions and automated captions. Human-generated captions are created firsthand by humans. Automated captions are computer-generated without human interference and automatically appear on the screen. They use automatic Speech Recognition (ASR) technology.
Almost all the platforms used for teleconferencing, telecommuting, distance education, and events management today offer automatic video captions and other forms of transcription services as well. In addition, social media platforms like Meta and YouTube also have options to enable automatic video captions.
Videos with captions are crucial to accommodate deaf and hard-of-hearing individuals. It’s no surprise that closed captions are also used by people with no disabilities.
Now the real question is, can you rely on automated captions for your videos, especially to make educational videos accessible?
Let’s answer this with a few stats and figures.
- YouTube captions are about 60-70% accurate. That says one in three words can be wrong.
- Captions generated by Zoom’s ASR technology are around 90% accurate. That’s amazing, but it is too less for the deaf and hard-of-hearing individuals. They turn off the videos immediately after they know the video is auto-captioned.
- From a list of 2000 clients, we collected their captioning requirements to know how many required real-time captioning, automated captioning with clean-up, or only automated captions. It was noted that 53.9% of the clients requested real-time captioning, 34.8% enquired for automated captions with clean-up, and the rest, 11.3%, only auto-captioning.
It’s common knowledge that voice recognition software struggles even in the simplest of situations, where there is no background noise, no overspeaking, and only a few simple words need to be recognized. Expecting these software to understand and differentiate between accents, identify speakers and know which background noises to exclude is highly unrealistic. Only humans can write high-quality captions, preferably professional and certified captioners with subject knowledge, a good ear, and an expert grasp of punctuation and grammar.
In general, AI fails to generate accurate text due to one or several of the following reasons.
The accuracy of such speech-to-text support may not meet your compliance needs, especially when it comes to live lectures, educational videos, meetings, webinars, or even the movies you watch.
Automated captions may help some people, but they will never replace the value of having a human touch behind your text. Educational videos demand higher quality. Terms used in Medicine and Law are complex, and you’ll likely find that automated captions aren’t sufficient on their own. If you’re still considering adding automated captions to your content, keep reading to learn more about automated captions and human-generated captions.
- Manual or Human Captioners provide highly accurate captions. Captions are over 99% accurate as they comprehend the content better than speech-to-text software.
- It’s mandatory by law. Websites and online events are considered places of public accommodation under Title III of the Americans with Disabilities Act (ADA). The WCAG (Web Content Accessibility guidelines) requires all times-based media to provide captions and a transcript in its absence. But we don’t know how accurate.The FCC, however, has set these standards for online videos. The FCC standards call for accurate captioning, but the percentage of accuracy is not defined. Thus, anything inaccurate is not recommended. These standards overlook punctuation and the identification of speakers, which are not possible with automated captioning solutions.
- Fast spoken words, jargon, dialects of the language, multiple speakers, spellings, and grammar are all identified and noted by manual captioners. AI tools fail to recognize them all, which again results in inaccurate captions.
- Captions may not be correctly synced with sound. Captions with too much time delay are useless. In such cases, you miss the next content, which thereby disturbs your concentration and focus. Live human captioners provide captions with 2 to 5 seconds time delay, and captions to pre-recorded videos are over 99% accurate.
Many variables are thrown into the equation when it comes to captioning videos. Auto captions can be a last-minute gift but are not as flawless as human interpreters. The best way to increase accessibility for deaf and hard-of-hearing viewers is through professionally created closed captions that consider the context and natural language variation. It can become a hindrance when learners try to concentrate on understanding both the video and the automatic captions at the same time. Thus, the most significant issue with automated captions is the level of accuracy. The speech recognition technology that generates them isn’t perfect, so they may have mistakes or misinterpretations.
At CaptioningStar, we produce high-quality, reliable captions, both for your live sessions and pre-recorded videos. We seamlessly integrate with any education technology, web-based, or event management platforms like Blackboard, Moodle, Zoom, Kaltura, Vimeo, Facebook, and Youtube to deliver captions in any format you demand. A few of our top-notch clients are Google, New York University, the American Alliance of Museums, EEG Enterprises, and Fox News.
We also offer several post-production services like video editing & trimming, including burnt-in captions, transcriptions, translations & subtitles, voiceover & dubbing, and audio/video description. Hire CaptioningStar for all your accessibility needs. We are the fastest and the best in the industry.