Artificial Intelligence (AI) for voice-over (VO) is a fantastic tool for sharing a message when creating audio for a video or presentation. Like any technology, we see AI as a tool. And, like any tool, there are particular applications, benefits, and drawbacks to using AI for VO work.
I am going to walk you through one particular experience using AI for VO on a recent CSA project. The client requested CSA to create 65 Science videos. After the English videos were created, CSA translated the new video scripts to Spanish for production. AI VO was used for both the English and Spanish versions of the videos, but I am going to specifically share my experience with the Spanish AI VO.
AI VO Technology
For this project, we used an AI tool that provided Spanish VO from text to speech. The tool we used had voices from three different speakers that sounded natural. These voices had neutral accents that would be universally understood across Spanish-speaking audiences.
We found that this voice-over tool was user-friendly and quite effective at what we were asking of it. It provided users with a quick tutorial of all the tools available to create voice-overs. When translating a text, the tool took approximately 5 seconds to create or update the voice-over.
Pros and Cons
As with any technology, we did identify some drawbacks and limitations. In the following table, you will find a list of pros and cons of using AI for this video project.
As explained in the table above, voice-overs created by AI had some challenges. As a user, the biggest challenge was figuring out how to develop solutions for the following questions and issues raised by CSA:
- Why does the speaker change the tone and volume when stating a question?
- Why is the speaker creating random pauses in the middle of a text without explanation?
- Why is the voice-over not translating the accent correctly?
- How can I make the Spanish speaker’s timbre sound similar to the voice used in the English videos?
I tried a few different things to solve the issues stated above. Some attempts were successful, and others caused frustration. CSA understood that AI has some limitations in creating voice-over, but I tried my best to address their concerns and accommodate their requests. Open communication between the contractor and CSA was crucial to solving these challenges.
Despite the limitations of the technology, it did allow us to create voice-overs for this project and achieve the client’s goal. I suspect that AI VO technologies will continue improving, but I think live voice actors are still best for videos that rely on timbres and tones to convey emotion.
One of the challenges of this project was creating the correct sounds for accents. However, even with the mispronunciation of words with accents, the audio produced still makes sense for Spanish speakers.
Overall, I had a great experience working on this project. We utilized AI as a helpful tool to complete this project successfully, and despite some of the limitations, the client was happy with the end result.
About Julian Holbrook: Julian is a native Spanish speaker originally hailing from Colombia. His family emigrated to the United States when Julian was 13. Subsequently, Julian grew up speaking Spanish, his first language, and English. Fluent in both languages, while growing up, Julian was frequently asked by teachers to provide Spanish translations to aid classmates. He continued this in college and worked as a semi-professional translator for several projects. Julian holds a Master’s in Cybersecurity Policy from Boston College and works as a Spanish language subject matter expert for CSA.