What is KUDO’s new conversational AI feature and how does it work?

New: live, continuous conversational AI translation becomes available on KUDO AI. Back-and-forth communication is now language-accessible.

Scrolling through newsfeeds about mind-blowing new technology, it’s tempting to think that we’ve already entered a sci-fi world in which artificial intelligence knows no limits. The reality is that the vast majority of AI creators are not seeking to eradicate the human race. On the contrary, most are building tools that solve a specific human problem – medical, social, sustainable, or otherwise – and one that is on a scale too complex for humans alone. 

With a mission to create a world in which everyone has the power to understand and be understood in their own language, this is certainly the case for us at KUDO.  

Back in 2017, we started as a platform for booking professional interpreters for multilingual meetings and events. This remains our core product offering today, and despite industry-wide fears of what would happen once those two little vowels entered the language interpretation space, we can confidently say that client bookings for our interpreter marketplace are only growing month on month. In other words, humans aren’t going anywhere.

We knew, however, that democratizing language accessibility throughout every public and private organization was going to require more than just humans. That’s why in January 2023, we launched KUDO AI, a real-time, continuous speech translator powered by AI. The solution aimed to make smaller, one-to-many use cases like live events, Town Hall meetings, and training accessible to all attendees via translated audio and captions in the click of a button.  

KUDO AI has come a long way since. Constant optimizations of our AI engine have driven a dramatic increase in quality and decrease in latency (the time between the person speaking and the listener hearing the translated speech). And in our latest release, we moved the solution one step further towards our goal of total language accessibility by adding a feature that allows users to activate conversational, back-and-forth mode through specialized, instant subtitles. Keep reading to find out more.

What is KUDO’s new conversational AI feature and how does it work?

Introducing conversational AI captions on KUDO AI 

Until now, KUDO AI users have been able to enjoy live-translated audio and captions one-way only. This means that presenters can take it in turns to speak, but interactive conversations are not possible (due to the latency when audio is involved). The most popular use cases are webinars, lectures, town hall meetings, council meetings, training sessions, sales presentations, and church sermons, to name a few. This modality offers the highest-quality translation experience, powered by the latest translation engine and voice models.

Users of KUDO AI now have an additional option to activate enhanced-speed subtitles, allowing participants to hold a back-and-forth conversation in real time.

  • One-way communication (translated audio and captions)
  • Interactive communication (translated captions only)

The particularity of this new feature is speed: instant subtitles appear on screen in the language of everyone’s choice, with a maximum delay of 1-2 seconds. To allow for this high-speed translation, only captions are available. The enhanced speed of these captions makes them as close to real-time as possible, meaning users can hold a live conversation in any language without incurring a lag in comprehension. This modality is best suitable for interactive online meetings such as group discussions, brainstorming sessions, focus groups, roundtable discussions, etc. 

Captions and/or audio: which mode is right for me?

Live translation can be used in the form of both captions (subtitles) and voice. Both forms have their advantages and disadvantages, so as a rule of thumb, if your meeting or event requires any sort of interactive (back-and-forth) communication, you’ll need a super-fast translation experience. In this case, our captions-only mode is the best solution. This will guarantee that everyone follows the translation with no delay—although it obviously means that they will have to read subtitles throughout rather than having the flexibility of listening to the translated audio.

So, to conclude, if there is little to no interactive element in your meeting or event – and a delay of a couple of seconds longer does not affect the user experience – we always recommend offering your attendees our advanced translation solution, withboth audio and captions. This ensures the highest quality of translation while allowing attendees to choose whether they wish to follow along by reading or listening. 

How is KUDO AI different from Chat GPT or Google Translate?

The words ‘real-time’ and ‘continuous translation’ are the key differentiator here. Chat GPT and Google Translate are great tools for what we call consecutive interpretation. This means that rather than speaking continuously into a microphone and everyone hearing the translation as you speak, the process is to speak your sentences one by one and wait for the translation in stages.

Where this could be helpful is if you were traveling and needed to ask directions in another language, or to check in at a foreign hotel. For meetings, events, and webinars, however, these tools simply don’t work.

Why? Picture yourself running an All Hands meeting or giving a keynote presentation and having to speak your sentences one-by-one. Then factor in the fact that your attendees also have to wait a few seconds between each sentence to catch up with the translation in stages. Nobody has time for that, and that’s where KUDO AI comes in.

When to use human interpretation

We’ve said it before, and we’ll say it again: human interpretation and AI speech translation are incomparable. In fact, many of our clients alternate between both solutions depending on the context.

When it comes to informing clients about which language accessibility solution is best for their needs, we take many things into account: language(s) requested, timing and duration of the meeting/event, budget, setting, complexity of the subject matter, and more.

In a nutshell, human interpretation remains the highest quality form of language accessibility. You’re also getting the emotion and nuances of a real human voice, and the expertise of someone who has trained in the terminology and context of specific industries or topics.

The quality of AI speech translation is nevertheless high today – more so than most people expect. You also have the added bonus of it being available on-demand for any duration of time and for an unlimited number of languages at once, for a more cost-effective price.

Talk to our team about which solution is better suited to your communication needs.