No doubt about it, artificial intelligence (AI) has made some aspects of life a lot more convenient. From driving with step-by-step directions and real-time traffic info to searching the Internet with simple voice commands. AI has made it faster and more convenient for us to get where we need to go and find the information we need. But when it comes to communication, especially multilingual communication, it is important to understand what AI can and can’t do. Fact is, there’s a lot of misunderstanding out there. So, let’s try to clear things up.
Defining Our Terms
First things first. Translation is written and interpretation is spoken or signed. Getting this distinction right is half the battle.
Next, we need to define what AI does when it comes to translating language. When you think of AI and language you should think machine translation (MT). In general terms, machine translation is when a computer takes written text and translates it automatically into another language using computer algorithms. There are many different approaches to doing this, for example, rules-based machine translation (RBMT), statistical machine translation (SMT), or the most popular way currently, neural machine translation (NMT). That’s alphabet soup, to be sure, but for our purposes here, just remember that NMT is what is used the most today.
For machine translation to work with spoken language, you have to tack on another couple of technologies. Automatic speech recognition (ASR) starts the whole process by converting speech to text, which is then fed into a machine translation engine. Then, if you want to listen to the result rather than read it, text to speech (TTS) or speech synthesis is added at the end of the process. Think Siri, Cortana, Alexa and Google assistant. You get the idea.
All three of these processes, ASR, MT and TTS, rely on artificial intelligence to do what they do. But here’s the rub: none of these processes is even close to perfect. “Neither are humans,” you say. And you are right. But bear with me, we will get to that in a minute. Every error introduced at any point in the speech-to-speech translation process will likely be amplified by the next step. The computer currently has no way of knowing whether what it produces makes sense in context. It processes data based on rules to produce the result. It’s up to the reader or listener to decide if it makes sense or not. In the case of translation, that’s a tough call for someone who doesn’t understand the original language in the first place.
It’s Called Interpretation for a Reason
Although often mischaracterized as “oral or spoken translation” by the general public (admit it, you’ve made that mistake more than once), simultaneous interpretation is called “interpretation” for good reason. Think of it this way. Judges are called upon to “interpret” laws. Musicians often refer to their “interpretation” of a well-known piece of music. And economists are constantly “interpreting” the latest economic data to explain consumer behavior and forecast market activity. These professionals’ interpretations require the application of their acquired skills, accumulated body of knowledge, their wisdom, and dare I say their humanity to their respective tasks.
Language interpreters do the same because their task is to interpret the meaning and intent of what a speaker has said, not the exact words and grammatical structures per se. This is the core difference between professional simultaneous interpretation and machine translation. Interpreting is meaning based, while machine translation is based on grammar rules, dictionary definitions of words, statistical calculations of syntax, and other “measurable” and often “quantifiable” aspects of language. All these aspects of language matter, and interpreters know them well. However, the professionally-trained human interpreter’s rendition is constructed on the basis of their “interpretation” of what the original speaker means or wants to communicate. Machine translation algorithms, by contrast, simply process the data received to produce the result. The computer has no way of knowing if it got the meaning right or not. It cannot interpret meaning, irony, sarcasm, or intonation like a professionally-trained interpreter can.
The progress in recent years by machine translation researchers is amazing, but the persisting errors in machine translation still range from confusing to comical to downright dangerous. The problem is that the computer doesn’t know that it has produced errors because it doesn’t know anything. If you need to find a bathroom or order a meal at a restaurant and are using a machine translation app, great. We just don’t recommend using AI-driven technology for your important multilingual meetings…yet. But don’t hold your breath either. That’s why at KUDO we work with professional human interpreters.
That’s right, at KUDO we’re not technological naysayers. We keep an eye on technological developments and can say with confidence that speech-to-speech translation technology just isn’t there yet. When your meeting matters, stick with the professionals. You’ll be glad you did.
Would you like to see these professionals in action? Join us for one of our daily demos with live interpretation into multiple languages.