What is Neural Machine Translation?

Neural Machine Translation is a fully-automated translation technology that uses neural networks. NMT provides more accurate translation by taking into account the context in which a word is used, rather than just translating each individual word on its own.

From the earliest written languages to the present day, human translation has always been an important way to connect the world. As we continue to transition more and more of our lives online, translation has become an important way to reach large global audiences who are looking for information on the internet.

For the longest time, translation was a highly manual process that relied solely on human labor to accomplish. While human translation continues to be the most reliable way to translate content, it takes longer and tends to be more expensive if you’re doing it for each individual piece of content. Translators had constraints on the volume of content they could be expected to accurately translate in a given time, meaning that there were large volumes of content for which it would be hard to justify translating based on the time, cost, and effort involved.

Alternative methods of translation have started appearing in more recent years with the advent of machine translation (MT) in the 1940s and 50s. Machine translation completely changed the way translation could be done, as it added powerful AI and automation to the translation process.

An Introduction to Machine Translation

At its core, machine translation is fully automated software that translates content from one language to another. Since a large portion of the world’s content is inaccessible to people that don’t speak the original source language, MT can effectively translate content faster and into more languages. MT systems are most commonly used when there’s a lot of information that needs translation (i.e., hundreds of thousands of words or more). In those situations, traditional human translation wouldn’t be feasible due to the sheer volume of content, so we turn to AI.

There are multiple types of MT systems and different approaches to MT, most notably:

Rule-based MT (RbMT): Algorithms are created based on the grammar, syntax, and semantics of language. Linguists write down large sets of rules for each language pair (i.e., EN-ES, EN-FR, etc.). Content is then fed through these algorithms and translated into the appropriate language.

Statistical MT (SMT): This type of MT takes a reference text and finds translations that are statistically most likely to be appropriate for use. These systems are able to be trained faster than RbMT and, more importantly, can be trained with much less human effort.

Neural MT (NMT): While both statistical and neural MT use huge datasets of translated sentences to teach software to find the best translation, the models themselves are different. Statistical MT translates sentences by breaking them up into phrases, translating the pieces, then trying to stitch those translations back together. Neural MT, on the other hand, uses neural networks to consider whole sentences when predicting translations, which allows it to take into account the context in which each word and phrase is used.

What is a Neural Network?

A neural network is a form of machine learning in which a computer learns to perform a task using data from previous examples of that task. However, the person programming the neural network doesn’t actually define what those patterns should be - the system learns on its own.

For example, an object recognition system is trained to detect objects in real time to better find and predict patterns. An object recognition system designed for a car’s automatic driving ability needs to understand every object in its view. To train, the system is fed thousands of pre-labeled images of houses, people, trees, etc. so it can find patterns to more accurately understand objects in real life.

When it comes to translation, neural networks are unique in their ability to determine the important language characteristics that are relevant to a translation without having a programmer indicate what's important. They’re also able to make predictions in context even in complex situations - like using an entire video and its script together instead of just an individual frame.

On the other hand, in previous rule-based systems, linguists would draft algorithms that defined the rules of translation between two languages. A source text would then be translated from the original language into a target language based on those rules. This not only took a tremendous amount of time, but also prevented the system from learning and adapting as new patterns emerged.

How does NMT Work?

The goal of any neural MT model is to take an input in one language and output it into another language. The first thing to understand is how a language input is transformed into data that can be used by the NMT model.

NMT models use a translation method more commonly called the Encoder-Decoder structure. This structure takes the content in its original source language, assigns each word a number, finds the corresponding word in the target language, then spits out a translation into the new language using the numerical representation that each word has.

Source: Towards Data Science

In this visual representation of the Encoder-Decoder structure, the source sentence - “the cat likes to eat pizza” - enters the Encoder-Decoder structure on the left. The encoder's job is to transform the individual words into a full-sentence representation. The grey cloud represents the meaning of the entire sentence in its numerical form (rather than its individual words). From there, the structure then uses algorithms to map the probability of the most likely translation based on the rest of the words in that sentence. The decoder then steps in to transform that numerical representation back into words - “el gato le gusta comer pizza”.

How is NMT different?

While statistical MT also uses numerical substitution methods for translation, it doesn’t assign relationships between words. However, if the data being used to train an NMT model shows a similar use case between two words, the model will assign numerical values closer together.

For example, an NMT model may assign the words “but” and “except” numerical values of 4.12 and 4.16 if data shows a similar use case so they are more likely to be interchanged. NMT models make no independent assumptions - all aspects of translation always take into account the entire context of the sentence, including the exact words and the order in which they appear. Because of that, NMT systems tend to maintain more fluency than other machine translation models.

What is Adaptive Neural MT?

Adaptive Neural MT is an NMT model that quickly adapts to translator feedback as the translators are working. Adaptation means that the system can get very specific to the translator very quickly, making the system feel more intuitive to the translator. There are two common types of adaptation: batch adaptation (training the model with Translation Memories) and online adaptation (training the model with translator feedback as they work).

The benefit of a system like Adaptive NMT is that it gets constant feedback from linguists as they do their translation work, updating quickly and learning on-the-fly. This means you don’t have to rely on retraining the baseline system, which can be expensive and complex. Because of the cost and complexity, retraining often happens far less frequently than it should in practice.

What translation method is right for me?

The right translation method depends on your needs. Certain translation types may be better suited for different pieces of content. For situations where speed and extremely low cost are more important that quality, like user-generated content, raw MT can suffice. For all other situations where quality matters, the neural machine translation with a human-in-the-loop workflow is best. After all, human-in-the-loop is the future of MT - it combines the pros of machine translation with the efficiency and quality from human translators.