Machine Translation: Everything You Need to Know

As companies look for ways to better connect with their growing multilingual global customers, the solution that many are reaching for is machine translation.

What is Machine Translation?

Machine language translation is the process of converting text from one language to another through automatic translation software. A translation machine automatically translates complex expressions and idioms from one language to another. While the concept seems straightforward, its execution can be daunting due to differences in the syntax, semantics, and grammar of various languages around the world. Whether the translator is a human or a machine, the text needs to be broken down into base elements in order to fully extract and accurately restore the message in the target language. That’s why it’s critical for a machine translator to encompass the entirety of a language's nuances, including regional sub-dialects. The source of a translation also adds to its complexity. For instance, given a piece of text, two different automated translation tools may produce two different results. The parameters and rules governing the machine translator will affect its ability to produce a translation matching the original text’s meaning. The goal of any machine translation is to create publishable work without the need for any human intervention. Currently, machine translation software is limited, requiring a human translator to input a baseline of content. However, advancements have allowed machine translation to pull syntax and grammar from a wider base, producing viable translations at an unmatched speed.

History of Machine Translation

Automatic translation originates from the works of the Arabic cryptographer Al-Kindi. The techniques he crafted in systemic language translation are also found in modern-day machine translation. After Al-Kindi, advancement in automatic translation continued slowly through the ages, until the 1930s. One of the field’s most notable patents came from a Soviet scientist, Peter Troyanskii, in 1933. Troyanskii showcased his “machine for the selection and printing of words when translating from one language to another,” at the Soviet Academy of Sciences. Troyanskii's machine translator consisted of a typewriter, a film camera, and a set of language cards. The translation process required a series of steps:

Step 1: A speaker of the original language organized text cards in a logical order, took a photo, and inputted the text’s morphological characteristics into a typewriter.

Step 2: The machine then created a set of frames, effectively translating the words, with the tape and camera’s film.

Step 3: Finally, an editor fluent in the target language reviewed the translation and ensured it was arranged in an accurate order.

The USSR’s Academy of Sciences dismissed Troyanskii’s invention as useless. Regardless, the scientist continued trying to perfect his machine translation until he passed away due to illness in 1950. His machine went unrecognized until 1956, when his patent was rediscovered. The next major advancement in machine translation occurred during the Cold War. In 1954, technology giant IBM began an experiment in which its IBM 701 computer system achieved the world’s first automatic translation of Russian to English text. The translation consisted of 60 lines of Russian copy. Upon hearing the news that the United States had developed an automatic translation system, countries across the world began investing in their own machine translators. However, twelve years later, the U.S. Automatic Language Processing Advisory Committee (ALPAC) issued a statement. In its report, the organization claimed that machine translation wasn’t worth the hefty investment, as it wasn’t effective enough to offset the cost of development. This report led to a nearly decade-long stagnation in American machine translation innovations. Over the next few years, America took minor steps in developing machine translation. Notable examples came from companies like Systran and Logos, which served the U.S. Department of Defense. Canada took a major step forward with its implementation of The METEO System. This was a machine translator that converted English weather forecasts into French, for the Quebec province. The system was used from 1981 to 2001 and translated nearly 30 million words annually. Beyond the METEO system, the 1980s saw a surge in the advancement of machine translation. With forerunners such as Japan spearheading the effort, microcomputing allowed small translators to enter the market. Although crude by contemporary standards, they still managed to bridge the divide between two foreign speakers. Currently, machine translation is becoming more and more crucial for companies to remain relevant in the fast-changing global economy. With potential customers coming from every corner of the world, the need for multilingual websites, videos, and even audio translation is critical.

Types of Machine Translation

Foundationally, machine translation is based on linguistic rules. These rules guide the machine in processing simple word substitutions. In itself, this doesn’t produce a high-quality translation. To expand on a machine translator’s usefulness, a rules-based method is used to parse a text. The words in each line are interpreted using a vast lexicon including morphological, syntactic, and semantic guidelines. With enough information to create a well-rounded set of rules, a machine translator can create a passable translation from the source language to the target language — a native speaker of the target language will be able to decipher the intent. However, success is contingent upon having a sufficient quantity of accurate data to create a cohesive translation. Rule-based Machine Translation (RBMT)

Rule-based machine translation emerged back in the 1970s. Scientists and researchers began developing a machine translator using linguistic information about the source and target languages. They accomplished this with multilingual dictionaries, using information about the source language’s semantic, morphological, and syntactic regularities to create a translation. There are three types of RBMT systems. Direct Machine Translation This is the most elementary form of machine translation. Using a simple rule structure, direct machine translation breaks the source sentence into words, compares them to the inputted dictionary, then adjusts the output based on morphology and syntax. This method is time-intensive, as it requires rules to be written for every word within the dictionary. While direct machine translation was a great starting point, it has since fallen to the wayside, being replaced by more advanced techniques. Transfer-based Machine Translation Deviating from the direct machine translation method, the transfer-based method foregoes a word-by-word translation, first organizing the source language's grammar structure. Transfer-based machine translation is broken down into three steps: 1. Analysis: The machine analyzes the source language to identify its grammatical rule set. 2. Transfer: The sentence structure is then converted into a form that’s compatible with the target language. 3. Generation: Once a suitable structure has been determined, the machine produces a translated text.

This method still uses a word substitution format, limiting its scope of use. While it streamlined grammatical rules, it also increased the number of word formulas compared to direct machine translation. Interlingual Machine Translation Interlingual machine translation is the method of translating text from the source language into interlingua, an artificial language developed to translate words and meanings from one language to another. The process of interlingual machine translation involves converting the source language into interlingua (an intermediate representation), then converting the interlingua translation into the target language. Interlingua is similar in concept to Esperanto, which is a third language that acts as a mediator. They differ in that Esperanto was intended to be a universal second language for speech, while interlingua was devised for the machine translator, with technical applications in mind. This method is sometimes mistaken for a transfer-based machine translation system. However, interlingual machine translation provides a wider range of applications. Because the source text is converted using interlingua, it can include multiple target languages. In comparison, the transfer-based method has defined rules between language pairs, limiting the process to accommodate only two languages at a time. The major benefit of interlingua is that developers only need to create rules between a source language and interlingua. The drawback is that creating an all-encompassing interlingua is extremely challenging. Pros and Cons of RBMT While there are certain applications where RBMT is useful, there are many drawbacks inhibiting its widespread adoption. The main benefit of using an RBMT method is that the translations can be reproduced. Because the rules dictating translations account for morphology, syntax, and semantics, even if the translation isn’t clear, it will always come back the same. This allows linguists and programmers to tailor it for specific use cases in which idioms and intentions are concise. For example, weather forecasts or technical manuals could be a good fit for this method. The main drawback of RBMT is that every language includes subtle expressions, colloquialisms, and dialects. Countless rules and thousands of language-pair dictionaries need to be factored into the application. Rules need to be constructed around a vast lexicon, considering each word's independent morphological, syntactic, and semantic attributes. Examples include:

English: The English language is filled with irregular verbs and has three main subsets to account for: American English, British English, and Australian English. While these three languages share a common vocabulary, each has its own list of exceptions.

Greek: Greek has a predominant syntax SVO (subject-verb-object). However, it freely and frequently rearranges the order as well as drops the noun, as implied in the context of a message.

Russian: Russian is a null-subject language, meaning that a complete sentence doesn’t necessarily need to contain a subject.

To build a functional RBMT system, the creator has to carefully consider their development plan. One option is putting a significant investment in the system, allowing the production of high-quality content at release. A progressive system is another option. It starts out with a low-quality translation, and as more rules and dictionaries are added, it becomes more accurate. While users can continually add to dictionaries and create sub-rules for each word, it’s not a conceivably effective method. Accounting for all of these idiosyncrasies, homonyms, and phrases would require a significant investment of time. Example-based Machine Translation (EBMT)

Example-based machine translation (EBMT) is a method of machine translation that uses side-by-side, phrase-to-phrase, parallel texts (bilingual corpus) as its core framework. Think about the famous Rosetta Stone, an ancient rock containing a decree from King Ptolemy V Epiphanes in three separate languages. The Rosetta Stone unlocked the secrets of hieroglyphics after their meaning had been lost for many ages. The hieroglyphics were decoded by the parallel Demotic script and Ancient Greek text on the stone, which were still understood. Japan invested heavily in EBMT in the 1980s, as it became a global marketplace for cars and electronics and its economy boomed. While the country’s financial horizons expanded, not many of its citizens spoke English, and the need for machine translation grew. Unfortunately, the existing methods of rule-based translation couldn’t produce adequate results, as the grammatical structure of Japanese and English are substantially different. To address this, in 1984, Makoto Nagao from Kyoto University discovered that instead of using word-for-word translation, a phrase-to-phrase method would produce a better translation. With this method, the more phrases you add to the database, the easier it is for the system to find a substitute word. For example, if the simple phrase, “I want to drink something,” has already been converted into the target language, then translating, “I want to eat something,” doesn’t require the full sentence to be translated word-for-word. Only the substitution word, “eat,” needs to be found in the dictionary. With EBMT, you only need to decipher the differences between phrases, look up the unknown words, and hope an exception doesn’t exist. This method greatly enhanced the accessibility of machine translation, because complex language rules are generally already built into each phrase. Statistical Machine Translation (SMT)

Around a half-decade after the implementation of EBMT, IBM's Thomas J. Watson Research Center showcased a machine translation system completely unique from both the RBMT and EBMT systems. The SMT system doesn’t rely on rules or linguistics for its translations. Instead, the system approaches language translation through the analysis of patterns and probability. The SMT system comes from a language model that calculates the probability of a phrase being used by a native language speaker. It then matches two languages that have been split into words, comparing the probability that a specific meaning was intended. For instance, the SMT will calculate the probability that the Greek word “γραφείο (grafeío)” is supposed to be translated into either the English word for “office” or “desk.” This methodology is also used for word order. The SMT will prescribe a higher syntax probability to the phrase “I will try it,” as opposed to “It I will try.” Keep in mind that decisions like using the word “office” when translating "γραφείο," weren't dictated by specific rules set by a programmer. Translations are based on the context of the sentence. The machine determines that if one form is more commonly used, it's most likely the correct translation. The SMT method proved significantly more accurate and less costly than the RBMT and EBMT systems. The system relied upon mass amounts of text to produce viable translations, so linguists weren’t required to apply their expertise. The beauty of a statistical machine translation system is that when it’s first created, all translations are given equal weight. As more data is entered into the machine to build patterns and probabilities, the potential translations begin to shift. This still leaves us wondering, how does the machine know to convert the word “γραφείο” into “desk” instead of “office?” This is when an SMT is broken down into subdivisions. Word-based SMT The first statistical machine translation system presented by IBM, called Model 1, split each sentence into words. These words would then be analyzed, counted, and given weight compared to the other words they could be translated into, not accounting for word order. To enhance this system, IBM then developed Model 2. This updated model considered syntax by memorizing where words were placed in a translated sentence. Model 3 further expanded the system by incorporating two additional steps. First, NULL token insertions allowed the SMT to determine when new words needed to be added to its bank of terms. The second step dictated the choice of the grammatically correct word for each token-word alignment. Model 4 began to account for word arrangement. As languages can have varying syntax, especially when it comes to adjectives and noun placement, Model 4 adopted a relative order system. While word-based SMT overtook the previous RBMT and EBMT systems, the fact that it would almost always translate “γραφειο” to “office” instead of “desk,” meant that a core change was necessary. As such, it was quickly overtaken by the phrase-based method. Phrase-based SMT The updated, phrase-based statistical machine translation system has similar characteristics to the word-based translation system. But, while the latter splits sentences into word components before reordering and weighing the values, the phrase-based system’s algorithm includes groups of words. The system is built on a contiguous sequence of “n” items from a block of text or speech. In computer linguistic terms, these blocks of phrases are called n-grams. The goal of the phrase-based method is to expand the scope of machine translation to incorporate n-grams in varying lengths. However, the machine doesn’t compute n-grams in the same way that we process phrases. Instead of using linguistic phrases, as we do in normal speech, the machine approaches its statistical ranking to phrasemes, as normal phrases are not always constructed using standard syntax. With these additions in place, machine translation improved noticeably. This method was rapidly adopted by major tech companies such as Google, Microsoft, and Yandex. For over a decade, phrase-based machine translation was the standard in language translation, making every other method obsolete. Syntax-based SMT Another form of SMT was syntax-based, although it failed to gain significant traction. The idea behind a syntax-based sentence is to combine an RBMT with an algorithm that breaks a sentence down into a syntax tree or parse tree. This method sought to resolve the word alignment issues found in other systems. Disadvantages of SMT One of the main disadvantages that you’ll find in any form of SMT is that if you’re attempting to translate text that is different from the core corpora the system is built on, you’ll run into numerous anomalies. The system will also strain as it tries to rationalize idioms and colloquialisms. This approach is especially disadvantageous when it comes to translating obscure or rare languages. An SMT’s inability to successfully translate casual language means that its use outside of specific technical fields limits its market reach. While it’s far superior to RBMT, errors in the previous system could be readily identified and remedied. SMT systems are significantly harder to fix if you detect an error, as the whole system needs to be retrained. Neural Machine Translation (NMT)

Phrase-based SMT systems reigned supreme until 2016, at which point several companies switched their systems to neural machine translation (NMT). Operationally, NMT isn’t a huge departure from the SMT of yesteryear. The advancement of artificial intelligence and the use of neural network models allows NMT to bypass the need for the proprietary components found in SMT. NMT works by accessing a vast neural network that’s trained to read whole sentences, unlike SMTs, which parsed text into phrases. This allows for a direct, end-to-end pipeline between the source language and the target language. These systems have progressed to the point that recurrent neural networks (RNN) are organized into an encoder-decoder architecture. This removes restrictions on text length, ensuring the translation retains its true meaning. This encoder-decoder architecture works by encoding the source language into a context vector. A context vector is a fixed-length representation of the source text. The neural network then uses a decoding system to convert the context vector into the target language. Simply put, the encoding side creates a description of the source text, size, shape, action, and so forth. The decoding side reads the description and translates it into the target language. While many NMT systems have an issue with long sentences or paragraphs, companies such as Google have developed encoder-decoder RNN architecture with attention. This attention mechanism trains models to analyze a sequence for the primary words, while the output sequence is decoded. Google isn’t the only company to adopt RNN to power its machine translator. Apple uses RNN as the backbone of Siri’s speech recognition software. This technology is continually expanding. Originally, an RNN was mono-directional, considering only the word before the keyed word. Then it became bi-directional, considering the proceeding and succeeding word, too. Eventually, NMT overtook the capabilities of phrase-based SMT. NMT began producing output text that contained less than half of the word order mistakes and almost 20% fewer word and grammar errors than SMT translations. NMT is built with machine learning in mind. The more corpora fed into the RNN, the more adaptable it becomes, resulting in fewer mistakes. One of the main advantages of NMT over SMT systems is that translating between two languages outside of the world’s lingua franca doesn’t require English. With SMT, the source language was first converted to English, before being translated into the target language. This method led to a loss in quality from the original text to the English translation and additional room for error in the translation from English to the target language. The NMT system is further enhanced by its crowdsourcing feature. When users interact with Google Translate online, they are given a primary translation with a few other potential translations. As more people choose one translation over the other, the system begins to learn which output is the most accurate. This means that linguists and developers can step back and let the community optimize the NMT. Disadvantages of NMT It’s easy to see why NMT has become the gold standard when it comes to casual translation. It’s fast, efficient, and constantly growing in capability. The main issue is its cost. NMTs are incredibly expensive compared to the other machine translation systems. They also require more training than their SMT counterparts, and you’ll still run into issues when dealing with obscure or fabricated words. Apart from these drawbacks, it seems that NMT will continue to lead the industry.

Hybrid Machine Translation

In an attempt to mitigate some of the more common issues found within a single machine translation method, approaches to combine certain functions or whole systems entirely have been made. Multi-Engine

A multi-engine approach combines two or more machine translation systems in parallel. The target language output is a combination of the multiple machine translation system's final outputs. Statistical Rule Generation

The statistical rule generation approach is a combination of the accumulated statistical data to create a rules format. The core principle behind this approach is to create a linguistic rule structure similar to an RBMT by using a training corpus, as opposed to a team of linguists. The drawback of this system is the same as a standard SMT. The quality of the output is predicated on its similarity to the text in the training corpus. While this makes it an excellent choice if it’s needed in an exact field or scope, it will struggle and falter if applied to different domains. Multi-Pass

A multi-pass approach is an alternative take on the multi-engine approach. The multi-engine approach worked a target language through parallel machine translators to create a translation, while the multi-pass system is a serial translation of the source language. The source language would be processed through an RBMT system and given over to an SMT to create the target language output. Confidence-Based

The confidence-based method approaches translation differently from the other hybrid systems, in that it doesn’t always use multiple machine translations. This system type will normally run a source language through an NMT and is then given a confidence score, indicating its probability of being a correct translation. If the confidence score is satisfactory, the target language output is given. Otherwise, it is given to a separate SMT, if the translation is found to be lacking.

Why Companies Use Machine Translation

Companies these days need to address a global market. They need access to translators that can produce copy in multiple languages, faster and with fewer errors. That’s why they’re turning to machine translation. Through machine translation, companies can localize their e-commerce sites or create content that can reach a world audience. This opens up the market, ensuring that:

- Revenue increases.

- Customers are happier.

- Overhead decreases.

- Go-to-market strategy is implemented faster.

Normally, companies have to choose between quality, efficiency, and price. Luckily, with Lilt, you don’t need to make that sacrifice. If you want to see how your business can perform on the world stage, Lilt’s NMT technology will help you localize your sites faster, better, and at a lower cost. With Lilt, you have access to the world’s best human translators and the top AI-powered neural machine translation system. We want your company to grow without changing the way you do business, so we’ve designed our translation services to integrate effortlessly into your current workflow. Lilt’s translation specialists work with your team to make any necessary adjustments, so you can focus on what you do best. To learn more about how Lilt can supercharge your localization, request a demo today!

Translate with Lilt today