Tackling a Localisation Myth – I – Machine Translation is Upon Us
Let me set the scene. I recently had a three way conversation with a very valued and long established customer and a senior executive from a machine translation technology company. The meeting was called to discuss the possibility of using machine translation to help bring down the costs of localisation (currently a $4m annual spend for my customer) and was prompted by a senior executive at the company who had heard at a recent conference (as one does!) of the enormous and instant savings that can be made in this area. My customer contact was simply told to “make it so”.
After 45 or so minutes of a concerted and collaborative effort on the part of the MT company representative and myself to convince the customer that her material was neither suitable or large enough to warrant even the investment of a pilot project (I am not joking here!) she decided to go ahead and spend the $20,000 it would cost to do such a pilot anyway.
Such was the power of the original suggestion that a usually controlled and circumspect professional was willing to disregard the people who she knew would know better and try to reach for the fantasy world of “free” high quality translation. Wishful thinking in the extreme.
Maybe you have been asked to investigate machine translation as a potential cost saving vehicle for your company or maybe you’ve seen enough of the web based free apps (here’s one!) that accurately translate “can you direct me to the nearest train station” in almost any language, to believe that it’s not a leap from that to the full blown free translation of your UI, Help and Documentation. Or maybe you’ve just seen enough evolution in technology in the last five years to make you believe that this must be possible soon, if not now.
Whatever you’re reasoning, here are the “undeniable truths” that you’ll have to get your head round before you can even begin to think about putting any plan into action:
- There are companies far, far bigger than yours who have invested millions of dollars into machine translation over the last ten years who still will admit to being a ways from having a viable solution (certainly in a general sense). Microsoft is one (if you’re looking for one!).
- Any machine translation system needs a whole lot of coaching and coaxing to even begin to generate results. By this I mean it needs hundred of thousands of paired segments (or matches) so it can learn what to do and what not to do. You need to pay for this coaching, most likely by the hour. You’ll need to do this for every language you want to make savings on.
- A typical “industry standard” minimum source word count needed to justify any savings is 1 million words.
- Some language pairings are more “advanced” than others. This is because some languages (Asian languages like Simplified Chinese for example) are simpler in construct (lack of gender and cases). German, for example, is much more difficult to program for. On an interesting side note, Arabic to English is a good pairing, following the multi million investment on behalf of the US government as part of their fight against terrorism.
- Once a machine translation system generates a translation, it needs to be post edited (by a human!). Basically, the task of the post-editor is to edit, modify and/or correct pre-translated text that has been processed by a machine translation system. More on post translation can be found here if you’re interested. Suffice to say that the “better” translators tend to avoid post editing duties, with some preferring to work with the original source rather than trying to understand the machine output when trying to get a sense for what is being said in the text.
- In all likelihood the adoption of any machine translation system will result in restrictions being placed on your source material. Machine translation systems need simple constructs (or at least as simple as possible) to generate results. You need to know that the quality of your source material will be affected accordingly.
- On a related point, the subject matter needs to be “straightforward”. If your material is in any way complex you’re probably wasting your time.
- On a positive note (I need to appear balanced on this subject!!) some companies have found that they benefited from machine translations in applications where the material will not be seen (or seen very rarely) by customers. Knowledgebase systems for example (where you have a company wide HR database or something similar (that is for internal company use only)) might lend themselves to a possible benefit.
In summary, my advice to anyone who is considering a machine translation trial, is to talk to people who have benefited from it. Be suspicious. Assume what you’re being sold is being talked up. Get references, and be sure to check that their material is similar or has a similar application, to yours.
And even then you should realise that it’s a long, hard and expensive road to when those machines start spitting out your material in multiple languages.
At least that’s how it looks from where I’m sitting…….
You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.


@Glyn: In all likelihood the adoption of any machine translation system will result in restrictions being placed on your source material. Machine translation systems need simple constructs (or at least as simple as possible) to generate results. You need to know that the quality of your source material will be affected accordingly.
For many types of text, restrictions improve the quality of the text. For example, for technical documentation, complex sentence structures and synonyms are not good.
@Glyn: On a related point, the subject matter needs to be “straightforward”. If your material is in any way complex you’re probably wasting your time.
In what way is a ‘straightforward’ subject different from a complex subject?
The subject matter is separate from the language structure. You can use a simple language structure to explain a complex subject such as nuclear physics.