How do you measure the quality of a translation if you don't understand the translated text?
Kirti Vashee gave his answer in a recent guest post on the Tomedes Blog.
Kirti advocates the use of BLEU. Essentially, the BLEU metric measures how many words overlap in the source and target texts, giving higher scores to sequential words. BLEU scores a translation on a 0 to 1 scale. The higher the score (i.e., the closer to 1), the more overlap there is with a human reference translation and thus the better the translation is.
He is not minimizing the challenge of measuring translation quality:
Anybody who has tried to measure translation quality will understand the difficulty of doing this in a way that has any general credibility. Developers of statistical machine translation systems, in particular have to grapple with this issue on a constant basis to understand how to evolve the state of the technology.And while there may be a role for the BLEU metric, there are clearly some shortcomings as well. Kirti's post as well as a recent academic paper [PDF link] and a newspaper article all highlight serious problems and limitations of the BLEU approach.
What's interesting is that all of these discussions are moving away from the goal of having "perfect" output for machine translation tools. Academics, computational linguists and software engineers are doing a nice job resetting expectations to "good enough" translation output.
Three more to read:
- What constitutes “good enough” in (machine) translation?
- Why measure translation quality?
- ASTM F 2575-06: A Practical Guide for Achieving Translation Quality
ForeignExchange Translations provides specialized medical translation services to the world's leading pharmaceutical and medical device companies.