Primer: Translation memory vs. glossary

Translation memory vs. glossary - medical translationLike most industries, the translation business is full of jargon, acronyms, and expressions that are weird and confusing to lay people (L10N, anyone?).

But few concepts are as confusing to newcomers as translation memory and glossary. If I had a dime for every time these terms where used incorrectly (usually meaning the other one), then I wouldn't be writing this blog today. So, here is a look at some of the differences between translation memories and terminology glossaries.

What is translation memory?
A translation memory (abbreviated as "TM") is a database of pairs of translation segments (source language and target language). A segment is usually a sentence but can also be a sub-heading, a table cell, or other text block.

In contrast to machine translation, a translation memory is not an automated translation tool: it does not have any linguistic "knowledge". TMs made up of previously-translated material. Essentially, TMs become more useful as more segments are added to it.

How does a TM work?
When a linguist is translating a document, they will have a TM open. If there is no content in a TM (such as for a new client), they will open a blank TM. This enables them to create a kind of file called an "unclean" or "bilingual" file, which contains the source text and the translation. Once the translation is complete, the unclean file can be "cleaned" into a TM, which removes the English (leaving just the translation), and populating the TM with the pairs of segments. Doing this over and over again with translated files makes the TM larger and more useful.

TMs work on a segment-by-segment basis. The TM application looks to see if there is a match between a source segment in the text to be translated and a source segment that was previously stored in the TM database. If there is a match, then it presents this information in a log file (this is called an "analysis"). The number of matches it finds is called "leveraging".

An exact match is called a 100% match. Segments that almost match are called fuzzy matches and range from a less than 75% match all the way to a 100% match. Segments that are brand new are called no matches. If a segment is repeated more in the file(s) for translation than once it is called a repetition. Translation Memories are able to analyze multiple files at once in order to gain leveraging across a whole project; not just one file.

While working on a file, the translator can then accept a translation, replace it with a new one, or edit it to match the source. Some "high fuzzy" matches (i.e., those with a 95-99% match level) will require only a little editing, whereas no matches will require a brand new translation.

Industry standards for charging for TM matches
There are some informal standards on how clients get charged for repeated text, 100% matches and fuzzy matches. Usually, repetitions and 100% matches are charged at a rate of 10% of the full per-word rate. This ensures that a linguist reviews this text (important because even though the leveraged text might not change, the context around it could have changed).

For fuzzy matches the charge to the client depends on the degree of the fuzzy match: high fuzzy matches (95%-99%) are usually charged at a lower percentage than low fuzzy matches (75%-85%). No matches are charged at the full per-word rate.

What is the difference between a TM and a glossary?
While TMs work on a sentence-by-sentence level, glossaries are list of individual terms or short expressions. Here is an example:

Accident InvestigationsEnquĂȘtes sur les accidents
Accident ManagementGestion des accidents

A glossary is independent of a TM. Some device and pharmaceutical companies maintain terminology glossaries by product line, by department, by division, or company wide. Sometimes they are short (50 entries) but they can run into the hundreds of entries.

One of the big differences is in what gets included. TMs usually include anything and everything translated - the more, the better.

Glossaries, on the other hand, should be built selectively. For example, clients with software applications sometimes think that the glossary should contain all software strings - and maybe even error messages, status messages, and the like. This will make the glossary unwieldy for the translators and unmanageable for the client.

Linguists generally use glossaries alongside a TM. For example, if a translator comes across a 75% match, but some terminology is different, they may find that terminology in a glossary. Without a TM, a glossary can be very useful especially for technical pieces. If the terminology is already translated, the linguist then only has to translate around these terms, making their job easier and faster. Making sure that the glossary and the TM are in concordance with each other (i.e., the TM does not contain different terminology from the glossary) is very important.

Similarities between TMs and glossaries
To be effective, both TMs and glossaries need to be managed. It is not sufficient to create a TM and think that it will never need to be touched again. Just like any database, glossaries and TMs need to be updated and cleaned as new terminology expressions emerge, or as languages themselves change.

Larger drug and device companies have staff dedicated to this. Smaller companies and departments usually include this as part of the solution that they expect from their translation service provider.

ForeignExchange Translations provides specialized medical translation services to the world's leading pharmaceutical and medical device companies.


  1. Hynek said...
    Good post. Some translators still think CAT tools translate individual words.

    But I do not agree with the following statement: Usually, repetitions and 100% matches are charged at a rate of 10% of the full per-word rate. Is your proofreading rate 10% of your translation rate?
    Paulina said...
    That was a nice sum-up! It should be added, though, that thanks to the "concordance" option, a TM can be searched for terms and thus used as a glossary (to some extent). This function displays segments that contain the searched term and translations of those segments.
    ForeignExchange Translations said...
    Paulina: You are right about the concordance search feature. Very handy and sort of a "cross over" feature.

    Hynek: It depends on the language but roughly, yes, proofreading is about 10% of the translation rate.

    100% matches, in particular, provide a unique challenges:

    They are really what TMs "are all about" and should be of the highest quality. But how do you maintain that quality? Just like any database, the quality of TM databases degrades over time. Who, who, when is maintenance performed?

    You might also want to look at 100% text repetitions: To review or not to review and What you need to know about translation memories
    AtalantasWeb said...
    Excellent post. I'm a newbie and this entry comes very handy.

