Prepping messy files for TM usage
Written by ForeignExchange Translations on Thursday, December 10, 2009
Translation memories are great - IF the TM databases are clean and IF the source files are well structured. Unfortunately, that is not always the case.
Luckily, Dave Turner has taken it upon himself to remedy this situation. He has developed and made available free of charge two terrific tools.
The first is his Code Zapper macro, which can be used to remove rogue codes. For example, it moves place markers from the middle of words to the end of the paragraph.
Dave then followed this up with Format Fixer, which:
- deletes leading spaces and tabs inserted typewriter style to indent text, and sets the equivalent indent,
- deletes excess spaces between words,
- deletes excess paragraph marks and sets the equivalent vertical spacing,
- attempts to correct frequent punctuation errors (space before comma or inside a parenthesis for example),
- tries to fix PDF converted files (removes hard and soft returns to make text wrap properly),
- adds a space between a number and a letter as in 20ohm, 10daN -> 20 ohm, 10 daN
[Thanks to Kevin Lossner's blog for the tip!]
UPDATE 2010-01-02: Dave Turner published an updated version of CodeZapper that features improvements in the PDFTidy and PDFFix routines. PDFTidy should provide better tidying up of PDF converted files before CZ is run. PDFFix should provide better elimination of stubborn rogue codes, especially in PDF converted files.
For more neat TM tools, check out these resources:
- CSVConverter is a free utility that converts glossaries stored in CSV (Comma Separated Values) to TMX (Translation Memory eXchange) standard.
- We recently looked at different options for taking the pain out of translating text in images.
- And if you are looking for still more tools, check out the mother lode of free localization tools.
Subscribing to Medical Translation Insight via email or RSS provides you with daily news regarding language, technology, and regulations.
Categories: tools, translation memory
0 Comments:
Subscribe to:
Post Comments (Atom)




