Translating scanned PDFs in dossiers and clinical research studies
Written by ForeignExchange Translations on Saturday, February 21, 2009
A recent entry in the Yndigo blog referred to "dirty" documents common in legal translations. The same is true in translations of materials related to clinical trials and regulatory submissions: Often editable files simply are not available, and we have to work with scanned PDFs.
A typical dossier, for instance, may contain thousands of pages that are uneditable, usually because the source documents were created a long time ago and faxed or simply converted to a graphic format. Because time is critical in these projects, retyping this content is not an option.
Where text is reasonably legible and not in table format, OCR tools such as ABBYY FineReader or OmniPage offer good solutions. Realizing that there is an opportunity here, Adobe is paying more and more attention to drug and device companies and offers custom life science solutions. What's particularly interesting is that the recently released Adobe Acrobat 9 offers much-improved OCR features right in Acrobat.
While this is great news for pharmaceutical and device companies (more and more regulatory agencies require submissions in fully electronic formats), it's a mixed bag for translation suppliers. On one hand, OCR makes the job of translating these documents a lot easier. On the other hand, OCR is far from perfect and will often result in odd and hard-to-find typos and errors, making the job of editors and proofreaders more demanding.
What tricks and tips do you have for dealing with non-editable PDF files?
Did you like this post? Subscribe to Medical Translation Blog via email or RSS.
Categories: pharmaceuticals, tools
1 Comment:
-
- Anonymous said...
February 24, 2009 11:10 PMSo far I have not come across an OCR tool that renders usable results for translation. Scanned documents are often in source languages other than English, which makes the scanning results (even for rough word counts) even more unuseful. My recommendation is to to simply type the translation into a new Word file and pay based on target word count or per page.




