How to translate PDF Files

Wordbee Translator makes it easy to translate searchable PDF documents by providing a PDF to Word converter option. This option eliminates the need to use a third-party tool and converts PDF documents into Word files (.docx) to make text extraction possible. For more information see the Integrated OCR and PDF to Word Converter page in the System features section.

When translating editable/searchable PDF's, an extensive amount of markup and formatting tags are typically present. Before translating an editable PDF, the following steps must be performed in the system to reduce markup: 

Text within images will not be extracted, as the PDF converter does not perform optical character recognition (OCR).

Step 1: Create a PDF Format Configuration

The first step is to create a PDF Format Configuration, as this will eliminate the extensive amount of markup that surfaces when you are translating an editable PDF.

Often the markup in these documents is overbearing because formatting tags frequently appear between letters and in a number of locations throughout the document. A PDF Format Configuration may be used to remove unnecessary markup and tags to make it easier to work on the PDF.

  1. To create this configuration, click on Settings in the toolbar and then click on PDF in the Document Formats option. 

2. Then click on Add New to create a new PDF format configuration. 

3. Enter a Name for the configuration.

4. On the General page, in the Content section, select Extract text from PDF.

5. Click the Reduce markup tab. 

6. Tick the following options:

OCR Noise Reduction and Merge Adjacent Markup Elements Into Single Markup. Click on the checkbox next to each option to enable it.

These options inform the system to remove as many tags as possible from the PDF. Remember to click on Save in the upper right corner to finish creating the PDF format configuration. 

To find out more about file format configurations, please see Document Formats in the Administration section.

Step 2: Choose the PDF Configuration

When you mark a document for online translation (Standard and CoDyt projects), you have the option to select a file format configuration. This is where the PDF format configuration created above will be applied in the system.

In order to configure this, you will need to have uploaded at least one document to the project. Then on the Documents Tab when viewing a project (Projects > Select), right-click on the document and do the following: 

  1. Choose Translate Yes/No in the drop-down menu and select Mark for Online Translation.

2. Check if the Yes, online option has been chosen in the pop-up window and then click Confirm.

3. Click the File Format Menu and choose the PDF format configuration. Then click Confirm again.

Step 3: Conducting Work in the Editor

After performing the above configuration steps, the system will be able to remove all unnecessary tags from the document for completing work in the editor. A PDF with reduced markup would appear similar to what is shown below: <image below is placeholder>

