Wordbee Translator makes it easy to translate searchable PDF documents by providing a PDF to Word converter option. This option eliminates the need to use a third-party tool and converts PDF documents into Word files (.docx) to make text extraction possible. For more information see the Integrated OCR and PDF to Word Converter page in the System features section.
When translating editable/searchable PDF's, an extensive amount of markup and formatting tags are typically present. Before translating an editable PDF, the following steps must be performed in the system to reduce markup:
Text within images will not be extracted, as the PDF converter does not perform optical character recognition (OCR).
Step 1: Create a PDF Format Configuration
The first step is to create a PDF Format Configuration, as this will eliminate the extensive amount of markup that surfaces when you are translating an editable PDF.
Often the markup in these documents is overbearing because formatting tags frequently appear between letters and in a number of locations throughout the document. A PDF Format Configuration may be used to remove unnecessary markup and tags to make it easier to work on the PDF.
To find out more about file format configurations, please see Document Formats in the Administration section.
Step 2: Choose the PDF Configuration
When you mark a document for online translation (Standard and CoDyt projects), you have the option to select a file format configuration. This is where the PDF format configuration created above will be applied in the system.
In order to configure this, you will need to have uploaded at least one document to the project. Then on the Documents Tab when viewing a project (Projects > Select), right-click on the document and do the following:
Choose Translate Yes/No in the drop-down menu and select Mark for Online Translation.
2. Check if the Yes, online option has been chosen in the pop-up window and then click Confirm.
3. Click the File Format Menu and choose the PDF format configuration. Then click Confirm again.
Step 3: Conducting Work in the Editor
After performing the above configuration steps, the system will be able to remove all unnecessary tags from the document for completing work in the editor. A PDF with reduced markup would appear similar to what is shown below: <image below is placeholder>