Skip to main content

Document Formats

When you import a file type (e.g. Word, Excel, XML) into Wordbee, the text is processed by a specific set of extraction rules that you can view and configure in Settings > Customization > Translation Settings > Document Formats.

Click on Configure to the right of Document Formats to search for a specific file and its extension. See GIF.

When you open the configuration of a document format, you will see the Default Configuration. Click on Select to view, edit, or clone the configuration.

Each file has specific configuration options to ensure proper extraction of the required information from the source file for translation. These options are grouped by tabs and sections to make the process of creating or modifying a file format configuration easier and will vary based on the file format you are using for the configuration. 

New

We have improved the filters for all document formats. When you open the configuration of one file type, you will see an additional option: File conditions. Here you can specify conditions for when a certain configuration shall be used. See article:

How to process files with auto-select filters


Default Configuration

Wordbee Translator offers a Default Configuration for every available file format and extension. These default configurations do not work for every situation and are designed to ensure that online translations are successfully completed.

For example, the default configuration for an XLIFF file does the following: 

  • Extracts existing translations

  • Does not show leading and trailing whitespaces

  • Does not show preceding and trailing markup

  • Splits segments at XLIFF segmentation boundaries

  • Enables SRX Rules for text segmentation

This configuration is not pre-configured to handle extraction for XLIFF files that are HTML or that contain HTML content. If you need to use Wordbee Translator to extract an HTML based XLIFF file, then a different configuration must be used that has the "Content is HTML" option checked as part of the XLIFF file format configuration.

The same applies for accomplishing specific extraction or exclusion objectives with Microsoft Word, Microsoft Excel, Code Files, and other types of formats for translation. 


Custom Configurations

File format configurations may be used to accomplish many tasks such as omitting red text from the translation of a Microsoft Word File or translating an XLIFF HTML file.

Certain needs are simply not covered by the default configurations provided by Wordbee Translator. In these instances, it makes more sense to create a custom file format configuration, as using the default will result in either an error or unwanted results in the completed translation.

With Wordbee Translator, you can do the following and more with custom file format configurations: 

  • Mark an XLIFF file as HTML.

  • Not extract File Headers/Footers in a Word file. 

  • Omit certain colors of text from a Microsoft Word File, Excel file, etc.

  • Not translate specific segments within a Word, Excel, or another file type.

  • Define specific columns or rows to translate in an Excel file.

  • Configure the extraction of embedded files.

  • Change the default character encoding of a code file.

  • Exclude quote strings or additional content from code files.


How to view, modify and test a configuration

See the sections listed below to learn how to create, view, modify and test the settings of document formats.


Supported Formats, Versions, and Extensions

Wordbee Translator allows you to create configurations for the following formats, versions, and file extensions. 

Supported Document Formats

Wordbee Translator supports a wide range of file formats and advanced configuration options.

List of document formats supported by Wordbee Translator

Adobe FrameMaker files (.mif): You can translate all Framemaker versions starting at v8 and using Unicode fonts.

Adobe InCopy files (.icml): We support from CS4 to latest version.

Adobe InDesign files ( .indd, .idml): IDML & INDD CC - There is a Preview function available.

Adobe Photoshop files (.psd): The configured filter extracts both text and formatting information. Translations are exported as either Excel, Html or Ods.

ASP.Net files: Localization of web pages (.aspx), controls (.asmx) and resource files (.resx).

Code files (.cs, .inc, .js, .css, .jfs, .cls, .asax, .asa, .c, .cpp, .h.): Localization of source code files such as javascript, css, c, java or c# code.

CSV files (.csv): Character-Separated Values. The separator can be either a comma, a semicolon, a tabulation etc.

DITA files (.dita, .ditamap): Darwin Information Typing Architecture (DITA) is an XML data model for authoring. Besides the DITA XML format we also support DITAmaps.

Email messages ( .msg, .eml, .oft)

Image files (for example, .jpg, .jpeg, .png, .bmp, .gif): If one of the Google or Microsoft optical character recognition (OCR) services is enabled, you can extract text from the image files to the Wordbee Translator editor. Alternatively, you can have blank segments created in the editor; then translators can view the source file and enter translations in the blank segments.

INI files (*.ini): Software configuration files. INI files are simple text files with a basic structure composed of sections, properties, and values.

iOS filesiOS resources files (.strings).

Java properties: You can translate (.properties) localization nodes according to user's configuration. 

JSON files (.json): JSON node localization according to user’s configuration. We support JSON version 1 and 2.

Microsoft Excel (.xls, .xlt, .xlsx, .xlsm, .xltx, .xltm, .xlsb): All versions from 97. The file filter handles embedded graphics and charts. You can even translate embedded Office files. Preview available.

Microsoft Excel Multilingual files: Translate Excel sheets that contain different languages in different columns. For example: column A is English, B is French, C is German and D are comments/instructions. Once translated into all target languages, the Excel is updated while formatting, headers etc. are preserved. Learn more: How to translate multilingual excel files.

Microsoft Excel Multiple Sheets: There is a configuration available that enables you to translate only the first sheet of monolingual Excel files with multiple sheets. No data from the other sheets are pulled and segmented, yet their content is retained in the final file.

Microsoft PowerPoint (.ppt, .pot, .pps, .pptx, .pptm, .potx, .potm, .ppsx, .ppsm): All versions from 97. The filter handles embedded graphics and charts. You can even translate embedded Office files. Preview available.

Microsoft Visio files files (.vsd): All versions from 97 to 2010.

Microsoft Word files files (.doc, .docx, .dot, .dotx, .docm, .dotm): All versions from 97. The filter handles embedded graphics and charts. You can even translate embedded Office files. Preview available.

Microsoft.Net resources (.resx)

OpenOffice files files ( .odt, .ods, .odp):  You can translate Open Document Text files.

PDF files (.pdf): Wordbee handles editable PDFs. Scanned documents saved as PDFs, where the text is embedded in an image of the document, require OCR and an alternative workflow. Learn more: How to translate PDF files.

Plain text files files (.txt, .utxt, .utf8, .text): Any kind of text files, such as .txt or .utf8. The filter is highly customizable with regular expressions, exclusion patterns and more. The filter can auto-detect encoding.

POT/PO files (.po, .pot): This file type is used by many CMS systems such as Drupal. Drupal uses the .po (portable object) and .pot (portable object template) extensions for the translation files. The PO files contain the actual translations, whereas the POT files are the template files for the PO files.

RTF files (.rtf): You can translate classic rich text files.

SRT files (.srt, .stlu): You can translate subtitling .srt files. Both the segments and the time code are imported into the Wordbee CAT tool. Preview function available.

STL subtitle files: Wordbee supports the European Broadcasting Union (EBU) subtitle format, which is widely adopted in the broadcast industry. Learn more: How to prepare your files for online subtitling

SVG files (.svg): You can translate .svg vector graphics files.

Trados bilingual files (.bak): This is a Trados uncleaned bilingual file format.

Transit language files (.deu, .eng, .fra, .*): These are file types created in Star Transit translation system.

TTX files (.ttx): TTX node and attribute localization according to user’s configuration.

Web pages (.htm, .html, .xhtml, . htmls, .php, .php2, .php3, .php4, .php5, .php6, .phtml, .csm, .jsp, .ahtm, .ahtml): The HTML filter can extract the alternate image text. The web pages configuration allows users to define translatable attributes and parent tags that should be extracted for translation. In-context preview of HTML files is possible when the projects are sent via the Beebox connector.

WebVTT subtitle files (.vtt): We can process Web Video Text Tracks format (WebVTT) files. These files are used for HTML5 <track> element subtitling, see Wikipedia. Subtitle ID and timestamps are extracted and shown to the translator. The default character encoding is UTF-8.

Wordbee Beebox files: Wordbee Beebox is a middleware that translates content management systems such as WordPress, Drupal, SiteCore, Adobe CQ5 etc. We have a special configuration to process the files received from Wordbee Beebox. Files have the .wbloc extension and are directly sent from Beebox to this platform, if such a link was established.

Wordbee Flex files: This is a special filter for the Wordbee JAWS product line as well as Wordbee Flex API Integrations.

XLIFF files (.xlf, .sdlxliff, .xliff, .xlif, .txlf): We support XLIFF version 1 and 2. You can translate bilingual SDLXLIFF files created in SDL Trados Studio.

XML files (.xml): XML node and attribute localization according to user’s configuration.

XSL files (.xsl, .xslt): XSL is a language for expressing style sheets. It describes how to display an XML document of a given type.

YAML files (*.yaml, .yml): The acronym stands for “YAML Ain’t Markup Languag.” This file type is commonly used for configuration files and in application where data is stored and transmitted.

Couldn’t find the file format you were looking for?

You can translate any document of human-readable format by creating custom filters that can match a specific file extension, file name, or file content. Find out more inHow to process files with Auto-Select Filters.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.