Skip to main content

DITA Files

File format

DITA

Supported file extensions

.dita, .ditamap

DITA (Darwin Information Typing Architecture) is an XML-based standard for structuring and publishing technical content. When you process DITA files in Wordbee Translator, the platform extracts translatable text while preserving the document structure, so you can generate accurately reconstructed target-language files.


How DITA Files Are Processed

Wordbee Translator includes a dedicated DITA parser that understands the structure of DITA documents. When you mark a DITA file for online translation, the parser:

  1. Identifies translatable elements based on the DITA standard.

  2. Extracts text content into segments for translation in the Editor.

  3. Preserves non-translatable structure so the target file can be reconstructed after translation.

You can find the DITA parser configuration under Settings > Customization > Translation Settings > Document Formats.


Excluding Content with translate="no"

The DITA standard defines a translate attribute that content authors use to mark whether an element should be translated. When an element carries translate="no", it signals that the content is not intended for translation: for example, code samples, product identifiers, or legal boilerplate that must remain in the source language.

Wordbee Translator respects this attribute automatically. Content marked with translate="no" is excluded from extraction, and no segments are created for it in the Editor.

Key behavior

Scenario

Result

Element with translate="no"

The element's text is not extracted for translation

Nested content inside a translate="no" element

All child elements are also excluded, regardless of their own attributes

Elements without a translate attribute

Extracted normally, following standard DITA parsing rules

Elements with translate="yes"

Extracted normally (this is the default behavior)

This behavior is always active when using the DITA parser. No additional configuration is required.

Note

The translate="no" exclusion applies to the entire subtree of the marked element. If a parent element such as <section translate="no"> contains paragraphs, lists, or other nested elements, none of that content will be extracted for translation.

Example

In the following DITA source, only the first paragraph is extracted for translation. The section marked with translate="no" and all its contents are skipped:

XML
<concept>
  <title>Product Overview</title>
  <conbody>
    <p>This product helps you manage translations efficiently.</p>
    <section translate="no">
      <title>Internal Reference</title>
      <p>SKU: WBT-2040-EN</p>
    </section>
  </conbody>
</concept>

Whitespace Compression

DITA source files often contain extra whitespace: line breaks, indentation, and consecutive spaces used for readability in the XML source. By default, this whitespace is preserved as-is in extracted segments, which can lead to inconsistent segmentation or unnecessary spaces in the Editor.

You can enable whitespace compression to normalize consecutive whitespace characters into a single space during extraction.

Enabling whitespace compression

To turn on whitespace compression for DITA files:

  1. Go to Settings > Customization > Translation Settings > Document Formats.

  2. Open the DITA parser configuration.

  3. In the Content section, check Compress sequences of whitespaces into a single whitespace (recommended).

  4. Click Save.

Preserved whitespace in code elements

When whitespace compression is enabled, certain DITA elements where whitespace is semantically significant are automatically excluded from compression. The following elements always preserve their original whitespace:

Element

Purpose

codeblock

Code listings

pre

Preformatted text

codeph

Inline code phrases

screen

Screen output

msgblock

Message blocks

lines

Lines of text where line breaks are significant

Content inside these elements retains its original spacing and line breaks, even when whitespace compression is active for the rest of the document.


Testing Your Configuration

To verify that the extraction works as expected:

  1. Open the DITA parser configuration and click Test configuration.

  2. Upload a sample DITA file.

  3. Review the extracted segments to confirm that content marked with translate="no" is not present.

  4. Mark a file for online translation in a project and verify the segments in the Editor.

  5. Generate the target file and confirm that the document structure is preserved, including both translated and non-translatable content.


Learn More

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.