Skip to main content

Web Page Configuration Options

Configure how Wordbee Translator extracts and handles content from web page files. These settings control encoding, HTML tag behavior, attribute translation, content exclusion, and more.

The web page configuration applies to the following file extensions: .htm, .html, .xhtml, .htmls, .php, .php2, .php3, .php4, .php5, .php6, .phtml, .csm, .jsp, .ahtm, .ahtml.

To access web page configurations:

  1. Go to Settings > Customization > Document Formats.

  2. Select Web Pages from the format drop-down menu.

  3. Click on a configuration profile to view it, or click Edit to modify it.

To learn more about working with file format configurations, see:


General Tab

The General tab controls encoding, HTML code handling, HTML attribute display, content exclusion, and text segmentation.

Screenshot 2026-03-24 143331.png

Encoding

Setting

Description

Default encoding

The character encoding used to read the file. Defaults to UTF-8. Other options include Windows, Macintosh, and ASCII encodings.

Convert incompatible characters

When enabled, characters not compatible with the target encoding are converted into entity references.

HTML Code

These settings control how the HTML markup is presented to translators.

Setting

Description

Hide beginning/ending whitespaces

Hides leading and trailing whitespace characters from the translator view.

Compress sequences of whitespaces

Replaces multiple consecutive whitespace characters with a single space.

Replace   by blanks

Converts non-breaking space entities into regular blank spaces.

Show preceding/trailing HTML tags

Displays the HTML tags that surround the translatable text.

Entity references display

Controls how HTML entity references are shown to translators.

HTML Attributes

These settings control how the content of HTML attributes is displayed to translators when attributes are marked as translatable.

Setting

Description

Show beginning/end whitespaces

Shows leading and trailing whitespace in attribute values.

Compress sequences of whitespaces

Replaces multiple consecutive whitespace characters with a single space in attribute values.

Entity references display

Controls how entity references inside attribute values are shown to translators.

Exclude Content

Use this section to exclude specific content from translation. Enter text segments or regular expressions. When a match is found, you can mark the segment as:

  • Not translatable — the segment is hidden from translators.

  • Translatable — the segment is shown for translation.

  • Potentially not translatable — the segment is shown but flagged for review.

Text Segmentation

Setting

Description

Enable SRX rules

When enabled, text is segmented using SRX rules.

Split text at line breaks

When enabled, a new segment starts at each line break.


Server and Client Side Code Tab

The Server and Client Side Code tab controls how the system handles code sections (JavaScript, PHP, and other server-side code) embedded in web pages.

Screenshot 2026-03-24 143440.png

Extract Quoted Strings

Web pages often contain JavaScript or server-side code (such as PHP) with quoted strings that may need translation. Enable this option to extract those strings automatically.

Setting

Description

Extract quoted strings

When enabled, quoted strings inside code sections are extracted for translation.

Compress sequences of whitespaces

Replaces multiple whitespace characters with a single space inside extracted strings.

Exclude Quoted Strings

Use this section to prevent specific quoted strings from being extracted. Enter text segments or regular expressions. When a match is found, the segment can be marked as translatable or not translatable.

Include or Exclude Additional Content

Use regular expressions to extract text inside code sections that goes beyond quoted strings. The expressions can capture any content.

Note

The regex must contain capture groups named pattern1, pattern2, etc. For example: @(?<pattern1>.*?)@ extracts any text delimited by @.


HTML Tags and Attributes Tab

The HTML Tags and Attributes tab controls which HTML attributes are extracted for translation, which tags are treated as inline (non-breaking), and which tags preserve whitespace.

Screenshot 2026-03-24 143510.png

Translatable Attributes

This grid defines which HTML attribute values are extracted for translation. By default, common attributes such as alt, title, placeholder, content, and value are pre-configured.

Each row in the grid specifies a rule with the following columns:

Column

Description

Attribute

The name of the HTML attribute (for example, content, alt, title).

Value

An optional filter for the attribute’s own value. Leave empty to match all values of the attribute. When a value is specified, the rule applies only when the attribute contains that exact value. Displays (any) when no filter is set.

Parent tag

An optional filter for the parent HTML tag. For example, setting this to meta restricts the rule to attributes within <meta> tags only.

Advanced condition

An optional condition based on a sibling attribute. For example, you can require that a sibling attribute name has the value description for the rule to apply.

Translate

Set to Yes to extract the attribute value for translation, or No to exclude it.

Use regex

When enabled, all text fields in the row (attribute name, value, parent tag, and condition) are interpreted as regular expressions instead of exact matches.

To add a translatable attribute rule:

  1. Click Edit in the upper right corner.

  2. Enter the Attribute name (for example, content).

  3. Optionally enter a Value to filter by (for example, HELP).

  4. Optionally enter a Parent tag (for example, meta).

  5. Set Translate to Yes or No.

  6. Click Save to apply the configuration.

Screenshot 2026-03-24 143721.png

Filtering by Attribute Value

The Value column allows you to target specific attribute values instead of applying a rule to every instance of an attribute. This is useful when your HTML contains the same attribute name with different values that require different handling.

How value matching works:

  • No value specified (empty): The rule applies to all instances of the attribute, regardless of its value. This is the default behavior.

  • Value specified: The rule applies only when the attribute’s value matches the specified text exactly. Matching is case-sensitive.

  • Regex enabled: When Use regex is enabled for the row, the value is treated as a regular expression pattern.

Example: Translating only specific meta tag content

Given the following HTML:

HTML
<meta name="description" content="About us">
<meta name="keywords" content="HELP">

To translate only the content attribute of the description meta tag and exclude keywords:

Attribute

Value

Parent tag

Translate

content

(empty)

meta

Yes

content

HELP

meta

No

The first row marks all content attributes within <meta> tags as translatable. The second row overrides this for the specific value HELP, excluding it from translation. The result: About us is extracted for translation, while HELP is not.

Value-specific rules take precedence

When both a general rule (no value filter) and a value-specific rule exist for the same attribute, the value-specific rule always wins, regardless of row order in the grid.

Example: Using regex to match a pattern

To translate only title attributes whose values start with translate:

Attribute

Value

Parent tag

Use regex

Translate

title

^translate.*

(empty)

Yes

Yes

This matches <p title="translate-me"> but not <p title="do-not-translate">.

Non-Breaking Tags

Non-breaking (inline) tags appear within translatable text rather than splitting it into separate segments. These are typically links, images, or text formatting elements.

The following tags are pre-configured as non-breaking: a, acronym, b, big, blink, br, cite, code, dfn, em, font, i, iframe, img, kbd, s, small, span, strike, strong, sub, sup, tt, u, var, ruby, rt, rc, rp, rbc, rtc, asp:label.

You can add additional non-breaking tags if needed for your content. Tag names are case-insensitive.

Whitespace Preserving Tags

Whitespace is generally collapsed in HTML. Tags listed in this section are exceptions: whitespace inside them is preserved during parsing.

The following tags are pre-configured: pre, script, style.

This section is read-only and cannot be modified.


CMS Specific Settings Tab

The CMS Specific Settings tab controls how the parser handles custom markup used by content management systems such as WordPress or Drupal.

Screenshot 2026-03-24 143809.png

Many CMS platforms use "shortcodes" — special markup enclosed in square brackets — within HTML content. For example: [image title="This is a text"]. Shortcodes are markup and do not need translation.

Setting

Description

Content between double brackets is considered markup

When enabled, text enclosed in square brackets (shortcodes) is treated as non-translatable markup.

Tip

If certain shortcode attributes need translation (for example, the title attribute in [image title="..."]), add those attribute names in the Translatable Attributes grid on the HTML Tags and Attributes tab.


Post-processing Tab

The Post-processing tab defines regex-based find and replace rules that are applied to the translated output file during reconstruction. Use these rules to adjust markup, inject attributes, or rewrite CSS for specific target languages (for example, to add dir="rtl" and lang="ar" to HTML output when translating into Arabic).

Rules run every time the translated file is generated: both when previewing a download and when creating a delivery. They are applied sequentially, in the order listed.

The Post-processing tab showing five example RTL rules targeting Arabic, Hebrew, Farsi, and Urdu output.

Post-processing Rules

Each row in the grid defines one rule with the following columns:

Column

Description

On

Enables or disables the rule. Set to Yes to apply the rule, or No to skip it without deleting it.

Language pattern

A regular expression matched against the target language code. Leave empty to apply the rule to all target languages. For example, ^(ar|he|fa|ur) applies the rule only when the target is Arabic, Hebrew, Farsi, or Urdu.

Search regex

The regular expression pattern to find in the output text. Use capturing groups (parentheses) to reference parts of the match in the replacement.

Replacement

The text that replaces each match. Reference capture groups from the search pattern with $1, $2, and so on.

To add a post-processing rule:

  1. Click Edit in the upper right corner.

  2. Add a new row to the Post-processing rules grid.

  3. Set On to Yes.

  4. Optionally enter a Language pattern to limit the rule to specific target languages.

  5. Enter the Search regex to match text in the output file.

  6. Enter the Replacement text.

  7. Click Save to apply the configuration.

When Rules Are Applied

Rules run on the fully reconstructed output file, so they can target any part of the document — including CSS declarations inside <style> blocks, inline attributes, or text content. They are applied both when generating a preview download and when creating a delivery.

Note

Post-processing runs after translation and reconstruction. It does not affect the content presented to translators in the Editor, only the final output file.

Example: Right-to-Left (RTL) Output for Arabic and Hebrew

When an HTML file is translated from a left-to-right source language (such as English) into a right-to-left language, the output often needs two kinds of adjustment:

  1. The <html> tag should include dir="rtl" and a matching lang attribute.

  2. Explicit direction: ltr and text-align: left CSS declarations should be flipped to their RTL equivalents.

The ruleset below is a starting point for Arabic, Hebrew, Farsi, and Urdu output. It is not a ready-to-use solution: each target document has its own markup and CSS structure, and the rules may need to be adapted, extended, or removed depending on the template you are translating.

On

Language pattern

Search regex

Replacement

Yes

^(ar|he|fa|ur)

(<html\b(?![^>]*\sdir\s*=)

$1 dir="rtl"

Yes

^(ar|he|fa|ur)

(<html\b[^>]*)\slang\s*=\s*["']*[^"']*["']

$1 lang="ar"

Yes

^(ar|he|fa|ur)

(<html\b(?![^>]*\slang\s*=)

$1 lang="ar"

Yes

^(ar|he|fa|ur)

direction\s*:\s*ltr

direction:rtl

Yes

^(ar|he|fa|ur)

text-align\s*:\s*left\s*;

text-align:right;

Warning

Always verify the generated output against your actual source files before using post-processing rules in production. Regex replacements run against the entire document, so overly broad patterns can produce unintended changes. Treat the ruleset above as an example to adapt, not as a finished configuration.


Learn More

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.