Whitespaces and Symbols

This page provides detailed information on the Whitespaces and Symbols section in the Microsoft Word format configuration.

  1. Do not show leading and trailing whitespaces:

  • Any leading or trailing blank in a segment will be hidden in the editor (and inserted back in the translated file).

  • This option is very useful since leading or trailing whitespaces do not add value for the translator.

  • For Asian languages, whitespaces may or may not be inserted back into the translation - see respective options for that.

  1. Convert sequences of multiple whitespaces into markup:

  • Collapsing whitespaces means any sequence of whitespaces is converted to a simple blank. For example:; A tab + 1 blank + 1 non-breaking space + 1 newline (4 chars total) will be transformed into 1 simple blank.

  • This option is inherited from HTML filters where such collapsing absolutely makes sense since any sequence of whitespaces is rendered the same as single blanks in a browser. They are redundant. In MS Office it has fewer use cases.

The two remaining options are tightly related to how Microsoft Office decomposes text into so-called “runs”. For example, a text like “Hello world ???“ is just a single contiguous run. But “Hello world ###” is actually 3 runs. This is because the portions of text are formatted differently. A hyperlink within a text is also considered a different run, as is an image embedded in a sentence.

  1. Do not show leading and trailing characters that are neither letters nor digits:

This option removes leading or trailing runs that do not contain any letters or numbers:

  • Hello world ###” ==> “Hello world” (the ###-run is trimmed and hidden from the translator)

  1. Convert words containing neither letters nor digits into markup:

This option collapses runs to self-closing markup if they do not contain any letter or digit:

  • Hello ### world ###” ==> “Hello <f1/> world </f2>”

Why would you tick option 3?

  • Leading or trailing runs that are special characters and, on top, formatted differently from the main body of text are likely control characters or some codes that do not require translation. If a document only contains regular text, ticking or unticking the option will not make much difference. There is no risk in ticking; you can only benefit.

Why would you tick option 4?

  • It should generally not be ticked unless the document is very specific and contains control character sequences that are designated by specific formatting. This option was added for particular use cases.

