The following file extensions are supported when setting up file format configurations for Code Files: .cs, .inc, .js, .css, .jfs, .cls, .asax, .asa, .c, .cpp, .h.
For more information, please see the following sections:
To learn more about working with file format configurations, please see the following pages:
- View file format configurations
- Modify file format configurations
- Create file format configurations
- Test and validate file format configurations
Default Code File Configuration
Every file format has a default configuration to ensure that a file can be translated; however, it does not handle every complex property that could be thrown your way when translating a source file. The default configuration for Code Files does the following:
- Uses UTF-8 as the default character encoding.
Compresses sequences of whitespaces into a single whitespace.
Uses //notrans, //beginnotrans ... //endnotrans to delimit non-translatable code.
- Defines two regular expressions for text extraction: one for translatable content and one for content that should not be translated.
- Uses SRX Rules for text segmentation.
Custom Code File Configurations
If you are performing a code file translation, a custom file format configuration might be necessary to achieve the right results in your target file. Outside of the default configuration selections, Wordbee Translator offers many additional choices for configuring:
- Character Encoding
- Translation of HTML Content
- Quoted String Extraction
- Quoted String Exclusion
- Including or Excluding Additional Content
- Text Segmentation Rules
- Omitting Words, Terms, or Segments from the Translation
Code File Configuration Options
When setting up a file format configuration for Code Files, there are many options to choose from to ensure extraction is successful.
The General Tab contains options for configuring the type of encoding, extracting HTML content, extracting quoted strings, excluding quoted strings, including or excluding additonal content, and text segmentation. The options are described in general below based on individual sections.
- Encoding - The default encoding selection for web pages is UTF-8; however, the encoding option may be used to select a different type of encoding such as Windows, Macintosh, ASCII, etc. An option is also provided for checking the encoding of a file to learn what encoding type is used.
- HTML Content - This option should be enabled if the code file contains HTML content. By default, the option is disabled to ensure a successful online translation; however, if HTML content is present you will need to enable this option and choose an HTML configuration to use for the translation. Within this configuraiton section, you may also elect to split text at HTML breaking tags.
- Extract Quoted Strings - Code files typically contain translatable text in quoted strings. By default, all quoted string are extracted, but you can add regular expressions within the "Exclude Quoted Strings" section of the configuration to exclude certain strings from the translation. The options in this section are specifically for handling the extraction of quoted strings.
- Exclude Quoted Strings - This configuration section may be used to define patterns to find and exclude specific quoted strings from translation. The system checks each pattern and if one matches the text, it is not extracted for translation. Several pre-defined patterns exist within this section and are configured to not be translatable. They may be modified, marked as translatable or removed within this configuration section. You may also add new patterns to handle more specific translation needs.
- Include or Exclude Additional Content - With this option you can do two things: 1)Extract translatable content from any location of the file (not just quoted strings) and 2) Completely prevent sections or areas of the file from being translated. The expressions MUST contain capture groups named "pattern1", "pattern2", etc. Capture groups are extracted for translation. Example: @(?<pattern1>.*?)@ will extract any text delimited by "@".
- Text Segmentation - This configuration section may be used to enable/disable SRX rules for text segmentation or to elect to always split text at line breaks.
Do Not Translate Tab
The Do Not Translate Tab may be used to defined specific words, terms, or text segments to be excluded from the translation.
- Words or Terms - This feature lets you exclude single words, terms or portions of a segment from translation. Text captured by regular expressions are converted to markup and thus protected from modification. To obfuscate the original text, type a description. Otherwise the original content is shown when the translator hovers over the markup.