When setting up a file format configuration a CSV (Comma Seperated Values) file, there are many options to choose from to ensure the translation is successful. This page will explain the most common options for CSV files.
The following file extensions are supported when setting up file format configurations for CSV files: .csv.
Please click on a section to see specific information regarding a configuration option:
To learn more about working with file format configurations, please see the following pages:
- View file format configurations
- Modify file format configurations
- Create file format configurations
- Test and validate file format configurations
The General Tab contains options for choosing what type of content will be translated and what portions of the CSV file will be extracted. For example, a custom configuration would be helpful if you desire to only extract certain columns or rows for monolingual content or when the translation will be for a multilingual content.
- Mono or Multilingual Content - The default CSV file format configuration is set up to translate monolingual content. This option may be used to create a custom configuration with specific extraction rules for monolingual content or to translate multilingual files. A custom configuration is necessary to translate multilingual CSV files.
- Encoding - The default CSV file format configurtion uses UTF-8 file encoding; however, this portion of the configuration may be altered to use a different type of encoding (ISO, Windows, Macintosh, ASCII, etc.). Additionally, if you do not know the encoding used for the file, an option is provided to check it for proper configuration.
- Columns - These options may be used to configure the column separater (tab, comma, semi-colon) and to configure specific columns to extract from the CSV file.
- Rows - This option may be used to configure what row the translation will start on.
- HTML Content - This option should be enabled if the CSV contains HTML content. Unless you have created a custom HTML configuration, Wordbee Translator will use the default configuration to complete the tranlsation.
- Text Segmentation - Enable/Disable SRX rules for text segmentation and choose to spilt or not split text at line breaks. By default, the system segments the document by cell, this means that if there are three sentences within a cell, they will be considered one segment. If SRX rules are enabled for text segmentation, the sentences will be segmented at each punctuation mark or line break (as defined by the SRX rules). A default set of SRX rules is defined for CSV files; however, these may be customized if needed. Additionally, the default or a customized set of SRX might have splitting text at line breaks disabled. In this instance, you can enable "Always Split Text at Line Breaks" to ensure this portion of segmentation is handled correctly.
Do Not Translate Tab
The Do Not Translate Tab contains options for configuring what will not be extracted for translation within the source file.
- Segments - Enter certain texts or regular expressions for Wordbee Translator to locate and exclude from the translation. Any text that does not match entered texts or patterns is automatically considered by the system to be translatable.
Regular expressions may be entered in the system to protect entire segments or just terms with the file. These segments or terms will not be extracted for translation and be taken into account during the wordcount step. A good example, is entering terms or regular expressions to protect brand names or confidential content like software codes.
Whitespaces & Symbols Tab
The Whitespaces & Symbols Tab may be used to hide whitespaces or symbols that exist within the source file.
- Do Not Show Leading and Trailing Whitespaces - If enabled, leading and trailing whitespaces within the source file will not be shown in the translation.
- Do Not Show Texts Containing Neither Letters nor Digits - If enabled, text (symbols) that do not consist of letters or digits will be hiddent from view in the the translation.