Skip to main content

API - Send text extraction rules

This method is useful in very special or advanced cases only.

This page does not apply to pass through projects where the text extraction takes place, not in Beebox, but in Wordbee Translator.


Depending on the file format (Word, HTML, XML) Beebox uses rules to extract the translatable content. These rules are configured in the project settings page.

In some scenarios, you may need to customize the rules individually per source file. This is the purpose of the present page.

URL

(PUT) /api/files/file?token=&locale=&folder=&filename={source file name}.beebox.filter

Add the XML formatted rules to the request body. Note that the instructions file is named like the source file name + .beebox.filter at the end.

The API method, parameters, and result is identical to that for sending source files.


Rules file name and location

The rules are XML formatted. Let us suppose the source file is folder\document-200.htm. In that case, the rules file must be named folder\document-200.htm.beebox.filter :

Rules file content

The .beebox.filter file must contain a valid configuration, a typical rule set for HTML pages starts like:

CODE
<?xml version="1.0" encoding="utf-8"?>
  <ParserConfiguration xmlns="http://www.wordbee.com/config">
    <Name>Web pages - Example</Name>
    <Description>Excludes URLs from translation</Description>
    <ParserDomain>HTML</ParserDomain>
    <EParser>1</EParser>
    <SegmentationRulesEnabled>true</SegmentationRulesEnabled>
    <SegmentationSplitAtNewlines>false</SegmentationSplitAtNewlines>
    <SegmentationSplitAtInlineTags>true</SegmentationSplitAtInlineTags>
    <VersionPretranslation>CompareTexts</VersionPretranslation>
    <CompactingOption xmlns="">0</CompactingOption>
    <HtmlConfiguration xmlns="http://www.wordbee.com/config/html">
      <IncludeSpaces>false</IncludeSpaces>
      <CompressSpaces>true</CompressSpaces>
      <ConvertEntities>AllToCharacter</ConvertEntities>
...


Note that the root node of the file is <ParserConfiguration>. When downloading rules from Beebox or Wordbee Translator you need to remove the <ParserConfigurations> (plural) parent node.


Read more about rules


API call sequence

Always send .beebox.filter files in this order to the Beebox:

  1. The instructions file (.beebox), if any
  2. The rules (.beebox.filter), if any
  3. The translated file, if any 
  4. The source file

It is important to send the source file at the end or otherwise an automatic Beebox operation might detect and process the source file before you start sending your rules.

CONTENT ENCODING AND CODE PAGES

Serialize XML to UTF-8 when adding to the HTTP request body.




JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.