Skip to main content

Terminology Extraction

Let us now take a look at the Terminology Extraction module with some basic scenarios.

When to use the terminology extraction tool?

Extract terminology from your working files. Step by step guide

Cases:
- Prepare your project before translation starts. Extract terminology from existing files that have been handed over for translation. These files can be monolingual or multilingual.
- Get the most out of your translation. Run the terminology extractor after the project is completed to document the most prominent terms

As a project manager, you can use term extraction to get term candidates detected in your files and create a list of what you consider should be the terminology for your project. This whole process is custom to your needs, so follow the steps below and adapt when required.

1. Run the extraction

Run the extraction as explained in the getting started guide. The process is summarized in the video below

2. Narrow down the candidates

Once the results are ready in your collection, narrow down the list of candidates by using the filters.

  • Use the preview and the statistics to get a rough idea of where to define your threshold and decide what you will focus on.

  • Remove filtered terms if you need to get rid of some of the raw data.

3. Prepare and validate the terminology offline

Export* your collection into Excel for further analysis and clean-up. Validate the results of your work with your stakeholders.
Use the option “Sort by lemma” in case you want to show terms candidates that relate to the same glossed word or phrase. In English, for example, run, runs and running are forms of the same lexeme, but run is the lemma.

*if you Export the collection in TBX you can import the clean results as a Terminology database directly into the related section of the system. See the Learn more section for more details

4. Work offline and import back your work into the collection

While doing the clean-up, you may want to document several terms that are related together, especially if they are referring to the same concept. Use Excel as your working template:

  • Term columns are “Source” and “Target”

  • Lemma columns show the base form for the term. This column can also be used to index terms into the same concept.
    For example:

    • World Health Organization

    • WHO
      are both terms that refer to the UN agency that promotes health.  

  • Example columns show a sample text with the term expression

Here a sample Excel with two related terms

Import back your file only the terms that are worth documenting. The terms you keep in your collection will be the ones used for creating a database once you decide the work for each language pair is done

5. Create a new termbase

Focus on one of the languages of your collection and create a database with your work. This operation can be done:

  • for the source language on its own

  • for each target language.

As a result, you can create monolingual or bilingual terminology databases with your terms and their contextual examples for later reference. These are stored in standard TBX fields.
Use the option “Aggregate terms with identical lemma” in case you want to create concept-oriented database, in which terms with the same lemma will be part of a single concept.

In our example above, the related terms will be included into the same concept.

6. Attach it to your project(s)

Add the new termbase to your project so it can be referenced during the translation process. Activate the QA flag in the project if you want to flag inconsistencies when QA checks are done.

Use the option “Copy terms to another collection” to merge the terminology work done in one or several collections. The data for that language will be aggregated into another existing collection or you can create a new one as you go.


When to create a new term collection?

Create the right terminology framework for your project. Step by step guide

Cases:
- You already have good terminology in one language and you want to translate that into other languages.
- You will start all terminology work from scratch.

This case is slightly different than the above. Teams can decide to start terminology work from scratch with no term extraction required. Use a new term collection term to get a basic Excel template that will serve to document the work to be done for each of the languages required. This is how the process will look like:

Access the term extraction module in Resources > Term extraction:

  1. Create a new collection by clicking add new in the Term extraction list. Enter at least two language(s) you will be documenting.

  2. Use the language selector to focus on a given language.

    1. Get your template for offline work by clicking Export excel.
      Depending on the language you chose, this template can be monolingual (source language) or bilingual (target)

  3. Import the Excel back once you are done with the work for that language.
    Repeat this process with any of the languages you wish to include in your terminology database.

  4. Use the filters and preview options available to see how the work is progressing.

  5. Create a database with the languages of your collection. This operation can be done:

    1. for the source language on its own

    2. for each target language.
      As a result:
      - you can create monolingual or bilingual terminology databases with your terms and their contextual examples for later reference. These are stored in standard TBX fields.

  6. Add the new termbase to your project so it can be referenced during the translation process. Activate the QA flag in the project if you want to flag inconsistencies when QA checks are done.


JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.