New sample: Projects, Jobs, Resources
Creates a new random sample for a given segments scope: A project, a job, a resource etc. Use cases:
- Evaluate quality of translations done in a specific project
- Evaluate quality of a specific job
- Evaluate quality of a translation memory
- Evaluate quality of any work done in any project over a period of time
URL
(POST) /resources/segments/sampling/new
PARAMETERS
The parameters are a JSON object included in the request body:
type | Value must be: Scope | Mandatory, string |
scope | The scope object. The sample will be taken from the segments within this scope (project, job, resource...). Use these methods to find or enumerate: resources, jobs and projects. You will need the respective IDs to create your scope. | Mandatory, object |
layout | Optionally specify the segments' fields to include in the results. This is done using a layout JSON object. If not specified, the system will include:
| Optional, object |
src | The source locale (language code). | Mandatory, string |
trg | The target locale (language code). | Mandatory, string |
size | The expected sample size. Default is 10. This must be a value between 1 and 50. | Optional, int? |
persist | Optional boolean. Default is false. Only set to true if required. If true, then the results are temporarily saved and assigned a token (see sampletoken in results). | Optional, bool? |
includeresults | Optional boolean. Default is true. If true then the returned JSON includes the result node. Otherwise only the summary statistics are returned. If you further process results using the sampletoken you may not need the results with this call. | Optional, bool? |
You can further fine tune the sample with these additional parameters:
Filter options | ||
editorInitial | Optional filter on the initial translation done. Values are:
| Optional, string? |
editorCurrent | Optional filter on the current translation. Values are:
| Optional, string? |
dteditfrom | Optional filter on the date of last translation edit. If set, the sample will include translations edited at or after this date only. | Optional, datetime? |
Scoring options | ||
boostWordsMin boostWordsMax | This option lets you express a preferred word count of the segments to retain. The sample will then contain segments with similar word count at a higher probability than segments with less or more words (of the source text - not the translated text!).
Explanation: If min is 10 and max is 15, the system will sample more segments with words in the range than other segments. Mathematically, the decrease of probability below min and above max is a Gaussian whereby the probability drops to below 0.2 beyond a certain range beyond the limits (between 3 words and twice the range width). | Optional, int? |
RESULTS
A JSON with these properties:
samples | An array of samples. The present method produces a single sample, so there is always exactly 1 element in the array. See table below for properties. | int |
sampletoken | If assigntoken was set to true, then this field is a token. It is required to push the sample into a QA evaluation workflow (see related API methods). | string? |
Each samples array element has these properties:
segments | Total segments in sample. Note that this number will be less than the expected sample count if there is no or not enough data or the filter is too restrictive. | int |
words | Total source text words in sample. | int |
src | The source language of the sample | string |
trg | The target language of the sample | string |
result | Contains all the segments in the sample, information on the resources to which the segments belong as well as worker names.
| object[] |
result.rows | The list of segments. Includes main segment properties as well as the data columns specified in the layout parameter. The format is explained further down in this page. | object[] |
result.docs | A dictionary with all documents that appear in the results. This permits to show document names and more information per segment (see the did property of a segment). The format is explained further down in this page. | object |
result.users | A dictionary with all users/persons that are referenced by the segments included with the results. A segment references the persons that have last changed a text, a status, a bookmark etc. The format is explained further down in this page. | object |
columns | An array with the columns in the result.rows property. Each array element describes one column, see here: Spreadsheet Column (Object) | object[] |
ACCESS RIGHTS
The user must be authorized to access the scope.
EXAMPLE
Request a sample of 1 random segment from a specific resource (such as project memory, translation memory or termbase) for German to English. To sample a project or job, use a different scope: Scope (Object).
We do not specify other optional parameters such as the layout. If the latter is not set, the system returns by default the columns for source text, translation, comments and translation revisions.
POST /resources/segments/randomsample/new
BODY:
{
"type": "Scope",
"scope": {
"type": "DocumentSet",
"dsid": 1863
},
"src": "de",
"trg": "en",
"size": 1
}
The result is:
{
"samples": [
{
"segments": 1,
"words": 5,
"src": "de",
"trg": "en",
"result": {
"rows": [
{
"no": "1",
"sid": 4840837,
"did": 7439,
"dsid": 1863,
"cty": 1,
"sdid": null,
"bsid": 1,
"bssid": 0,
"edit": true,
"tags": null,
"tmx": [],
"ctx": "p",
"ctx_edit": true,
"chmin": null,
"chmax": null,
"ch_edit": true,
"lbls": [],
"lbls_edit": true,
"cfs": [],
"cfs_edit": true,
"cols": {
"_0": {
"column": 0,
"txt": {
"val": "Hallo Welt, wie bist Du",
"st": 0,
"bk": 0,
"tsk": null,
"loc": "de",
"cmc": 0,
"ed": 0,
"usid": null,
"usdt": "2018-07-11T08:15:44.7995228Z",
"hh": false,
"sim": 0,
"err": null,
"lck": false,
"lck_edit": true,
"hn": 856510075,
"hp": 0,
"cfs": [],
"cfs_edit": true,
"lbls": [],
"lbls_edit": true,
"usfid": null,
"usfdt": null,
"tmx": []
},
"txt_edit": true
},
"_1": {
"column": 1,
"txt": {
"val": "Hello world how are you",
"st": 0,
"bk": 0,
"tsk": null,
"loc": "en",
"cmc": 0,
"ed": 1,
"usid": 187,
"usdt": "2018-07-11T08:19:35.9095277Z",
"hh": false,
"sim": 0,
"err": null,
"lck": false,
"lck_edit": true,
"hn": null,
"hp": null,
"cfs": [],
"cfs_edit": true,
"lbls": [],
"lbls_edit": true,
"usfid": null,
"usfdt": null,
"tmx": []
},
"txt_edit": true
},
"_2": {
"column": 2,
"revs": [
{
"ty": "text",
"current": true,
"val": "t1",
"tsk": null,
"ed": 1,
"dt": "2018-07-11T08:19:35.9095277Z",
"uid": 187,
"loc": "en",
"mk": null
}
],
"revs_edit": false
},
"_3": {
"column": 3,
"cms": [],
"cm_edit": false
},
"_4": {
"column": 4,
"cms": [],
"cm_edit": false
}
}
}
],
"docs": {
"_7439": {
"did": 7439,
"dsid": 1863,
"name": "sample.html",
"pmax": null,
"pmin": null,
"ptype": 1,
"pdomain": "HTML",
"previewapp": null,
"previewurl": null,
"edit": true,
"ctags": [
"[b]",
"[/b]",
"[i]",
"[/i]",
"[u]",
"[/u]",
"[s]",
"[/s]",
"[sup]",
"[/sup]",
"[sub]",
"[/sub]",
"[nbsp]/"
],
"sub": []
}
},
"users": {
"_187": {
"id": 187,
"nm": "Böhmig Stephan",
"cid": 1,
"cnm": "Pons"
}
}
},
"columns": [
{
"index": 0,
"fkey": "1~de~0",
"fkeyLayout": "1~de~0",
"ftype": 1,
"fqualifier": 0,
"name": "Allemand",
"loc": "de",
"loc_rtl": false,
"loc_cmplx": false,
"loc_ea": false
},
{
"index": 1,
"fkey": "1~en~0",
"fkeyLayout": "1~en~0",
"ftype": 1,
"fqualifier": 0,
"name": "Anglais",
"loc": "en",
"loc_rtl": false,
"loc_cmplx": false,
"loc_ea": false
},
{
"index": 2,
"fkey": "12~en~0",
"fkeyLayout": "12~en~0",
"ftype": 12,
"fqualifier": 0,
"name": "Revisions - Anglais",
"loc": "en",
"loc_rtl": false,
"loc_cmplx": false,
"loc_ea": false
},
{
"index": 3,
"fkey": "9~de~0",
"fkeyLayout": "9~de~0",
"ftype": 9,
"fqualifier": 0,
"name": "Comments - Allemand",
"loc": "de",
"loc_rtl": false,
"loc_cmplx": false,
"loc_ea": false
},
{
"index": 4,
"fkey": "9~en~0",
"fkeyLayout": "9~en~0",
"ftype": 9,
"fqualifier": 0,
"name": "Comments - Anglais",
"loc": "en",
"loc_rtl": false,
"loc_cmplx": false,
"loc_ea": false
}
]
}
]
}