Document Translation
In order to translate a document with the Lilt API, you need to do the following steps:
- Upload a file to a project (currently, for API upload we only support xliff, html, xlsx, csv)
# Python; Note: This is just a stub def upload_document(filename, projid): payload = {"key": lilt_api_key} jsonData = {"name": fileName, "project_id": projid} headers = { "LILT-API": json.dumps(jsonData), "Content-Type": "application/octet-stream" } with open(fileName, 'r') as fp: rawData = fp.read() res = requests.post(lilt_api_url + "/documents/files", params=payload, data=rawData, headers=headers, verify=False) return res.json()["id"]
- Retrieve segment source and ids
# Python; Note: This is just a stub def get_document(docid): segments = [] payload = {"key": lilt_api_key, "id": docid} res = requests.get(lilt_api_url + "/documents", params=payload, verify=False) for seg in res.json()["segments"]: segments.append(seg) return segments
- Translate each source segment
# Python; Note: This is just a stub def translate(source, memid): payload = {"key": lilt_api_key, "memory_id": memid, "source": source, "tm_matches": "false"} res = requests.get(lilt_api_url + "/translate", params=payload, verify=False) return res.json()[0]
- Update each segment with the translation
# Python; Note: This is just a stub def update_segment(segid, target): payload = {"key": lilt_api_key} jsondata = {"id": segid, "target": target} res = requests.put(lilt_api_url + "/segments", params=payload, data=jsondata, verify=False)
- Download the file
# Python; Note: This is just a stub def download_document(fileName, docid): payload = {"key": lilt_api_key, "id": docid} res = requests.get(lilt_api_url + "/documents/files", params=payload, verify=False) with open(fileName, 'wb') as fp: fp.write(res.content)
To adapt your machine translation engine and update the TM in your Memory you should also add the new segments with their final correct translation (after verification/processing by a translator or bilingual) to the Memory.
Segmentation
Lilt performs sentence segmentation on source segments.
To bypass the segmentation, add a <seg-source>
element which indicates a segmented source, and corresponding <mrk>
markers inside the segment to specify the segment boundaries.
Note that you have to put both, <source>
and <seg-source>
, even if ultimately the latter overrides the former.
For example:
<trans-unit>
<source>Segment this source. Any content here will be imported as multiple segments in Lilt. Try it out!</source>
<seg-source><mrk mtype="seg">Do not segment this source. Any content here will be imported as one segment in Lilt. Try it out!</mrk></seg-source>
<target />
</trans-unit>
To learn more, see the full API reference.