Writable translation memories and one of its implementations

Writable translation memories

XLIFF-based CAT tools usually have a notion of translation memory provider plugin, and for most of them they provide a checkbox to decide whenever you want to write (sometimes said "update") the given provider or not. When later you translate in the editor, each time you validate a segment it will be sent to all providers for which the checkbox was on. What the provider does with it depends on its implementation. Providers can declare themselves as read-only, in which case the checkbox will be grayed.

Cyclotis client has been implemented in Trados as a writable translation memory provider, it seemed natural to do the same for OmegaT ... except that in OmegaT, the notion of translation memory plugin does not exist yet. That is the reason why OmegaT is not yet compliant with Cyclotis, you have to use either DGT-OmegaT or patched OmegaT to use Cyclotis.

As usual, when we propose a new API in OmegaT, we always propose almost two implementations. Cyclotis is one of them. The second is directly in the patches, does not need a server, this is the one we will discuss right now.

Text memory translation provider

To illustrate the new possibility for OmegaT to have real-time writable translation memories, to enable to test them without creating a server, we included in our patches a translation provider which reads and exports the Tab-separated-values / TSV-style text files used by the commercial CAT tool Wordfast, and later, by the open source clone Anaphraseus. Both tools have the particularity to be integrated as a macro in a word processor (Microsoft Word for Wordfast, LibreOffice for Anaphraseus) and to save their translation memories in a text file which is updated in real time: each time you validate a segment, it is saved in the document itself and, at same time in the TSV file.

To do this with OmegaT (well, with one of the patched versions mentioned earlier), create a file in the "tm" folder of your project. The file can have any name with ".properties" extension and contain something like this:

class=org.omegat.core.matching.external.TextFileMemory
file=/path/to/file.txt
# Optional : what will appear in matches pane
name=Example memory

If the TSV file exists it will be used as long as it is in the correct TSV format. If not, it will be created when you load the project.

Warning: do not have the TSV file in the tm folder of your project !!! If you do so, since OmegaT updates it on each segment, it would reload the file again and again at each change! Only the properties file must be in tm folder, and the TSV file must be somewhere OmegaT can find it and has right to write. The TSV file can be inside the project (in the root, in omegat, whatever you want) but not in tm folder.

Addition, February 2023: keep old versions of a segment

Even if you have no interest on Wordfast and Anaphraseus, we realize that ths plugin has also another potential interest.

Contrarily to project_save.tmx, when you save a segment to a TSV file, it is added at the end of the file. This is an advantage of TSV format over TMX: since it does not require to be XML-compliant, you can add lines to the end without corrupting the file, and this is essential when you want to update in real time - to do the same with a TMX you would have to rewrite completely the file on each rewrite (that is probably one of the reasons why project_save.tmx is only saved every 3 minutes).

This also means that if you modify multiple times the same segment, then each new value is added to the file, while in project_save.tmx it would be replaced. Today I saw a discussion in OmegaT's user list where it was asked whenever it would be possible to save older versions of a segment. I immediately thought I already tried to do something like this in Cyclotis ... and now I realize that the TSV file is also an answer to this problem, without requiring a server.

That said, it is only a partial answer to the problem. This TSV format is the one of another CAT tool, and for this reason, it does never save the context (previous/next segment, key, path...). So in the best case it will act as if all translations are default. It is certainly possible to create another similar format which saves the context, but I leave it for the future. Simply have it in mind during your tests, and if you think it is useful, we can speak about it later.