Frequently asked questions
- General questions
- Cyclotis
- Other tools
- Why are other tools written in Ruby language?
- Are these tools really ready for use?
- Why do you separate short and long term databases?
- Why do you separate local and distant databases?
- What is the difference between Exilis and OmegaT lucene indexes?
- Can Exilis or Elefas store glossaries instead of translation memories?
General questions
What is it?
Silvestris is a project of free/libre/opensource libraries and web applications related to computer assisted translation (CAT) and translation memories (TM). This is not a full CAT interface but a set of libraries and applications which can be used together or independently. These tools can also be used together with some known CAT tool interfaces.
Is it really free/libre/open source?
We want to use as much as possible the European Union Public License (EUPL), actually version 1.1 (read also our comments about EUPL 1.2). Unfortunately, EUPL has some incompatibilities with other licenses, and of course with those of commercial software, which we do not want to exclude from available clients. For this reason some of our projects use other licenses and can be closed source because they are linked to non-public APIs. But they remain available free of charge.
Maybe you may consider that our project is not totally free of charge since there is a paid hosting service. But this service is absolutely not mandatory : if you can install a server yourself, you do not need to use it. If on the contrary it gives you a real help, don't forget that all of this has a cost and that this is the only way we use to finance it.
What does Silvestris, Cyclotis, and other names of the projects, mean?
Silvestris, in latin, is an adjective related to wood (silva). It can also have the meaning of wild, savage. Originally the plan was to create a free web CAT tool, which could be understood as wild CAT, so we made the reference with silvestris felis, the wild version of your familiar cat. When we finally decided to go step by step, creating several smaller tools, we kept the word silvestris for the global project and for other subprojects, we use names from the official international animals taxonomy, in latin. For example, Cyclotis is the reference to african forest elephant, which is not the best known species of elephant but semt to reflect correctly the spirit of a short term translation memory.
How to contact you to participate?
First you can open an account to this web site and participate in the forum. We perfectly know that it was not possible until very recently: the reason is that this web site received a lot of spammer accounts (more than 100 a day!) meaning that we had to temporarily switch off registration until we could find a solution. And this temporarily situation was finally durable. Now we installed an anti-spammer plugin for Drupal, which reduces the problem but does not completely remove it. As a consequence we adopt the following rule: please register only when you plan to use the forum, because if you register and send nothing, your account will be included in the list of spamming accounts which are regularly deleted.
For technical contributions, you may also register to the forge, which has never been closed for registration (it is not affected by the spamming problem). However, this forge does not always send e-mail notifications, so if you receive no answer, consider to use the forum as well.
Is there a link with the European Commission, and with DGT-OmegaT?
The two projects are maintained by the same person, but not in the same conditions. DGT-OmegaT is a project from the European Commission, containing what they decide to put in, while Silvestris is a personal project, designed at home. The fact that Silvestris uses EUPL license has nothing to do with the link between the two projects: this is only the license I prefer.
Another link between the two projects is that what they call TeamBase is nothing more than a customized version of Silvestris Cyclotis. Except for the name, most of what you can read in DGT-OmegaT web site about TeamBase is perfectly valid for Cyclotis - so, instead of repeating the same things in two locations, we invite you to read both web sites. Also DGT-OmegaT already includes the Cyclotis plugin, while OmegaT would need some adaptations (but the code for them is already available). It is probably a good idea that you use DGT-OmegaT to test Cyclotis, before we open an RFE for OmegaT to have the patches included in the original version as well.
Cyclotis
Why not to keep Cyclotis memories once the project is finished?
Cyclotis database is optimized to reply fast, as well for reading or writing, but in the condition that the table contents remains small. According to our experience with years of using, performances start to decrease seriously after around 10 000 segments and are quite unacceptable after around 40 000. If you don't delete your memories, you will certainly reuse them in other projects, they will continue to grow, and as a result, you will after complain about performances. For long-term memories, we would still use Postgresql but with the choice of another table schema organisation. And it would be better to use it in batch mode, that is, send and receive lot of segments in one step rather than trying to answer at real time.
Why not to index Cyclotis memories?
Using the function EXPLAIN PLAN from Postgres, we discovered that indexes were used for SELECT queries only if there were more than approximately 10 000 segments. However, they are filled at any time, meaning that UPDATE or INSERT queries become slower while SELECT queries are not faster. That's one of the reasons why we think that Cyclotis should remain a short-term database (see previous answer) : for a long-term database we would use different optimizations, and indexation is one of them which is only efficient on big databases (in the case of Postgresql).
Another reason why it is difficult to use indexes for Cyclotis is that it makes large use of table inheritance. This kind of inheritance does not apply to indexes: you must put indexes to the child tables really containing data, and if a table has a lot of children, it cannot do better than exploring each table and eventually the index dedicated to this table. Also for this reason indexes are globally inefficient for Cyclotis - while for Elefas, which uses different schemas, the things are really different.
Is it possible to use another server than PostgreSQL?
Actually no, or it would need lot of developments.
Fuzzy matches inside Cyclotis make use of the Postgres package "pg_trgm", for which I do not know any equivalent in MySQL or other open source servers. If you want to port this service to another server, either you find an equivalent or you will have to limit the power, for example using exact searches or retreiving all results. Glossary search uses the module tsearch2 and is not portable for the same reason.
We did sucessfully execute an OmegaT project mode using other databases, since this mode uses only exact search, which is an SQL standard. So, we know that almost for this service, this is possible. But we do not provide a plugin for it, because it is only valid for this mode and we may add in the future some other optimisations, in case portability could be not possible in the future.
Can Cyclotis be integrated with other CAT tools (instead of only OmegaT and Trados)?
Maybe: probably yes for some tools and no for others.
Most Computer Assisted Translation tools are proprietary and don't offer a plugins API, especially for translation memories or direct access to project or editor (while they usually do for Machine Translators). This is probably because most of them also promote their own server solution for translation memory. However, if you think that you are able to implement such a plugin for your favorite CAT tool, please tell us so and we will be proud to give you all necessary information about Cyclotis (like our server protocol) to make it possible to do so. You can find here a short documentation but it is far from being complete.
Other tools
Why are other tools written in Ruby language?
Ruby seems to be the most portable programming language existing: not only because its original implementation is ported to lot of platforms, but it also has alternative implementations for the Java Virtual Machine (JRuby), for the .NET platform (Iron Ruby), for Javascript (Opal)...
The purpose of these libraries is to be used in most common CAT tools, and most CAT GUIs are written either in Java or in .NET: the idea is that it would enable to write plugins which use the Ruby classes rather than rewriting them in the target programming language. Maybe it will be slower but it will ensure perfect compatibility between all platforms, even if they change in the time.
Are these tools really ready for use?
Cyclotis is used by my clients for more than five years, it is clearly mature even if it still can be improved.
Speaking about other tools, only Culter is used enough to be considered as mature. Note that this is valid for command line and libraries, not for the GUI, which is actually beta (using it to test segmentation rules seem to work, but this is not yet a complete editor). Other tools should be considered as experimental, for the moment: don't hesitate to alert about any problem, but please do not consider them as ready for use in production.
Why do you separate short and long term databases?
When you create a database schema, you generally add indexes: this sounds perfectly natural. But in the case of Cyclotis, as a big surprise, we discovered that most often, the SELECT queries did not use the indexes (they do not appear in an EXPLAIN PLAN), even if they exist, until the database grows to a certain limit. So, in this case, indexes have a cost during UPDATE without any gain during SELECT.
Most translation projects are in the just mentioned limit. So we consider that a database which is created for one translation and destroyed immediately after has no reason why to be indexed. Unless we finally receive an explanation about why the indexes were not used...
Note: the possibility to index a Cyclotis memory is available, but not set by default. In the SaaS server, the rule is that memories are not indexed during creation, but they are automatically indexed when you renew a memory (because when you do it, we hope that the memory is now big enough to take profit from the indexes).
Another difference is that Cyclotis makes use of GiST indexes, while Elefas uses GiN indexes. According to PostgreSQL documentation, GiST indexes are better for dynamic searches while GiN are indicated for more static data - and this is precisely, the biggest difference between Cyclotis and Elefas!
Why do you separate local and distant databases?
Even with indexes, a client-server database can have poor performances if hundreeds of users are connected at the same time. That is the reason why Elefas is mostly batch-oriented and has poor support for real-time searches.
Lucene indexes are accessed as files. You can put them in a shared drive (Windows share or NFS for example) and then, even if hundreeds of users are using it, this will not decrease performances because browsing of the index is done by the computer of the user, not the computer hosting the files.
However, Lucene indexes also have limitations. They have, first of all, poor support for deletions: deleted segments are simply marked as such, and must be eliminated programmatically during a search. And more important, you can have hundreeds of reading users, but writing implies complete lock of the file. You cannot have multiple processes writing in same file, even if they are executed by various computers through a shared drive.
Even worse, every lock of a Lucene index has a cost: locking it once to insert 100 segments is better than locking and unlocking it for each segment individually. That is the reason why such indexes would not be adapted for a service like Cyclotis: this cannot be efficiently updated by two people at same time. You could eventually write a server and put it as an intermediate... but then it has no added value compared to actual Cyclotis.
For all these reasons, we find a compromise and advice to do the following: use Cyclotis for onging work, store the results in Elefas when it is finished, and when you want to re-use the data, extract them to TMX or to Exilis. The possibility to directly exchange between Elefas and Exilis is not yet ready, but planned for the future.
What is the difference between Exilis and OmegaT lucene indexes?
Both are based on Apache Lucene, but with different organization of the data. To do a comparison, if it were SQL databases, they would use the same engine but have a distinct schema.
OmegaT indexes are actually strictly bilingual: source is indexed, target is stored. It pre-supposes that the source and target language are those from the project, but does not store languages explicitly. Exilis indexes are TMX-based, they can support multiple languages.
OmegaT indexes use the library which is included in OmegaT: actually most versions (from 3.6 to almost 5.5) are based on Lucene 5.2, a project indexed by OmegaT will be usable only by a tool based on Apache Lucene 5 or later. Exilis was started long time before, for that reason it may propose multiple releases based on various versions of Apache Lucene. Think about it when you choose which version of Exilis you want to use.
Actually OmegaT does not support Exilis but we plan to upgrade the code so that it would accept any Lucene schema provided that the configuration file contains name of internal indexes, meaning that Exilis will become only a special case of Lucene indexes.
Can Exilis or Elefas store glossaries instead of translation memories?
In theory this would be perfectly possible. It would only imply a different way of tokenization before indexation and searching. This is not yet implemented, but considered for the future.