Culter is a library to implement segmentation algorithms:
- A very simple segmenter which supports usual end of phrases (dot, ! or ?) but no rules
- An implementation of the SRX format
- The Culter Segmentation Compatible Format
- The Culter Segmentation Extended Format
The package contains the libraries in Ruby language, which can be used for your own project in the conditions of the EUPL. It also includes some small programs:
- culter : a small script which acts as a filter between standard input and output, considering that each new line is a paragraph;
this script has a verbose mode which tells you which rule is applied in each position, which can be used to understand why a SRX or CS[CE] file gives unexpected results;
- culter-conv : a script to make conversions between SRX and CSCX. As for the library, SRX 1 and 2 are supported, and the script also supports "uncascading" a file, which can be useful to check if the applied rules are in the order you expected.
- ensis : a GUI which enables to test segmentation rules.
About the license: The Ruby code is under EUPL 1.1, like most of our programs. The schemas are under license Creative Commons Attibution-NoDerivatives : feel free to make your own implementation of our formats, but if you plan to make improvements in the schemas, please discuss with us before.