Culter Segmentation Compatible Format

The Culter Segmentation Compatible Format is a specification for a format to specify segmentation rules.

Compatible means that it is strictly convertible to SRX format (you can use culter-conv to produce a SRX file based on CSC), it uses the same algorithm. Features which are not convertible to SRX are moved to another format, named CSE. Compared to SRX, using CSC adds the following advantages:

  • A rule template specification, to avoid to write similar rules where only a part changes;
  • The possibility to put lists (such as abbreviations) in a separate pure-text file.
  • The possibility to split rules in more than one file:
    • Lists (abbreviations, ordinal followers, etc.) can be taken from a text file, and the CSC file contains only the reference;
    • A CSC file can import an other CSC file, then add or replace rules.

This format has two representations, totally equivalent: CSCX, which is an XML format, and CSCY, based on YAML.

You can find a sample and a schema in the package. The program seems stable, however do not consider that the format is fully specified: it may accept new features later.

About the license: The Ruby code is under EUPL 1.1, like most of our programs. The schemas are under license Creative Commons Attibution-NoDerivatives : feel free to make your own implementation of our formats, but if you plan to make improvements in the schemas, please discuss with us before.