    This perl script creates database or tables for Cyclotis database.

INSTALLATION REQUIREMENTS
    This is not a cgi/module : it must be run in command line (or integrated
    in another tool, such as the Web creation HTML interface and web
    services). You can use it in any machine with Perl, DBI and DBD::Pg
    installed, even if the PostgreSQL server is installed elsewhere (but in
    this case you must check that the PostgreSQL server gives creation
    rights to users connected via network sockets).

USAGE
    The general format for the command line is "perl dbcreator.pl [command]
    [database-name] [other parameters]"

  MODES
    First, you must decide in which mode the database will work.

    The mode decides which field Postgres may add to every line in order to
    make the distinction between tables.

    To do this, the best solution is to define CYCLOTIS_MODE environment
    variable. You can also give the mode as first parameter of dbcreator.pl,
    before command name. However, do not forget that this must be decided
    before you create the first table, and using dbcreator.pl on an already
    created database with a different mode may cause errors.

    Existing modes:

    by_void
     Tables contain no discriminant at all, so takes less space on disk.
     This mode is ideal for direct access with your favorite plugin (since
     in this mode each connected client always access to the same table)
     unless you make use of inheritance and want to know, when you retreive
     a result from a parent table, from which child table it comes from.

    by_id::noRules
     Tables contain an integer as discriminant. All lines contain this
     value, so it takes more space on disk.

     This mode is ideal if you make direct connections but with use of
     inheritance : when you do a query on a parent table, data coming from
     child tables are identified by their ID and joining with meta_info you
     can retreive all info such as child table name.

    by_id::withRules
     Tables contain an integer as discriminant. All lines contain this
     value, so it takes more space on disk.

     This mode is ideal if you use the REST server, with a lot of tables:
     thanks to the id, the server can minimize the number of different SQL
     queries it has in RAM. Inheritance with personal tables is supported
     but implies lot of work at database side, so it is not ideal. Note that
     you must use Postgresql 9.2 or later to use this.

    by_path:noLangs
     Tables contain a string as discriminant. All lines contain this value,
     so it takes more space on disk.

     This mode is ideal if you use the REST server, with a frequent use of
     personal table inheritance: when querying a parent table, the path is
     used to make the distinction between entries coming from different
     children, even if you call it from the REST service.

     Path contains only qualified table names.

    by_path:withLangs
     Tables contain a string as discriminant. All lines contain this value,
     so it takes more space on disk.

     Compared to "by_path::noLangs", it adds also languages in the path.
     Useful only if you use inheritance by language (for example table for
     en-fr inherits from en-all)

    by_code
     Tables contain a 64-bit integer as discriminant. You can see it as an
     intermediate between "by_id::withRules" (which takes less space but has
     poor support for inheritance of personal tables) and "by_path" (which
     supports inheritance of personal tables but takes lot of space in the
     disk)

     Note: by_code is an abstract mode, you have to write a subclass
     depending on the tree you want to create. The two following modes are
     samples of by_code subclasses.

    by_code::theme
     In this mode, 3 themes are defined : agriculture, transport and
     technology. Each memory inherits from one of these themes. Each theme
     can contain up to 1 billion memories.

     You can change the list of themes by looking in the code and changing
     @THEMES array. But do it *before* any table creation.

    by_code::lang
     In this mode, the tree is built upon source and target language : any
     new table inherits from MEM_%SRC_%TRA where %SRC and %TRA are replaced
     by source and target language.

     Limited to 5 languages and 1 billion tables per language pair, but you
     can easily change this inside the code (remember that this mode is only
     an example).

     Note : this mode will die if you did not specify source or target
     language.

  Database name
    By default the name you type is considered as the catalog name, and we
    suppose that server is on localhost, port 5432, connecting as current
    user without password.

    If it is no true, you can use the following format:

    *
     a host, separated by '/' (for example, h/base means 'dbname = base',
     host = 'h')

    *
     a port, separated by ':'

     For example:

     ':5433/base' means 'dbname = 'base', port=5433;

     'h:5433/base' means 'dbname = base', host = 'h', port=5433

    *
     a user, separated by @

     For example

     me@base means 'user = me, dbname = base'

     me@h/base means 'user = me, dbname = base, host = h'

    *
     user password, separated from user by :

     me:123@base means 'user = me, password = 123, dbname = base'

    *
     owner of object (database catalog, schema, table) after creation

     me:123#you@base means 'user = me, password = 123, owner = you dbname =
     base'

     me#you@base means 'user = me, owner = you, dbname = base'

     (note: in this case, user and password are used for connection; you do
     not need to know owner's password, if you have creation rights)

  COMMANDS
   perl dbcreator.pl help [command]
    Display help for given command using pod2txt

   perl dbcreator.pl create-db [database name]
    Creates an empty database with only procedures loaded

   perl dbcreator.pl init-db [database name]
    Creates minimal contents for a database. Identical to create-db except
    that the database catalog must already exist and be empty

   perl dbcreator.pl create-schema [database] [schema-name]
    Creates a Postgresql schema (i.e. namespace for tables)

    Since any DBI pool (dancer's, Apache's, etc.) use one connection per
    database, even in the same server, it is more efficient to have only one
    database with all tables accessed by the same server via HTTP (the
    situation is different if you use direct Postgresql access).

    On the other hand, all are accessed by the same Postgresql user, so it
    makes impossible to define privileges so they must be defined at HTTP
    level.

   perl dbcreator.pl create-table [database] [table] [parents]
    Creates the table, and its parents if not yet created.

    [database] uses the same format as previous command.

    Values for parent tables (you can inherit from two tables, separated by
    ','):

    MEM
     (standard, without context or modification data) : by default.

    MEMUP
     (memory with modification author/date) : may be associated to a rule to
     keep one version of a segment

    MEMX
     (memory with properties and note): stores same content as a TMX

    MEMID
     (memory with context): stores an information to distinguish segments
     Context is not an identifier: the identifier is (src,context)

    GLOS
     (with context and note, but no author/date): contains a glossary, for
     terminology

    or
     any already created table

    In case you use MEMX or MEMID, there is also the possibility to
    configure type for properties or context column. The default behaviour
    is to inherit from standard tables, so to keep their type if they
    already exist. If the tables do not exist, the script will use the value
    of environment variables CYCLOTIS_PROPS_TYPE or CYCLOTIS_CONTEXT_TYPE,
    or *text* if the variables are not defined. Note that once the tables
    exist, environment variables are never used again.

    You have also the possibility to add one of the two above parameters :

    props=(json|hstore|text)
     Use another type for properties. Possible values are:

     text (default)
       Properties are stored in a TEXT column. Format is similar to URL.
       This format preserves order and supports multiple properties.

     hstore
       Uses Postgresql's type HSTORE. Loads the extension in the database,
       it not yet existing. This binary format has better performances than
       pure text, but does not preserve order, and does support only one
       value per property.

     json
       Uses Postgresql's tpe JSON. Requires Postgresql 9.2 or more. This
       format preserves order and supports multiple properties (this is
       non-strict JSON value). It is a text format, but more readable than
       the first type.

    tool_info=(json|hstore|text|null)
     Adds a column which will contain data specific to CAT clients. Format
     is the same as for properties, except that default value is null,
     meaning that we do not want to create this column at all.

    context=(text|char\d|varchar\d|int)
     Type for the context field.

     For context, you can use "char" or "varchar", with a number of
     characters. Be sure your client uses the correct size.

     You can also use an int, in which case the context will be reduced to
     its Java hash code.

     Structured types (json or hstore) are not usable because Postgresql
     does not implement "=" operator for them.

    In this case, if the standard tables already exist, and if almost one
    column type does not match parameters, new tables with names suffixed by
    the first letter of type names will be created.

    You can also specify source and target languages, which will be stored
    in meta_info table.

   perl dbcreator.pl create-rule [database] [table] [keyfield] [action] [props|note=...]+
    Creates a rule associated to the given table, saying in which situation
    we consider a segment as duplicate which should be treated as an update
    (instead of an insertion).

    [database] uses the same format as previous command. Table name can
    contain a schema prefix, else we use 'public'.

    Possible keyfields:

    src
     keeps only one version for each source text (i.e. if anybody inserts a
     segment with same source, it will replace the previous one)

    author
     keeps only one version for each source+author pair (i.e. if the same
     author inserts a segment with same source, it will replace the previous
     one)

    context
     keeps only one version for each source+context pair (i.e. if anybody
     inserts a segment with same source and same context, it will replace
     the previous one)

    Possible actions:

    replace-only
     A new line with same value for key fields will replace the old one.
     Deletions are physically applied. Default value.

    replace-null
     A new line with same value for key fields will replace the old one.
     Deletions are equivalent to setting translation to null.

     This option is useful for applications which use 'all' query to build a
     cache (like OmegaT in Project mode) as it enables to retreive deleted
     segments as well.

    version
     Keep both versions of the segment (old and new one), with changedate as
     a discriminant. Deletions are equivalent to affect a changedate without
     creating a new version.

    Next parameters indicate what to do with properties or notes in case the
    rule implies to update a line instead of insertion.

    Default is to replace the properties/notes as well, so what the user
    will read is the very last inserted properties and notes. Alternatives
    include:

    concat
     Only if the column is of type TEXT or VARCHAR, the new value is the
     concatenation of old and new value.

    merge
     Only if the value is of type HSTORE, the new value is a key-per-key
     replacement. Values which are not in the new entry are kept as is.

    keep
     Do not replace the value, keep the old one.

   perl dbcreator.pl create-function [database] [function-name]
    Creates one of the standard Cyclotis SQL functions. As they are not
    necessary all the time they are not created by default.

    Values for parent table:

    tokenize
     cut string by words, and transform them to stems (without grammatical
     suffixes)

    score
     alternative score based on levenshtein distance

   perl dbcreator.pl create-index [database] [table] [idxname] [lang]
    Creates an index for the given table

    Loads GIST extension if necessary

    Values for idxname:

    fuzzy
     trigram GIST index in the source

    concSrc
     stemming index of the source

    concTra
     stemming index of the translation

    operations indicate if you want to index not the field itself but a
    calculated value. Possibilities include (concatenable using ',')

    untag
     Removes tags in text; next parameter can indicate which regular
     expression you consider as a tag; defaults to <.+?>

    tokenize
     Postgresql will replace string by succession of stems (terms without
     grammatical suffixes)

    lang parameter used only for concordance indexes: it contains language
    name in english; by default the value is 'simple'.

   perl dbcreator.pl create-dict [database] [langName] [dict-file]? [affix-file]? [stop-words]?
    Loads an ispell dictionnary

    The files must already be in tsearch_data directory from Postgres.

   perl dbcreator.pl create-text-search-config [database] [langName] [origin] [token=dictionnaries]+
    Create a text search configuration, usable in concordance queries

   perl dbcreator.pl exec-script [database] [script file]
    The file is a succession of commands, similar to the previous ones,
    except that we do not repeat database name. create-db is not scriptable.

    Sample script:

        create-schema mySchema
        create-table mySchema.table1 memx
        create-rule mySchema.table1 author replace-only
        create-index mySchema.table1 date
        create-index mySchema.table1 concTra fr
        create-table mySchema.table2 memup

License
    Copyright 2013-2015 Silvestris Project (http://www.silvestris-lab.org/)

    Licensed under the EUPL, Version 1.1 or – as soon they will be approved
    by the European Commission - subsequent versions of the EUPL (the
    "Licence"); You may not use this work except in compliance with the
    Licence. You may obtain a copy of the Licence at:
    http://ec.europa.eu/idabc/eupl

    Unless required by applicable law or agreed to in writing, software
    distributed under the Licence is distributed on an "AS IS" basis,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the Licence for the specific language governing permissions and
    limitations under the Licence.

