More information on Transformer Language

The transformer language allows for complex data transformations to be defined in the mapping files by looking up values in the row and applying mathematical operations to them.

Primitive Types

The transformer language supports the following primitive types:

  • booleans - Either True or False to resolve to a true or false result respectively.

  • null - Null is used to specify that a result has no value but has been set by the transformation (this is treated as None by the python code in the background).

  • integers - Any number without a decimal point, can be positive or negative. Valid integers include 123 and -45.

  • floats - Any number with a decimal point or written in e-notation format, can be positive or negative. Valid floats include 12.3, -45.6 and 7.8e9.

  • strings - Any characters withing single quotes ('). If a single quote character is needed in the string it must be escaped using a backtick (`) which must also be escaped with a backtick is required in a string. For example the string must escape ' and ` would be represented as 'must escape `' and ``'. Valid strings include 'foo bar', '' and 'baz`'s bar'.

  • regular expressions - Regular expressions are strings that are preceded by re (or ire for case insensitive regular expressions). For example re'.*' will create a regular expression that matches all characters. The regex rules follow the same rules as the python regex rules which can be found here.

Column Lookup

To act on columns you must lookup their value from the row, this can be done in 2 ways:

  1. If the name of the column only contains alphanumeric character and underscores (_) then simply using the name of the column will get the lookup value. The name must start with an upper or lowecase letter. Valid column names include firstCol and column_2. column 3 would be invalid as it contains a space and 4thColumn would be invalid as it doesn’t start with a letter.

  2. If your columns name is not a valid name then you can use the lookup function which allows you to lookup any valid string. Using the lookup function lookup('column 3') and lookup('4thColumn') are both valid lookups.

Operations

The transformation language supports combining columns with other columns or primitive values by using mathematics operators (+-*/). Operations are performed in the expected order, multiplication and division are applied before addition and subtraction unless operations are surrounded by brackets. For example 1 + foo * 2 will multiple column foo by 2 then add 1 to the result, whereas (1 + foo) * 2 will add 1 to column foo and then multiply the result by 2.

Groupings

Takes a group of values to perform checks against them. Each expression is made from the grouping type, a list and a boolean check eg. <group> [1, 2, 3] is 4.

  • all - Allows to check if all values pass the check. For example all [a, b, c] is 4 will return True if columns a, b and c are all 4.

  • any - Allows to check if any values pass the check. For example any [a, b, c] is 4 will return True if columns a, b or c is 4.

String Manipulations

A set of functions are supplied for manipulating string:

  • join(separator, string1, string2, strings...) - joins all the provided strings with separated by the separator string. If any non strings are passed into the the function they will be converted into strings first.

  • replace(target, pattern, repl, [pattern, repl]...) - replaces pattern in target with repl. Multiple pattern and repl pairs can be supplied and each will be applied in the order they are given. pattern can be either a regular expression or string. If it’s a regular expression groups can be used and they will be available for use in repl.

  • match(target, pattern) - checks if the target string matches the pattern, if they match the result is True otherwise False. pattern may be either a string or regular expression. If a regular expression is used it must match the full string, if a string is used it is the same as value is <pattern>.

  • search(target, pattern) - checks if the pattern string is in the target, if they it is the result is True otherwise False. pattern may be either a string or regular expression.