ODTF¶
Data transformation tool in ODS-tools
On this page:¶
Introduction¶
The transform command can be used to convert data from one format to another (e.g., from the AIR Cede format to OED). It will convert from a specific version of the source format to a specific version of the target format, for example from AIR Cede 10.0.0 to OED 3.0.2.
As of ODS Tools 3.2.3, we support conversions between AIR Cede and OED formats.
Inputs¶
For a transformation to run, the necessary input files should be present. The input files you will need to run a transformation are:
Configuration file
See, e.g., this example configuration file:
transformations:
loc: # Transformation name
input_format:
name: Cede_Location
version: 10.0.0
output_format:
name: OED_Location
version: 3.0.2
runner:
batch_size: 150000 # Number of rows to process in a single batch
extractor:
options:
path: ./cede_location_1000.csv # Path to the input file
quoting: minimal
loader:
options:
path: ./oed_location_1000.csv # Path to the output file
quoting: minimal
The configuration file contains a list of transformations to run (currently loc for location and acc for account data). Each transformation includes name and version of the input and output formats, the (optional) batch size, and the paths to the input (extractor) and output (loader) files.
To connect to a database, the extractor should include the database connection details. For example, see this example configuration file:
transformations:
loc: # Transformation name
input_format:
name: Cede_Location
version: 10.0.0
output_format:
name: OED_Location
version: 3.0.2
runner:
batch_size: 150000 # Number of rows to process in a single batch
extractor:
type: mssql # other options are 'postgres' and 'sqlite'. Assumes a file if not specified
options:
host: localhost
database: AIRExposure_CEDE
port: 1433
user: user
password: password
sql_statement: ./sql/cede_location.sql # Path to the SQL file
loader:
options:
path: ./oed_location_1000.csv # Path to the output file
quoting: minimal
Input data
The input data should be in the format that you want to transform from. For example, if you want to transform data from AIR Cede to OED, the input data should be in the AIR Cede format. File types supported: .csv
Database connections supported: mssql postgres sqlite
SQL statement
If the input data is in a database, the extractor should include the path to an SQL file containing the query to extract (and, if necessary, rename) the data. For example, see this example SQL file.
Mapping file
A mapping file is a file in .yaml format that describes how to run a conversion between the source and target formats and vice versa. Multiple mapping files can be used together to define a mapping between a source and destination format that do not appear in the same mapping file. I.e., A mapping file for model A to B and and a mapping file for Model B to C, can be used to transform data directly from A to C.
Transformations can copy one field into another, substitute field values using a replace function, or include conditional transformation using a where clause. For columns that can contain multiple values (the LocPerils column in AIR Cede which could contain, for example “CF, CH, EQ”), the replace_multiple allows to input a separator used in the cell to split the values. Only the transformations involving columns present in the input file will be run.
For example, see the Cede-OED mapping file
Simplified transformation¶
A simplified version can be used to transform data without the need for a configuration file. The simplified version can be run directly from the command line.
For example, to transform data from AIR Cede to OED, the following command can be used:
ods_tools transform --format air-oed --input-file input.csv --output-file output.csv
The command will run the transformation using the default configuration for the specified formats (to select a specific version, a specific batch size for large transformations, or a database connection a configuration file will be necessary). Currently supported formats are: air-oed, oed-air.
Usage¶
Command line usage..
ods_tools transform [-h] –config-file CONFIG_FILE [-v LOGGING_LEVEL] [–nocheck NOCHECK]
Transform data format to/from OED.
- options:
- -h, --help
show this help message and exit
- --config-file CONFIG_FILE
Path to the config file
- -v LOGGING_LEVEL, --logging-level LOGGING_LEVEL
logging level (debug:10, info:20, warning:30, error:40, critical:50)
- --nocheck NOCHECK
if True, OED file will not be checked after transformation
Validation¶
Validation is performed after a conversion to make sure that the output file is valid with respect to specific rules. A data validation file contains comparisons of various metrics in both the input file and output file. For example, the sum of Total Insured Value grouped by Occupancy Type and Currency. The fields and operations are defined by the user in the validation definition files. The validation definition file is in .yaml format. See, for example, this example validation file.
Note
The ODTF and the transform command are adapted from the Open Data Transformation Framework.