converter.runner.pandas

Module Contents

Classes

PandasGroupWrapper

Base class for the pandas implementation for any and all groups

PandasAnyWrapper

Pandas specific implementation of the any expression

PandasAllWrapper

Pandas specific implementation of the all expression

StrReplace

StrMatch

StrSearch

StrJoin

ConversionError

PandasRunner

Default implementation for a pandas like runner

Functions

get_logger()

logical_and_transformer(row, lhs, rhs)

logical_or_transformer(row, lhs, rhs)

logical_not_transformer(row, value)

in_transformer(row, lhs, rhs)

not_in_transformer(row, lhs, rhs)

type_converter(to_type, nullable, null_values)

converter.runner.pandas.get_logger()
class converter.runner.pandas.PandasGroupWrapper(values)

Bases: converter.transformers.transform.GroupWrapper

Base class for the pandas implementation for any and all groups

in_operator(self, x, y)

Checks the left hand side of the operator is contained in the right hand side

Parameters
  • lhs – The left hand side of the operator

  • rhs – The right hand side of the operator

Returns

True if lhs in rhs, False otherwise

not_in_operator(self, x, y)

Checks the left hand side of the operator is not contained in the right hand side

Parameters
  • lhs – The left hand side of the operator

  • rhs – The right hand side of the operator

Returns

True if lhs not in rhs, False otherwise

class converter.runner.pandas.PandasAnyWrapper(values)

Bases: PandasGroupWrapper

Pandas specific implementation of the any expression

check_fn(self, values)

Checks the results of the operator. This should be a reduction of each result in the values list into a single value.

Parameters

checks – The results from the operator comparison

Returns

The reduced result

class converter.runner.pandas.PandasAllWrapper(values)

Bases: PandasGroupWrapper

Pandas specific implementation of the all expression

check_fn(self, values)

Checks the results of the operator. This should be a reduction of each result in the values list into a single value.

Parameters

checks – The results from the operator comparison

Returns

The reduced result

converter.runner.pandas.logical_and_transformer(row, lhs, rhs)
converter.runner.pandas.logical_or_transformer(row, lhs, rhs)
converter.runner.pandas.logical_not_transformer(row, value)
converter.runner.pandas.in_transformer(row, lhs, rhs)
converter.runner.pandas.not_in_transformer(row, lhs, rhs)
class converter.runner.pandas.StrReplace(series_type)
__call__(self, row: converter.transformers.transform.RowType, target, *pattern_repl)
class converter.runner.pandas.StrMatch(series_type)
__call__(self, row: converter.transformers.transform.RowType, target, pattern: re.Pattern)
class converter.runner.pandas.StrSearch(series_type)
__call__(self, row: converter.transformers.transform.RowType, target, pattern: re.Pattern)
class converter.runner.pandas.StrJoin(series_type)
to_str(self, obj)
concat(self, left, right)
join(self, left, join, right)
__call__(self, row: converter.transformers.transform.RowType, join, *elements)
class converter.runner.pandas.ConversionError(value=None, reason=None)
converter.runner.pandas.type_converter(to_type, nullable, null_values)
class converter.runner.pandas.PandasRunner(config: converter.config.Config, **options)

Bases: converter.runner.base.BaseRunner

Default implementation for a pandas like runner

row_value_conversions
dataframe_type
series_type
coerce_row_types(self, row, conversions: converter.mapping.base.ColumnConversions)

Changes data types of each input column. If a cast fails a warning will be written to the logs and the row will be ignored.

Parameters
  • row – The input row.

  • conversions – The set of conversions to run

Returns

The updated input row if there are no errors, None if any updates fail.

create_series(self, index, value)
get_dataframe(self, extractor: converter.connector.base.BaseConnector) pandas.DataFrame

Builds a dataframe from the extractors data

Parameters

extractor – The extractor providing the input data

Returns

The created dataframe

combine_column(self, row, current_column_value: Union[pandas.Series, converter.types.notset.NotSetType], entry: converter.mapping.base.TransformationEntry)

Combines the current column value with the result of the transformation. If the current value is NotSet the value of the current transformation will be calculated and applied.

Parameters
  • row – The row loaded from the extractor

  • current_column_value – Series representing the current transformed value

  • entry – The transformation to apply

Returns

The combined column value

assign(self, input_row: pandas.DataFrame, output_row: Union[pandas.DataFrame, converter.types.notset.NotSetType], **assignments)

Helper function for assigning a series to a dataframe. Some implementations of pandas are less efficient if we start with an empty dataframe so here we allow for None to be passed and create the initial dataframe from the first assigned series.

Parameters
  • input_row – The row loaded from the extractor

  • output_row – The data frame to assign to or None

  • assignments – The assignments to apply to the dataframe

Returns

The updated dataframe

apply_transformation_entry(self, input_df: pandas.DataFrame, entry: converter.mapping.base.TransformationEntry) Union[pandas.Series, converter.types.notset.NotSetType]

Applies a single transformation to the dataset returning the result as a series.

Parameters
  • input_df – The dataframe loaded from the extractor

  • entry – The transformation to apply

Returns

The transformation result

transform(self, extractor: converter.connector.base.BaseConnector, mapping: converter.mapping.base.BaseMapping) Iterable[Dict[str, Any]]

Performs the transformation

Parameters
  • extractor – The data connection to extract data from

  • mapping – Mapping object describing the transformations to apply

Returns

An iterable containing the transformed data