oasislmf.lookup.builtin¶

Module for the built-in Lookup Class

in the future we may want to improve on the management of files used to generate the keys tutorial for pandas and parquet https://towardsdatascience.com/a-gentle-introduction-to-apache-arrow-with-apache-spark-and-pandas-bb19ffe0ddae

Attributes¶

`BallTree`
`gdal`
`OPT_INSTALL_MESSAGE`
`X_COORDINATE`
`W_E_PIXEL_RESOLUTION`
`ROW_ROTATION`
`Y_COORDINATE`
`COLUMN_ROTATION`
`N_S_PIXEL_RESOLUTION`
`key_columns`

Classes¶

`PerilCoveredDeterministicLookup`	Basic abstract class for KeyLookup
`Lookup`	Built-in Lookup class that implement the OasisLookupInterface

Functions¶

`get_nearest`(src_points, candidates[, k_neighbors])	Find nearest neighbors for all source points from a set of candidate points
`nearest_neighbor`(left_gdf, right_gdf[, return_dist])	For each point in left_gdf, find closest point in right GeoDataFrame and return them.
`jit_gda_loc_to_val`(ds_array, inv_gt, x_array, y_array, ...)
`z_index`(x, y)	Returns the Z-order index of cell (x,y) in a grid
`undo_z_index`(z)	Returns the (x,y) coordinates of a z-order index in a grid
`z_index_to_normal`(index, size_across)	Converts from z-indexing to linear ordering
`normal_to_z_index`(index, size_across)	Converts from linear ordering to z-indexing
`create_lat_lon_id_functions`(lat_min, lat_max, lon_min, ...)	Returns a function to give grid co-ordinates of a location
`jit_geo_grid_lookup`(lat, lon, lat_min, lat_max, ...)	Returns an array of area peril IDs for all lats given
`get_step`(grid)	Returns the grid size using the max and min long and latitude and arc size

Module Contents¶

oasislmf.lookup.builtin.BallTree = None[source]¶

oasislmf.lookup.builtin.gdal = None[source]¶

oasislmf.lookup.builtin.OPT_INSTALL_MESSAGE = "install oasislmf with extra packages by running 'pip install oasislmf[extra]'"[source]¶

oasislmf.lookup.builtin.get_nearest(src_points, candidates, k_neighbors=1)[source]¶: Find nearest neighbors for all source points from a set of candidate points

oasislmf.lookup.builtin.nearest_neighbor(left_gdf, right_gdf, return_dist=False)[source]¶

For each point in left_gdf, find closest point in right GeoDataFrame and return them.

NOTICE: Assumes that the input Points are in WGS84 projection (lat/lon).

oasislmf.lookup.builtin.X_COORDINATE = 0[source]¶

oasislmf.lookup.builtin.W_E_PIXEL_RESOLUTION = 1[source]¶

oasislmf.lookup.builtin.ROW_ROTATION = 2[source]¶

oasislmf.lookup.builtin.Y_COORDINATE = 3[source]¶

oasislmf.lookup.builtin.COLUMN_ROTATION = 4[source]¶

oasislmf.lookup.builtin.N_S_PIXEL_RESOLUTION = 5[source]¶

oasislmf.lookup.builtin.jit_gda_loc_to_val(ds_array, inv_gt, x_array, y_array, useful_array_idx, defaults, res)[source]¶

oasislmf.lookup.builtin.z_index(x, y)[source]¶: Returns the Z-order index of cell (x,y) in a grid

oasislmf.lookup.builtin.undo_z_index(z)[source]¶: Returns the (x,y) coordinates of a z-order index in a grid

oasislmf.lookup.builtin.z_index_to_normal(index, size_across)[source]¶: Converts from z-indexing to linear ordering

oasislmf.lookup.builtin.normal_to_z_index(index, size_across)[source]¶: Converts from linear ordering to z-indexing

oasislmf.lookup.builtin.create_lat_lon_id_functions(lat_min, lat_max, lon_min, lon_max, arc_size, lat_reverse, lon_reverse)[source]¶: Returns a function to give grid co-ordinates of a location

oasislmf.lookup.builtin.jit_geo_grid_lookup(lat, lon, lat_min, lat_max, lon_min, lon_max, compute_id, lat_id, lon_id)[source]¶: Returns an array of area peril IDs for all lats given

oasislmf.lookup.builtin.get_step(grid)[source]¶: Returns the grid size using the max and min long and latitude and arc size

oasislmf.lookup.builtin.key_columns = ['loc_id', 'peril_id', 'coverage_type', 'area_peril_id', 'vulnerability_id', 'status', 'message'][source]¶

class oasislmf.lookup.builtin.PerilCoveredDeterministicLookup(config, config_dir=None, user_data_dir=None, output_dir=None)[source]¶

Bases: oasislmf.lookup.base.AbstractBasicKeyLookup

Basic abstract class for KeyLookup

multiproc_enabled = False[source]¶

process_locations(locations)[source]¶: Process location rows - passed in as a pandas dataframe. Results can be list, tuple, generator or a pandas dataframe.

class oasislmf.lookup.builtin.Lookup(config, config_dir=None, user_data_dir=None, output_dir=None)[source]¶

Bases: oasislmf.lookup.base.AbstractBasicKeyLookup, oasislmf.lookup.base.MultiprocLookupMixin

Built-in Lookup class that implement the OasisLookupInterface The aim of this class is to provide a data driven lookup capability that will be both flexible and efficient.

it provide several generic function factory that can be define in the config under the “step_definition” key (ex:) “step_definition”: {

“split_loc_perils_covered”:{
“type”: “split_loc_perils_covered” , “columns”: [“locperilscovered”], “parameters”: {

“model_perils_covered”: [“WTC”, “WSS”]

}

}, “vulnerability”: {

“type”: “merge”, “columns”: [“peril_id”, “coverage_type”, “occupancycode”], “parameters”: {“file_path”: “%%KEYS_DATA_PATH%%/vulnerability_dict.csv”,

“id_columns”: [“vulnerability_id”]

}

}

} mapper key: is called the step_name,

it will be added the the lookup object method once the function has been built it can take any value but make sure it doesn’t collide with already existing method

type: define the function factory to call.: in the class for type <fct_type> the function factory called will be build_<fct_type> ex: “type”: “merge” => build_merge
columns: are the column required to be able to apply the step.: those are quite important as any column (except ‘loc_id’) from the original Locations Dataframe that is not in any step will be drop to reduce memory consumption

parameters: the parameter passed the the function factory.

Once all the functions have been defined, the order in which they must be applied is defined in the config under the “strategy” key (ex:)

“strategy”: [“split_loc_perils_covered”, “vulnerability”]

It is totally possible to subclass Lookup in order to create your custom step or function factory for custom step:

add your function definition to the “mapper”with no parameters

“my_custom_step”: {: “type”: “custom_type” , “columns”: […],

} simply add it to your “strategy”: [“split_loc_perils_covered”, “vulnerability”, “my_custom_step”] and code the function in your subclass class MyLookup(Lookup):

@staticmethod def my_custom_step(locations):

<do something on locations> return modified_locations

for function factory: add your function definition to the “step_definition” with the required parameters “my_custom_step”: {

“type”: “custom_type” , “columns”: […], “parameters”: {

“param1”: “value1”

}

} add your step to “strategy”: [“split_loc_perils_covered”, “vulnerability”, “my_custom_step”] and code the function factory in your subclass class MyLookup(Lookup):

def build_custom_type(self, param1):

def fct(locations):
<do something on locations that depend on param1> return modified_locations

return fct

interface_version = '1'[source]¶

set_step_function(step_name, step_config, function_being_set=None)[source]¶

set the step as a function of the lookup object if it’s not already done and return it. if the step is composed of several child steps, it will set the child steps recursively.

Args:: step_name (str): name of the strategy for this step step_config (dict): config of the strategy for this step function_being_set (set, None): set of all the strategy that are parent of this step
Returns:: function: function corresponding this step

process_locations(locations)[source]¶: Process location rows - passed in as a pandas dataframe. Results can be list, tuple, generator or a pandas dataframe.

to_abs_filepath(filepath)[source]¶

replace placeholder r’%%(.+?)%%’ (ex: %%KEYS_DATA_PATH%%) with the path set in self.config Args:

filepath (str): filepath with potentially a placeholder

Returns:: str: filepath where placeholder are replace their actual value.

static set_id_columns(df, id_columns)[source]¶: in Dataframes, only float column can have nan values. So after a left join for example if you have nan values that will change the type of the original column into float. this function replace the nan value with the OASIS_UNKNOWN_ID and reset the column type to int

build_interval_to_index(value_column_name, sorted_array, index_column_name=None, side='left')[source]¶

Allow to map a value column to an index according to it’s index in the interval defined by sorted_array. nan value are kept as nan Args:

value_column_name: name of the column to map sorted_array: sorted value that define the interval to map to index_column_name: name of the output column side: define what index is returned (left or right) in case of equality with one of the interval boundary

Returns:: function: return the mapping function

static build_combine(id_columns, strategy, logical_type='or')[source]¶

build a function that will combine several strategy trying to achieve the same purpose by different mean into one. for example, finding the correct area_peril_id for a location with one method using (latitude, longitude) and one using postcode. each strategy will be applied sequentially on the location that steal have OASIS_UNKNOWN_ID in their id_columns after the precedent strategy

‘or’ example: (note: “id_columns” is a list)

“vulnerability”:{

“type”: “combine”, “parameters”: {

“id_columns”: [“vulnerability_id”], “strategy”: [“vuln_cov_Building_Content”, “vuln_cov_car”] “logical_type”: “or”

}

‘and’ example: (note: that “id_columns” is a list of list)

“vuln_cov_car”:{

“type”: “combine”, “columns”: [“autocode”], “parameters”: {

“id_columns”: [[“vuln_id_car”], [“vulnerability_id”]], “strategy”: [“vulnerability_car”, “coverage_type_car”], “logical_type”: “and”

}

Args:

id_columns (list): columns that will be checked to determine if a strategy has succeeded strategy (list): list of strategy to apply logical_type: if ‘or’ apply the next strategy only on invalid id_columns

if ‘and’ apply the next strategy only on valid id_columns
id_columns needs to be a list of list of columns that each sublist is checked sequentially

Returns:

function: function combining all strategies

static build_split_loc_perils_covered(model_perils_covered=None)[source]¶

split the value of LocPerilsCovered into multiple line, taking peril group into account drop all line that are not in the list model_perils_covered

usefull inspirational code: https://stackoverflow.com/questions/17116814/pandas-how-do-i-split-text-in-a-column-into-multiple-rows

static build_prepare(**kwargs)[source]¶: Prepare the dataframe by setting default, min and max values and type support several simple DataFrame preparation:

default: create the column if missing and replace the nan value with the default value max: truncate the values in a column to the specified max min: truncate the values in a column to the specified min type: convert the type of the column to the specified numpy dtype

Note that we use the string representation of numpy dtype available at https://numpy.org/doc/stable/reference/arrays.dtypes.html#arrays-dtypes-constructing

build_rtree(file_path, file_type, id_columns, area_peril_read_params=None, nearest_neighbor_min_distance=-1)[source]¶

Function Factory to associate location to area_peril based on the rtree method

!!! please note that this method is quite time consuming (specialy if you use the nearest point option if your peril_area are square you should use area_peril function fixed_size_geo_grid !!!

file_path: is the path to the file containing the area_peril_dictionary.: this file must be a geopandas Dataframe with a valid geometry. an example on how to create such dataframe is available in PiWind if you are new to geo data (in python) and want to learn more, you may have a look at this excellent course: https://automating-gis-processes.github.io/site/index.html
file_type: can be any format readable by geopandas (‘file’, ‘parquet’, …): see: https://geopandas.readthedocs.io/en/latest/docs/reference/io.html you may have to install additional library such as pyarrow for parquet

id_columns: column to transform to an ‘id_column’ (type int32 with nan replace by -1)

nearest_neighbor_min_distance: option to compute the nearest point if intersection method fails: we use: https://automating-gis-processes.github.io/site/notebooks/L3/nearest-neighbor-faster.html but alternatives can be found here: https://gis.stackexchange.com/questions/222315/geopandas-find-nearest-point-in-other-dataframe

static build_fixed_size_geo_grid_multi_peril(perils_dict)[source]¶

Create multiple grids of varying resolution, one per peril, and associate an id to each square of the grid using the fixed_size_geo_grid method.

Parameters¶

perils_dict: dict: Dictionary with peril_id as key and fixed_size_geo_grid parameter dict as value. i.e {‘peril_id’ : {fixed_size_geo_grid parameters}}

static build_fixed_size_geo_grid(lat_min, lat_max, lon_min, lon_max, arc_size, lat_reverse=False, lon_reverse=False, lon_first=False)[source]¶: associate an id to each square of the grid define by the limit of lat and lon reverse allow to change the ordering of id from (min to max) to (max to min)

static build_fixed_size_z_index_geo_grid_multi_peril(perils_dict)[source]¶

Create multiple grids of varying resolution, one per peril, and associate an id to each square of the grid using the fixed_size_z_index_geo_grid method.

Parameters¶

perils_dict: dict: Dictionary with peril_id as key and fixed_size_geo_grid parameter dict as value. i.e {‘peril_id’ : {fixed_size_geo_grid parameters}}

static build_fixed_size_z_index_geo_grid(lat_min, lat_max, lon_min, lon_max, arc_size, lat_reverse=False, lon_reverse=False, lon_first=False)[source]¶: associate an id to each square of the grid defined by z-order indexing. reverse allow to change the ordering of id from (min to max) to (max to min)

build_geotiff(file_path, band_info)[source]¶

Args:: file_path: path to the geotiff file band_info: a dict where keys are assigned column name, and values are dicts with

id is the id of the band in the tiff file default is the(value for outside of range location
Returns:: function to assign band value to each corresponding lat lon

build_merge(file_path, id_columns=[], **kwargs)[source]¶

this method will merge the locations Dataframe with the Dataframe present in file_path All non match column present in id_columns will be set to -1

this is an efficient way to map a combination of column that have a finite scope to an idea.

static build_simple_pivot(pivots, remove_pivoted_col=True)[source]¶

allow to pivot columns of the locations dataframe into multiple rows each pivot in the pivot list may define:

“on”: to rename a column into a new one “new_cols”: to create a new column with a certain values

ex: “pivots”: [{“on”: {“vuln_str”: “vulnerability_id”},

“new_cols”: {“coverage_type”: 1}},

{“on”: {“vuln_con”: “vulnerability_id”},
“new_cols”: {“coverage_type”: 3}},

],

loc_id vuln_str vuln_con 1 3 2 2 18 4

=> loc_id vuln_str vuln_con vulnerability_id coverage_type 1 3 2 3 1 2 18 4 18 1 1 3 2 2 3 2 18 4 4 3

static build_model_data(columns)[source]¶: Serialises specified columns from the OED file into a model_data dict

static build_dynamic_model_adjustment(intensity_adjustment_col, return_period_col)[source]¶: Converts specified columns from the OED file into intensity adjustments and return period protection.