oasislmf.lookup.builtin

Module for the built-in Lookup Class

in the future we may want to improve on the management of files used to generate the keys tutorial for pandas and parquet https://towardsdatascience.com/a-gentle-introduction-to-apache-arrow-with-apache-spark-and-pandas-bb19ffe0ddae

Module Contents

Classes

PerilCoveredDeterministicLookup

Basic abstract class for KeyLookup

Lookup

Built-in Lookup class that implement the OasisLookupInterface

Functions

get_nearest(src_points, candidates[, k_neighbors])

Find nearest neighbors for all source points from a set of candidate points

nearest_neighbor(left_gdf, right_gdf[, return_dist])

For each point in left_gdf, find closest point in right GeoDataFrame and return them.

Attributes

oasislmf.lookup.builtin.BallTree[source]
oasislmf.lookup.builtin.OPT_INSTALL_MESSAGE = "install oasislmf with extra packages by running 'pip install oasislmf[extra]'"[source]
oasislmf.lookup.builtin.get_nearest(src_points, candidates, k_neighbors=1)[source]

Find nearest neighbors for all source points from a set of candidate points

oasislmf.lookup.builtin.nearest_neighbor(left_gdf, right_gdf, return_dist=False)[source]

For each point in left_gdf, find closest point in right GeoDataFrame and return them.

NOTICE: Assumes that the input Points are in WGS84 projection (lat/lon).

oasislmf.lookup.builtin.key_columns = ['loc_id', 'peril_id', 'coverage_type', 'area_peril_id', 'vulnerability_id', 'status', 'message'][source]
class oasislmf.lookup.builtin.PerilCoveredDeterministicLookup(config, config_dir=None, user_data_dir=None, output_dir=None)[source]

Bases: oasislmf.lookup.base.AbstractBasicKeyLookup

Basic abstract class for KeyLookup

multiproc_enabled = False[source]
process_locations(locations)[source]

Process location rows - passed in as a pandas dataframe. Results can be list, tuple, generator or a pandas dataframe.

class oasislmf.lookup.builtin.Lookup(config, config_dir=None, user_data_dir=None, output_dir=None)[source]

Bases: oasislmf.lookup.base.AbstractBasicKeyLookup, oasislmf.lookup.base.MultiprocLookupMixin

Built-in Lookup class that implement the OasisLookupInterface The aim of this class is to provide a data driven lookup capability that will be both flexible and efficient.

it provide several generic function factory that can be define in the config under the “step_definition” key (ex:) “step_definition”: {

“split_loc_perils_covered”:{

“type”: “split_loc_perils_covered” , “columns”: [“locperilscovered”], “parameters”: {

“model_perils_covered”: [“WTC”, “WSS”]

}

}, “vulnerability”: {

“type”: “merge”, “columns”: [“peril_id”, “coverage_type”, “occupancycode”], “parameters”: {“file_path”: “%%KEYS_DATA_PATH%%/vulnerability_dict.csv”,

“id_columns”: [“vulnerability_id”]

}

}

} mapper key: is called the step_name,

it will be added the the lookup object method once the function has been built it can take any value but make sure it doesn’t collide with already existing method

type: define the function factory to call.

in the class for type <fct_type> the function factory called will be build_<fct_type> ex: “type”: “merge” => build_merge

columns: are the column required to be able to apply the step.

those are quite important as any column (except ‘loc_id’) from the original Locations Dataframe that is not in any step will be drop to reduce memory consumption

parameters: the parameter passed the the function factory.

Once all the functions have been defined, the order in which they must be applied is defined in the config under the “strategy” key (ex:)

“strategy”: [“split_loc_perils_covered”, “vulnerability”]

It is totally possible to subclass Lookup in order to create your custom step or function factory for custom step:

add your function definition to the “mapper”with no parameters

“my_custom_step”: {

“type”: “custom_type” , “columns”: […],

} simply add it to your “strategy”: [“split_loc_perils_covered”, “vulnerability”, “my_custom_step”] and code the function in your subclass class MyLookup(Lookup):

@staticmethod def my_custom_step(locations):

<do something on locations> return modified_locations

for function factory: add your function definition to the “step_definition” with the required parameters “my_custom_step”: {

“type”: “custom_type” , “columns”: […], “parameters”: {

“param1”: “value1”

}

} add your step to “strategy”: [“split_loc_perils_covered”, “vulnerability”, “my_custom_step”] and code the function factory in your subclass class MyLookup(Lookup):

def build_custom_type(self, param1):
def fct(locations):

<do something on locations that depend on param1> return modified_locations

return fct

interface_version = '1'[source]
set_step_function(step_name, step_config, function_being_set=None)[source]

set the step as a function of the lookup object if it’s not already done and return it. if the step is composed of several child steps, it will set the child steps recursively.

Args:

step_name (str): name of the strategy for this step step_config (dict): config of the strategy for this step function_being_set (set, None): set of all the strategy that are parent of this step

Returns:

function: function corresponding this step

process_locations(locations)[source]

Process location rows - passed in as a pandas dataframe. Results can be list, tuple, generator or a pandas dataframe.

to_abs_filepath(filepath)[source]

replace placeholder r’%%(.+?)%%’ (ex: %%KEYS_DATA_PATH%%) with the path set in self.config Args:

filepath (str): filepath with potentially a placeholder

Returns:

str: filepath where placeholder are replace their actual value.

static set_id_columns(df, id_columns)[source]

in Dataframes, only float column can have nan values. So after a left join for example if you have nan values that will change the type of the original column into float. this function replace the nan value with the OASIS_UNKNOWN_ID and reset the column type to int

static build_combine(id_columns, strategy)[source]

build a function that will combine several strategy trying to achieve the same purpose by different mean into one. for example, finding the correct area_peril_id for a location with one method using (latitude, longitude) and one using postcode. each strategy will be applied sequentially on the location that steal have OASIS_UNKNOWN_ID in their id_columns after the precedent strategy

Args:

id_columns (list): columns that will be checked to determine if a strategy has succeeded strategy (list): list of strategy to apply

Returns:

function: function combining all strategies

static build_split_loc_perils_covered(model_perils_covered=None)[source]

split the value of LocPerilsCovered into multiple line, taking peril group into account drop all line that are not in the list model_perils_covered

usefull inspirational code: https://stackoverflow.com/questions/17116814/pandas-how-do-i-split-text-in-a-column-into-multiple-rows

static build_prepare(**kwargs)[source]

Prepare the dataframe by setting default, min and max values and type support several simple DataFrame preparation:

default: create the column if missing and replace the nan value with the default value max: truncate the values in a column to the specified max min: truncate the values in a column to the specified min type: convert the type of the column to the specified numpy dtype

Note that we use the string representation of numpy dtype available at https://numpy.org/doc/stable/reference/arrays.dtypes.html#arrays-dtypes-constructing

build_rtree(file_path, file_type, id_columns, area_peril_read_params=None, nearest_neighbor_min_distance=-1)[source]

Function Factory to associate location to area_peril based on the rtree method

!!! please note that this method is quite time consuming (specialy if you use the nearest point option if your peril_area are square you should use area_peril function fixed_size_geo_grid !!!

file_path: is the path to the file containing the area_peril_dictionary.

this file must be a geopandas Dataframe with a valid geometry. an example on how to create such dataframe is available in PiWind if you are new to geo data (in python) and want to learn more, you may have a look at this excellent course: https://automating-gis-processes.github.io/site/index.html

file_type: can be any format readable by geopandas (‘file’, ‘parquet’, …)

see: https://geopandas.readthedocs.io/en/latest/docs/reference/io.html you may have to install additional library such as pyarrow for parquet

id_columns: column to transform to an ‘id_column’ (type int32 with nan replace by -1)

nearest_neighbor_min_distance: option to compute the nearest point if intersection method fails

we use: https://automating-gis-processes.github.io/site/notebooks/L3/nearest-neighbor-faster.html but alternatives can be found here: https://gis.stackexchange.com/questions/222315/geopandas-find-nearest-point-in-other-dataframe

static build_fixed_size_geo_grid(lat_min, lat_max, lon_min, lon_max, arc_size, lat_reverse=False, lon_reverse=False)[source]

associate an id to each square of the grid define by the limit of lat and lon reverse allow to change the ordering of id from (min to max) to (max to min)

build_merge(file_path, id_columns=[], **kwargs)[source]

this method will merge the locations Dataframe with the Dataframe present in file_path All non match column present in id_columns will be set to -1

this is an efficient way to map a combination of column that have a finite scope to an idea.

static build_simple_pivot(pivots, remove_pivoted_col=True)[source]

allow to pivot columns of the locations dataframe into multiple rows each pivot in the pivot list may define:

“on”: to rename a column into a new one “new_cols”: to create a new column with a certain values

ex: “pivots”: [{“on”: {“vuln_str”: “vulnerability_id”},

“new_cols”: {“coverage_type”: 1}},

{“on”: {“vuln_con”: “vulnerability_id”},

“new_cols”: {“coverage_type”: 3}},

],

loc_id vuln_str vuln_con 1 3 2 2 18 4

=> loc_id vuln_str vuln_con vulnerability_id coverage_type 1 3 2 3 1 2 18 4 18 1 1 3 2 2 3 2 18 4 4 3