oasislmf.pytools.aal.manager
============================

.. py:module:: oasislmf.pytools.aal.manager


Attributes
----------

.. autoapisummary::

   oasislmf.pytools.aal.manager.logger
   oasislmf.pytools.aal.manager.OASIS_AAL_MEMORY
   oasislmf.pytools.aal.manager.AAL_output
   oasislmf.pytools.aal.manager.ALCT_output


Functions
---------

.. autoapisummary::

   oasislmf.pytools.aal.manager.process_bin_file
   oasislmf.pytools.aal.manager.sort_and_save_chunk
   oasislmf.pytools.aal.manager.merge_sorted_chunks
   oasislmf.pytools.aal.manager.get_summaries_data
   oasislmf.pytools.aal.manager.summary_index
   oasislmf.pytools.aal.manager.read_input_files
   oasislmf.pytools.aal.manager.get_num_subsets
   oasislmf.pytools.aal.manager.get_weighted_means
   oasislmf.pytools.aal.manager.do_calc_end
   oasislmf.pytools.aal.manager.read_losses
   oasislmf.pytools.aal.manager.skip_losses
   oasislmf.pytools.aal.manager.run_aal
   oasislmf.pytools.aal.manager.calculate_mean_stddev
   oasislmf.pytools.aal.manager.get_aal_data
   oasislmf.pytools.aal.manager.get_aal_data_meanonly
   oasislmf.pytools.aal.manager.calculate_confidence_interval
   oasislmf.pytools.aal.manager.get_alct_data
   oasislmf.pytools.aal.manager.run
   oasislmf.pytools.aal.manager.main


Module Contents
---------------

.. py:data:: logger

.. py:data:: OASIS_AAL_MEMORY

.. py:data:: AAL_output

.. py:data:: ALCT_output

.. py:function:: process_bin_file(fbin, offset, occ_map, unique_event_ids, event_id_counts, summaries_data, summaries_idx, file_index, sample_size)

   Reads summary<n>.bin file event_ids and summary_ids to populate summaries_data
   Args:
       fbin (np.memmap): summary binary memmap
       offset (int): file offset to read from
       occ_map (ndarray[occ_map_dtype]): numpy map of event_id, period_no, occ_date_id from the occurrence file
       unique_event_ids (ndarray[np.int32]): List of unique event_ids
       event_id_counts (ndarray[np.int32]): List of the counts of occurrences for each unique event_id in occ_map
       summaries_data (ndarray[_SUMMARIES_DTYPE]): Index summary data (summaries.idx data)
       summaries_idx (int): current index reached in summaries_data
       file_index (int): Summary bin file index
       sample_size (int): Sample size
   Returns:
       summaries_idx (int): current index reached in summaries_data
       resize_flag (bool): flag to indicate whether to resize summaries_data when full
       offset (int): file offset to read from


.. py:function:: sort_and_save_chunk(summaries_data, temp_file_path)

   Sort a chunk of summaries data and save it to a temporary file.
   Args:
       summaries_data (ndarray[_SUMMARIES_DTYPE]): Indexed summary data
       temp_file_path (str | os.PathLike): Path to temporary file


.. py:function:: merge_sorted_chunks(memmaps)

   Merge sorted chunks using a k-way merge algorithm and yield next smallest row
   Args:
       memmaps (List[np.memmap]): List of temporary file memmaps
   Yields:
       smallest_row (ndarray[_SUMMARIES_DTYPE]): yields the next smallest row from sorted summaries partial files


.. py:function:: get_summaries_data(path, files_handles, occ_map, unique_event_ids, event_id_counts, sample_size, aal_max_memory)

   Gets the indexed summaries data, ordered with k-way merge if not enough memory
   Args:
       path (os.PathLike): Path to the workspace folder containing summary binaries
       files_handles (List[np.memmap]): List of memmaps for summary files data
       occ_map (ndarray[occ_map_dtype]): numpy map of event_id, period_no, occ_date_id from the occurrence file
       unique_event_ids (ndarray[np.int32]): List of unique event_ids
       event_id_counts (ndarray[np.int32]): List of the counts of occurrences for each unique event_id in occ_map
       sample_size (int): Sample size
       aal_max_memory (float): OASIS_AAL_MEMORY value (has to be passed in as numba won't update from environment variable)
   Returns:
       memmaps (List[np.memmap]): List of temporary file memmaps
       max_summary_id (int): Max summary ID


.. py:function:: summary_index(path, occ_map, unique_event_ids, event_id_counts, stack)

   Index the summary binary outputs
   Args:
       path (os.PathLike): Path to the workspace folder containing summary binaries
       occ_map (ndarray[occ_map_dtype]): numpy map of event_id, period_no, occ_date_id from the occurrence file
       unique_event_ids (ndarray[np.int32]): List of unique event_ids
       event_id_counts (ndarray[np.int32]): List of the counts of occurrences for each unique event_id in occ_map
       stack (ExitStack): Exit stack
   Returns:
       files_handles (List[np.memmap]): List of memmaps for summary files data
       sample_size (int): Sample size
       max_summary_id (int): Max summary ID
       memmaps (List[np.memmap]): List of temporary file memmaps


.. py:function:: read_input_files(run_dir)

   Reads all input files and returns a dict of relevant data
   Args:
       run_dir (str | os.PathLike): Path to directory containing required files structure
   Returns:
       file_data (Dict[str, Any]): A dict of relevent data extracted from files


.. py:function:: get_num_subsets(alct, sample_size, max_summary_id)

   Gets the number of subsets required to generates the Sample AAL np map for subset sizes up to sample_size
   Example: sample_size[10], max_summary_id[2] generates following ndarray
   [
       #   subset_size, mean,  mean_squared, mean_period
       [0, 0, 0],  # subset_size = 1 , summary_id = 1
       [0, 0, 0],  # subset_size = 1 , summary_id = 2
       [0, 0, 0],  # subset_size = 2 , summary_id = 1
       [0, 0, 0],  # subset_size = 2 , summary_id = 2
       [0, 0, 0],  # subset_size = 4 , summary_id = 1
       [0, 0, 0],  # subset_size = 4 , summary_id = 2
       [0, 0, 0],  # subset_size = 10 , summary_id = 1, subset_size = sample_size
       [0, 0, 0],  # subset_size = 10 , summary_id = 2, subset_size = sample_size
   ]
   Subset_size is implicit based on position in array, grouped by max_summary_id
   So first two arrays are subset_size 2^0 = 1
   The next two arrays are subset_size 2^1 = 2
   The next two arrays are subset_size 2^2 = 4
   The last two arrays are subset_size = sample_size = 10
   Doesn't generate one with subset_size 8 as double that is larger than sample_size
   Therefore this function returns 4, and the sample aal array is 4 * 2
   Args:
       alct (bool): Boolean for ALCT output
       sample_size (int): Sample size
       max_summary_id (int): Max summary ID
   Returns:
       num_subsets (int): Number of subsets


.. py:function:: get_weighted_means(vec_sample_sum_loss, weighting, sidx, end_sidx)

   Get sum of weighted mean and weighted mean_squared
   Args:
       vec_sample_sum_loss (ndarray[_AAL_REC_DTYPE]): Vector for sample sum losses
       weighting (float): Weighting value
       sidx (int): start index
       end_sidx (int): end index
   Returns:
       weighted_mean (float): Sum weighted mean
       weighted_mean_squared (float): Sum weighted mean squared


.. py:function:: do_calc_end(period_no, no_of_periods, period_weights, sample_size, curr_summary_id, max_summary_id, vec_analytical_aal, vecs_sample_aal, vec_used_summary_id, vec_sample_sum_loss)

   Updates Analytical and Sample AAL vectors from sample sum losses
   Args:
       period_no (int): Period Number
       no_of_periods (int): Number of periods
       period_weights (ndarray[period_weights_dtype]): Period Weights
       sample_size (int): Sample Size
       curr_summary_id (int): Current summary_id
       max_summary_id (int): Max summary_id
       vec_analytical_aal (ndarray[_AAL_REC_DTYPE]): Vector for Analytical AAL
       vecs_sample_aal (ndarray[_AAL_REC_PERIODS_DTYPE]): Vector for Sample AAL
       vec_used_summary_id (ndarray[bool]): vector to store if summary_id is used
       vec_sample_sum_loss (ndarray[_AAL_REC_DTYPE]): Vector for sample sum losses


.. py:function:: read_losses(summary_fin, cursor, vec_sample_sum_loss)

   Read losses from summary_fin starting at cursor, populate vec_sample_sum_loss
   Args:
       summary_fin (np.memmap): summary file memmap
       cursor (int): data offset for reading binary files
       (ndarray[_AAL_REC_DTYPE]): Vector for sample sum losses
   Returns:
       cursor (int): data offset for reading binary files


.. py:function:: skip_losses(summary_fin, cursor)

   Skip through losses in summary_fin starting at cursor
   Args:
       summary_fin (np.memmap): summary file memmap
       cursor (int): data offset for reading binary files
   Returns:
       cursor (int): data offset for reading binary files


.. py:function:: run_aal(memmaps, no_of_periods, period_weights, sample_size, max_summary_id, files_handles, vec_analytical_aal, vecs_sample_aal, vec_used_summary_id)

   Run AAL calculation loop to populate vec data
   Args:
       memmaps (List[np.memmap]): List of temporary file memmaps
       no_of_periods (int): Number of periods
       period_weights (ndarray[period_weights_dtype]): Period Weights
       sample_size (int): Sample Size
       max_summary_id (int): Max summary_id
       files_handles (List[np.memmap]): List of memmaps for summary files data
       vec_analytical_aal (ndarray[_AAL_REC_DTYPE]): Vector for Analytical AAL
       vecs_sample_aal (ndarray[_AAL_REC_PERIODS_DTYPE]): Vector for Sample AAL
       vec_used_summary_id (ndarray[bool]): vector to store if summary_id is used


.. py:function:: calculate_mean_stddev(observable_sum, observable_squared_sum, number_of_observations)

   Compute the mean and standard deviation from the sum and squared sum of an observable
   Args:
       observable_sum (ndarray[oasis_float]): Observable sum
       observable_squared_sum (ndarray[oasis_float]): Observable squared sum
       number_of_observations (int | ndarray[int]): number of observations
   Returns:
       mean (ndarray[oasis_float]): Mean
       std (ndarray[oasis_float]): Standard Deviation


.. py:function:: get_aal_data(vec_analytical_aal, vecs_sample_aal, vec_used_summary_id, sample_size, no_of_periods)

   Generate AAL csv data
   Args:
       vec_analytical_aal (ndarray[_AAL_REC_DTYPE]): Vector for Analytical AAL
       vecs_sample_aal (ndarray[_AAL_REC_PERIODS_DTYPE]): Vector for Sample AAL
       vec_used_summary_id (ndarray[bool]): vector to store if summary_id is used
       sample_size (int): Sample Size
       no_of_periods (int): Number of periods
   Returns:
       aal_data (List[Tuple]): AAL csv data


.. py:function:: get_aal_data_meanonly(vec_analytical_aal, vecs_sample_aal, vec_used_summary_id, sample_size, no_of_periods)

   Generate AAL csv data
   Args:
       vec_analytical_aal (ndarray[_AAL_REC_DTYPE]): Vector for Analytical AAL
       vecs_sample_aal (ndarray[_AAL_REC_PERIODS_DTYPE]): Vector for Sample AAL
       vec_used_summary_id (ndarray[bool]): vector to store if summary_id is used
       sample_size (int): Sample Size
       no_of_periods (int): Number of periods
   Returns:
       aal_data (List[Tuple]): AAL csv data


.. py:function:: calculate_confidence_interval(std_err, confidence_level)

   Calculate the confidence interval based on standard error and confidence level.
   Args:
       std_err (float): The standard error.
       confidence_level (float): The confidence level (e.g., 0.95 for 95%).
   Returns:
       confidence interval (float): The confidence interval.


.. py:function:: get_alct_data(vecs_sample_aal, max_summary_id, sample_size, no_of_periods, confidence)

   Generate ALCT csv data
   Args:
       vecs_sample_aal (ndarray[_AAL_REC_PERIODS_DTYPE]): Vector for Sample AAL
       max_summary_id (int): Max summary_id
       sample_size (int): Sample Size
       no_of_periods (int): Number of periods
       confidence (float): Confidence level between 0 and 1, default 0.95
   Returns:
       alct_data (List[List]): ALCT csv data


.. py:function:: run(run_dir, subfolder, aal_output_file=None, alct_output_file=None, meanonly=False, noheader=False, confidence=0.95)

   Runs AAL calculations
   Args:
       run_dir (str | os.PathLike): Path to directory containing required files structure
       subfolder (str): Workspace subfolder inside <run_dir>/work/<subfolder>
       aal_output_file (str, optional): Path to AAL output file. Defaults to None
       alct_output_file (str, optional): Path to ALCT output file. Defaults to None
       meanonly (bool): Boolean value to output AAL with mean only
       noheader (bool): Boolean value to skip header in output file
       confidence (float): Confidence level between 0 and 1, default 0.95


.. py:function:: main(run_dir='.', subfolder=None, aal=None, alct=None, meanonly=False, noheader=False, confidence=0.95, **kwargs)