Monte Carlo

class MonteCarlo

Class that allows the statistical comparison of several models on the same dataset

Example:

from cvnn.montecarlo import MonteCarlo
# Assume you already have complex data 'x' with its labels 'y'... and 3 Cvnn models.

montecarlo = MonteCarlo()
montecarlo.add_model(model1)
montecarlo.add_model(model2)
montecarlo.add_model(model3)

montecarlo.run(x, y)

A file ./log/monte_carlo_summary.xlsx is generated the first call to run method. All following calls will add a row to that same file with each run information to keep track of the results and its configuration.

This code will also generate files into ./log/montecarlo/date/of/run/

  • run_data.csv: Full information of performance of iteration of each model at each epoch
  • <model.name>_statistical_result.csv: Statistical results of all iterations of each model per epoch (mean, median, std, etc)
  • models_details.json: A full detailed description of each model to be trained
  • (Optional) run_summary.txt: User friendly summary of the run models and data
  • (Optional) plot/ folder with the corresponding plots generated by MonteCarloAnalyzer.do_all()

Note

To see how to create the Optional outputs refer to Output files.

add_model(self, model: keras.Model)

Adds a keras.Model to the list to then comparate between them

run(self, x, y, data_summary='', real_cast_modes=None, validation_split=0.2, validation_data=None, test_data=None, iterations=100, epochs=10, batch_size=100, shuffle=False, display_freq=1)

This function is used to compare all models added with self.add_model method. Runs the iteration dataset (x, y).

  1. It then runs a monte carlo simulation of several iterations of each added model.
  2. An excel file will be created on ./logs/ folder on the first time it runs. All following runs will add a row to the file with the run information to keep track of the results and its configuration.
  3. Saves several files into ./logs/montecarlo/<year>/<month>/<day>/run_<time>/
    1. run_data.csv: Full information of performance of iteration of each model at each epoch
    2. <model.name>_statistical_result.csv: Statistical results of all iterations of each model per epoch (mean, median, std, etc)
    3. models_details.json: A full detailed description of each model to be trained
    4. (Optional) run_summary.txt: User friendly summary of the run models and data
    5. (Optional) plot/ folder with the corresponding plots generated by MonteCarloAnalyzer.do_all()
Parameters:
  • x

    Input data in complex form. It can be:

    • A Numpy array (or array-like), or a list of arrays (in case the model has multiple inputs).
    • A TensorFlow tensor, or a list of tensors (in case the model has multiple inputs).
    • A tf.data dataset. Should return a tuple (inputs, targets). Preferred data type (less overhead).
  • y – Labels/Target data. Like the input data x, it could be either Numpy array(s) or TensorFlow tensor(s). If x is a dataset then y will be ignored.
  • data_summary – (String) Dataset name to keep track of it
  • real_cast_modes

    mode parameter used by transform-to-real-label to be used when the model to train is real-valued. One of the following:

    • String with the mode listed in cvnn.utils.transform_to_real to be used by all the real-valued models to cast complex data to real.
    • List or Tuple of strings: Same size of self.models. Mode to cast complex data to real for each model in self.model. real_cast_modes[i] will indicate how to cast data for self.models[i] (ignored when model is complex)
  • validation_split – Float between 0 and 1. Percentage of the input data to be used as test set (the rest will be use as train set) Default: 0.0 (No validation set). This input is ignored if validation_data is given.
  • validation_data – A tuple (x_val, y_val) of Numpy arrays or tensors. Preferred data type (less overhead). Data on which to evaluate the loss and any model metrics at the end of each epoch. The model will not be trained on this data. This parameter takes precedence over validation_split.
  • test_data – (Optional) tuple (x_test, y_test) of Numpy arrays or tensors. Data on which to evaluate the loss and any model metrics at the end of a model training. The model will not be trained on this data. If test data is not None (default) it will generate a file called test_results.csv with the statistical results from the test data.
  • iterations – Number of iterations to be done for each model
  • epochs – Number of epochs for each iteration
  • batch_size – Batch size at each iteration
  • display_freq – Integer (Default 1) Frequency on terms of epochs before saving information and running a checkpoint.
  • shuffle – (Boolean) Whether to shuffle the training data before each epoch.
  • early_stop – (Boolean) Default: False. Wheather to implement early stop on training.
  • same_weights – (Boolean) Default: False. If True it will use the same weights at each iteration.
  • verbose

    Different modes according to

    • 0 or ‘silent’: No output at all
    • 1 or False: Progress bar per iteration
    • 2 or True or ‘debug’: Progress bar per epoch
  • early_stop – (Default: False) Wheather to implement early stop on training. :param same_weights: (Default False) If True it will use the same weights at each iteration.
Returns:

(string) Full path to the run_data.csv generated file. It can be used by cvnn.data_analysis.SeveralMonteCarloComparison to compare several runs.