Simulator

DPGANSimulator

class zoo.chronos.simulator.doppelganger_simulator.DPGANSimulator(L_max, sample_len, feature_dim, num_real_attribute, discriminator_num_layers=5, discriminator_num_units=200, attr_discriminator_num_layers=5, attr_discriminator_num_units=200, attribute_num_units=100, attribute_num_layers=3, feature_num_units=100, feature_num_layers=1, attribute_input_noise_dim=5, addi_attribute_input_noise_dim=5, d_gp_coe=10, attr_d_gp_coe=10, g_attr_d_coe=1, d_lr=0.001, attr_d_lr=0.001, g_lr=0.001, g_rounds=1, d_rounds=1, seed=0, num_threads=None, ckpt_dir='.', checkpoint_every_n_epoch=0)[source]

Bases: object

Doppelganger Simulator for time series generation. The codes and algorithm are adapted from https://github.com/fjxmlzn/DoppelGANger.

Initialize a doppelganger simulator.

Parameters
  • L_max – the maximum length of your feature.

  • sample_len – the sample length to control LSTM length, should be a divider to L_max

  • feature_dim – dimention of the feature

  • num_real_attribute – the length of you attribute, which should be equal to the len(data_attribute).

  • discriminator_num_layers – MLP layer num for discriminator.

  • discriminator_num_units – MLP hidden unit for discriminator.

  • attr_discriminator_num_layers – MLP layer num for attr discriminator.

  • attr_discriminator_num_units – MLP hidden unit for attr discriminator.

  • attribute_num_units – MLP layer num for attr generator/addi attr generator.

  • attribute_num_layers – MLP hidden unit for attr generator/addi attr generator.

  • feature_num_units – LSTM hidden unit for feature generator.

  • feature_num_layers – LSTM layer num for feature generator.

  • attribute_input_noise_dim – noise data dim for attr generator.

  • addi_attribute_input_noise_dim – noise data dim for addi attr generator.

  • d_gp_coe – gradient penalty ratio for d loss.

  • attr_d_gp_coe – gradient penalty ratio for attr d loss.

  • g_attr_d_coe – ratio between feature loss and attr loss for g loss.

  • d_lr – learning rate for discriminator.

  • attr_d_lr – learning rate for attr discriminator.

  • g_lr – learning rate for genereators.

  • g_rounds – g rounds.

  • d_rounds – d rounds.

  • seed – random seed.

  • num_threads – num of threads to be used for training.

  • ckpt_dir – The checkpoint location, defaults to the working dir.

  • checkpoint_every_n_epoch – checkpoint every n epoch, defaults to 0 for no checkpoints.

fit(data_feature, data_attribute, data_gen_flag, feature_outputs, attribute_outputs, epoch=1, batch_size=32)[source]

Fit on the training data(typically the private data).

Parameters
  • data_feature – Training features, in numpy float32 array format. The size is [(number of training samples) x (maximum length) x (total dimension of features)]. Categorical features are stored by one-hot encoding; for example, if a categorical feature has 3 possibilities, then it can take values between [1., 0., 0.], [0., 1., 0.], and [0., 0., 1.]. Each continuous feature should be normalized to [0, 1] or [-1, 1]. The array is padded by zeros after the time series ends.

  • data_attribute – Training attributes, in numpy float32 array format. The size is [(number of training samples) x (total dimension of attributes)]. Categorical attributes are stored by one-hot encoding; for example, if a categorical attribute has 3 possibilities, then it can take values between [1., 0., 0.], [0., 1., 0.], and [0., 0., 1.]. Each continuous attribute should be normalized to [0, 1] or [-1, 1].

  • data_gen_flag – Flags indicating the activation of features, in numpy float32 array format. The size is [(number of training samples) x (maximum length)]. 1 means the time series is activated at this time step, 0 means the time series is inactivated at this timestep.

  • feature_outputs – A list of Output indicates the meta data of data_feature.

  • attribute_outputs – A list of Output indicates the meta data of data_attribute.

  • epoch – training epoch.

  • batch_size – training batchsize.

generate(sample_num=1, batch_size=32)[source]

Generate synthetic data with similar distribution as training data.

Parameters
  • sample_num – How many samples to be generated.

  • batch_size – batch size to generate.

save(path_dir)[source]

Save the simulator.

Parameters

path_dir – saving path

load(path_dir, model_version='doppelganger.ckpt')[source]

Load the simulator.

Parameters
  • path_dir – saving path

  • model_version – model version(filename) you would like to load.