Utilities¶
config
¶
GeneralConfig
¶
Configuration for general settings.
Attributes:
Name | Type | Description |
---|---|---|
seed |
int
|
The random seed value. |
eval_batch_size |
int
|
The batch size for evaluation. |
logging |
LoggingConfig
|
The logging configuration. |
device |
str
|
The device to run the computations on (e.g., 'cpu', 'cuda'). |
optuna_db |
Optional[str]
|
Optional database string for Optuna. |
Source code in vambn/utils/config.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
|
LoggingConfig
¶
Configuration class for logging settings.
Attributes:
Name | Type | Description |
---|---|---|
level |
int
|
The logging level. |
mlflow |
MlflowConfig
|
The MLflow configuration. |
Source code in vambn/utils/config.py
26 27 28 29 30 31 32 33 34 35 36 37 |
|
MlflowConfig
¶
Configuration class for MLflow settings.
Attributes:
Name | Type | Description |
---|---|---|
use |
bool
|
Whether to use MLflow for logging. |
tracking_uri |
str
|
The URI of the MLflow tracking server. |
experiment_name |
str
|
The name of the MLflow experiment. |
Source code in vambn/utils/config.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
OptimizationConfig
¶
Configuration class for optimization settings.
Attributes:
Name | Type | Description |
---|---|---|
max_epochs |
int
|
The maximum number of epochs. |
folds |
int
|
The number of folds for cross-validation. |
n_modular_trials |
int
|
The number of trials for modular models. |
n_traditional_trials |
int
|
The number of trials for traditional models. |
s_dim_lower |
int
|
The lower bound of the s dimension. |
s_dim_upper |
int
|
The upper bound of the s dimension. |
s_dim_step |
int
|
The step size for the s dimension. |
fixed_s_dim |
bool
|
Whether the s dimension is fixed. |
y_dim_lower |
int
|
The lower bound of the y dimension. |
y_dim_upper |
int
|
The upper bound of the y dimension. |
y_dim_step |
int
|
The step size for the y dimension. |
fixed_y_dim |
bool
|
Whether the y dimension is fixed. |
latent_dim_lower |
int
|
The lower bound of the latent dimension. |
latent_dim_upper |
int
|
The upper bound of the latent dimension. |
latent_dim_step |
int
|
The step size for the latent dimension. |
batch_size_lower_n |
int
|
The lower bound of the batch size. |
batch_size_upper_n |
int
|
The upper bound of the batch size. |
learning_rate_lower |
float
|
The lower bound of the learning rate. |
learning_rate_upper |
float
|
The upper bound of the learning rate. |
fixed_learning_rate |
bool
|
Whether the learning rate is fixed. |
lstm_layers_lower |
int
|
The lower bound of the LSTM layers. |
lstm_layers_upper |
int
|
The upper bound of the LSTM layers. |
lstm_layers_step |
int
|
The step size for the LSTM layers. |
use_relative_correlation_error_for_optimization |
bool
|
Whether to use relative correlation error for optimization. |
use_auc_for_optimization |
bool
|
Whether to use AUC for optimization. |
Source code in vambn/utils/config.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
|
PipelineConfig
¶
Configuration for the pipeline settings.
Attributes:
Name | Type | Description |
---|---|---|
general |
GeneralConfig
|
General configuration settings. |
optimization |
OptimizationConfig
|
Optimization configuration settings. |
training |
TrainingConfig
|
Training configuration settings. |
Source code in vambn/utils/config.py
134 135 136 137 138 139 140 141 142 143 144 145 146 |
|
TrainingConfig
¶
Configuration for training settings.
Attributes:
Name | Type | Description |
---|---|---|
use_imputation_layer |
bool
|
Whether to use an imputation layer. |
use_mtl |
bool
|
Whether to use multi-task learning. |
with_gan |
bool
|
Whether to use a GAN. |
Source code in vambn/utils/config.py
119 120 121 122 123 124 125 126 127 128 129 130 131 |
|
exceptions
¶
InvalidSamples
¶
Bases: Exception
Excpetion Type for invalid samples in the HIVAE.
Source code in vambn/utils/exceptions.py
1 2 3 4 |
|
helpers
¶
AggregatedMetric
¶
Class to aggregate and compute average of float metrics.
Source code in vambn/utils/helpers.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
|
__add__(new_value)
¶
Adds a new value to the metric list.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
new_value |
float
|
The new float value to add. |
required |
Source code in vambn/utils/helpers.py
29 30 31 32 33 34 35 |
|
__call__()
¶
Computes the average of the aggregated values.
Returns:
Type | Description |
---|---|
float
|
The average value of the aggregated metrics. |
Source code in vambn/utils/helpers.py
37 38 39 40 41 42 43 |
|
AggregatedTorchMetric
¶
Class to aggregate and compute average of torch.Tensor metrics.
Source code in vambn/utils/helpers.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
|
__add__(new_value)
¶
Adds a new tensor value to the metric list.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
new_value |
Tensor
|
The new tensor value to add. |
required |
Returns:
Name | Type | Description |
---|---|---|
self |
AggregatedTorchMetric
|
The updated AggregatedTorchMetric object. |
Source code in vambn/utils/helpers.py
52 53 54 55 56 57 58 59 60 61 62 |
|
__call__()
¶
Computes the average of the aggregated tensor values.
Returns:
Type | Description |
---|---|
Tensor
|
The average tensor value of the aggregated metrics. |
Source code in vambn/utils/helpers.py
64 65 66 67 68 69 70 |
|
NaNHandlingStrategy
¶
Bases: Enum
Enumeration of strategies for handling NaN values.
Source code in vambn/utils/helpers.py
14 15 16 17 18 19 20 |
|
column_is_categorical(col)
¶
Determines if a column is categorical.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col |
Series
|
The column to check. |
required |
Returns:
Type | Description |
---|---|
bool
|
True if the column is categorical, False otherwise. |
Source code in vambn/utils/helpers.py
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
|
delete_directory(dir_path)
¶
Deletes a directory and all its contents.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dir_path |
Path
|
Path to the directory to delete. |
required |
Source code in vambn/utils/helpers.py
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
|
encode_numerical_columns(patient_data)
¶
Encodes non-numeric columns of a DataFrame with categorical values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
patient_data |
DataFrame
|
The DataFrame with patient data. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A DataFrame with non-numeric columns encoded as categorical values. |
Source code in vambn/utils/helpers.py
138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
|
get_normalized_vector_distance(vec1, vec2)
¶
Computes the normalized distance between two vectors.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vec1 |
ndarray
|
The first vector. |
required |
vec2 |
ndarray
|
The second vector. |
required |
Returns:
Type | Description |
---|---|
float
|
The normalized distance between the two vectors. |
Source code in vambn/utils/helpers.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
|
get_vector_to_mixed_matrix_distance(vec, matrix)
¶
Computes the distance from a vector to a mixed-type matrix.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vec |
ndarray
|
The vector to compare. |
required |
matrix |
ndarray
|
The matrix to compare against. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
An array of distances. |
Source code in vambn/utils/helpers.py
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
|
handle_nan_values(real, virtual, strategy=NaNHandlingStrategy.sample_random)
¶
Handles NaN values in two dataframes according to the specified strategy.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
real |
DataFrame
|
The real dataframe. |
required |
virtual |
DataFrame
|
The virtual dataframe. |
required |
strategy |
NaNHandlingStrategy
|
The strategy to use for handling NaN values. |
sample_random
|
Returns:
Type | Description |
---|---|
tuple[DataFrame, DataFrame]
|
A tuple containing the processed real and virtual dataframes. |
Source code in vambn/utils/helpers.py
173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 |
|
logging
¶
setup_logging(level, log_file=None)
¶
Setup logging to stdout or a specified file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
level |
int
|
The logging level. |
required |
log_file |
Optional[Path]
|
The file where logs should be saved. Defaults to None. |
None
|
Source code in vambn/utils/logging.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
|
syndat_conversion
¶
main(original_input, decoded_input, synthetic_input, original_output, decoded_output, synthetic_output)
¶
Processes and pivots the original, decoded, and synthetic CSV files, then saves the results.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
original_input |
Path
|
Path to the original input CSV file. |
required |
decoded_input |
Path
|
Path to the decoded input CSV file. |
required |
synthetic_input |
Path
|
Path to the synthetic input CSV file. |
required |
original_output |
Path
|
Path where the processed original output CSV file will be saved. |
required |
decoded_output |
Path
|
Path where the processed decoded output CSV file will be saved. |
required |
synthetic_output |
Path
|
Path where the processed synthetic output CSV file will be saved. |
required |
Source code in vambn/utils/syndat_conversion.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
|
trial_counter
¶
main(db_url, study_name)
¶
Get the number of completed or pruned trials and return to bash.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
db_url |
str
|
URL to the Optuna database. |
required |
study_name |
str
|
Name of the study. |
required |
Source code in vambn/utils/trial_counter.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|