Configuration Documentation¶
This document provides detailed information on the configuration options available in the provided configuration file. The configuration is divided into several sections: snakemake
, general
, optimization
, and training
. Each section and its respective options are described below.
Snakemake Configuration¶
The snakemake
section contains settings related to the Snakemake workflow management system.
Options¶
- use_slurm:
false
- Type: Boolean
- Description: Flag to indicate if SLURM is available (e.g., on Loewenburg).
- with_gan:
false
- Type: Boolean
- Description: Flag to indicate if a GAN variant should be used in addition to the normal training.
- with_mtl:
false
- Type: Boolean
- Description: Flag to indicate if multitask learning variants should be used in addition to the normal training.
- output_dir:
"reports"
- Type: String
- Description: Output directory for all generated output files.
- bn: Bayesian network configuration; typically does not need to be changed.
- refactor:
true
- Type: Boolean
- cv_runs:
5
- Type: Integer
- cv_restart:
5
- Type: Integer
- fit:
"mle-cg"
- Type: String
- maxp:
5
- Type: Integer
- loss:
null
- Type: Null
- score:
"bic-cg"
- Type: String
- folds:
3
- Type: Integer
- n_bootstrap:
500
- Type: Integer
- seed:
42
- Type: Integer
- refactor:
- excluded_datasets: List of datasets to be excluded from the pipeline.
- Type: List of Strings
- Description: List of datasets to be excluded from the pipeline. The datasets are defined by their folder name in the
data/raw
directory. - Example:
["texas"]
- exclusive_dataset:
null
- Type: Null or String
- Description: Define this if you only want to run the pipeline for a single dataset for e.g. testing purposes.
- cluster_modules:
- R:
null
- Type: Null or String
- Description: Optional cluster module for R (e.g.,
"R/4.0.3"
). Not required if R is available on the system or in a Conda environment.
- R:
- r_env:
"/path/to/R.yaml"
- Type: String
- Description: Path to the R environment file. The file is used to setup an conda environment for R. If this is used snakemake needs to be run with the
--use-conda
flag.
General Configuration¶
The general
section contains general settings for the application.
Options¶
- seed:
42
- Type: Integer
- Description: Seed for reproducibility.
- eval_batch_size:
64
- Type: Integer
- Description: Batch size for evaluation. Does not affect training and is only restricted by the available memory.
- device:
"cpu"
- Type: String
- Description: Device to use for training. Use
"cuda"
for GPU training.
- optuna_db:
"postgresql://localhost/optuna"
- Type: String or Null
- Description: Database connection for Optuna. If not available, set to
null
to use SQLite databases. PostgreSQL is recommended since all results would be stored in a single database.
- logging:
- level:
20
- Type: Integer
- Description: Logging level. Default is
20
(INFO). Other options are10
(DEBUG),30
(WARNING),40
(ERROR), and50
(CRITICAL).
- mlflow:
- use:
false
- Type: Boolean
- Description: Flag to indicate if MLflow should be used for logging.
- tracking_uri:
"http://localhost:5000"
- Type: String
- Description: URI for the MLflow tracking server. Can be a local or remote server.
- experiment_name:
"VAMBN2"
- Type: String
- Description: Name of the MLflow experiment.
- use:
- level:
Optimization Configuration¶
The optimization
section contains settings related to the optimization process.
Options¶
- folds:
3
- Type: Integer
- n_traditional_trials:
20
- Type: Integer
- n_modular_trials:
20
- Type: Integer
- s_dim_lower:
1
- Type: Integer
- s_dim_upper:
5
- Type: Integer
- s_dim_step:
1
- Type: Integer
- fixed_s_dim:
false
- Type: Boolean
- y_dim_lower:
1
- Type: Integer
- y_dim_upper:
5
- Type: Integer
- y_dim_step:
1
- Type: Integer
- fixed_y_dim:
false
- Type: Boolean
- latent_dim_lower:
1
- Type: Integer
- latent_dim_upper:
5
- Type: Integer
- latent_dim_step:
1
- Type: Integer
- batch_size_lower_n:
4
- Type: Integer
- batch_size_upper_n:
8
- Type: Integer
- max_epochs:
2500
- Type: Integer
- Description: Maximum number of epochs. Currently, early stopping is used.
- learning_rate_lower:
0.0001
- Type: Float
- learning_rate_upper:
0.1
- Type: Float
- fixed_learning_rate:
true
- Type: Boolean
- lstm_layers_lower:
1
- Type: Integer
- lstm_layers_upper:
4
- Type: Integer
- lstm_layers_step:
1
- Type: Integer
- use_relative_correlation_error_for_optimization:
false
- Type: Boolean
- Description: Flag to indicate if the relative correlation error should be used as optuna metric.
- use_auc_for_optimization:
false
- Type: Boolean
- Description: Flag to indicate if the Area under the ROC curve should be used as optuna metric.
Training Configuration¶
The training
section contains settings related to the training process.
Options¶
- use_imputation_layer:
true
- Type: Boolean
- Description: Flag to indicate if the imputation layer should be used.