How-to: Run GARAGE
A recipe for running the full GARAGE pipeline on any dataset.
Goal
Generate synthetic scRNA-seq data using the GARAGE two-stage pipeline (GAT subsampling → GAN generation).
Prerequisites
Installation completed.
At least one dataset configured in
config.py→DATASET_CONFIG.
Steps
1. Basic invocation
python -m data_generation.garage --dataset muraro
Replace muraro with yan, pollen, or cbmc for the built-in datasets, or your own registered dataset name.
2. Customise hyper-parameters
Edit config.py → GARAGE_DEFAULTS before running:
GARAGE_DEFAULTS = {
"gan_total_iters": 30001, # Train longer
"leakage_fraction": 0.3, # More GAT seeds
"g_lr": 0.0001, # Slower generator
"d_lr": 0.0002, # Slower discriminator
# ... other params unchanged
}
3. Monitor training
Console output shows loss every 1000 iterations:
iter 0: D_loss=1.231, G_loss=0.823
iter 1000: D_loss=0.752, G_loss=1.412
iter 5000: D_loss=0.611, G_loss=1.658
...
G_loss increasing slightly is expected early on as the generator learns.
D_loss oscillating (> 1.0 swings) indicates training instability — reduce learning rates.
G_loss diverging (→ ∞) — reduce
d_lror increasend_steps.
4. Running on multiple datasets
for d in yan pollen cbmc muraro; do
python -m data_generation.garage --dataset $d
done
Output
Generated data is saved to data/gen_data/<dataset>_data_mixdata_iter3_top_426.csv.
Training losses are saved to results/losses_<dataset>.csv.
Troubleshooting
Symptom |
Fix |
|---|---|
CUDA OOM |
Reduce batch size/leakage fraction, or set |
Fast GAT convergence (loss = 0 after 100 iters) |
GAT may be overfitting — reduce |
Slow GAN convergence |
Increase |
“File exists” error on output |
Delete the existing |