How-to: Biological Validation
A recipe for verifying that GARAGE preserves biological signal for rare cell types.
Goal
Test whether GAT-high-attention cells are enriched for rare cell types and express known marker genes.
Prerequisites
GARAGE run completed on CBMC (or a dataset with defined marker genes).
biological_analysis/biological_validation.py— ships with the repo.
Steps
1. Run the main validation
python biological_analysis/biological_validation.py
2. Run marker gene clustering
python biological_analysis/marker_gene_clustering.py
3. Examine the output files
# Check attention weights
head -20 results/attention_weights.csv
# Check marker gene expression
cat results/biological_validation.csv | column -s, -t | head
# Check per-cell-type positive rates
cat results/positive_rate_per_celltype.csv | column -s, -t | head
4. Key metrics to verify
From the console output or CSV files:
Metric |
Good sign |
Warning sign |
|---|---|---|
Rare-type enrichment (HIGH vs. LOW) |
observed/expected > 2.0, p < 0.01 |
observed/expected ≈ 1.0 |
Marker gene log₂FC |
> 1.0 (2× higher in HIGH cells) |
< 0.5 |
Wilcoxon p-value |
< 0.01 |
> 0.05 (not significant) |
Positive rate (cell-type level) |
HIGH > ALL |
HIGH ≈ ALL |