Estimating effect sizes with a DeepDive Model#

This tutorial demonstrates how to estimate effect sizes with a DeepDive model on single-cell ATAC-seq data.


Imports#

[1]:
import scanpy as sc
import DeepDive
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
[2]:
from utils import reads_to_fragments

1. Load and preprocess the dataset#

We start with an AnnData object containing single-cell chromatin accessibility profiles. Here we use the liver sciATAC-seq3 dataset (sciatac3_liver_10k.h5ad), subset from the full dataset (dataset and preprocessing better described in the training.ipynb notebook).

[3]:
adata = sc.read_h5ad('data/sciatac3_liver_10k.h5ad')

# Filter genes detected in fewer than 1% of cells
min_cells = int(adata.shape[0] * 0.01)
sc.pp.filter_genes(adata, min_cells=min_cells)

# Ensure unique cell IDs
adata.obs_names_make_unique()

# Convert raw reads into fragments
reads_to_fragments(adata)
adata.X = adata.layers['fragments']

2. Define model and training parameters#

We specify model hyperparameters and training settings.

[4]:
n_decoders = 5
model_params = {
    'n_epochs_pretrain_ae' : 200*n_decoders,
    'n_decoders' : n_decoders,
}
train_params = {
    'max_epoch' : 300*n_decoders,
    'batch_size' : 1024,
    'shuffle' : True
}
[5]:
discrete_covriate_keys = ['sample_name', 'sex', 'batch', 'cell_type']
continuous_covriate_keys = ['day_of_pregnancy']

3. Train DeepDive#

We initialize and train the model using the training dataset.

[6]:
model = DeepDive.DeepDive(adata = adata,
                        discrete_covariate_names = discrete_covriate_keys,
                        continuous_covariate_names = continuous_covriate_keys,
                        **model_params
                       )
[7]:
model.train_model(adata, None,
                  **train_params)
Epoch Train [1500 / 1500]: 100%|██████████| 10/10 [00:00<00:00, 10.10it/s, ETA=01d:00h:21:m41s|01d:00h:21:m41s, kl_loss=1.68, recon_loss=3.15e+3]

4. Compare groups#

The DeepDive.compare_groups() function identifies accessible regions that differ between two groups independent of disentangled covariates.

In this example, we compare Hepatoblasts and Erythroblasts to identify differential chromatin accessibility patterns.

[8]:
da = DeepDive.compare_groups(model, adata, 'cell_type', groupA = 'Hepatoblasts', groupB = 'Erythroblasts')
[9]:
da.head()
[9]:
Feature groupA groupB Difference Zscore Pvalue FDR
0 chr1:565561-565842 0.808578 -1.587507 2.396085 1.704382 0.006697 0.055971
1 chr1:569259-569557 2.841677 1.127797 1.713880 1.223938 0.128054 0.535595
2 chr1:569659-569951 1.822991 -0.457494 2.280485 1.626589 0.011054 0.082341
3 chr1:713647-714744 0.733735 0.902667 -0.168932 -0.094762 0.999939 1.000000
4 chr1:762075-763248 -0.183651 -0.042263 -0.141388 -0.077697 0.999797 1.000000

This returns a DataFrame containing:

  • Difference: Effect size between the two groups

  • FDR: False discovery rate (multiple testing–corrected p-value)

We can visualize the differential results as a volcano plot.

[10]:
plt.subplots(figsize=(2,2))
sns.scatterplot(
    x=da.Difference,
    y=-np.log10(da.FDR),
    hue=da.FDR < 0.05,
    edgecolor='k',
    linewidth=0.5,
    s=10
)
plt.legend([])
plt.ylabel('-log10 FDR')
plt.xlabel('Effect size')
[10]:
Text(0.5, 0, 'Effect size')
../_images/tutorials_Effect_size_estimation_16_1.png

5. Compare groups with covariate constraints#

DeepDive can also control for confounding factors by fixing certain covariates. For example, to compare Hepatoblasts vs Erythroblasts while holding sex = female constant:

[11]:
background = {'sex':'female'}
da = DeepDive.compare_groups(model, adata, 'cell_type', groupA = 'Hepatoblasts', groupB = 'Erythroblasts', background = background)

This performs the same comparison as above but within the context of a fixed background covariate — allowing interpretation of group differences dependent the sex.

Summary#

  • DeepDive.compare_groups() quantifies differential effects between categorical groups.

  • Supports covariate-controlled comparisons via the background argument.

  • Returns a feature-level DataFrame with effect sizes and FDR-adjusted p-values.

[ ]: