Spatial deconvolution with linear scalability for atlas-scale data.
FlashDeconv estimates cell type proportions from spatial transcriptomics data (Visium, Visium HD, Stereo-seq). It is designed for large-scale analyses where computational efficiency is essential, while maintaining attention to low-abundance cell populations through leverage-score-based feature weighting.
Paper: Yang, C., Zhang, X. & Chen, J. FlashDeconv enables atlas-scale, multi-resolution spatial deconvolution via structure-preserving sketching. bioRxiv (2025). DOI: 10.64898/2025.12.22.696108
pip install flashdeconvFor development or additional I/O support, see Installation Options.
import scanpy as sc
import flashdeconv as fd
# Load data
adata_st = sc.read_h5ad("spatial.h5ad")
adata_ref = sc.read_h5ad("reference.h5ad")
# Deconvolve
fd.tl.deconvolve(adata_st, adata_ref, cell_type_key="cell_type")
# Results stored in adata_st.obsm["flashdeconv"]
sc.pl.spatial(adata_st, color="flashdeconv_dominant")FlashDeconv is also available as a tool in ChatSpatial, an MCP server for spatial transcriptomics — run deconvolution through natural language from any compatible client.
Spatial deconvolution methods offer different trade-offs. Probabilistic approaches like Cell2Location and RCTD provide rigorous uncertainty quantification; methods like CARD incorporate spatial structure through dense kernel matrices. FlashDeconv takes a complementary approach, prioritizing computational efficiency for million-scale datasets.
-
Linear complexity — O(N) time and memory through randomized sketching and sparse graph regularization.
-
Leverage-based feature weighting — Variance-based selection (PCA, HVG) can underweight markers of low-abundance populations. We use leverage scores from the reference SVD to identify genes that define distinct transcriptomic directions, regardless of expression magnitude.
-
Sparse spatial regularization — Graph Laplacian smoothing with O(N) complexity, avoiding the O(N²) cost of dense kernel methods.
| Spots | Time | Memory |
|---|---|---|
| 10,000 | < 1 sec | < 1 GB |
| 100,000 | ~4 sec | ~2 GB |
| 1,000,000 | ~3 min | ~21 GB |
Benchmarked on MacBook Pro M2 Max (32GB unified memory), CPU-only.
On the Spotless benchmark:
| Metric | FlashDeconv | RCTD | Cell2Location |
|---|---|---|---|
| Pearson (56 datasets) | 0.944 | 0.905 | 0.895 |
Performance varies by tissue type and experimental conditions. We recommend evaluating on data similar to your use case.
FlashDeconv solves a graph-regularized non-negative least squares problem:
minimize ½‖Y - βX‖²_F + ½λ·Tr(βᵀLβ) + ρ‖β‖₁, subject to β ≥ 0
where Y is spatial expression, X is reference signatures, L is the graph Laplacian, and β represents cell type abundances.
Pipeline:
- Select informative genes (HVG ∪ markers) and compute leverage scores
- Compress gene space via CountSketch with uniform hashing + leverage-weighted amplitudes (G → 512 dimensions)
- Construct sparse k-NN spatial graph
- Solve via block coordinate descent with spatial smoothing
fd.tl.deconvolve(
adata_st, # Spatial AnnData
adata_ref, # Reference AnnData
cell_type_key="cell_type", # Column in adata_ref.obs
key_added="flashdeconv", # Key for results
)from flashdeconv import FlashDeconv
model = FlashDeconv(
sketch_dim=512,
lambda_spatial="auto",
n_hvg=2000,
k_neighbors=6,
random_state=0,
)
proportions = model.fit_transform(Y, X, coords)| Parameter | Default | Description |
|---|---|---|
sketch_dim |
512 | Sketch dimension |
lambda_spatial |
"auto" | Spatial regularization (auto-tuned) |
rho_sparsity |
0.01 | L1 sparsity penalty (dimensionless fraction) |
n_hvg |
2000 | Highly variable genes |
n_markers_per_type |
50 | Marker genes per cell type |
spatial_method |
"knn" | Graph method: "knn", "radius", or "grid" |
k_neighbors |
6 | Spatial graph neighbors (for "knn") |
radius |
None | Neighbor radius (required for "radius") |
preprocess |
"log_cpm" | Normalization: "log_cpm", "pearson", or "raw" |
random_state |
0 | Random seed for reproducibility |
| Attribute | Description |
|---|---|
proportions_ |
Cell type proportions (N × K), sum to 1 |
beta_ |
Raw abundances (N × K) |
info_ |
Convergence statistics |
Main class for spatial deconvolution.
from flashdeconv import FlashDeconv
model = FlashDeconv(sketch_dim=512, lambda_spatial="auto", ...)| Parameter | Type | Default | Description |
|---|---|---|---|
sketch_dim |
int |
512 | Dimension of the randomized sketch space. |
lambda_spatial |
float or "auto" |
"auto" |
Spatial regularization strength. "auto" tunes based on data scale. |
rho_sparsity |
float |
0.01 | L1 sparsity penalty (dimensionless fraction, internally scaled). |
n_hvg |
int |
2000 | Number of highly variable genes to select. |
n_markers_per_type |
int |
50 | Number of marker genes per cell type. |
spatial_method |
str |
"knn" |
Graph construction: "knn", "radius", or "grid". |
k_neighbors |
int |
6 | Number of neighbors for KNN graph. |
radius |
float or None |
None |
Radius for radius-based graph (required when spatial_method="radius"). |
max_iter |
int |
100 | Maximum BCD solver iterations. |
tol |
float |
1e-4 | Convergence tolerance (relative change in beta). |
preprocess |
str |
"log_cpm" |
Preprocessing: "log_cpm", "pearson", or "raw". |
random_state |
int or None |
0 | Random seed for reproducibility. |
verbose |
bool |
False |
Whether to print progress. |
fit(Y, X, coords, cell_type_names=None)
Fit the deconvolution model.
| Parameter | Type | Description |
|---|---|---|
Y |
ndarray or sparse (N, G) | Spatial transcriptomics count matrix. |
X |
ndarray (K, G) | Reference cell type signature matrix. |
coords |
ndarray (N, 2) or (N, 3) | Spatial coordinates. |
cell_type_names |
ndarray (K,), optional | Cell type names. |
Returns self.
fit_transform(Y, X, coords, **kwargs)
Fit and return cell type proportions. Same parameters as fit(). Returns ndarray of shape (N, K).
get_cell_type_proportions() — Return normalized proportions (N, K).
get_abundances() — Return raw (unnormalized) abundances (N, K).
get_dominant_cell_type() — Return index of dominant cell type per spot (N,).
summary() — Return dict with model parameters and fit statistics.
compute_uncertainty(alpha=0.05)
Analytical uncertainty via Hessian-diagonal Laplace approximation. Returns dict with keys: entropy, residual_ss, residual_norm, var_prop, ci_lower, ci_upper, ci_half_width, cv, detection_confident, mean_ci_width.
bootstrap_uncertainty(n_bootstrap=100, max_iter_boot=20, seed=42, verbose=False)
Poisson parametric bootstrap for empirical confidence intervals. Returns dict with keys: boot_mean, boot_std, boot_ci_lower, boot_ci_upper, boot_cv, n_bootstrap.
| Attribute | Type | Description |
|---|---|---|
proportions_ |
ndarray (N, K) | Cell type proportions (sum to 1 per spot). |
beta_ |
ndarray (N, K) | Raw (unnormalized) cell type abundances. |
gene_idx_ |
ndarray | Indices of genes used for deconvolution. |
lambda_used_ |
float | Actual lambda value used (relevant when lambda_spatial="auto"). |
info_ |
dict | Optimization info: converged, n_iterations, final_objective. |
Scanpy-style entry point. Runs deconvolution and stores results in adata_st.
fd.tl.deconvolve(
adata_st, adata_ref,
cell_type_key="cell_type",
*,
sketch_dim=512, lambda_spatial="auto", rho_sparsity=0.01,
n_hvg=2000, n_markers_per_type=50,
spatial_method="knn", k_neighbors=6, radius=None,
preprocess="log_cpm",
layer_st=None, layer_ref=None,
spatial_key="spatial", key_added="flashdeconv",
random_state=0, copy=False,
)| Parameter | Type | Default | Description |
|---|---|---|---|
adata_st |
AnnData | — | Spatial transcriptomics data with coordinates in .obsm[spatial_key]. |
adata_ref |
AnnData | — | Single-cell reference with cell type labels in .obs[cell_type_key]. |
cell_type_key |
str |
"cell_type" |
Column in adata_ref.obs for cell type annotations. |
layer_st |
str or None |
None |
Layer in adata_st to use. Uses .X if None. |
layer_ref |
str or None |
None |
Layer in adata_ref to use. Uses .X if None. |
spatial_key |
str |
"spatial" |
Key in adata_st.obsm for spatial coordinates. |
key_added |
str |
"flashdeconv" |
Key for storing results. |
copy |
bool |
False |
If True, return a copy instead of modifying in-place. |
All other parameters (sketch_dim, lambda_spatial, etc.) are forwarded to FlashDeconv — see constructor parameters.
Stores in adata_st:
.obsm[key_added]— DataFrame of cell type proportions (N x K).obs[f"{key_added}_dominant"]— Dominant cell type per spot (Categorical).uns[f"{key_added}_params"]— Parameters used for deconvolution
I/O utilities for loading data from AnnData objects.
load_spatial_data(adata, layer=None, coord_key="spatial")
Extract count matrix, coordinates, and gene names from a spatial AnnData object. Looks for coordinates in adata.obsm[coord_key], then adata.obsm["X_spatial"], then adata.obs[["x", "y"]].
| Parameter | Type | Default | Description |
|---|---|---|---|
adata |
AnnData | — | Spatial transcriptomics AnnData. |
layer |
str or None |
None |
Layer to use for counts. Uses .X if None. |
coord_key |
str |
"spatial" |
Key in adata.obsm for coordinates. |
Returns (Y, coords, gene_names).
load_reference(adata_ref, cell_type_key="cell_type", layer=None, method="mean")
Aggregate single-cell reference into cell type signatures.
| Parameter | Type | Default | Description |
|---|---|---|---|
adata_ref |
AnnData | — | Single-cell reference AnnData. |
cell_type_key |
str |
"cell_type" |
Column in adata_ref.obs for cell type labels. |
layer |
str or None |
None |
Layer to use. Uses .X if None. |
method |
str |
"mean" |
Aggregation method: "mean" or "sum". |
Returns (X, cell_type_names, gene_names).
align_genes(Y, X, genes_spatial, genes_ref)
Intersect and align genes between spatial and reference data. Returns (Y_aligned, X_aligned, common_genes).
prepare_data(adata_st, adata_ref, cell_type_key="cell_type", spatial_coord_key="spatial", layer_st=None, layer_ref=None)
Convenience wrapper combining load_spatial_data, load_reference, and align_genes. Returns (Y, X, coords, cell_type_names, gene_names).
result_to_anndata(beta, adata, cell_type_names=None, key_added="flashdeconv")
Store deconvolution results in AnnData. Adds .obsm[key_added] (DataFrame) and .obs[f"{key_added}_dominant"] (Categorical).
Graph construction and evaluation metrics.
build_knn_graph(coords, k=6, include_self=False)
Build k-nearest neighbor spatial graph from coordinates.
| Parameter | Type | Default | Description |
|---|---|---|---|
coords |
ndarray (N, 2) or (N, 3) | — | Spatial coordinates. |
k |
int |
6 | Number of nearest neighbors. |
include_self |
bool |
False |
Whether to include self-loops. |
Returns scipy.sparse.csr_matrix (N, N) binary adjacency matrix.
build_radius_graph(coords, radius, include_self=False)
Build radius-based neighbor graph. Parameters same as build_knn_graph except radius: float replaces k.
coords_to_adjacency(coords, method="knn", k=6, radius=None)
Convert coordinates to adjacency matrix. Dispatches to build_knn_graph, build_radius_graph, or grid-based construction depending on method.
All evaluation functions take pred and true as ndarray of shape (N, K).
compute_rmse(pred, true, per_cell_type=False) — Root mean squared error. Returns float or ndarray (K,) if per_cell_type=True.
compute_mae(pred, true, per_cell_type=False) — Mean absolute error. Returns float or ndarray (K,).
compute_correlation(pred, true, method="pearson", per_cell_type=False) — Pearson or Spearman correlation. Returns float or ndarray (K,).
compute_jsd(pred, true, epsilon=1e-10) — Jensen-Shannon divergence per spot. Returns ndarray (N,).
evaluate_deconvolution(pred, true, cell_type_names=None) — Comprehensive evaluation returning a dict with overall metrics (RMSE, MAE, Pearson, Spearman, mean JSD) and per_cell_type breakdown.
- Spatial data: AnnData, NumPy array (N × G), or SciPy sparse matrix
- Reference: AnnData (aggregated by cell type) or NumPy array (K × G)
- Coordinates: Extracted from
adata.obsm["spatial"]or NumPy array (N × 2)
Deconvolution accuracy depends on reference quality:
| Requirement | Guideline |
|---|---|
| Cells per type | ≥ 500 recommended |
| Marker fold-change | ≥ 5× for distinguishability |
| Signature correlation | < 0.95 between types |
| No Unknown cells | Filter before deconvolution |
Critical: Always remove cells labeled "Unknown", "Unassigned", or similar. These cells act as universal signatures that absorb proportions from specific types—a fundamental property of regression-based deconvolution, not a FlashDeconv limitation.
See Reference Data Guide for details.
# Standard
pip install flashdeconv
# With AnnData support
pip install flashdeconv[io]
# Development
git clone https://github.com/cafferychen777/flashdeconv.git
cd flashdeconv && pip install -e ".[dev]"Requirements: Python ≥ 3.9, numpy, scipy, numba. Optional: scanpy, anndata.
If you use FlashDeconv in your research, please cite:
Yang, C., Zhang, X. & Chen, J. FlashDeconv enables atlas-scale, multi-resolution spatial deconvolution via structure-preserving sketching. bioRxiv (2025). DOI: 10.64898/2025.12.22.696108
@article{yang2025flashdeconv,
title={FlashDeconv enables atlas-scale, multi-resolution spatial deconvolution
via structure-preserving sketching},
author={Yang, Chen and Zhang, Xianyang and Chen, Jun},
journal={bioRxiv},
year={2025},
doi={10.64898/2025.12.22.696108}
}- Paper reproducibility code
- Reference data guide — Building quality reference signatures
- Stereo-seq guide — Platform-specific considerations
- GitHub Issues
- BSD-3-Clause License
We thank the developers of Spotless, Cell2Location, RCTD, CARD, and other deconvolution methods whose work contributed to this field.
