
前面我们介绍了visium空转反卷积常用方法RCTD,这里介绍另外一种基于R的反卷积方法-SPOTlight,也是论文中的常客。
SPOTlight是一款基于scRNA参考的反卷积工具,解析空转spots细胞组成。该工具最初是为10xVisium空间转录组技术开发的,但也适用于所有会返回细胞混合信号的技术平台。SPOTlight基于NMFreg模型,通过学习每种细胞类型的主题特征,并确定哪种细胞类型组合最适合对目标spot进行去卷积。以下是其工作流程。

Github:https://github.com/MarcElosua/SPOTlight?tab=readme-ov-file
Paper:Elosua-Bayes M, Nieto P, Mereu E, Gut I, Heyn H (2021): SPOTlight: seeded NMF regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes. Nucleic Acids Res 49(9):e50. doi:10.1093/nar/gkab043.
安装包:我使用github安装最新版的时候,要求R>=4.5.0。可以升级R或者使用BiocManager安装之前版本。我这里演示使用的是最新版本!
#install.packages("devtools")
#library(devtools)
#install_github("https://github.com/MarcElosua/SPOTlight")
#或者:
#install.packages("BiocManager")
#BiocManager::install("SPOTlight")
library(ggplot2)
library(SPOTlight)
library(SingleCellExperiment)
library(SpatialExperiment)
library(scater)
library(scran)
packageVersion("SPOTlight")## [1] '1.13.2'加载spatial & scRNA数据,还是之前的子宫内膜数据。
setwd('./')
#scRNA
load('./sce_edm_sub.RData')
library(Seurat)
DimPlot(sce_edm_sub,label = T)+NoLegend()
#STRNA
load('./merged_STvisium.RData')
SpatialDimPlot(merged,ncol = 2)
因为scRNA数据来源于不同时期的子宫内膜,有着不同分组(CTRL,RIF),而且空转visium也是两种分组的数据,为了反卷积结果更准确,减少批次效应的影响,所以这里的演示将两组数据分开进行(需要注意,在演示RCTD的时候没有分开,效果不佳)。分别提取scRNA的RIF组以及正常组作为参考数据集。
library(dplyr)
scRNA_CTR <- sce_edm_sub[, sce_edm_sub$orig.ident %in% c("LH11","LH3","LH5","LH7","LH9")]
scRNA_RIF <- sce_edm_sub[, sce_edm_sub$orig.ident %in% c("RIF1","RIF10","RIF2","RIF3","RIF4","RIF5","RIF6","RIF7","RIF8","RIF9")]这一步是准备数据用于后续的反卷积分析,如果数据集(scRNA)非常大,作者建议对数据下采样(downsample )进行模型训练,包括细胞数量和基因数量的下采样。保留每个cluster中具有代表性的细胞数量,以及最具生物学意义的基因。原文中作者展示将每种细胞类型的细胞数量下采样至约100个,并不会影响模型性能。对于每种细胞类型包含超过100个细胞,仅能带来微小的性能提升,却会显著增加计算时间和资源消耗。此外,将基因集限制为每种细胞类型的marker gene,以及最多3000个高变异基因,可以进一步优化模型性能并节省计算资源。
所以首先第一步,需要计算marker gene。如果提供的数据是seurat obj,可以使用FindAllMarkers计算marker gene,但是要选择具有代表性的基因。计算marker基因的时候,可以先排除线粒体、核糖体等基因。这里我们演示官网提供的更通用的方式。seurat obj要转为SingleCellExperiment object。
scRNA_CTR_SCE <- as.SingleCellExperiment(scRNA_CTR)#seurat转为SingleCellExperimentgenes <- !grepl(pattern = "^RP[L|S]|MT", x = rownames(scRNA_CTR_SCE))#挑选非核糖体,线粒体基因
colLabels(scRNA_CTR_SCE) <- colData(scRNA_CTR_SCE)$manual_cellAnno#将celltype注释设置为SCE的colLabels
mgs <- scoreMarkers(scRNA_CTR_SCE, subset.row = genes)# Compute marker genes#筛选每种celltype marker gene
mgs_fil <- lapply(names(mgs), function(i) {
x <- mgs[[i]]
# Filter and keep relevant marker genes, those with AUC > 0.8
x <- x[x$mean.AUC > 0.8, ]
# Sort the genes from highest to lowest weight
x <- x[order(x$mean.AUC, decreasing = TRUE), ]
# Add gene and cluster id to the dataframe
x$gene <- rownames(x)
x$cluster <- i
data.frame(x)
})
mgs_df <- do.call(rbind, mgs_fil)
head(mgs_df)## self.average other.average self.detected other.detected mean.logFC.cohen
## COL3A1 4.075212 0.14169668 0.9979046 0.15368741 7.405822
## COL1A1 3.549336 0.07342670 0.9962927 0.08755446 7.377472
## COL6A1 1.987737 0.03920294 0.9854932 0.05037966 4.674203
## PDGFRA 1.931132 0.01285208 0.9813024 0.01820181 4.034837
## RORB 2.582072 0.02777240 0.9819471 0.03285990 3.789204
## DCN 2.325909 0.02079197 0.9804965 0.02410455 4.742480
## min.logFC.cohen median.logFC.cohen max.logFC.cohen rank.logFC.cohen
## COL3A1 5.975909 7.633704 8.460611 1
## COL1A1 6.911168 7.379465 7.919610 1
## COL6A1 3.853348 4.806405 5.024184 4
## PDGFRA 3.894631 4.036985 4.124132 5
## RORB 3.550067 3.826762 3.886360 7
## DCN 4.498384 4.766425 4.910453 3
## mean.AUC min.AUC median.AUC max.AUC rank.AUC mean.logFC.detected
## COL3A1 0.9985645 0.9973781 0.9986886 0.9988953 1 2.921145
## COL1A1 0.9977911 0.9970438 0.9978777 0.9980847 2 3.707111
## COL6A1 0.9898699 0.9814609 0.9914031 0.9927466 4 5.028842
## PDGFRA 0.9892501 0.9851022 0.9896019 0.9906512 6 6.033951
## RORB 0.9890848 0.9845755 0.9900949 0.9908750 5 5.154691
## DCN 0.9888443 0.9841881 0.9892254 0.9902482 4 5.241212
## min.logFC.detected median.logFC.detected max.logFC.detected
## COL3A1 1.491836 2.543605 4.524720
## COL1A1 2.627083 3.292976 5.259360
## COL6A1 2.126444 4.883105 8.413607
## PDGFRA 3.342890 5.996942 8.407477
## RORB 3.311984 4.668452 6.823460
## DCN 4.243140 5.004194 6.713357
## rank.logFC.detected gene cluster
## COL3A1 1127 COL3A1 Stromal
## COL1A1 685 COL1A1 Stromal
## COL6A1 1 COL6A1 Stromal
## PDGFRA 4 PDGFRA Stromal
## RORB 101 RORB Stromal
## DCN 2 DCN Stromaltable(mgs_df$cluster)##
## Bcell Ciliated endothelial Mast Myeloid Neutrophil
## 39 803 156 117 85 60
## NK Stromal Tcell unCiliated
## 117 284 29 345#计算高变基因
dec <- modelGeneVar(scRNA_CTR_SCE, subset.row = genes)
hvg <- getTopHVGs(dec, n = 3000)# Get the top 3000 genes.# split cell indices by identity
idx <- split(seq(ncol(scRNA_CTR_SCE)), scRNA_CTR_SCE$manual_cellAnno)
#
n_cells <- 100
cs_keep <- lapply(idx, function(i) {
n <- length(i)
if (n < n_cells)
n_cells <- n
sample(i, n_cells)
})
scRNA_CTR_SCE <- scRNA_CTR_SCE[, unlist(cs_keep)]反卷积进行了两步,一个是训练模型-NMF分解,一步是反卷积spots细胞组成!
#我们的数据是多样本合并数据,这里仅演示一个样本
CTR3 <- subset(merged,orig.ident=='CTR3')
CTR3_counts <- GetAssayData(CTR3,assay = 'RNA',layer = 'counts')CTR_res <- SPOTlight(
x = scRNA_CTR_SCE, #scRNA数据,可以是matrix,也可以是SingleCellExperiment
y = CTR3_counts,#ST数据,可以是matrix,也可以是SpatialExperiment
groups = as.character(scRNA_CTR_SCE$manual_cellAnno), #celltype字符
mgs = mgs_df,#marker gene dataframe,需要包含group labels and the weight等列
hvg = hvg,#高变基因
weight_id = "mean.AUC",#mgs_df mean.AUC 列
group_id = "cluster",#mgs_df celltype cluster列
gene_id = "gene")#mgs_df marker gene列## Scaling count matrix## Seeding NMF model...## Training NMF model...##
## iter | tol
## ---------------
## 1 | 4.24e-01
## 2 | 1.46e-02
## 3 | 3.90e-03
## 4 | 1.39e-03
## 5 | 5.34e-04
## 6 | 2.36e-04
## 7 | 1.10e-04
## 8 | 5.50e-05
## 9 | 2.89e-05
## 10 | 1.59e-05
## 11 | 9.08e-06## Time for training: 0min## Removing genes in mixture matrix that are all 0s## Keep intersection of genes between W and mixture matrix## Deconvoluting mixture data...前面的反卷积是一步法,也可以将SPOTlight分两步运行,这样可以获得训练好的模型。拥有训练好的模型后,你可以将其用于其他也想用同一参考进行去卷积的数据集。这样就可以跳过训练步骤,而训练步骤是最耗时且计算量最大的部分。
mod_ls <- trainNMF(
x = sce,
groups = sce$free_annotation,
mgs = mgs_df,
hvg = hvg,
weight_id = "mean.AUC",
group_id = "cluster",
gene_id = "gene")
# Run deconvolution
res <- runDeconvolution(
x = spe,
mod = mod_ls[["mod"]],
ref = mod_ls[["topic"]])评估每个主题特征对每种细胞身份的特异性。理想情况下,每种细胞身份都会对应一个独特的主题概况。
plotTopicProfiles(
x = CTR_res$NMF,
y = scRNA_CTR_SCE$manual_cellAnno,
facet = FALSE,
min_prop = 0.01,
ncol = 1) +theme(aspect.ratio = 1)
查看来自每种celltype的cellid是否都具有一致的topic profile,这意味着SPOTlight为同一细胞类型的所有细胞学习到了稳定且一致的特征签名。在这里例子中,meyloid效果明显很差,主要是没有对其进行细分。
plotTopicProfiles(
x = CTR_res$NMF,
y = scRNA_CTR_SCE$manual_cellAnno,
facet = TRUE,
min_prop = 0.01,
ncol = 3)
查看预测结果celltype相关性。理想的效果肯定是每种celltype其自身相关性较高,与其他celltype显著区分,比如这里的B、T等,这些celltype特征明显,能很好的区分出来,像其他细胞就有较大概率的混合了。
plotCorrelationMatrix(CTR_res$mat,insig = 'pch',outline.color='black')
饼图展示spots中每种celltype的比例:
#提取反卷积细胞比例矩阵,可以做一下筛洗,比如小于某个阈值的设置为0
mat = CTR_res$mat
ct <- colnames(mat)
mat[mat < 0.1] <- 0
#定义celltype颜色
paletteMartin <- c("#FFD92F","#4DAF4A","#FCCDE5","#D9D9D9","#377EB8","#7FC97F","#BEAED4","#FDC086","#FFFF99","#386CB0","#F0027F","#BF5B17","#666666","#1B9E77","#D95F02","#7570B3","#E7298A","#66A61E","#E6AB02","#A6761D")
pal <- colorRampPalette(paletteMartin)(length(ct))
names(pal) <- ct
#plot
plotSpatialScatterpie(
x = GetTissueCoordinates(CTR3)[,1:2], #spots坐标文件,两列的矩阵或者datafram,也可以是SpatialExperiment,因为是seurat obj,就没必要麻烦转化了,直接提取坐标就可以了。
y = mat,#反卷积结果矩阵
cell_types = colnames(mat),
img = FALSE,
scatterpie_alpha = 1,
pie_scale = 0.4) +
scale_fill_manual(
values = pal,
breaks = names(pal))
将结果添加到seurat obj:需要注意,行名barcode要和原来的metadat一致。
# add results back to Seurat object
CTR3 <- AddMetaData(CTR3, metadata = mat)SpatialFeaturePlot(CTR3,
features = colnames(mat),
ncol = 3,
images = 'CTR3',
stroke=0.2,
pt.size.factor=2)&
theme_bw()&
theme(axis.text = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank())
就这个结果与之前RCTD对比了一下,还是比较相似的!END!