forked from satijalab/seurat
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathseurat5_bpcells_interaction_vignette.Rmd
212 lines (167 loc) · 8.48 KB
/
seurat5_bpcells_interaction_vignette.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
---
title: "Using BPCells with Seurat Objects"
output:
html_document:
theme: united
df_print: kable
pdf_document: default
date: 'Compiled: `r format(Sys.Date(), "%B %d, %Y")`'
---
***
```{r setup, include=FALSE}
all_times <- list() # store the time for each chunk
knitr::knit_hooks$set(time_it = local({
now <- NULL
function(before, options) {
if (before) {
now <<- Sys.time()
} else {
res <- difftime(Sys.time(), now, units = "secs")
all_times[[options$label]] <<- res
}
}
}))
knitr::opts_chunk$set(
tidy = 'styler',
warning = FALSE,
error = TRUE,
message = FALSE,
fig.width = 8,
time_it = TRUE,
cache = TRUE
)
```
BPCells is an [R package](https://github.com/bnprks/BPCells) that allows for computationally efficient single-cell analysis. It utilizes bit-packing compression to store counts matrices on disk and C++ code to cache operations.
We leverage the high performance capabilities of BPCells to work with Seurat objects in memory while accessing the counts on disk. In this vignette, we show how to use BPCells to load data, work with a Seurat objects in a more memory-efficient way, and write out Seurat objects with BPCells matrices.
We will show the methods for interacting with both a single dataset in one file or multiple datasets across multiple files using BPCells. BPCells allows us to easily analyze these large datasets in memory, and we encourage users to check out some of our other vignettes [here]() and [here]() to see further applications.
```{r, eval=FALSE}
devtools::install_github("bnprks/BPCells")
```
```{r install, message = FALSE, warning = FALSE}
library(BPCells)
library(Seurat)
library(SeuratObject)
library(SeuratDisk)
library(Azimuth)
```
We use BPCells functionality to both load in our data and write the counts layers to bitpacked compressed binary files on disk to improve computation speeds. BPCells has multiple functions for reading in files.
# Load Data
## Load Data from one h5 file
In this section, we will load a dataset of mouse brain cells freely available from 10x Genomics. This includes 1.3 Million single cells that were sequenced on the Illumina NovaSeq 6000. The raw data can be found [here](https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.3.0/1M_neurons?).
To read in the file, we will use open_matrix_10x_hdf5, a BPCells function written to read in feature matrices from 10x. We then write a matrix directory, load the matrix, and create a Seurat object.
```{r}
brain.data <- open_matrix_10x_hdf5(
path = "/brahms/hartmana/vignette_data/1M_neurons_filtered_gene_bc_matrices_h5.h5")
# Write the matrix to a directory
write_matrix_dir(
mat = brain.data,
dir = '/brahms/hartmana/vignette_data/bpcells/brain_counts')
# Now that we have the matrix on disk, we can load it
brain.mat <- open_matrix_dir(dir = "/brahms/hartmana/vignette_data/bpcells/brain_counts")
brain.mat <- Azimuth:::ConvertEnsembleToSymbol(mat = brain.mat, species = "mouse")
# Create Seurat Object
brain <- CreateSeuratObject(counts = brain.mat)
```
<details>
<summary>**What if I already have a Seurat Object?**</summary>
You can use BPCells to convert the matrices in your already created Seurat objects to on-disk matrices. Note, that this is only possible for V5 assays. As an example, if you'd like to convert the counts matrix of your RNA assay to a BPCells matrix, you can use the following:
```{r, message=FALSE, warning=FALSE, eval=FALSE}
obj <- readRDS("/path/to/reference.rds")
# Write the counts layer to a directory
write_matrix_dir(mat = obj[["RNA"]]$counts, dir = '/brahms/hartmana/vignette_data/bpcells/brain_counts')
counts.mat <- open_matrix_dir(dir = "/brahms/hartmana/vignette_data/bpcells/brain_counts")
obj[["RNA"]]$counts <- counts.mat
```
</details>
### Example Analsyis
Once this conversion is done, you can perform typical Seurat functions on the object. For example, we can normalize data and visualize features by automatically accessing the on-disk counts.
```{r}
VlnPlot(brain, features = c("Sox10", "Slc17a7", "Aif1"), ncol = 3, layer = "counts", alpha = 0.1)
# We then normalize and visualize again
brain <- NormalizeData(brain, normalization.method = "LogNormalize")
VlnPlot(brain, features = c("Sox10", "Slc17a7", "Aif1"), ncol = 3, layer = "data", alpha = 0.1)
```
### Saving Seurat objects with on-disk layers
If you save your object and load it in in the future, Seurat will access the on-disk matrices by their path, which is stored in the assay level data. To make it easy to ensure these are saved in the same place, we provide new functionality to the SaveSeuratRds function. In this function, you specify your filename. The pointer to the path in the Seurat object will change to the current directory.
This also makes it easy to share your Seurat objects with BPCells matrices by sharing a folder that contains both the object and the BPCells directory.
```{r}
SaveSeuratRds(
object = brain,
file = "obj.Rds")
```
If needed, a layer with an on-disk matrix can be converted to an in-memory matrix using the `as()` function. For the purposes of this demo, we'll subset the object so that it takes up less space in memory.
```{r}
brain <- subset(brain, downsample = 1000)
brain[["RNA"]]$counts <- as(object = brain[["RNA"]]$counts, Class = "dgCMatrix")
```
## Load data from multiple h5ad files
You can also download data from multiple matrices. In this section, we create a Seurat object using multiple peripheral blood mononuclear cell (PBMC) samples that are freely available for downlaod from CZI [here](https://cellxgene.cziscience.com/collections). We download data from [Ahern et al. (2022) Nature](https://cellxgene.cziscience.com/collections/8f126edf-5405-4731-8374-b5ce11f53e82), [Jin et al. (2021) Science](https://cellxgene.cziscience.com/collections/b9fc3d70-5a72-4479-a046-c2cc1ab19efc), and [Yoshida et al. (2022) Nature](https://cellxgene.cziscience.com/collections/03f821b4-87be-4ff4-b65a-b5fc00061da7). We use the BPCells function to read h5ad files.
```{r, warning=FALSE}
file.dir <- "/brahms/hartmana/vignette_data/h5ad_files/"
files.set <- c("ahern_pbmc.h5ad", "jin_pbmc.h5ad", "yoshida_pbmc.h5ad")
# Loop through h5ad files and output BPCells matrices on-disk
data.list <- c()
metadata.list <- c()
for (i in 1:length(files.set)) {
path <- paste0(file.dir, files.set[i])
data <- open_matrix_anndata_hdf5(path)
write_matrix_dir(
mat = data,
dir = paste0(gsub(".h5ad", "", path), "_BP"),
overwrite = TRUE
)
# Load in BP matrices
mat <- open_matrix_dir(dir = paste0(gsub(".h5ad", "", path), "_BP"))
mat <- Azimuth:::ConvertEnsembleToSymbol(mat = mat, species = "human")
# Get metadata
metadata.list[[i]] <- LoadH5ADobs(path = path)
data.list[[i]] <- mat
}
# Name layers
names(data.list) <- c("ahern", "jin", "yoshida")
# Add Metadata
for (i in 1:length(metadata.list)){
metadata.list[[i]]$publication <- names(data.list)[i]
}
metadata.list <- lapply(metadata.list, function(x) {
x <- x[, c("publication", "sex", "cell_type", "donor_id", "disease")]
return(x)
})
metadata <- Reduce(rbind, metadata.list)
```
When we create the Seurat object with the list of matrices from each publication, we can then see that multiple counts layers exist that represent each dataset. This object contains over a million cells, yet only takes up minimal space in memory!
```{r}
merged.object <- CreateSeuratObject(counts = data.list, meta.data = metadata)
merged.object
```
```{r save_merged, eval=FALSE}
SaveSeuratRds(
object = merged.object,
file = "obj.Rds")
```
## Parse Biosciences
Here, we show how to load a 1 million cell data set from Parse Biosciences and create a Seurat Object. The data is available for download [here](https://support.parsebiosciences.com/hc/en-us/articles/7704577188500-How-to-analyze-a-1-million-cell-data-set-using-Scanpy-and-Harmony)
```{r}
parse.data <- open_matrix_anndata_hdf5(
"/brahms/hartmana/vignette_data/h5ad_files/ParseBio_PBMC.h5ad")
```
```{r, eval=FALSE}
write_matrix_dir(mat = parse.data, dir = "/brahms/hartmana/vignette_data/bpcells/parse_1m_pbmc")
```
```{r}
parse.mat <- open_matrix_dir(dir = "/brahms/hartmana/vignette_data/bpcells/parse_1m_pbmc")
metadata <- readRDS("/brahms/hartmana/vignette_data/ParseBio_PBMC_meta.rds")
metadata$disease <- sapply(strsplit(x = metadata$sample, split = "_"), "[", 1)
parse.object <- CreateSeuratObject(counts = parse.mat, meta.data = metadata)
```
```{r save_parse, eval=FALSE}
SaveSeuratRds(
object = parse.object,
file = "obj.Rds")
```
<details>
<summary>**Session Info**</summary>
```{r}
sessionInfo()
```
</details>