Multimodal configuration
After running the main conversion pipeline you can populate the required YAML parameters file to run the multimodal integration pipeline.
Parameters file
The parameters file looks like this:
outdir: "/path/to/output/"
url: http://localhost:3000/
project: my_project
title: "My Project"
data:
-
dataset: scrnaseq
obs_type: cell
anndata: /path/to/main/output/scrnaseq-anndata.zarr
offset: 0
is_spatial: false
vitessce_options:
spatial:
xy: obsm/spatial
mappings:
obsm/X_umap: [0,1]
matrix: X
-
dataset: visium
obs_type: spot
anndata: /path/to/main/output/visium-anndata.zarr
offset: 1000000
is_spatial: true
raw_image: /path/to/main/output/visium-raw.zarr
label_image: /path/to/main/output/visium-label.zarr
vitessce_options:
spatial:
xy: obsm/spatial
matrix: X
In contrast to the main conversion pipeline’s parameters file, this file includes a single project to which multiple datasets belong.
Each dataset
block defines the name of the dataset and paths to the converted data and image files (if any).
Each dataset
also requires a set of vitessce_options
that specify the location of certain data (spatial coordinates, embeddings, expression matrix, etc.) within the AnnData object that is processed/generated.
This follows the same structure as in the main pipeline’s vitessce_options.
Additionally, each dataset
requires:
obs_type
, a string indicating the type of observation of the dataset. For example, “cell” or “spot”.offset
, an integer offset to add to the dataset’s ID’s so they don’t clash with the other datasets.is_spatial
, a boolean indicating whether the dataset contains spatial information and has associated image files (raw and/or label images)
Given that raw images are only read but not modified the pipeline does not generate new output from them.
In order for the output directory (defined by outdir
) to contain all necessary files that need to be served for the web application to consume,
by default, the pipeline copies the raw images to the output directory (unless a file with the same name already exists in the output directory).
This process can take a long time depending on the size of the image.
You may want to manually copy or move the image or serve it from its own directory separate from the rest of the output.
The default copying can be disabled by setting copy_raw: false
as a project-wide parameter (at the same level as outdir
, project
, etc).
For example,
outdir: "/path/to/output/"
url: http://localhost:3000/
project: my_project
title: "My Project"
copy_raw: false
With additional features
Using the above example parameters file to run the multimodal integration pipeline will run the reindexing and intersection steps. To perform the concatenation of additional features (like celltypes) to visualise them as continuous values, some extra parameters need to be added.
As a project-wide parameter (at the same level as outdir
, project
, etc.):
extend_feature_name
, the name of the additional feature. For example, “celltype”
And at a dataset
level:
extend_feature
, the location of the additional feature information. This can be either the path to a cell2location output file, or the location within the AnnData object where the feature is stored as a categorical withinobs
. For example,/path/to/c2l.h5ad
containing predicted continuous values, orobs/celltype
containing categoricals.
The full parameters file will then look like this
outdir: "/path/to/output/"
url: http://localhost:3000/
project: my_project
title: "My Project"
extend_feature_name: celltype
data:
-
dataset: scrnaseq
obs_type: cell
anndata: /path/to/main/output/scrnaseq-anndata.zarr
extend_feature: obs/celltype
offset: 0
is_spatial: false
vitessce_options:
spatial:
xy: obsm/spatial
mappings:
obsm/X_umap: [0,1]
matrix: X
-
dataset: visium
obs_type: spot
anndata: /path/to/main/output/visium-anndata.zarr
extend_feature: /path/to/c2l.h5ad
offset: 1000000
is_spatial: true
raw_image: /path/to/main/output/visium-raw.zarr
label_image: /path/to/main/output/visium-label.zarr
vitessce_options:
spatial:
xy: obsm/spatial
matrix: X
With these parameters the multimodal integration pipeline will concatenate the expression matrix with the additional feature values so both can be queried and visualised across datasets within the same portal.
In the case of providing a cell2location output file, you can further configure extend_feature
with arguments for how the file should be processed.
Instead of only setting the path to the file you would need to define extend_feature
as a map containing path
and optional args
.
extend_feature_name: celltype
data:
-
dataset: visium
obs_type: spot
anndata: /path/to/main/output/visium-anndata.zarr
extend_feature:
path: /path/to/c2l.h5ad
args:
sample: ["library_id", "sample_1"] # tuple containing the obs column name and value to filter the object. By default the object is not filtered.
q: "q05_cell_abundance_w_sf" # matrix in obsm to use. Defaults to "q05_cell_abundance_w_sf".
sort_index: "index_column" # column in the AnnData object that contains an index that matches the index in cell2location.
sort: True # can be set to False to skip ordering the cell2location matrix but observations might not match in order between files. Defaults to True.
For example, sample
can be used when a cell2location output file contains predictions for multiple samples.
Setting sample
to filter the output file enables the pipeline to obtain the appropriate prediction matrix for the data being processed,
without having to split the cell2location output file for each sample. Otherwise, if a file with multiple sample prediction is input
it will not match the number of observations of the AnnData object and the process will throw an error.
q
can be set to use a different prediction matrix from the cell2location output file.
It defaults to "q05_cell_abundance_w_sf"
sort
and sort_index
can be used to define how a cell2location output file matches the AnnData object.
By default the pipeline will try to ensure the order of observations between the prediction matrix and AnnData object match
so values are correctly concatenated.
The pipeline will attempt to order the prediction matrix given the index of the AnnData object
(or the original index if the main pipeline re-indexed it).
However you can override the observations column of the AnnData object that contains the index that the prediction matrix should match using sort_index
.
sort
can be set to False
to disable any re-ordering. If disabled, the prediction matrix would be concatenated as-is into the AnnData object
without checking if observations’ IDs match.