Quick Start

Here, you will be briefly guided through the basics of how to use BALSAM and DeltaTopic.

Data Preparation

Read your data file, for example, a h5ad file, using scanpy:

import scanpy as sc
adata = sc.read(filename_spliced)
adata_unspliced = sc.read(filename_unspliced)

OR from a numpy array:

from scipy.sparse import csr_matrix
import anndata as ad
adata = ad.AnnData(csr_matrix(X_spliced))
adata.layers["counts"] = adata.X.copy()
adata.obsm["unspliced_expression"] = csr_matrix(X_unspliced)

Register spliced and unspliced counts:

adata.layers["counts"] = adata.X.copy()
adata.obsm["unspliced_expression"] = adata_unspliced.X.copy()

Setup anndata:

from DeltaTopic.nn.util import setup_anndata
setup_anndata(adata, layer="counts", unspliced_obsm_key = "unspliced_expression")

Note

if you are training BALSAM only, you can skip the additional step to read and register unspliced counts.

Training

Import the model and train:

from DeltaTopic.nn.modelhub import DeltaTopic
model = DeltaTopic(adata, n_latent = 32)
model.train(400)

Save model states and output the latent space:

import pandas as pd
model.save(SavePATH) #"./saved_model/"
model.get_parameters(save_dir = SavePath) # spike and slab parameters
topics_np = model.get_latent_representation() # latent topic proportions
pd.DataFrame(topics_np).to_csv(SaveFILENAME)

Analysis

Finally, perform favorite analyis on the latent space and topic loading. For an example of analyis used in the paper, please refer to the Rmd files in the project repository.