Model

Model is a generic class for training and inference, which builds model from base_components and TrainingPlan.

class DeltaTopic.nn.modelhub.BALSAM(adata_seq: anndata.AnnData, n_latent: int = 32, **model_kwargs)[source]

Bases: BaseModelClass

Bayesian Latent topic analysis with Sparse Association Matrix (BALSAM).

Parameters:
  • adata – AnnData object that has been registered via setup_anndata().

  • n_latent – Dimensionality of the latent space

  • **model_kwargs – Keyword args for BALSAM_module

Examples

>>> adata = anndata.read_h5ad(path_to_anndata)
>>> DeltaTopic.nn.util.setup_anndata(adata)
>>> model = DeltaTopic.nn.modelhub.BALSAM(adata)
>>> model.train(100)
get_latent_representation(adata: anndata.AnnData | None = None, deterministic: bool = True, output_softmax_z: bool = True, batch_size: int = 128)

Return the latent space (topic proportions).

Parameters:
  • adatas – adata registered with setup_anndata.

  • deterministic – If true, use the mean of the encoder instead of a stochastic sample

  • output_softmax_z – If true, output probability, otherwise output z (unnormalized probability).

  • batch_size – Minibatch size for data loading into model.

get_parameters(save_dir=None, overwrite=False)

Save the spike and slab parameters to the specificed directory.

Parameters:
  • save_dir – Save directory.

  • overwrite – If true, overwrite the existing files.

save(dir_path: str, overwrite: bool = False, save_anndata: bool = False, **anndata_write_kwargs)[source]

Save model parameters to the specified directory.

Parameters:
  • dir_path – Path to a directory.

  • overwrite – Overwrite existing data or not. If False and directory already exists at dir_path, error will be raised.

  • save_anndata – If True, also saves the anndata

  • anndata_write_kwargs – Kwargs for anndata write function

train(max_epochs: int | None = 1000, lr: float = 0.001, use_gpu: str | int | bool | None = None, train_size: float = 0.9, validation_size: float | None = None, batch_size: int = 128, n_steps_kl_warmup: int | None = None, n_epochs_kl_warmup: int | None = None, plan_kwargs: dict | None = None, **kwargs)[source]

Trains the model using amortized variational inference.

Parameters:
  • max_epochs – Number of passes through the dataset.

  • lr – Learning rate for optimization.

  • use_gpu – Use default GPU if available (if None or True), or index of GPU to use (if int), or name of GPU (if str, e.g., ‘cuda:0’), or use CPU (if False).

  • train_size – Size of training set in the range [0.0, 1.0].

  • validation_size – Size of the test set. If None, defaults to 1 - train_size. If train_size + validation_size < 1, the remaining cells belong to a test set.

  • batch_size – Minibatch size to use during training.

  • n_steps_kl_warmup – Number of training steps (minibatches) to scale weight on KL divergences from 0 to 1. Only activated when n_epochs_kl_warmup is set to None. If None, defaults to floor(0.75 * adata.n_obs).

  • n_epochs_kl_warmup – Number of epochs to scale weight on KL divergences from 0 to 1. Overrides n_steps_kl_warmup when both are not None.

class DeltaTopic.nn.modelhub.DeltaTopic(adata_seq: anndata.AnnData, n_latent: int = 32, **model_kwargs)[source]

Bases: BaseModelClass

Dynamically-Encoded Latent Transcriptomic pattern Analysis by Topic modelling (DeltaTopic).

Parameters:
  • adata – AnnData object that has been registered via setup_anndata().

  • n_latent – Dimensionality of the latent space

  • **model_kwargs – Keyword args for DeltaTopic_module

Examples

>>> adata= anndata.read_h5ad(path_to_anndata_spliced)
>>> X_unspliced = sc.read(path_to_anndata_spliced)
>>> adata.obsm["unspliced_expression"] = (X_unspliced.X.copy()
>>> DeltaTopic.nn.util.setup_anndata(adata, layer="counts", unspliced_obsm_key = "unspliced_expression")
>>> model = DeltaTopic.nn.modelhub.DeltaTopic(adata)
>>> model.train(100)
get_latent_representation(adata: anndata.AnnData | None = None, deterministic: bool = True, output_softmax_z: bool = True, batch_size: int = 128)

Return the latent space (topic proportions) for spliced and unspliced.

Parameters:
  • adatas – List of adata_spliced and adata_unspliced.

  • deterministic – If true, use the mean of the encoder instead of a stochastic sample.

  • output_softmax_z – if true, output probability, otherwise output z.

  • batch_size – Minibatch size for data loading into model.

get_parameters(save_dir=None, overwrite=False)

Save the spike and slab parameters to the specified directory.

Parameters:
  • save_dir – Directory to save the parameters.

  • overwrite – If true, overwrite the existing parameters.

get_reconstruction_error(adata: anndata.AnnData | None = None, batch_size: int | None = 128)

Return the reconstruction error for the data.

Parameters:
  • adata – AnnData object with equivalent structure to initial AnnData. If None, defaults to the AnnData object used to initialize the model.

  • batch_size – Minibatch size for data loading into model.

classmethod load(dir_path: str, adata_seq: anndata.AnnData | None = None, use_gpu: str | int | bool | None = None)[source]

Instantiate a model from the saved output.

Parameters:
  • adata_seq – AnnData organized in the same way as data used to train model.

  • dir_path – Path to saved outputs.

  • use_gpu – Load model on default GPU if available (if None or True), or index of GPU to use (if int), or name of GPU (if str), or use CPU (if False).

Return type:

Model with loaded state dictionaries.

save(dir_path: str, overwrite: bool = False, save_anndata: bool = False, **anndata_write_kwargs)[source]

Save the state of the model.

Neither the trainer optimizer state nor the trainer history are saved.

Parameters:
  • dir_path – Path to a directory.

  • overwrite – Overwrite existing data or not. If False and directory already exists at dir_path, error will be raised.

  • save_anndata – If True, also saves the anndata

  • anndata_write_kwargs – Kwargs for anndata write function

train(max_epochs: int | None = 1000, lr: float = 0.001, use_gpu: str | int | bool | None = None, train_size: float = 0.9, validation_size: float | None = None, batch_size: int = 128, n_steps_kl_warmup: int | None = None, n_epochs_kl_warmup: int | None = None, plan_kwargs: dict | None = None, **kwargs)[source]

Trains the model using amortized variational inference.

Parameters:
  • max_epochs – Number of passes through the dataset.

  • lr – Learning rate for optimization.

  • use_gpu – Use default GPU if available (if None or True), or index of GPU to use (if int), or name of GPU (if str, e.g., ‘cuda:0’), or use CPU (if False).

  • train_size – Size of training set in the range [0.0, 1.0].

  • validation_size – Size of the test set. If None, defaults to 1 - train_size. If train_size + validation_size < 1, the remaining cells belong to a test set.

  • batch_size – Minibatch size to use during training.

  • n_steps_kl_warmup – Number of training steps (minibatches) to scale weight on KL divergences from 0 to 1. Only activated when n_epochs_kl_warmup is set to None. If None, defaults to floor(0.75 * adata.n_obs).

  • n_epochs_kl_warmup – Number of epochs to scale weight on KL divergences from 0 to 1. Overrides n_steps_kl_warmup when both are not None.