Model
Model is a generic class for training and inference, which builds model from base_components and TrainingPlan.
- class DeltaTopic.nn.modelhub.BALSAM(adata_seq: anndata.AnnData, n_latent: int = 32, **model_kwargs)[source]
Bases:
BaseModelClassBayesian Latent topic analysis with Sparse Association Matrix (BALSAM).
- Parameters:
adata – AnnData object that has been registered via
setup_anndata().n_latent – Dimensionality of the latent space
**model_kwargs – Keyword args for
BALSAM_module
Examples
>>> adata = anndata.read_h5ad(path_to_anndata) >>> DeltaTopic.nn.util.setup_anndata(adata) >>> model = DeltaTopic.nn.modelhub.BALSAM(adata) >>> model.train(100)
- get_latent_representation(adata: anndata.AnnData | None = None, deterministic: bool = True, output_softmax_z: bool = True, batch_size: int = 128)
Return the latent space (topic proportions).
- Parameters:
adatas – adata registered with setup_anndata.
deterministic – If true, use the mean of the encoder instead of a stochastic sample
output_softmax_z – If true, output probability, otherwise output z (unnormalized probability).
batch_size – Minibatch size for data loading into model.
- get_parameters(save_dir=None, overwrite=False)
Save the spike and slab parameters to the specificed directory.
- Parameters:
save_dir – Save directory.
overwrite – If true, overwrite the existing files.
- save(dir_path: str, overwrite: bool = False, save_anndata: bool = False, **anndata_write_kwargs)[source]
Save model parameters to the specified directory.
- Parameters:
dir_path – Path to a directory.
overwrite – Overwrite existing data or not. If False and directory already exists at dir_path, error will be raised.
save_anndata – If True, also saves the anndata
anndata_write_kwargs – Kwargs for anndata write function
- train(max_epochs: int | None = 1000, lr: float = 0.001, use_gpu: str | int | bool | None = None, train_size: float = 0.9, validation_size: float | None = None, batch_size: int = 128, n_steps_kl_warmup: int | None = None, n_epochs_kl_warmup: int | None = None, plan_kwargs: dict | None = None, **kwargs)[source]
Trains the model using amortized variational inference.
- Parameters:
max_epochs – Number of passes through the dataset.
lr – Learning rate for optimization.
use_gpu – Use default GPU if available (if None or True), or index of GPU to use (if int), or name of GPU (if str, e.g., ‘cuda:0’), or use CPU (if False).
train_size – Size of training set in the range [0.0, 1.0].
validation_size – Size of the test set. If None, defaults to 1 - train_size. If train_size + validation_size < 1, the remaining cells belong to a test set.
batch_size – Minibatch size to use during training.
n_steps_kl_warmup – Number of training steps (minibatches) to scale weight on KL divergences from 0 to 1. Only activated when n_epochs_kl_warmup is set to None. If None, defaults to floor(0.75 * adata.n_obs).
n_epochs_kl_warmup – Number of epochs to scale weight on KL divergences from 0 to 1. Overrides n_steps_kl_warmup when both are not None.
- class DeltaTopic.nn.modelhub.DeltaTopic(adata_seq: anndata.AnnData, n_latent: int = 32, **model_kwargs)[source]
Bases:
BaseModelClassDynamically-Encoded Latent Transcriptomic pattern Analysis by Topic modelling (DeltaTopic).
- Parameters:
adata – AnnData object that has been registered via
setup_anndata().n_latent – Dimensionality of the latent space
**model_kwargs – Keyword args for
DeltaTopic_module
Examples
>>> adata= anndata.read_h5ad(path_to_anndata_spliced) >>> X_unspliced = sc.read(path_to_anndata_spliced) >>> adata.obsm["unspliced_expression"] = (X_unspliced.X.copy() >>> DeltaTopic.nn.util.setup_anndata(adata, layer="counts", unspliced_obsm_key = "unspliced_expression") >>> model = DeltaTopic.nn.modelhub.DeltaTopic(adata) >>> model.train(100)
- get_latent_representation(adata: anndata.AnnData | None = None, deterministic: bool = True, output_softmax_z: bool = True, batch_size: int = 128)
Return the latent space (topic proportions) for spliced and unspliced.
- Parameters:
adatas – List of adata_spliced and adata_unspliced.
deterministic – If true, use the mean of the encoder instead of a stochastic sample.
output_softmax_z – if true, output probability, otherwise output z.
batch_size – Minibatch size for data loading into model.
- get_parameters(save_dir=None, overwrite=False)
Save the spike and slab parameters to the specified directory.
- Parameters:
save_dir – Directory to save the parameters.
overwrite – If true, overwrite the existing parameters.
- get_reconstruction_error(adata: anndata.AnnData | None = None, batch_size: int | None = 128)
Return the reconstruction error for the data.
- Parameters:
adata – AnnData object with equivalent structure to initial AnnData. If None, defaults to the AnnData object used to initialize the model.
batch_size – Minibatch size for data loading into model.
- classmethod load(dir_path: str, adata_seq: anndata.AnnData | None = None, use_gpu: str | int | bool | None = None)[source]
Instantiate a model from the saved output.
- Parameters:
adata_seq – AnnData organized in the same way as data used to train model.
dir_path – Path to saved outputs.
use_gpu – Load model on default GPU if available (if None or True), or index of GPU to use (if int), or name of GPU (if str), or use CPU (if False).
- Return type:
Model with loaded state dictionaries.
- save(dir_path: str, overwrite: bool = False, save_anndata: bool = False, **anndata_write_kwargs)[source]
Save the state of the model.
Neither the trainer optimizer state nor the trainer history are saved.
- Parameters:
dir_path – Path to a directory.
overwrite – Overwrite existing data or not. If False and directory already exists at dir_path, error will be raised.
save_anndata – If True, also saves the anndata
anndata_write_kwargs – Kwargs for anndata write function
- train(max_epochs: int | None = 1000, lr: float = 0.001, use_gpu: str | int | bool | None = None, train_size: float = 0.9, validation_size: float | None = None, batch_size: int = 128, n_steps_kl_warmup: int | None = None, n_epochs_kl_warmup: int | None = None, plan_kwargs: dict | None = None, **kwargs)[source]
Trains the model using amortized variational inference.
- Parameters:
max_epochs – Number of passes through the dataset.
lr – Learning rate for optimization.
use_gpu – Use default GPU if available (if None or True), or index of GPU to use (if int), or name of GPU (if str, e.g., ‘cuda:0’), or use CPU (if False).
train_size – Size of training set in the range [0.0, 1.0].
validation_size – Size of the test set. If None, defaults to 1 - train_size. If train_size + validation_size < 1, the remaining cells belong to a test set.
batch_size – Minibatch size to use during training.
n_steps_kl_warmup – Number of training steps (minibatches) to scale weight on KL divergences from 0 to 1. Only activated when n_epochs_kl_warmup is set to None. If None, defaults to floor(0.75 * adata.n_obs).
n_epochs_kl_warmup – Number of epochs to scale weight on KL divergences from 0 to 1. Overrides n_steps_kl_warmup when both are not None.