brainbeacon.pipeline.cell_label_transfer.run_label_transfer_pipeline#

brainbeacon.pipeline.cell_label_transfer.run_label_transfer_pipeline(encoder_adata_list, source_adata_list, target_adata_list, bb_ckpt_path, cellplm_ckpt_path, output_dir, output_prefix, config_update=None, n_hvg=1000, cd_weight=0.02, use_hvg=True, use_batch=True, use_spatial=True, weight_mode='expression', force_tokenize=True, do_fit=True, fit_epochs=500, shuffle_each_epoch=True, slice_sample=False, enc_mod='flowformer', save_model=True, save_model_path=None, do_train_encoder=True, num_global_epochs=100, per_dataset_epochs=50, label_key='cell_label', device=None)#

Run label transfer with an encoder-training stage.

This pipeline provides an end-to-end workflow for: 1) Training an encoder using encoder_adata_list. 2) Running supervised label transfer from source_adata_list to target_adata_list.

Notes

The function uses the default training configuration from brainbeacon.configs.config_train and applies runtime overrides via config_update (recommended). Internally, it is recommended to copy the global config before updating to avoid side effects across runs.

Parameters:

encoder_adata_list (list[dict]) – Dataset specifications used for encoder training (when do_train_encoder=True).
source_adata_list (list[dict]) – Source dataset specifications that include cell labels (supervision for transfer).
target_adata_list (list[dict]) – Target dataset specifications to predict labels for.
bb_ckpt_path (str) – Path to BrainBeacon pretrained checkpoint.
cellplm_ckpt_path (str) – Path to the initial CellPLM/CellFormer checkpoint. If do_train_encoder=False, this checkpoint is used directly.
output_dir (str) – Output directory for intermediate files and results.
output_prefix (str) – Prefix used to name output files.
config_update (dict, optional) – Overrides applied on top of the default training configuration.
n_hvg (int, default 1000) – Number of HVGs to use when use_hvg=True.
cd_weight (float, default 0.02) – Cell-density token weight used by expression-weighted pooling.
use_hvg (bool, default True) – Whether to use HVG selection in tokenization/training steps.
use_batch (bool, default True) – Whether to enable batch-related options in annotation.
use_spatial (bool, default True) – Whether to enable spatial-related options in annotation.
weight_mode (str, default "expression") – Pooling mode for embedding aggregation.
force_tokenize (bool, default True) – Whether to force regeneration of intermediate tokenization outputs (project-specific).
do_fit (bool, default True) – Whether to fit/fine-tune the annotation model.
fit_epochs (int, default 500) – Number of epochs for fitting when do_fit=True.
shuffle_each_epoch (bool, default True) – Whether to shuffle samples each epoch during encoder training.
slice_sample (bool, optional) – If True, select one slice for training (project-specific behavior).
enc_mod (str, default "flowformer") – Encoder module variant.
save_model (bool, default True) – Whether to save the fitted model checkpoint.
save_model_path (str, optional) – Path to save the model checkpoint. If None, a default path is used.
do_train_encoder (bool, default True) – If True, train an encoder using encoder_adata_list before label transfer.
num_global_epochs (int, default 100) – Number of global epochs for multi-dataset encoder training.
per_dataset_epochs (int, default 50) – Number of epochs per dataset in multi-dataset encoder training.
label_key (str, default "cell_label") – Key in adata.obs used as the supervision label.
device (torch.device or str, optional) – Device to run on. If None, uses CUDA if available, else CPU.

Returns:

target_adata : anndata.AnnData or dict Predicted target AnnData (or a dict of targets) returned by the internal annotation routine.

brainbeacon.pipeline.cell_label_transfer.run_label_transfer_pipeline

Contents

brainbeacon.pipeline.cell_label_transfer.run_label_transfer_pipeline#