Interactive and resumable training¶
Most of the time, you will be training models through the GUI or using the sleap-nn train CLI.
If you'd like to customize the training process, however, you can use sleap-nn's low-level training functionality interactively. This allows you to define scripts that train models according to your own workflow, for example, to resume training on an already trained model. Another possible application would be to train a model using transfer learning, where a pretrained model can be used to initialize the weights of the new model.
In this notebook we will explore how to set up a training job and train a model for multiple rounds without the GUI or CLI.
1. Setup¶
Run this cell first to install sleap-nn. If you get a dependency error in subsequent cells, just click Runtime → Restart runtime to reload the packages.
Don't forget to set Runtime → Change runtime type → GPU as the accelerator.
!pip install -qqq "sleap-nn[torch-cpu]"
# if you have GPU (in colab, enable GPU runtime)
# !pip install -qqq "sleap-nn[torch-cuda-128]"
zsh:1: command not found: pip
Import SLEAP to make sure it installed correctly and print out some information about the system:
import sleap_nn
sleap_nn.__version__
'0.0.1'
2. Setup training data¶
Here we will download an existing training dataset package. This is an .slp file that contains both the labeled poses, as well as the image data for labeled frames.
If running on Google Colab, you'll want to replace this with mounting your Google Drive folder containing your own data, or if running locally, simply change the path to your labels below in TRAINING_SLP_FILE.
# !curl -L --output labels.pkg.slp https://www.dropbox.com/s/b990gxjt3d3j3jh/210205.sleap_wt_gold.13pt.pkg.slp?dl=1
!curl -L --output labels.pkg.slp https://storage.googleapis.com/sleap-data/datasets/wt_gold.13pt/tracking_split2/train.pkg.slp
!ls -lah
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 619M 100 619M 0 0 52.1M 0 0:00:11 0:00:11 --:--:-- 57.3M.9M
total 1283584
drwxr-xr-x@ 14 divyasesh staff 448B Sep 24 19:47 .
drwxr-xr-x@ 15 divyasesh staff 480B Sep 24 17:10 ..
-rw-r--r--@ 1 divyasesh staff 713K Sep 22 10:30 Analysis_examples.ipynb
-rw-r--r--@ 1 divyasesh staff 462K Sep 22 12:05 Data_structures.ipynb
-rw-r--r--@ 1 divyasesh staff 175K Sep 22 12:07 Interactive_and_realtime_inference.ipynb
-rw-r--r--@ 1 divyasesh staff 62K Sep 24 19:47 Interactive_and_resumable_training.ipynb
-rw-r--r--@ 1 divyasesh staff 159K Sep 23 16:54 Model_evaluation.ipynb
-rw-r--r--@ 1 divyasesh staff 127K Sep 22 12:06 Post_inference_tracking.ipynb
-rw-r--r--@ 1 divyasesh staff 471K Sep 19 19:36 SLEAP_Tutorial_at_Cosyne_2024_Using_exported_data.ipynb
-rw-r--r--@ 1 divyasesh staff 95K Sep 24 00:13 Training_and_inference_on_an_example_dataset.ipynb
-rw-r--r--@ 1 divyasesh staff 12K Sep 23 12:19 Training_and_inference_using_Google_Drive.ipynb
-rw-r--r--@ 1 divyasesh staff 619M Sep 24 19:47 labels.pkg.slp
-rw-r--r--@ 1 divyasesh staff 3.4K Sep 22 11:01 notebooks-overview.md
-rw-r--r--@ 1 divyasesh staff 16K Sep 19 19:36 sleap_io_idtracker_IDs.ipynb
TRAINING_SLP_FILE = "labels.pkg.slp"
3. Setup training job¶
A sleap-nn TrainingJobConfig is a structure that contains all of the hyperparameters needed to train a SLEAP model. This is typically saved out to initial_config.yaml and training_config.yaml in the model folder so that training runs can be reproduced if needed, as well as to store metadata necessary for inference.
Normally, these are generated interactively by the GUI, or manually by editing an existing YAML file in a text editor. Here, we will define a configuration interactively entirely in Python.
from sleap_nn.config.training_job_config import TrainingJobConfig, verify_training_cfg
from sleap_nn.config.model_config import UNetConfig, CenteredInstanceConfMapsConfig, CenteredInstanceConfig
from sleap_nn.config.data_config import AugmentationConfig, GeometricConfig
# Initialize the default training job configuration.
cfg = TrainingJobConfig()
# Update path to training data we just downloaded.
cfg.data_config.train_labels_path = [TRAINING_SLP_FILE]
cfg.data_config.validation_fraction = 0.1
# These configures the actual neural network and the model type:
cfg.model_config.backbone_config.unet = UNetConfig(
filters=16,
output_stride=4
)
centered_head = CenteredInstanceConfig(confmaps=CenteredInstanceConfMapsConfig(anchor_part="thorax", sigma=1.5, output_stride=4))
cfg.model_config.head_configs.centered_instance = centered_head
# Preprocesssing and training parameters.
cfg.data_config.augmentation_config = AugmentationConfig(geometric=GeometricConfig(affine_p=1.0))
cfg.trainer_config.max_epochs = 5 # This is the maximum number of training rounds.
cfg.trainer_config.train_data_loader.batch_size = 4
cfg.trainer_config.val_data_loader.batch_size = 4
# Setup how we want to save the trained model.
cfg.trainer_config.save_ckpt = True
cfg.trainer_config.run_name = "baseline_model.topdown_centered_instance"
# verify config structure
cfg = verify_training_cfg(cfg)
cfg
{'data_config': {'train_labels_path': ['labels.pkg.slp'], 'val_labels_path': None, 'validation_fraction': 0.1, 'test_file_path': None, 'provider': 'LabelsReader', 'user_instances_only': True, 'data_pipeline_fw': 'torch_dataset', 'cache_img_path': None, 'use_existing_imgs': False, 'delete_cache_imgs_after_training': True, 'preprocessing': {'ensure_rgb': False, 'ensure_grayscale': False, 'max_height': None, 'max_width': None, 'scale': 1.0, 'crop_size': None, 'min_crop_size': 100}, 'use_augmentations_train': False, 'augmentation_config': {'intensity': None, 'geometric': {'rotation_min': -15.0, 'rotation_max': 15.0, 'scale_min': 0.9, 'scale_max': 1.1, 'translate_width': 0.0, 'translate_height': 0.0, 'affine_p': 1.0, 'erase_scale_min': 0.0001, 'erase_scale_max': 0.01, 'erase_ratio_min': 1.0, 'erase_ratio_max': 1.0, 'erase_p': 0.0, 'mixup_lambda_min': 0.01, 'mixup_lambda_max': 0.05, 'mixup_p': 0.0}}, 'skeletons': None}, 'model_config': {'init_weights': 'default', 'pretrained_backbone_weights': None, 'pretrained_head_weights': None, 'backbone_config': {'unet': {'in_channels': 1, 'kernel_size': 3, 'filters': 16, 'filters_rate': 1.5, 'max_stride': 16, 'stem_stride': None, 'middle_block': True, 'up_interpolate': True, 'stacks': 1, 'convs_per_block': 2, 'output_stride': 4}, 'convnext': None, 'swint': None}, 'head_configs': {'single_instance': None, 'centroid': None, 'centered_instance': {'confmaps': {'part_names': None, 'anchor_part': 'thorax', 'sigma': 1.5, 'output_stride': 4, 'loss_weight': 1.0}}, 'bottomup': None, 'multi_class_bottomup': None, 'multi_class_topdown': None}, 'total_params': None}, 'trainer_config': {'train_data_loader': {'batch_size': 4, 'shuffle': False, 'num_workers': 0}, 'val_data_loader': {'batch_size': 4, 'shuffle': False, 'num_workers': 0}, 'model_ckpt': {'save_top_k': 1, 'save_last': None}, 'trainer_devices': None, 'trainer_device_indices': None, 'trainer_accelerator': 'auto', 'profiler': None, 'trainer_strategy': 'auto', 'enable_progress_bar': True, 'min_train_steps_per_epoch': 200, 'train_steps_per_epoch': None, 'visualize_preds_during_training': False, 'keep_viz': False, 'max_epochs': 5, 'seed': None, 'use_wandb': False, 'save_ckpt': True, 'ckpt_dir': '.', 'run_name': 'baseline_model.topdown_centered_instance', 'resume_ckpt_path': None, 'wandb': {'entity': None, 'project': None, 'name': None, 'save_viz_imgs_wandb': False, 'api_key': None, 'wandb_mode': None, 'prv_runid': None, 'group': None, 'current_run_id': None}, 'optimizer_name': 'Adam', 'optimizer': {'lr': 0.001, 'amsgrad': False}, 'lr_scheduler': None, 'early_stopping': {'min_delta': 0.0, 'patience': 1, 'stop_training_on_plateau': False}, 'online_hard_keypoint_mining': {'online_mining': False, 'hard_to_easy_ratio': 2.0, 'min_hard_keypoints': 2, 'max_hard_keypoints': None, 'loss_scale': 5.0}, 'zmq': {'controller_port': None, 'controller_polling_timeout': 10, 'publish_port': None}}, 'name': '', 'description': '', 'sleap_nn_version': '0.0.1', 'filename': ''}
Existing configs can also be loaded from a .yaml file with:
from omegaconf import OmegaConf
cfg = OmegaConf.load("training_config.yaml")
4. Training¶
Next we will create a SLEAP Trainer from the configuration we just specified. This handles all the nitty gritty mechanics necessary to setup training in the backend.
from sleap_nn.training.model_trainer import ModelTrainer
trainer = ModelTrainer.get_model_trainer_from_config(cfg)
2025-09-24 20:02:09 | INFO | sleap_nn.training.model_trainer:_setup_train_val_labels:216 | Creating train-val split... 2025-09-24 20:02:09 | INFO | sleap_nn.training.model_trainer:_setup_train_val_labels:261 | # Train Labeled frames: 1440 2025-09-24 20:02:09 | INFO | sleap_nn.training.model_trainer:_setup_train_val_labels:262 | # Val Labeled frames: 160 2025-09-24 20:02:09 | INFO | sleap_nn.training.model_trainer:setup_config:512 | Setting up config...
Great, now we're ready to do the first round of training. This is when the model will actually start to improve over time:
trainer.train()
GPU available: True (mps), used: True TPU available: False, using: 0 TPU cores HPU available: False, using: 0 HPUs /Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/torch/utils/data/dataloader.py:684: UserWarning: 'pin_memory' argument is set as true but not supported on MPS now, then device pinned memory won't be used. warnings.warn(warn_msg) /Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/callbacks/model_checkpoint.py:751: Checkpoint directory /Users/divyasesh/Desktop/talmolab/sleap-core-docs/docs/notebooks/baseline_model.topdown_centered_instance exists and is not empty. | Name | Type | Params | Mode ----------------------------------------------------------------------- 0 | model | Model | 306 K | train 1 | instance_peaks_inf_layer | FindInstancePeaks | 0 | train ----------------------------------------------------------------------- 306 K Trainable params 0 Non-trainable params 306 K Total params 1.226 Total estimated model params size (MB) 66 Modules in train mode 0 Modules in eval mode
2025-09-24 20:02:18 | INFO | sleap_nn.training.model_trainer:train:849 | Setting up for training...
2025-09-24 20:02:18 | INFO | sleap_nn.training.model_trainer:_setup_model_ckpt_dir:575 | Setting up model ckpt dir: `baseline_model.topdown_centered_instance`...
2025-09-24 20:02:18 | INFO | sleap_nn.training.model_trainer:train:868 | Setting up Trainer...
2025-09-24 20:02:18 | INFO | sleap_nn.training.model_trainer:_setup_loggers_callbacks:647 | Setting up callbacks and loggers...
2025-09-24 20:02:18 | INFO | sleap_nn.training.model_trainer:train:897 | Trainer devices: auto
2025-09-24 20:02:18 | INFO | sleap_nn.training.model_trainer:train:950 | Training on 1 device(s)
2025-09-24 20:02:18 | INFO | sleap_nn.training.model_trainer:train:951 | Training on mps:0 accelerator
2025-09-24 20:02:18 | INFO | sleap_nn.training.model_trainer:train:955 | Setting up lightning module for centered_instance model...
2025-09-24 20:02:18 | INFO | sleap_nn.training.model_trainer:train:959 | Backbone model: UNet(
(encoders): ModuleList(
(0): Encoder(
(encoder_stack): ModuleList(
(0): SimpleConvBlock(
(blocks): Sequential(
(stack0_enc0_conv0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc0_act0_relu): ReLU()
(stack0_enc0_conv1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc0_act1_relu): ReLU()
)
)
(1): SimpleConvBlock(
(blocks): Sequential(
(stack0_enc1_pool): MaxPool2dWithSamePadding(kernel_size=2, stride=2, padding=same, dilation=1, ceil_mode=False)
(stack0_enc1_conv0): Conv2d(16, 24, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc1_act0_relu): ReLU()
(stack0_enc1_conv1): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc1_act1_relu): ReLU()
)
)
(2): SimpleConvBlock(
(blocks): Sequential(
(stack0_enc2_pool): MaxPool2dWithSamePadding(kernel_size=2, stride=2, padding=same, dilation=1, ceil_mode=False)
(stack0_enc2_conv0): Conv2d(24, 36, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc2_act0_relu): ReLU()
(stack0_enc2_conv1): Conv2d(36, 36, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc2_act1_relu): ReLU()
)
)
(3): SimpleConvBlock(
(blocks): Sequential(
(stack0_enc3_pool): MaxPool2dWithSamePadding(kernel_size=2, stride=2, padding=same, dilation=1, ceil_mode=False)
(stack0_enc3_conv0): Conv2d(36, 54, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc3_act0_relu): ReLU()
(stack0_enc3_conv1): Conv2d(54, 54, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc3_act1_relu): ReLU()
)
)
(4): Sequential(
(stack0_enc4_last_pool): MaxPool2dWithSamePadding(kernel_size=2, stride=2, padding=same, dilation=1, ceil_mode=False)
)
)
)
)
(decoders): ModuleList(
(0): Decoder(
(decoder_stack): ModuleList(
(0): SimpleUpsamplingBlock(
(blocks): Sequential(
(stack0_dec0_s16_to_s8_interp_bilinear): Upsample(scale_factor=2.0, mode='bilinear')
(stack0_dec0_s16_to_s8_refine_conv0): Conv2d(135, 54, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_dec0_s16_to_s8_refine_conv0_act_relu): ReLU()
(stack0_dec0_s16_to_s8_refine_conv1): Conv2d(54, 54, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_dec0_s16_to_s8_refine_conv1_act_relu): ReLU()
)
)
(1): SimpleUpsamplingBlock(
(blocks): Sequential(
(stack0_dec1_s8_to_s4_interp_bilinear): Upsample(scale_factor=2.0, mode='bilinear')
(stack0_dec1_s8_to_s4_refine_conv0): Conv2d(90, 36, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_dec1_s8_to_s4_refine_conv0_act_relu): ReLU()
(stack0_dec1_s8_to_s4_refine_conv1): Conv2d(36, 36, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_dec1_s8_to_s4_refine_conv1_act_relu): ReLU()
)
)
)
)
)
(middle_blocks): ModuleList(
(0): SimpleConvBlock(
(blocks): Sequential(
(stack0_enc5_middle_expand_conv0): Conv2d(54, 81, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc5_middle_expand_act0_relu): ReLU()
)
)
(1): SimpleConvBlock(
(blocks): Sequential(
(stack0_enc6_middle_contract_conv0): Conv2d(81, 81, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc6_middle_contract_act0_relu): ReLU()
)
)
)
)
2025-09-24 20:02:18 | INFO | sleap_nn.training.model_trainer:train:960 | Head model: ModuleList(
(0): Sequential(
(CenteredInstanceConfmapsHead): Sequential(
(0): Conv2d(36, 13, kernel_size=(1, 1), stride=(1, 1), padding=same)
(1): Identity()
)
)
)
2025-09-24 20:02:18 | INFO | sleap_nn.training.model_trainer:train:962 | Total model parameters: 306444
2025-09-24 20:02:18 | INFO | sleap_nn.training.model_trainer:train:967 | Input image shape: torch.Size([1, 1, 144, 144])
2025-09-24 20:02:18 | INFO | sleap_nn.training.model_trainer:train:1021 | Finished trainer set up. [0.1s]
2025-09-24 20:02:18 | INFO | sleap_nn.training.model_trainer:train:1024 | Starting training loop...
Sanity Checking: | | 0/? [00:00<?, ?it/s]
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('learning_rate', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('val_loss', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('val_time', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.
Training: | | 0/? [00:00<?, ?it/s]
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('head', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('thorax', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('abdomen', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('wingL', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('wingR', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('forelegL4', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('forelegR4', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('midlegL4', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('midlegR4', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('hindlegL4', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('hindlegR4', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('eyeL', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('eyeR', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('train_loss', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
Validation: | | 0/? [00:00<?, ?it/s]
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('train_time', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
Validation: | | 0/? [00:00<?, ?it/s]
Validation: | | 0/? [00:00<?, ?it/s]
Validation: | | 0/? [00:00<?, ?it/s]
Validation: | | 0/? [00:00<?, ?it/s]
`Trainer.fit` stopped: `max_epochs=5` reached.
2025-09-24 20:06:45 | INFO | sleap_nn.training.model_trainer:train:1037 | Finished training loop. [4.5 min]
5. Continuing training¶
We can continue training by setting the resume_ckpt_path to the previous ckpt with a potentially different number of epochs:
from pathlib import Path
cfg.trainer_config.max_epochs = 10 #(prv_epochs + extra epochs)
cfg.trainer_config.resume_ckpt_path = Path(cfg.trainer_config.ckpt_dir) / f"{cfg.trainer_config.run_name}" / "best.ckpt"
trainer = ModelTrainer.get_model_trainer_from_config(cfg)
print(len(trainer.train_labels))
trainer.train()
2025-09-24 20:06:45 | INFO | sleap_nn.training.model_trainer:_setup_train_val_labels:216 | Creating train-val split... 2025-09-24 20:06:46 | INFO | sleap_nn.training.model_trainer:_setup_train_val_labels:261 | # Train Labeled frames: 1440 2025-09-24 20:06:46 | INFO | sleap_nn.training.model_trainer:_setup_train_val_labels:262 | # Val Labeled frames: 160 2025-09-24 20:06:46 | INFO | sleap_nn.training.model_trainer:setup_config:512 | Setting up config... 2025-09-24 20:06:46 | INFO | sleap_nn.training.model_trainer:_setup_ckpt_path:380 | Checkpoint path already exists: baseline_model.topdown_centered_instance... adding suffix to prevent overwriting. 1 2025-09-24 20:06:46 | INFO | sleap_nn.training.model_trainer:train:849 | Setting up for training... 2025-09-24 20:06:46 | INFO | sleap_nn.training.model_trainer:_setup_model_ckpt_dir:575 | Setting up model ckpt dir: `baseline_model.topdown_centered_instance-1`...
GPU available: True (mps), used: True TPU available: False, using: 0 TPU cores HPU available: False, using: 0 HPUs /Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/callbacks/model_checkpoint.py:751: Checkpoint directory /Users/divyasesh/Desktop/talmolab/sleap-core-docs/docs/notebooks/baseline_model.topdown_centered_instance-1 exists and is not empty. Restoring states from the checkpoint path at baseline_model.topdown_centered_instance/best.ckpt /Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/callbacks/model_checkpoint.py:445: The dirpath has changed from '/Users/divyasesh/Desktop/talmolab/sleap-core-docs/docs/notebooks/baseline_model.topdown_centered_instance' to '/Users/divyasesh/Desktop/talmolab/sleap-core-docs/docs/notebooks/baseline_model.topdown_centered_instance-1', therefore `best_model_score`, `kth_best_model_path`, `kth_value`, `last_model_path` and `best_k_models` won't be reloaded. Only `best_model_path` will be reloaded. | Name | Type | Params | Mode ----------------------------------------------------------------------- 0 | model | Model | 306 K | train 1 | instance_peaks_inf_layer | FindInstancePeaks | 0 | train ----------------------------------------------------------------------- 306 K Trainable params 0 Non-trainable params 306 K Total params 1.226 Total estimated model params size (MB) 66 Modules in train mode 0 Modules in eval mode Restored all states from the checkpoint at baseline_model.topdown_centered_instance/best.ckpt
2025-09-24 20:06:46 | INFO | sleap_nn.training.model_trainer:train:868 | Setting up Trainer...
2025-09-24 20:06:46 | INFO | sleap_nn.training.model_trainer:_setup_loggers_callbacks:647 | Setting up callbacks and loggers...
2025-09-24 20:06:46 | INFO | sleap_nn.training.model_trainer:train:897 | Trainer devices: auto
2025-09-24 20:06:46 | INFO | sleap_nn.training.model_trainer:train:950 | Training on 1 device(s)
2025-09-24 20:06:46 | INFO | sleap_nn.training.model_trainer:train:951 | Training on mps:0 accelerator
2025-09-24 20:06:46 | INFO | sleap_nn.training.model_trainer:train:955 | Setting up lightning module for centered_instance model...
2025-09-24 20:06:46 | INFO | sleap_nn.training.model_trainer:train:959 | Backbone model: UNet(
(encoders): ModuleList(
(0): Encoder(
(encoder_stack): ModuleList(
(0): SimpleConvBlock(
(blocks): Sequential(
(stack0_enc0_conv0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc0_act0_relu): ReLU()
(stack0_enc0_conv1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc0_act1_relu): ReLU()
)
)
(1): SimpleConvBlock(
(blocks): Sequential(
(stack0_enc1_pool): MaxPool2dWithSamePadding(kernel_size=2, stride=2, padding=same, dilation=1, ceil_mode=False)
(stack0_enc1_conv0): Conv2d(16, 24, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc1_act0_relu): ReLU()
(stack0_enc1_conv1): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc1_act1_relu): ReLU()
)
)
(2): SimpleConvBlock(
(blocks): Sequential(
(stack0_enc2_pool): MaxPool2dWithSamePadding(kernel_size=2, stride=2, padding=same, dilation=1, ceil_mode=False)
(stack0_enc2_conv0): Conv2d(24, 36, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc2_act0_relu): ReLU()
(stack0_enc2_conv1): Conv2d(36, 36, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc2_act1_relu): ReLU()
)
)
(3): SimpleConvBlock(
(blocks): Sequential(
(stack0_enc3_pool): MaxPool2dWithSamePadding(kernel_size=2, stride=2, padding=same, dilation=1, ceil_mode=False)
(stack0_enc3_conv0): Conv2d(36, 54, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc3_act0_relu): ReLU()
(stack0_enc3_conv1): Conv2d(54, 54, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc3_act1_relu): ReLU()
)
)
(4): Sequential(
(stack0_enc4_last_pool): MaxPool2dWithSamePadding(kernel_size=2, stride=2, padding=same, dilation=1, ceil_mode=False)
)
)
)
)
(decoders): ModuleList(
(0): Decoder(
(decoder_stack): ModuleList(
(0): SimpleUpsamplingBlock(
(blocks): Sequential(
(stack0_dec0_s16_to_s8_interp_bilinear): Upsample(scale_factor=2.0, mode='bilinear')
(stack0_dec0_s16_to_s8_refine_conv0): Conv2d(135, 54, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_dec0_s16_to_s8_refine_conv0_act_relu): ReLU()
(stack0_dec0_s16_to_s8_refine_conv1): Conv2d(54, 54, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_dec0_s16_to_s8_refine_conv1_act_relu): ReLU()
)
)
(1): SimpleUpsamplingBlock(
(blocks): Sequential(
(stack0_dec1_s8_to_s4_interp_bilinear): Upsample(scale_factor=2.0, mode='bilinear')
(stack0_dec1_s8_to_s4_refine_conv0): Conv2d(90, 36, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_dec1_s8_to_s4_refine_conv0_act_relu): ReLU()
(stack0_dec1_s8_to_s4_refine_conv1): Conv2d(36, 36, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_dec1_s8_to_s4_refine_conv1_act_relu): ReLU()
)
)
)
)
)
(middle_blocks): ModuleList(
(0): SimpleConvBlock(
(blocks): Sequential(
(stack0_enc5_middle_expand_conv0): Conv2d(54, 81, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc5_middle_expand_act0_relu): ReLU()
)
)
(1): SimpleConvBlock(
(blocks): Sequential(
(stack0_enc6_middle_contract_conv0): Conv2d(81, 81, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc6_middle_contract_act0_relu): ReLU()
)
)
)
)
2025-09-24 20:06:46 | INFO | sleap_nn.training.model_trainer:train:960 | Head model: ModuleList(
(0): Sequential(
(CenteredInstanceConfmapsHead): Sequential(
(0): Conv2d(36, 13, kernel_size=(1, 1), stride=(1, 1), padding=same)
(1): Identity()
)
)
)
2025-09-24 20:06:46 | INFO | sleap_nn.training.model_trainer:train:962 | Total model parameters: 306444
2025-09-24 20:06:46 | INFO | sleap_nn.training.model_trainer:train:967 | Input image shape: torch.Size([1, 1, 144, 144])
2025-09-24 20:06:46 | INFO | sleap_nn.training.model_trainer:train:1021 | Finished trainer set up. [0.3s]
2025-09-24 20:06:46 | INFO | sleap_nn.training.model_trainer:train:1024 | Starting training loop...
Sanity Checking: | | 0/? [00:00<?, ?it/s]
Training: | | 0/? [00:00<?, ?it/s]
Validation: | | 0/? [00:00<?, ?it/s]
Validation: | | 0/? [00:00<?, ?it/s]
Validation: | | 0/? [00:00<?, ?it/s]
Validation: | | 0/? [00:00<?, ?it/s]
Validation: | | 0/? [00:00<?, ?it/s]
`Trainer.fit` stopped: `max_epochs=10` reached.
2025-09-24 20:11:35 | INFO | sleap_nn.training.model_trainer:train:1037 | Finished training loop. [4.8 min]
As you can see, the training pick up from where it left off in the previous run.
Usually, however, if you're continuing training it's likely because you're starting off from an already trained model.
In this case, all you need to do to continue training is to create a new Trainer from the existing model configuration and provide the previous ckpt paths as below:
cfg.trainer_config.resume_ckpt_path = None
cfg.trainer_config.max_epochs = 5
cfg.model_config.pretrained_backbone_weights = "baseline_model.topdown_centered_instance-1/best.ckpt"
cfg.model_config.pretrained_head_weights = "baseline_model.topdown_centered_instance-1/best.ckpt"
# Create and initialize the trainer.
trainer = ModelTrainer.get_model_trainer_from_config(cfg)
trainer.train()
# this wont resume training but will load the weights from the previous model and start training from there.
2025-09-24 20:17:58 | INFO | sleap_nn.training.model_trainer:_setup_train_val_labels:216 | Creating train-val split... 2025-09-24 20:17:58 | INFO | sleap_nn.training.model_trainer:_setup_train_val_labels:261 | # Train Labeled frames: 1440 2025-09-24 20:17:58 | INFO | sleap_nn.training.model_trainer:_setup_train_val_labels:262 | # Val Labeled frames: 160 2025-09-24 20:17:58 | INFO | sleap_nn.training.model_trainer:setup_config:512 | Setting up config...
GPU available: True (mps), used: True TPU available: False, using: 0 TPU cores HPU available: False, using: 0 HPUs /Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/torch/utils/data/dataloader.py:684: UserWarning: 'pin_memory' argument is set as true but not supported on MPS now, then device pinned memory won't be used. warnings.warn(warn_msg) /Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/callbacks/model_checkpoint.py:751: Checkpoint directory /Users/divyasesh/Desktop/talmolab/sleap-core-docs/docs/notebooks/baseline_model.topdown_centered_instance-2 exists and is not empty. | Name | Type | Params | Mode ----------------------------------------------------------------------- 0 | model | Model | 306 K | train 1 | instance_peaks_inf_layer | FindInstancePeaks | 0 | train ----------------------------------------------------------------------- 306 K Trainable params 0 Non-trainable params 306 K Total params 1.226 Total estimated model params size (MB) 66 Modules in train mode 0 Modules in eval mode
2025-09-24 20:17:58 | INFO | sleap_nn.training.model_trainer:_setup_ckpt_path:380 | Checkpoint path already exists: baseline_model.topdown_centered_instance... adding suffix to prevent overwriting.
2025-09-24 20:17:58 | INFO | sleap_nn.training.model_trainer:train:849 | Setting up for training...
2025-09-24 20:17:58 | INFO | sleap_nn.training.model_trainer:_setup_model_ckpt_dir:575 | Setting up model ckpt dir: `baseline_model.topdown_centered_instance-2`...
2025-09-24 20:17:58 | INFO | sleap_nn.training.model_trainer:train:868 | Setting up Trainer...
2025-09-24 20:17:58 | INFO | sleap_nn.training.model_trainer:_setup_loggers_callbacks:647 | Setting up callbacks and loggers...
2025-09-24 20:17:58 | INFO | sleap_nn.training.model_trainer:train:897 | Trainer devices: auto
2025-09-24 20:17:58 | INFO | sleap_nn.training.model_trainer:train:950 | Training on 1 device(s)
2025-09-24 20:17:58 | INFO | sleap_nn.training.model_trainer:train:951 | Training on mps:0 accelerator
2025-09-24 20:17:58 | INFO | sleap_nn.training.model_trainer:train:955 | Setting up lightning module for centered_instance model...
2025-09-24 20:17:58 | INFO | sleap_nn.training.lightning_modules:__init__:197 | Loading backbone weights from `baseline_model.topdown_centered_instance-1/best.ckpt` ...
2025-09-24 20:17:58 | INFO | sleap_nn.training.lightning_modules:__init__:226 | Loading head weights from `baseline_model.topdown_centered_instance-1/best.ckpt` ...
2025-09-24 20:17:58 | INFO | sleap_nn.training.model_trainer:train:959 | Backbone model: UNet(
(encoders): ModuleList(
(0): Encoder(
(encoder_stack): ModuleList(
(0): SimpleConvBlock(
(blocks): Sequential(
(stack0_enc0_conv0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc0_act0_relu): ReLU()
(stack0_enc0_conv1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc0_act1_relu): ReLU()
)
)
(1): SimpleConvBlock(
(blocks): Sequential(
(stack0_enc1_pool): MaxPool2dWithSamePadding(kernel_size=2, stride=2, padding=same, dilation=1, ceil_mode=False)
(stack0_enc1_conv0): Conv2d(16, 24, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc1_act0_relu): ReLU()
(stack0_enc1_conv1): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc1_act1_relu): ReLU()
)
)
(2): SimpleConvBlock(
(blocks): Sequential(
(stack0_enc2_pool): MaxPool2dWithSamePadding(kernel_size=2, stride=2, padding=same, dilation=1, ceil_mode=False)
(stack0_enc2_conv0): Conv2d(24, 36, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc2_act0_relu): ReLU()
(stack0_enc2_conv1): Conv2d(36, 36, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc2_act1_relu): ReLU()
)
)
(3): SimpleConvBlock(
(blocks): Sequential(
(stack0_enc3_pool): MaxPool2dWithSamePadding(kernel_size=2, stride=2, padding=same, dilation=1, ceil_mode=False)
(stack0_enc3_conv0): Conv2d(36, 54, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc3_act0_relu): ReLU()
(stack0_enc3_conv1): Conv2d(54, 54, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc3_act1_relu): ReLU()
)
)
(4): Sequential(
(stack0_enc4_last_pool): MaxPool2dWithSamePadding(kernel_size=2, stride=2, padding=same, dilation=1, ceil_mode=False)
)
)
)
)
(decoders): ModuleList(
(0): Decoder(
(decoder_stack): ModuleList(
(0): SimpleUpsamplingBlock(
(blocks): Sequential(
(stack0_dec0_s16_to_s8_interp_bilinear): Upsample(scale_factor=2.0, mode='bilinear')
(stack0_dec0_s16_to_s8_refine_conv0): Conv2d(135, 54, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_dec0_s16_to_s8_refine_conv0_act_relu): ReLU()
(stack0_dec0_s16_to_s8_refine_conv1): Conv2d(54, 54, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_dec0_s16_to_s8_refine_conv1_act_relu): ReLU()
)
)
(1): SimpleUpsamplingBlock(
(blocks): Sequential(
(stack0_dec1_s8_to_s4_interp_bilinear): Upsample(scale_factor=2.0, mode='bilinear')
(stack0_dec1_s8_to_s4_refine_conv0): Conv2d(90, 36, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_dec1_s8_to_s4_refine_conv0_act_relu): ReLU()
(stack0_dec1_s8_to_s4_refine_conv1): Conv2d(36, 36, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_dec1_s8_to_s4_refine_conv1_act_relu): ReLU()
)
)
)
)
)
(middle_blocks): ModuleList(
(0): SimpleConvBlock(
(blocks): Sequential(
(stack0_enc5_middle_expand_conv0): Conv2d(54, 81, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc5_middle_expand_act0_relu): ReLU()
)
)
(1): SimpleConvBlock(
(blocks): Sequential(
(stack0_enc6_middle_contract_conv0): Conv2d(81, 81, kernel_size=(3, 3), stride=(1, 1), padding=same)
(stack0_enc6_middle_contract_act0_relu): ReLU()
)
)
)
)
2025-09-24 20:17:58 | INFO | sleap_nn.training.model_trainer:train:960 | Head model: ModuleList(
(0): Sequential(
(CenteredInstanceConfmapsHead): Sequential(
(0): Conv2d(36, 13, kernel_size=(1, 1), stride=(1, 1), padding=same)
(1): Identity()
)
)
)
2025-09-24 20:17:58 | INFO | sleap_nn.training.model_trainer:train:962 | Total model parameters: 306444
2025-09-24 20:17:58 | INFO | sleap_nn.training.model_trainer:train:967 | Input image shape: torch.Size([1, 1, 144, 144])
2025-09-24 20:17:58 | INFO | sleap_nn.training.model_trainer:train:1021 | Finished trainer set up. [0.2s]
2025-09-24 20:17:58 | INFO | sleap_nn.training.model_trainer:train:1024 | Starting training loop...
Sanity Checking: | | 0/? [00:00<?, ?it/s]
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('learning_rate', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('val_loss', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('val_time', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.
Training: | | 0/? [00:00<?, ?it/s]
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('head', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('thorax', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('abdomen', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('wingL', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('wingR', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('forelegL4', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('forelegR4', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('midlegL4', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('midlegR4', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('hindlegL4', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('hindlegR4', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('eyeL', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('eyeR', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('train_loss', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
Validation: | | 0/? [00:00<?, ?it/s]
/Users/divyasesh/Desktop/talmolab/sleap-core-docs/.venv/lib/python3.13/site-packages/lightning/pytorch/core/module.py:520: You called `self.log('train_time', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
Validation: | | 0/? [00:00<?, ?it/s]
Validation: | | 0/? [00:00<?, ?it/s]
Validation: | | 0/? [00:00<?, ?it/s]
Validation: | | 0/? [00:00<?, ?it/s]
`Trainer.fit` stopped: `max_epochs=5` reached.
2025-09-24 20:22:14 | INFO | sleap_nn.training.model_trainer:train:1037 | Finished training loop. [4.3 min]
The resulting model can be used as usual for inference on new data.