Overview
The native LoRA training system consists of four nodes:| Node | Category | Purpose |
|---|---|---|
| Train LoRA | training | Trains a LoRA model from latents and conditioning |
| Load LoRA Model | loaders | Applies trained LoRA weights to a model |
| Save LoRA Weights | loaders | Exports LoRA weights to a safetensors file |
| Plot Loss Graph | training | Visualizes training loss over time |
Requirements
- A GPU with sufficient VRAM (training typically requires more memory than inference)
- Latent images (encoded from your training dataset)
- Text conditioning (captions for your training images)
Basic training workflow
Prepare your dataset
Encode your training images to latents using a VAE Encode node. Create text conditioning for each image using CLIP Text Encode.
Configure the Train LoRA node
Connect your model, latents, and conditioning to the Train LoRA node. Set the training parameters:
- batch_size: Number of samples per training step (default: 1)
- steps: Total training iterations (default: 16)
- learning_rate: How quickly the model adapts (default: 0.0005)
- rank: LoRA rank - higher values capture more detail but use more memory (default: 8)
Run training
Execute the workflow. The node will output:
- lora: The trained LoRA weights
- loss_map: Training loss history
- steps: Total steps completed
Train LoRA node
The main training node that creates LoRA weights from your dataset.Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
model | MODEL | - | Base model to train the LoRA on |
latents | LATENT | - | Encoded training images |
positive | CONDITIONING | - | Text conditioning for training |
batch_size | INT | 1 | Samples per step (1-10000) |
grad_accumulation_steps | INT | 1 | Gradient accumulation steps (1-1024) |
steps | INT | 16 | Training iterations (1-100000) |
learning_rate | FLOAT | 0.0005 | Learning rate (0.0000001-1.0) |
rank | INT | 8 | LoRA rank (1-128) |
optimizer | COMBO | AdamW | Optimizer: AdamW, Adam, SGD, RMSprop |
loss_function | COMBO | MSE | Loss function: MSE, L1, Huber, SmoothL1 |
seed | INT | 0 | Random seed for reproducibility |
training_dtype | COMBO | bf16 | Training precision: bf16, fp32 |
lora_dtype | COMBO | bf16 | LoRA weight precision: bf16, fp32 |
algorithm | COMBO | lora | Training algorithm (lora, lokr, oft, etc.) |
gradient_checkpointing | BOOLEAN | true | Reduces VRAM usage during training |
checkpoint_depth | INT | 1 | Depth level for gradient checkpointing (1-5) |
offloading | BOOLEAN | false | Offload model to RAM (requires bypass mode) |
existing_lora | COMBO | [None] | Continue training from existing LoRA |
bucket_mode | BOOLEAN | false | Enable resolution bucketing for multi-resolution datasets |
bypass_mode | BOOLEAN | false | Apply adapters via hooks instead of weight modification |
Outputs
| Output | Type | Description |
|---|---|---|
lora | LORA_MODEL | Trained LoRA weights |
loss_map | LOSS_MAP | Training loss history |
steps | INT | Total training steps completed |
Load LoRA Model node
Applies trained LoRA weights to a diffusion model. Use this instead of the standard Load LoRA node when working with LoRA weights directly from the Train LoRA node.Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
model | MODEL | - | Base diffusion model |
lora | LORA_MODEL | - | Trained LoRA weights |
strength_model | FLOAT | 1.0 | LoRA strength (-100 to 100) |
bypass | BOOLEAN | false | Apply LoRA without modifying base weights |
Output
| Output | Type | Description |
|---|---|---|
model | MODEL | Model with LoRA applied |
Save LoRA Weights node
Exports trained LoRA weights to a safetensors file in your output folder.Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
lora | LORA_MODEL | - | Trained LoRA weights to save |
prefix | STRING | loras/ComfyUI_trained_lora | Output filename prefix |
steps | INT | (optional) | Training steps for filename |
{prefix}_{steps}_steps_{counter}.safetensors and placed in your ComfyUI/output/loras/ folder.
Plot Loss Graph node
Visualizes training progress by plotting loss values over training steps.Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
loss | LOSS_MAP | - | Loss history from Train LoRA |
filename_prefix | STRING | loss_graph | Output filename prefix |
Training tips
VRAM optimization
- Enable gradient_checkpointing to significantly reduce VRAM usage (enabled by default)
- Use bypass_mode when working with quantized models (FP8)
- Enable offloading to move the model to RAM during training (requires bypass_mode)
- Lower the batch_size if you encounter out-of-memory errors
Dataset preparation
- Use consistent image dimensions when possible, or enable bucket_mode for multi-resolution training
- Match the number of conditioning inputs to the number of latent images
- Quality matters more than quantity—start with 10-20 high-quality images
Training parameters
- rank: Start with 8-16 for most use cases. Higher ranks (32-64) capture more detail but may overfit
- steps: Start with 100-500 steps and monitor the loss graph
- learning_rate: The default 0.0005 works well for most cases. Lower values (0.0001) for more stable training
Continuing training
Select an existing LoRA from the existing_lora dropdown to continue training from a previously saved checkpoint. The total step count will accumulate.Supported algorithms
The algorithm parameter supports multiple weight adapter types:- lora: Standard Low-Rank Adaptation (recommended)
- lokr: LoCon with Kronecker product decomposition
- oft: Orthogonal Fine-Tuning
Example: Single-subject LoRA
A minimal workflow for training a LoRA on a specific subject:- Load your training images with Load Image
- Encode images to latents with VAE Encode
- Create captions with CLIP Text Encode (e.g., “a photo of [subject]”)
- Connect to Train LoRA with:
- steps: 200
- rank: 16
- learning_rate: 0.0001
- Save with Save LoRA Weights
- Test with Load LoRA Model connected to your inference workflow
For training on multiple images with different captions, connect multiple conditioning inputs to match your latent batch size.