ml-finetuning/stable-diffusion/readme.md

2.5 KiB

Steps

  • Upscale marseys with swin2sr (colab) and place the output in upscaled
  • Run preprocess_training_data.py to create the training sets and metadata.jsonl files.
  • Run the training script.
  • Modify upload_to_huggingface.py and run it.

Environment setup

sudo apt -y install fonts-dejavu-core ttf-mscorefonts-installer ffmpeg libsm6 libxext6 fonts-roboto build-essential

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && bash Miniconda3-latest-Linux-x86_64.sh

conda install pytorch torchvision pytorch-cuda=11.6 -c pytorch -c nvidia
conda install xformers -c xformers/label/dev
pip install --upgrade git+https://github.com/huggingface/diffusers.git transformers accelerate scipy datasets

Dreambooth

Use train_dreambooth.py, which has been modified to use the filename as a caption (metadata.jsonl isn't used.)

export MODEL_NAME="stabilityai/stable-diffusion-2"
export INSTANCE_DIR="./training-colors"
export OUTPUT_DIR="./dreambooth"
  
  accelerate launch train_dreambooth.py \
  --mixed_precision="fp16" \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --train_text_encoder \
  --instance_data_dir=$INSTANCE_DIR \
  --output_dir=$OUTPUT_DIR \
  --instance_prompt="N/A" \
  --resolution=768 \
  --train_batch_size=4 \
  --gradient_accumulation_steps=1 \
  --learning_rate=2e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=1000

Normal fine-tuning

Use the training script in diffusers.

export MODEL_NAME="stability-ai/stable-diffusion-2"
export DATASET_NAME="training-colors"

accelerate launch --mixed_precision="fp16"  train_text_to_image.py \
 --pretrained_model_name_or_path=$MODEL_NAME \
 --dataset_name=$DATASET_NAME \
 --use_ema \
 --resolution=768 \
 --center_crop \
 --random_flip \
 --train_batch_size=9 \
 --gradient_accumulation_steps=1 \
 --max_train_steps=10000 \
 --learning_rate=1.5e-06 \
 --max_grad_norm=1 \
 --lr_scheduler="constant" --lr_warmup_steps=0 \
 --output_dir="marsey" 

Notes

  • If training on 512x512, change 768 -> 512 on the commands below and in preprocess_training_data.py
  • Colored backgrounds worked best for direct fine-tuning and white backgrounds worked best for Dreambooth.
  • Batch sizes are configured for an A6000 with 48GB VRAM and would need to be lowered if running on a smaller GPU.