## Steps * Upscale `marseys` with swin2sr ([colab](https://github.com/mv-lab/swin2sr#demos)) and place the output in `upscaled` * Run `preprocess_training_data.py` to create the training sets and `metadata.jsonl` files. * Run the training script. * Modify `upload_to_huggingface.py` and run it. ## Environment setup ```sh sudo apt -y install fonts-dejavu-core ttf-mscorefonts-installer ffmpeg libsm6 libxext6 fonts-roboto build-essential wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && bash Miniconda3-latest-Linux-x86_64.sh conda install pytorch torchvision pytorch-cuda=11.6 -c pytorch -c nvidia conda install xformers -c xformers/label/dev pip install --upgrade git+https://github.com/huggingface/diffusers.git transformers accelerate scipy datasets ``` ## Dreambooth Use `train_dreambooth.py`, which has been modified to use the filename as a caption (`metadata.jsonl` isn't used.) ```sh export MODEL_NAME="stabilityai/stable-diffusion-2" export INSTANCE_DIR="./training-colors" export OUTPUT_DIR="./dreambooth" accelerate launch train_dreambooth.py \ --mixed_precision="fp16" \ --pretrained_model_name_or_path=$MODEL_NAME \ --train_text_encoder \ --instance_data_dir=$INSTANCE_DIR \ --output_dir=$OUTPUT_DIR \ --instance_prompt="N/A" \ --resolution=768 \ --train_batch_size=4 \ --gradient_accumulation_steps=1 \ --learning_rate=2e-6 \ --lr_scheduler="constant" \ --lr_warmup_steps=0 \ --max_train_steps=1000 ``` ## Normal fine-tuning Use the [training script in diffusers](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py). ```sh export MODEL_NAME="stability-ai/stable-diffusion-2" export DATASET_NAME="training-colors" accelerate launch --mixed_precision="fp16" train_text_to_image.py \ --pretrained_model_name_or_path=$MODEL_NAME \ --dataset_name=$DATASET_NAME \ --use_ema \ --resolution=768 \ --center_crop \ --random_flip \ --train_batch_size=9 \ --gradient_accumulation_steps=1 \ --max_train_steps=10000 \ --learning_rate=1.5e-06 \ --max_grad_norm=1 \ --lr_scheduler="constant" --lr_warmup_steps=0 \ --output_dir="marsey" ``` ## Notes * If training on 512x512, change 768 -> 512 on the commands below and in `preprocess_training_data.py` * Colored backgrounds worked best for direct fine-tuning and white backgrounds worked best for Dreambooth. * Batch sizes are configured for an A6000 with 48GB VRAM and would need to be lowered if running on a smaller GPU.