ml-finetuning/stable-diffusion/readme.md

## Steps

* Upscale `marseys` with swin2sr ([colab](https://github.com/mv-lab/swin2sr#demos)) and place the output in `upscaled`
* Run `preprocess_training_data.py` to create the training sets and `metadata.jsonl` files.
* Run the training script.
* Modify `upload_to_huggingface.py` and run it.


## Environment setup
```sh
sudo apt -y install fonts-dejavu-core ttf-mscorefonts-installer ffmpeg libsm6 libxext6 fonts-roboto build-essential

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && bash Miniconda3-latest-Linux-x86_64.sh

conda install pytorch torchvision pytorch-cuda=11.6 -c pytorch -c nvidia
conda install xformers -c xformers/label/dev
pip install --upgrade git+https://github.com/huggingface/diffusers.git transformers accelerate scipy datasets
```


## Dreambooth

Use `train_dreambooth.py`, which has been modified to use the filename as a caption (`metadata.jsonl` isn't used.)

```sh
export MODEL_NAME="stabilityai/stable-diffusion-2"
export INSTANCE_DIR="./training-colors"
export OUTPUT_DIR="./dreambooth"
  
  accelerate launch train_dreambooth.py \
  --mixed_precision="fp16" \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --train_text_encoder \
  --instance_data_dir=$INSTANCE_DIR \
  --output_dir=$OUTPUT_DIR \
  --instance_prompt="N/A" \
  --resolution=768 \
  --train_batch_size=4 \
  --gradient_accumulation_steps=1 \
  --learning_rate=2e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=1000
```
 
 
## Normal fine-tuning

Use the [training script in diffusers](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py).

 ```sh
export MODEL_NAME="stability-ai/stable-diffusion-2"
export DATASET_NAME="training-colors"

accelerate launch --mixed_precision="fp16"  train_text_to_image.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --dataset_name=$DATASET_NAME \
  --use_ema \
  --resolution=768 \
  --center_crop \
  --random_flip \
  --train_batch_size=9 \
  --gradient_accumulation_steps=1 \
  --max_train_steps=10000 \
  --learning_rate=1.5e-06 \
  --max_grad_norm=1 \
  --lr_scheduler="constant" --lr_warmup_steps=0 \
  --output_dir="marsey" 
 ```


## Notes
* If training on 512x512, change 768 -> 512 on the commands below and in `preprocess_training_data.py`
* Colored backgrounds worked best for direct fine-tuning and white backgrounds worked best for Dreambooth.
* Batch sizes are configured for an A6000 with 48GB VRAM and would need to be lowered if running on a smaller GPU.
. 2022-12-08 04:26:34 +00:00			`## Steps`

			* Upscale `marseys` with swin2sr ([colab](https://github.com/mv-lab/swin2sr#demos)) and place the output in `upscaled`
			* Run `preprocess_training_data.py` to create the training sets and `metadata.jsonl` files.
			`* Run the training script.`
			* Modify `upload_to_huggingface.py` and run it.


			`## Environment setup`
			```sh
			`sudo apt -y install fonts-dejavu-core ttf-mscorefonts-installer ffmpeg libsm6 libxext6 fonts-roboto build-essential`

			`wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && bash Miniconda3-latest-Linux-x86_64.sh`

			`conda install pytorch torchvision pytorch-cuda=11.6 -c pytorch -c nvidia`
			`conda install xformers -c xformers/label/dev`
			`pip install --upgrade git+https://github.com/huggingface/diffusers.git transformers accelerate scipy datasets`
			```


			`## Dreambooth`

			Use `train_dreambooth.py`, which has been modified to use the filename as a caption (`metadata.jsonl` isn't used.)

			```sh
			`export MODEL_NAME="stabilityai/stable-diffusion-2"`
			`export INSTANCE_DIR="./training-colors"`
			`export OUTPUT_DIR="./dreambooth"`

			`accelerate launch train_dreambooth.py \`
			`--mixed_precision="fp16" \`
			`--pretrained_model_name_or_path=$MODEL_NAME \`
			`--train_text_encoder \`
			`--instance_data_dir=$INSTANCE_DIR \`
			`--output_dir=$OUTPUT_DIR \`
			`--instance_prompt="N/A" \`
			`--resolution=768 \`
			`--train_batch_size=4 \`
			`--gradient_accumulation_steps=1 \`
			`--learning_rate=2e-6 \`
			`--lr_scheduler="constant" \`
			`--lr_warmup_steps=0 \`
			`--max_train_steps=1000`
			```


			`## Normal fine-tuning`

			`Use the [training script in diffusers](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py).`

			```sh
			`export MODEL_NAME="stability-ai/stable-diffusion-2"`
			`export DATASET_NAME="training-colors"`

			`accelerate launch --mixed_precision="fp16" train_text_to_image.py \`
			`--pretrained_model_name_or_path=$MODEL_NAME \`
			`--dataset_name=$DATASET_NAME \`
			`--use_ema \`
			`--resolution=768 \`
			`--center_crop \`
			`--random_flip \`
			`--train_batch_size=9 \`
			`--gradient_accumulation_steps=1 \`
			`--max_train_steps=10000 \`
			`--learning_rate=1.5e-06 \`
			`--max_grad_norm=1 \`
			`--lr_scheduler="constant" --lr_warmup_steps=0 \`
			`--output_dir="marsey"`
			```


			`## Notes`
			* If training on 512x512, change 768 -> 512 on the commands below and in `preprocess_training_data.py`
			`* Colored backgrounds worked best for direct fine-tuning and white backgrounds worked best for Dreambooth.`
			`* Batch sizes are configured for an A6000 with 48GB VRAM and would need to be lowered if running on a smaller GPU.`