76 lines
2.5 KiB
Markdown
76 lines
2.5 KiB
Markdown
|
## Steps
|
||
|
|
||
|
* Upscale `marseys` with swin2sr ([colab](https://github.com/mv-lab/swin2sr#demos)) and place the output in `upscaled`
|
||
|
* Run `preprocess_training_data.py` to create the training sets and `metadata.jsonl` files.
|
||
|
* Run the training script.
|
||
|
* Modify `upload_to_huggingface.py` and run it.
|
||
|
|
||
|
|
||
|
## Environment setup
|
||
|
```sh
|
||
|
sudo apt -y install fonts-dejavu-core ttf-mscorefonts-installer ffmpeg libsm6 libxext6 fonts-roboto build-essential
|
||
|
|
||
|
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && bash Miniconda3-latest-Linux-x86_64.sh
|
||
|
|
||
|
conda install pytorch torchvision pytorch-cuda=11.6 -c pytorch -c nvidia
|
||
|
conda install xformers -c xformers/label/dev
|
||
|
pip install --upgrade git+https://github.com/huggingface/diffusers.git transformers accelerate scipy datasets
|
||
|
```
|
||
|
|
||
|
|
||
|
## Dreambooth
|
||
|
|
||
|
Use `train_dreambooth.py`, which has been modified to use the filename as a caption (`metadata.jsonl` isn't used.)
|
||
|
|
||
|
```sh
|
||
|
export MODEL_NAME="stabilityai/stable-diffusion-2"
|
||
|
export INSTANCE_DIR="./training-colors"
|
||
|
export OUTPUT_DIR="./dreambooth"
|
||
|
|
||
|
accelerate launch train_dreambooth.py \
|
||
|
--mixed_precision="fp16" \
|
||
|
--pretrained_model_name_or_path=$MODEL_NAME \
|
||
|
--train_text_encoder \
|
||
|
--instance_data_dir=$INSTANCE_DIR \
|
||
|
--output_dir=$OUTPUT_DIR \
|
||
|
--instance_prompt="N/A" \
|
||
|
--resolution=768 \
|
||
|
--train_batch_size=4 \
|
||
|
--gradient_accumulation_steps=1 \
|
||
|
--learning_rate=2e-6 \
|
||
|
--lr_scheduler="constant" \
|
||
|
--lr_warmup_steps=0 \
|
||
|
--max_train_steps=1000
|
||
|
```
|
||
|
|
||
|
|
||
|
## Normal fine-tuning
|
||
|
|
||
|
Use the [training script in diffusers](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py).
|
||
|
|
||
|
```sh
|
||
|
export MODEL_NAME="stability-ai/stable-diffusion-2"
|
||
|
export DATASET_NAME="training-colors"
|
||
|
|
||
|
accelerate launch --mixed_precision="fp16" train_text_to_image.py \
|
||
|
--pretrained_model_name_or_path=$MODEL_NAME \
|
||
|
--dataset_name=$DATASET_NAME \
|
||
|
--use_ema \
|
||
|
--resolution=768 \
|
||
|
--center_crop \
|
||
|
--random_flip \
|
||
|
--train_batch_size=9 \
|
||
|
--gradient_accumulation_steps=1 \
|
||
|
--max_train_steps=10000 \
|
||
|
--learning_rate=1.5e-06 \
|
||
|
--max_grad_norm=1 \
|
||
|
--lr_scheduler="constant" --lr_warmup_steps=0 \
|
||
|
--output_dir="marsey"
|
||
|
```
|
||
|
|
||
|
|
||
|
## Notes
|
||
|
* If training on 512x512, change 768 -> 512 on the commands below and in `preprocess_training_data.py`
|
||
|
* Colored backgrounds worked best for direct fine-tuning and white backgrounds worked best for Dreambooth.
|
||
|
* Batch sizes are configured for an A6000 with 48GB VRAM and would need to be lowered if running on a smaller GPU.
|