Stable Diffusion 2.0 の実行 (Windowsローカル環境)

概要

WindowsでStable Diffusion 2.0を実行して画像生成を行います。

環境設定(インストール)については下記を参照ください。

Stable Diffusion 2.0 のインストール (Windowsローカル環境)
概要 Stability-AI/stablediffusion: High-Resolution Image Synthesis with Latent Diffusion Models WindowsでStable Diffusion

グラフィックボードは ZOTAC GAMING GeForce RTX 2060 Twin Fan (Memory 6GB) を使用して試しました。(※現時点ではこのボードでStable Diffusion 2.0を画像を生成することはできていません。)

Stable Diffusion 2.0

git clone https://huggingface.co/stabilityai/stable-diffusion-2
python scripts/txt2img.py --prompt "a girl eating curry with rice" --n_iter 1 --n_samples 1 --ckpt ./stable-diffusion-2/768-v-ema.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768

開始することはできたのですが、CUDA out of memory が発生し、画像は生成できませんでした。

D:\home\wurly\project_image_ai\stable_d2\stablediffusion>python scripts/txt2img.py --prompt "a girl eating curry with rice" --n_iter 1 --n_samples 1 --ckpt ./stable-diffusion-2/768-v-ema.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768
Global seed set to 42
Loading model from ./stable-diffusion-2/768-v-ema.ckpt
Global Step: 140000
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
LatentDiffusion: Running in v-prediction mode
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
(略)
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
DiffusionWrapper has 865.91 M params.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Creating invisible watermark encoder (see https://github.com/ShieldMnt/invisible-watermark)...
Sampling:   0%|                                                                                                                                                                                                  | 0/1 [00:00<?, ?it/s]Data shape for DDIM sampling is (1, 4, 96, 96), eta 0.0                                                                                                                                                           | 0/1 [00:00<?, ?it/s]
Running DDIM Sampling with 50 timesteps
DDIM Sampler:   0%|                                                                                                                                                                                             | 0/50 [00:07<?, ?it/s]
data:   0%|                                                                                                                                                                                                      | 0/1 [00:14<?, ?it/s]
Sampling:   0%|                                                                                                                                                                                                  | 0/1 [00:14<?, ?it/s]
Traceback (most recent call last):
  File "D:\home\wurly\project_image_ai\stable_d2\stablediffusion\scripts\txt2img.py", line 289, in <module>
    main(opt)
  File "D:\home\wurly\project_image_ai\stable_d2\stablediffusion\scripts\txt2img.py", line 248, in main
    samples, _ = sampler.sample(S=opt.steps,
  File "C:\usr\Python310\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "d:\home\wurly\project_image_ai\stable_d2\stablediffusion\ldm\models\diffusion\ddim.py", line 103, in sample
    samples, intermediates = self.ddim_sampling(conditioning, size,
  File "C:\usr\Python310\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "d:\home\wurly\project_image_ai\stable_d2\stablediffusion\ldm\models\diffusion\ddim.py", line 163, in ddim_sampling
    outs = self.p_sample_ddim(img, cond, ts, index=index, use_original_steps=ddim_use_original_steps,
  File "C:\usr\Python310\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "d:\home\wurly\project_image_ai\stable_d2\stablediffusion\ldm\models\diffusion\ddim.py", line 211, in p_sample_ddim
    model_uncond, model_t = self.model.apply_model(x_in, t_in, c_in).chunk(2)
  File "d:\home\wurly\project_image_ai\stable_d2\stablediffusion\ldm\models\diffusion\ddpm.py", line 858, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "C:\usr\Python310\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "d:\home\wurly\project_image_ai\stable_d2\stablediffusion\ldm\models\diffusion\ddpm.py", line 1329, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "C:\usr\Python310\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "d:\home\wurly\project_image_ai\stable_d2\stablediffusion\ldm\modules\diffusionmodules\openaimodel.py", line 776, in forward
    h = module(h, emb, context)
  File "C:\usr\Python310\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "d:\home\wurly\project_image_ai\stable_d2\stablediffusion\ldm\modules\diffusionmodules\openaimodel.py", line 84, in forward
    x = layer(x, context)
  File "C:\usr\Python310\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "d:\home\wurly\project_image_ai\stable_d2\stablediffusion\ldm\modules\attention.py", line 334, in forward
    x = block(x, context=context[i])
  File "C:\usr\Python310\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "d:\home\wurly\project_image_ai\stable_d2\stablediffusion\ldm\modules\attention.py", line 269, in forward
    return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
  File "d:\home\wurly\project_image_ai\stable_d2\stablediffusion\ldm\modules\diffusionmodules\util.py", line 114, in checkpoint
    return CheckpointFunction.apply(func, len(inputs), *args)
  File "d:\home\wurly\project_image_ai\stable_d2\stablediffusion\ldm\modules\diffusionmodules\util.py", line 129, in forward
    output_tensors = ctx.run_function(*ctx.input_tensors)
  File "d:\home\wurly\project_image_ai\stable_d2\stablediffusion\ldm\modules\attention.py", line 272, in _forward
    x = self.attn1(self.norm1(x), context=context if self.disable_self_attn else None) + x
  File "C:\usr\Python310\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "d:\home\wurly\project_image_ai\stable_d2\stablediffusion\ldm\modules\attention.py", line 220, in forward
    v = self.to_v(context)
  File "C:\usr\Python310\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\usr\Python310\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 5.23 GiB already allocated; 0 bytes free; 5.28 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

解像度 512×512、384×384、256×256、128×128 いずれもNG。

python scripts/txt2img.py --prompt "a girl eating curry with rice" --n_iter 1 --n_samples 1 --ckpt ./stable-diffusion-2/768-v-ema.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 512 --W 512
python scripts/txt2img.py --prompt "a girl eating curry with rice" --n_iter 1 --n_samples 1 --ckpt ./stable-diffusion-2/768-v-ema.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 384 --W 384
python scripts/txt2img.py --prompt "a girl eating curry with rice" --n_iter 1 --n_samples 1 --ckpt ./stable-diffusion-2/768-v-ema.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 256 --W 256
python scripts/txt2img.py --prompt "a girl eating curry with rice" --n_iter 1 --n_samples 1 --ckpt ./stable-diffusion-2/768-v-ema.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 128 --W 128

–ckpt で 512×512 ベースモデルを指定してみましたが、これもNG。

wget https://huggingface.co/stabilityai/stable-diffusion-2-base/resolve/main/512-base-ema.ckpt
python scripts/txt2img.py --prompt "a girl eating curry with rice" --n_iter 1 --n_samples 1 --ckpt 512-base-ema.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 128 --W 128

Stable Diffusion 2.1

再度、Stability-AI/stablediffusion: High-Resolution Image Synthesis with Latent Diffusion Models を見直していたところ、

News の December 7, 2022 の Version 2.1 のところに、

To enable fp16 (which can cause numerical instabilities with the vanilla attention module on the v2.1 model) , run your script with ATTN_PRECISION=fp16 python <thescript.py>

との記載がありました。

実績として、Stable Diffusion 1.4では、fp16 を有効にすることで Graphic Memory 6GB でも画像生成できていました。 Stable Diffusion 2.1 を使用し、fp16 を有効することで CUDA Out of memory を回避できないかを探ります。

git clone https://huggingface.co/stabilityai/stable-diffusion-2-1
set ATTN_PRECISION=fp16
python scripts/txt2img.py --prompt "a girl eating curry with rice" --n_iter 1 --n_samples 1 --ckpt ./stable-diffusion-2-1/768-v-ema.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 384 --W 384

上記の通り試してみましたが、結果的には、画像生成はできていません。

Graphic Memory がもっと大きい別のグラフィックボードを導入して試したいと思います。

参考

【Stable Diffusion Web UI】RuntimeError: CUDA out of memory.が起こった場合の対処法
Stable Diffusion Web UI を使っていてRuntimeError: CUDA out of memory.が起こった場合の対処法をまとめました。

Stable Diffusion 1.0 向けの内容のような気がします。

Stable Diffusion runtime error - how to fix CUDA out of memory error
If the Stable Diffusion runtime error is preventing you from making art, here is what you need to do. Try these tips and CUDA out of memory error will be a thin...

Stable Diffusion 1.0 向けの内容のような気がします。

Stable Diffusion 2.0のローカルインストール方法と1.5との比較
1.Stable Diffusion 2.0のローカルインストール方法と1.5との比較まとめ ・画像生成人工知能として有名になったStable DiffusionのVersion 2.0が公開 ・有料Webサービスでもまだ使えない突貫リリー

“画像一枚でもpythonが11751MiBのGPUメモリを占有していたので12GBでギリギリ動かせている状況です。”とあり、6GB では厳しそうです。

PCでStable Diffusion 2.0を動かす - Qiita
はじめに自宅サーバーを組み替えて、機械学習を試してみたい。Stable Diffusion 2.0を試してみたので記載する。環境Intel Core i3 12100F32GB DDR4…

512×512 モデルを取得する方法など参考になりました。

Copied title and URL