BLOG

Mountain Air, Multipass

by GymDreams
Using multiple checkpoint models for in the same Stable Diffusion render.

I’ve been experimenting with a technique that was previously only available in ComfyUI, but implemented in Automatic1111 v1.6.0. I don’t know if there’s a name for this technique, and I simply refer it as Multipass myself — borrowing a 3D rendering term, where you’d apply different render layers as passes to create a final image.

Virile Motion into Virile Reality. Using multiple checkpoint models for in the same Stable Diffusion render.
Virile Motion into Virile Reality. Using multiple checkpoint models for in the same Stable Diffusion render.

The technique is simple — you start txt2img render with one checkpoint model, then switch to another checkpoint model as a second pass. Previously, in order to do that, you would have to first render the image in one and put it into img2img to denoise over it. Now you can simply do this in a single go.

For this image, I started with Virile Motion, which is a checkpoint that creates a look similar to Pixar animation films — with rich colors, smooth contours, and everything you’d come to love about these films. Then in my hires fix, I used Virile Reality Beta 2. As a photorealistic model, it finalizes the image with a photographic look, thus replacing the cartoonish face rendered by Virile Motion int the first pass.

The result is this highly photorealistic image that looks like it was rendered in a Pixar film. I haven’t posted a lot of images using this technique, though I have in fact made some images already using this technique for clients — mostly for commission works.

Why? Many clients asked for photographic finish — “as real as possible” — yet the subject matter, composition, and lighting setup that they ask for is often never seen in real life, and as such lack training data. By using checkpoint models that are trained mostly with artworks instead of photographs, I’m thus able to create the structure and composition of the content that they seek, yet be able to finish it with somthing that look as if it was photographed.

This workflow requires a lot of steps to render, which is the main reason I don’t normally do it in my own works because it would unnecessarily lengthen the time required for rendering.

The specific parameters depend highly on the models and it’s hard to summarize. I will try to do a tutorial on this in the future, though I’m not sure when I will have time to do it.

Tech

  • Stable Diffusion txt2img. Euler a.
  • virileMotion_v1 (e13908ef81)
  • virileReality_v30BETA2 (16966c5826)
  • vae-ft-mse-840000-ema-pruned.vae (235745af8d)
  • 8x_NMKD-Superscale_150000_G

Notes + FAQ

There are some specific things I want to point out about this workflow.

  • ADetailer. As a default, ADetailer will use the checkpoint from the first pass. So you need to change the checkpoint to be the same as your second pass. Otherwise, your face will be rendered with the cartoonish look from the first pass.
  • Checkpoint in memory. The default setting for Automatic1111 keeps a single model in video memory. While loading the model into memory is quick, if you read the logs you’ll see that lots of things happen when you switch models — one of which is re-calculating all the weights associated. When you replace a checkpoint, those calculations will need to happen every pass. This drastically increase the time needed to render. So you should set the number of checkpoints to keep in memory to at least 2. More if you’re using a third checkpoint for ADetailer.