Previously Stable Diffusion has great impact on image generation with supported unconditional and conditional generation. Especially, the conditional generation enpowers users to create specific types of "arts", manifesting a great bussiness potential to be discovered. Follow the work of Stable Diffusion, this repo shows a method to generate realistic texture defect with few defect samples, in our extreme case, only one sample is available.
(Code would be updated later)
From left to right: defect sample, defect mask, non-defect image, generated defect image
Inspired by the text2img and inpainting applicaions from Stable Diffusion, my propose method follows:
- Mask out the defect area and concat the defect mask as input, following the inpainting training schedule
- Use the CLIP image encoder to extract features from defect area
- Project the defect features to proper dimension and inject to the cross-attention in Latent Diffusion Model
- Freeze the Encoder&Decoder(
AutoencoderKL
I use) - Train the stable diffusion