Random Conditioning with Distillation for Data-Efficient Diffusion Model Compression

Diffusion models generate high-quality images through progressive denoising but are computationally intensive due to large model sizes and repeated sampling. Knowledge distillation—transferring knowledge from a complex teacher to a simpler student model—has been widely studied in recognition tasks, particularly for transferring concepts unseen during student training. However, its application to diffusion models remains underexplored, especially in enabling student models to generate concepts not covered by the training images. In this work, we propose Random Conditioning, a novel approach that pairs noised images with randomly selected text conditions to enable efficient, image-free knowledge distillation. By leveraging this technique, we show that the student can generate concepts unseen in the training images. When applied to conditional diffusion model distillation, our method allows the student to explore the condition space without generating condition-specific images, resulting in notable improvements in both generation quality and efficiency. This promotes resource-efficient deployment of generative diffusion models, broadening their accessibility for both research and real-world applications.

The overall process of the Random Conditioning. Given a large set of text prompts, we generate images using the teacher model only for a small subset, significantly reducing the image generation cost. These images are then used to create noised inputs for knowledge distillation. During training, we pair these noised images with either their original text prompts or randomly selected ones from the full set. This allows the student model to receive supervision across a much broader condition space than covered by the available image-text pairs. In doing so, Random Conditioning enables image-free, data-efficient distillation of conditional diffusion models, allowing the student to generalize beyond the limited set of generated images.

Conditioning behavior across timesteps. The figure above shows how conditioning affects the denoising process across timesteps: Generated results conditioned on the rightmost column using the input image from the leftmost column at each timestep. We observe that when $t$ is small, the generated image reflects the original input, while at large $t$, it follows the conditioning text. This supports our random conditioning strategy, where the condition does not need to align with the noised input $\bx_t$.

Knowledge transfer of unseen concepts. Models are trained without animal images. “Rand Cond” denotes random conditioning, and “Additional Texts” indicates extra text-only data. The gray row shows teacher performance. Random conditioning enables the student to handle unseen concepts.

Impact of random conditioning. “Rand Cond” indicates whether random conditioning is applied, and “Real image” shows whether real images were used during training. The gray row shows teacher performance. Random conditioning leads to better performance of the student model.

Qualitative results on unseen concepts. Images generated by models distilled without any animal images, using animal-related captions from DiffusionDB.

Qualitative comparison with baseline models. Images generated by our model and baseline models, using captions from DiffusionDB as prompts.

BibTeX

@inproceedings{kim2025randcond,
      title={Random Conditioning with Distillation for Data-Efficient Diffusion Model Compression},
      author={Kim, Dohyun and Park, Sehwan and Han, Geonhee and Kim, Seung Wook and Seo, Paul Hongsuck},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      year={2025}
    }

Random Conditioning with Distillation for Data-Efficient Diffusion Model Compression

Abstract

Method

Observations and Motivation

Quantitative Results

Qualitative Results

BibTeX