Welcome to TorchDiff

TorchDiff is an open-source PyTorch library for diffusion models like DDPM, DDIM, SDE, and LDM, with unCLIP and Kandinsky under development. Designed for researchers, it enables image synthesis and text-to-image generation.

Dive into TorchDiff’s Resources

Learn how to build and experiment with diffusion models using our comprehensive API documentation. Visit the GitHub repository to access source code, submit contributions, or connect with the TorchDiff community.

Photo by Gantavya Bhatt on Unsplash

Models

Explore our collection of diffusion models.

TorchDiff is a comprehensive library offering a suite of advanced diffusion models for generative tasks, particularly image synthesis. Built on PyTorch, it provides modular implementations of foundational and cutting-edge models, enabling both unconditional and conditional generation with text prompts. From classic denoising approaches to efficient latent-space techniques, TorchDiff empowers researchers and developers to explore and innovate in generative AI with flexible, high-performance tools.

DDPM (completed)

Denoising Diffusion Probabilistic Models (DDPM), pioneered by Ho et al. in 2020, form the cornerstone of diffusion-based generative modeling. This approach generates high-fidelity images by iteratively reversing a noise-adding process. DDPM employs a Markov chain to gradually corrupt data in the forward process and learns to reconstruct it in the reverse process. TorchDiff's implementation supports both unconditional image generation and text-conditioned synthesis, offering modular components for customizable training and sampling.

DDIM (completed)

Denoising Diffusion Implicit Models (DDIM), introduced by Song et al., enhance DDPM by accelerating the image generation process. By using a non-Markovian process, DDIM reduces the number of denoising steps while preserving image quality. This makes it ideal for efficient inference. TorchDiff's DDIM implementation integrates seamlessly with the same noise prediction architecture as DDPM, supporting both text-guided and unconditional generation with optimized performance.

SDE (completed)

Score-Based Generative Models through Stochastic Differential Equations (SDE), developed by Song et al., provide a robust framework for modeling diffusion using continuous-time processes. This approach allows flexible noise manipulation and generation. TorchDiff includes four SDE variants: Variance Exploding (VE), Variance Preserving (VP), sub-Variance Preserving (sub-VP), and a deterministic ODE solver, enabling both conditional and unconditional generation with adaptable noise schedules.

LDM (completed)

Latent Diffusion Models (LDM), proposed by Rombach et al., revolutionize diffusion by operating in a compressed latent space, significantly reducing computational demands. Using a variational autoencoder (VAE), LDM encodes images into compact representations and applies diffusion processes there, decoding results back to pixel space. TorchDiff's LDM implementation supports DDPM, DDIM, or SDE as backbones, with training enhanced by perceptual and adversarial losses for superior image quality.

unCLIP (under development)

unCLIP, a key component of OpenAI’s DALL-E 2, is a text-to-image diffusion model leveraging CLIP embeddings for robust text-image alignment, as detailed in "Hierarchical Text-Conditional Image Generation with CLIP Latents" by Ramesh et al. (2022). It operates in two stages: a prior generates CLIP image embeddings from text prompts, and a diffusion-based decoder transforms these into high-quality images. TorchDiff’s upcoming unCLIP implementation will enable text-guided image synthesis in pixel space, offering flexibility for creative applications.

Kandinsky (under development)

Kandinsky, developed by Sber AI, is a text-to-image diffusion model combining latent diffusion with an image prior, as described in "Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion" by Razzhigaev et al. (2023). Kandinsky 2.0 uses a prior to map text to image embeddings, followed by latent diffusion for efficient, high-quality image generation. TorchDiff’s forthcoming Kandinsky implementation will support text-guided generation with enhanced text understanding, leveraging its modular architecture for seamless integration.

Examples

Learn how to train and sample from diffusion models using TorchDiff.

For each diffusion model in TorchDiff, we provide a Jupyter Notebook demonstrating data preparation, training, and sampling processes. Explore the following models:

Journal

Articles on TorchDiff

Loghman Samani

Latent Diffusion Models in TorchDiff

Latent Diffusion Models (LDMs), introduced by Rombach et al. (2022), boost efficiency by performing diffusion in a compressed latent space using a variational autoencoder (VAE). This article explores LDMs’ two-stage process—perceptual and semantic compression—and their implementation in TorchDiff for scalable image synthesis with text prompts, featuring practical training and sampling examples. Read More

Loghman Samani

Score-Based Modeling with SDEs in TorchDiff

This article dives into score-based generative modeling using stochastic differential equations (SDEs), covering Variance Exploding, Variance Preserving, and sub-VP variants, plus deterministic ODEs. It explains their connection to diffusion models and demonstrates their implementation in TorchDiff with MNIST-based examples for training and sampling. Read More

Loghman Samani

TorchDiff: A Library for Diffusion Models

TorchDiff is an open-source PyTorch library for diffusion models like DDPM, DDIM, SDE, and LDM. This article outlines their theoretical foundations, focusing on DDPM, and showcases their implementation with practical examples. It invites contributions to enhance the library’s development. Read More

About TorchDiff

A Python library for diffusion models

TorchDiff is an open-source PyTorch library designed to empower researchers and developers with state-of-the-art diffusion models for generative tasks. Built for flexibility and ease of use, it supports Denoising Diffusion Probabilistic Models (DDPM), Denoising Diffusion Implicit Models (DDIM), Score-Based Generative Models (SDE), and Latent Diffusion Models (LDM), with unCLIP and Kandinsky under active development. TorchDiff enables efficient image synthesis, text-to-image generation, and applications in fields like biology and medicine, such as protein structure prediction. With comprehensive documentation and practical examples, it's a gateway to advancing generative AI research and innovation.

About the Developer

Loghman Samani

Loghman Samani TorchDiff Developer

As a computational biologist passionate about structural biology, I, Loghman Samani, began exploring diffusion models in 2025 to unlock their potential in predicting protein structures and designing molecules for biology and medicine. Starting with a GitHub repository to study and implement original papers, I created TorchDiff, an open-source PyTorch library featuring DDPM, DDIM, SDE, and LDM, with unCLIP and Kandinsky in development. My goal is to make these powerful models accessible, fostering innovation at the intersection of AI and life sciences.

Contact

Join the TorchDiff community

TorchDiff is an open-source PyTorch library for diffusion models, built to advance generative AI research. Whether you're a developer, researcher, or enthusiast, we welcome your contributions to enhance models like DDPM, DDIM, SDE, LDM, and upcoming unCLIP and Kandinsky. Have ideas or feedback? Connect with me, Loghman Samani, on social media to collaborate and shape the future of TorchDiff!

Contribute to TorchDiff