Recent advances in diffusion-based generative models have shown incredible promise for zero shot image-to-image translation and editing. Most of these approaches work by combining or replacing network-specific features used in the generation of new images with those taken from the inversion of some guide image. Methods of this type are considered the current state-of-the-art in training- free approaches, but have some notable limitations: they tend to be costly in runtime and memory, and often depend on deterministic sampling that limits variation in generated results. We propose Filter-Guided Diffusion (FGD), an alternative approach that lever- ages fast filtering operations during the diffusion process to support finer control over the strength and frequencies of guidance and can work with non-deterministic samplers to produce greater variety. With its efficiency, FGD can be sampled over multiple seeds and hyperparameters in less time than a single run of other SOTA meth- ods to produce superior results based on structural and semantic metrics. We conduct extensive quantitative and qualitative experi- ments to evaluate the performance of FGD in translation tasks and also demonstrate its potential in localized editing when used with masks.
Paintings
"a painting of a cat in a red hat"
σspatial=3, σvalue=0.3, tend=15, δ=1.4, normalization: off
"a portrait of a dog"
σspatial=2, σvalue=1, tend=15, δ=1.4, normalization: off
"a painting of a dog in a wig"
σspatial=3, σvalue=0.3, tend=15, δ=1.0, normalization: off
"a painting of a bear playing the electric guitar"
σspatial=2, σvalue=1, tend=15, δ=1.2, normalization: off
"a watercolor of cats at the beach"
σspatial=2, σvalue=1, tend=15, δ=1.2, normalization: off
"a painting of a cat pouring milk"
σspatial=2, σvalue=1, tend=15, δ=1.2, normalization: off
"a painting of a dog in a blue headband"
σspatial=3, σvalue=0.3, tend=15, δ=1.0, normalization: off
"a painting of a cat in a black dress"
σspatial=3, σvalue=0.3, tend=15, δ=1.0, normalization: off
Pareidolia
"a photo of a cow"
σspatial=2, σvalue=1, tend=15, δ=1.4, normalization: on
"a photo of a bowl of fruit"
σspatial=3, σvalue=1, tend=15, δ=1.9, normalization: on
"a photo of an elephant"
σspatial=2, σvalue=1, tend=15, δ=1.6, normalization: on
"a photo of a turtle"
σspatial=2, σvalue=1, tend=15, δ=1.9, normalization: on
"a photo of a boat"
σspatial=3, σvalue=1, tend=15, δ=1.4, normalization: on
"a photo of a duck"
σspatial=2, σvalue=1, tend=15, δ=1.6, normalization: on
Food
"a photo of a pizza"
σspatial=3, σvalue=0.3, tend=15, δ=1.6, normalization: on
"a photo of a cake"
σspatial=2, σvalue=1, tend=15, δ=1.9, normalization: on
"a photo of steak"
σspatial=3, σvalue=0.3, tend=15, δ=1.6, normalization: on
"a photo of steak"
σspatial=3, σvalue=0.3, tend=15, δ=1.6, normalization: on
"a photo of meatballs"
σspatial=2, σvalue=1, tend=15, δ=1.6, normalization: on
"a photo of apartment buildings"
σspatial=2, σvalue=1, tend=15, δ=1.4, normalization: on
Landscapes
"a photo of a desert"
σspatial=2, σvalue=1, tend=15, δ=1.4, normalization: on
"a photo of a mountain"
σspatial=2, σvalue=1, tend=15, δ=1.6, normalization: on
"a photo of a sea"
σspatial=2, σvalue=1, tend=15, δ=1.6, normalization: on
"a photo of a mountain"
σspatial=2, σvalue=1, tend=15, δ=1.9, normalization: on
"a photo of an island"
σspatial=2, σvalue=1, tend=15, δ=1.4, normalization: on
"a photo of a desert"
σspatial=2, σvalue=1, tend=15, δ=1.9, normalization: on
Other Images
"a photo of a bear"
σspatial=3, σvalue=1, tend=15, δ=1.4, normalization: on
"a photo of a modern bedroom"
σspatial=2, σvalue=1, tend=15, δ=1.6, normalization: on
"a statue of a golden fish"
σspatial=2, σvalue=1, tend=15, δ=1.4, normalization: on
"a photo of a dog in the snow"
σspatial=3, σvalue=0.3, tend=15, δ=1.6, normalization: on
We show a few videos where we sweep the filter strength δ from 0 to 1.6
"a painting of a dog in a red hat"
σspatial=3, σvalue=0.3, tend=15, δ=1.4, normalization: off
"a painting of a cat in a red hat"
σspatial=3, σvalue=0.3, tend=15, δ=1.4, normalization: off
"a photo of a dog"
σspatial=2, σvalue=1, tend=15, δ=1.4, normalization: off
"a photo of a rabbit"
σspatial=2, σvalue=1, tend=15, δ=1.4, normalization: off