首页
网站开发
桌面应用
管理软件
微信开发
App开发
嵌入式软件
工具软件
数据采集与分析
其他
首页
>
> 详细
UMEECS542代做、代写Java/c++编程语言
项目预算:
开发周期:
发布时间:
要求地区:
UMEECS542: AdvancedTopicsinComputerVision Homework#2: DenoisingDiffusiononTwo-PixelImages
Due: 14October202411:59pm
The field of image synthesis has evolved significantly in recent years. From auto-regressive models and Variational Autoencoders (VAEs) to Generative Adversarial Networks (GANs), we have now entered a new era of diffusion models. A key advantage of diffusion models over other generative approaches is their ability to avoid mode collapse, allowing them to produce a diverse range of images. Given the high dimensionality of real images, it is impractical to sample and observe all possible modes directly. Our objective is to study denoising diffusion on two-pixel images to better understand how modes are generated and to visualize the dynamics and distribution within a 2D space.
1 Introduction
Diffusion models operate through a two-step process (Fig. 1): forward and reverse diffusion.
Figure 1: Diffusion models have a forward process to successively add noise to a clear image x0 and a backward process to successively denoise an almost pure noise image xT [2].
During the forward diffusion process, noise εt is incrementally added to the original data at time step t, over more time steps degrading it to a point where it resembles pure Gaussian noise. Let εt represent standard Gaussian noise, we can parameterize the forward process as xt ∼ N (xt|√1 − βt xt−1, βt I):
q(xt|xt−1) = 1 − βt xt−1 + βt εt−1 (1) 0<βt <1. (2)
Integrating all the steps together, we can model the forward process in a single step:
√√
xt= α ̄txo+ 1−α ̄tε (3)
αt =1−βt (4) α ̄ t = α 1 × α 2 × · · · × α t (5)
As t → ∞, xt is equivalent to an isotropic Gaussian distribution. We schedule β1 < β2 < ... < βT , as larger update steps are more appropriate when the image contains significant noise.
1
The reverse diffusion process, in contrast, involves the model learning to reconstruct the original data from a noisy version. This requires training a neural network to iteratively remove the noise, thereby recovering the original data. By mastering this denoising process, the model can generate new data samples that closely resemble the training data.
We model each step of the reverse process as a Gaussian distribution
pθ(xt−1|xt) = N (xt−1|μθ(xt, t), Σθ(xt, t)) . (6)
It is noteworthy that when conditioned on x0, the reverse conditional probability is tractable:
q(x |x,x )=Nx |μ,βˆI, (7)
t−1 t 0 t−1 t t
where, using the Bayes’ rule and skipping many steps (See [8] for reader-friendly derivations), we have:
1 1−αt
μt=√α xt−√1−α ̄εt . (8)
tt
We follow VAE[3] to optimize the negative log-likelihood with its variational lower bound with respect to μt and μθ(xt,t) (See [6] for derivations). We obtain the following objective function:
L=Et∼[1,T],x0,ε∥εt −εθ(xt,t)∥2. (9) The diffusion model εθ actually predicts the noise added to x0 from xt at timestep t.
a) many-pixel images b) two-pixel images
Figure 2: The distribution of images becomes difficult to estimate and distorted to visualize for many- pixel images, but simple to collect and straightforward to visualize for two-pixel images. The former requires dimensionality reduction by embedding values of many pixels into, e.g., 3 dimensions, whereas the latter can be directly plotted in 2D, one dimension for each of the two pixels. Illustrated is a Gaussian mixture with two density peaks, at [-0.35, 0.65] and [0.75, -0.45] with sigma 0.1 and weights [0.35, 0.65] respectively. In our two-pixel world, about twice as many images have a lighter pixel on the right.
In this homework, we study denoising diffusion on two-pixel images, where we can fully visualize the diffusion dynamics over learned image distributions in 2D (Fig. 2). Sec. 2 describes our model step by step, and the code you need to write to finish the model. Sec. 3 describes the starter code. Sec. 4 lists what results and answers you need to submit.
2
2 Denoising Diffusion Probabilistic Models (DDPM) on 2-Pixel Images
Diffusion models not only generate realistic images but also capture the underlying distribution of the training data. However, this probability density distributions (PDF) can be hard to collect for many- pixel images and their visualization highly distorted, but simple and direct for two-pixel images (Fig. 2). Consider an image with only two pixels, left and right pixels. Our two-pixel world contains two kinds of images: the left pixel lighter than the right pixel, or vice versa. The entire image distribution can be modeled by a Gaussian mixture with two peaks in 2D, each dimension corresponding to a pixel.
Let us develop DDPM [2] for our special two-pixel image collection.
2.1 Diffusion Step and Class Embedding
We use a Gaussian Fourier feature embedding for diffusion step t:
xemb = sin2πw0x,cos2πw0x,...,sin2πwnx,cos2πwnx, wi ∼ N(0,1), i = 1,...,n. (10)
For the class embedding, we simply need some linear layers to project the one-hot encoding of the class labels to a latent space. You do not need to do anything for this part. This part is provided in the code.
2.2 Conditional UNet
We use a UNet (Fig. 3) that takes as input both the time step t and the noised image xt, along with class label y if it is provided, and outputs the predicted noise. The network consists of only two blocks for the encoding or decoding pathway. To incorporate the step into the UNet features, we apply a dense
Figure 3: Sampe condition UNet architecture. Please note how the diffusion step and the class/text conditional embeddings are fused with the conv blocks of the image feature maps. For simplicity, we will not add the attention module for our 2-pixel use case.
3
linear layer to transform the step embedding to match the image feature dimension. A similar embedding approach can be used for class label conditioning. The detailed architecture is as follows.
1. Encoding block 1: Conv1D with kernel size 2 + Dense + GroupNorm with 4 groups
2. Encoding block 2: Conv1D with kernel size 1 + Dense + GroupNorm with 32 groups
3. Decoding block 1: ConvTranspose1d with kernel size 1 + Dense + GroupNorm with 4 groups 4. Decoding block 2: ConvTranspose1d with kernel size 1
We use SiLU [1] as our activation function. When adding class conditioning, we handle it similarly to the diffusion step.
Your to-do: Finish the model architecture and forward function in ddpm.py 2.3 Beta Scheduling and Variance Estimation
We adopt the sinusoidal beta scheduling [4] for better results then the original DDPM [2]:
α ̄t = f(t) (11)
f (0)
t/T+s π
f(t)=cos 1+s ·2 . (12) However, we follow the simpler posterior variance estimation [2] without using [4]’s learnt posterior
variance method for estimating Σθ(xt,t).
For simplicity, we declare some global variables that can be handy during sampling and training. Here is
the definition of these global variables in the code.
1. betas: βt
2. alphas: αt = 1 − βt
3. alphas cumprod: α ̄t = Πt0αi ̃ 1−α ̄t−1
4. posterior variance: Σθ(xt, t) = σt = βt = 1−α ̄t βt
Your to-do: Code up all these variables in utils.py. Feel free to add more variables you need. 2.4 Training with and without Guidance
For each DDPM iteration, we randomly select the diffusion step t and add random noise ε to the original image x0 using the β we defined for the forward process to get a noisy image xt. Then we pass the xt and t to our model to output estimated noise εθ, and calculate the loss between the actual noise ε and estimated noise εθ. We can choose different loss, from L1, L2, Huber, etc.
To sample images, we simply follow the reverse process as described in [2]:
11−αt
xt−1=√α xt−√1−α ̄εθ(xt,t) +σtz, wherez∼N(0,I)ift > 1else0. (13)
tt
We consider both classifier and classifier-free guidance. Classifier guidance requires training a separate classifier and use the classifier to provide the gradient to guide the generation of diffusion models. On the other hand, classifier-free guidance is much simpler in that it does not need to train a separate model.
To sample from p(x|y), we need an estimation of ∇xt log p(xt|y). Using Bayes’ rule, we have:
∇xt log p(xt|y) = ∇xt log p(y|xt) + ∇xt log p(xt) − ∇xt log p(y) (14)
= ∇xt log p(y|xt) + ∇xt log p(xt), (15) 4
Figure 4: Sample trajectories for the same start point (a 2-pixel image) with different guidance. Setting y = 0 generates a diffusion trajectory towards images of type 1 where the left pixel is darker than the right pixel, whereas setting y = 1 leads to a diffusion trajectory towards images of type 2 where the left pixel is lighter than the right pixel.
where ∇xt logp(y|xt) is the classifier gradient and ∇xt logp(xt) the model likelihood (also called score function [7]). For classifier guidance, we could train a classifier fφ for different steps of noisy images and estimate p(y|xt) using fφ(y|xt).
Classifier-free guidance in DDPM is a technique used to generate more controlled and realistic samples without the need for an explicit classifier. It enhances the flexibility and quality of the generated images by conditioning the diffusion model on auxiliary information, such as class labels, while allowing the model to work both conditionally and unconditionally.
For classifier-free guidance, we make a small modification by parameterizing the model with an additional input y, resulting in εθ(xt,t,y). This allows the model to represent ∇xt logp(xt|y). For non-conditional generation, we simply set y = ∅. We have:
∇xt log p(y|xt) = ∇xt log p(xt|y) − ∇xt log p(xt) (16) Recall the relationship between score functions and DDPM models, we have:
ε ̄θ(xt, t, y) = εθ(xt, t, y) + w (εθ(xt, t, y) − εθ(xt, t, ∅)) (17) = (w + 1) · εθ(xt, t, y) − w · εθ(xt, t, ∅), (18)
where w controls the strength of the conditional influence; w > 0 increases the strength of the guidance, pushing the generated samples more toward the desired class or conditional distribution.
During training, we randomly drop the class label to train the unconditional model. We replace the orig- inal εθ(xt, t) with the new (w + 1)εθ(xt, t, y) − wεθ(xt, t, ∅) to sample with specific class labels (Fig.4). Classifier-free guidance involves generating a mix of the model’s predictions with and without condition- ing to produce samples with stronger or weaker guidance.
Your to-do: Finish up all the training and sampling functions in utils.py for classifier-free guidance. 5
3 Starter Code
1. gmm.py defines the Gaussian Mixture model for the groundtruth 2-pixel image distribution. Your training set will be sampled from this distribution. You can leave this file untouched.
2. ddpm.py defines the model itself. You will need to follow the guideline to build your model there.
3. utils.py defines all the other utility functions, including beta scheduling and training loop module.
4. train.py defines the main loop for training.
4 Problem Set
1. (40 points) Finish the starter code following the above guidelines. Further changes are also welcome! Please make sure your training and visualization results are reproducible. In your report, state any changes that you make, any obstacles you encounter during coding and training.
2. (20 points) Visualize a particular diffusion trajectory overlaid on the estimated image distribution pθ (xt |t) at time-step t = 10, 20, 30, 40, 50, given max time-step T = 50. We estimate the PDF by sampling a large number of starting points and see where they end up with at time t, using either 2D histogram binning or Gaussian kernel density estimation methods. Fig. 5 plots the de-noising trajectory for a specific starting point overlaid on the ground-truth and estimated PDF.
Visualize such a sample trajectory overlaid on 5 estimated PDF’s at t = 10, 20, 30, 40, 50 respectively and over the ground-truth PDF. Briefly describe what you observe.
Figure 5: Sample de-noising trajectory overlaid on the estimated PDF for different steps.
3. (20 points) Train multiple models with different maximum timesteps T = 5, 10, 25, 50. Sample and de- noise 5000 random noises. Visualize and describe how the de-noised results differ from each other. Simply do a scatter plot to see how the final distribution of the 5000 de-noised samples is compared with the groundtruth distribution for each T . Note that there are many existing ways [5, 9] to make smaller timesteps work well even for realistic images. 1 plot with 5 subplots is expected here.
4. (20 points) Visualize different trajectories from the same starting noise xT that lead to different modes with different guidance. Describe what you find. 1 plot as illustrated by Fig. 4 is expected here.
5. Bonus point (30 points): Extend this model to MNIST images. Actions: Add more conv blocks for encoding/decoding; add residual layers and attention in each block; increase the max timestep to 200 or more. Four blocks for each pathway should be enough for MNIST. Show 64 generated images with any random digits you want to guide (see Figure 6). Show one trajectory of the generation from noise to a clear digit. Answer the question: Throughout the generation, is this shape of the digit generated part by part, or all at once.
6
Figure 6: Sample MNIST images generated by denoising diffusion with classifier-free guidance. The tensor() below is the random digits (class labels) input to the sampling steps.
7
5 Submission Instructions
1. This assignment is to be completed individually.
2. Submissions should be made through Gradescope and Canvas. Please upload:
(a) A PDF file of the graph and explanation: This file should be submitted on gradescope. Include your name, student ID, and the date of submission at the top of the first page. Write each problem on a different page.
(b) A folder containing all code files: This folder will be submitted under the folder of your uniq- name on our class server. Please leave all your visualization codes inside as well, so that we can reproduce your results if we find any graphs strange.
(c) If you believe there may be an error in your code, please provide a written statement in the pdf describing what you think may be wrong and how it affected your results. If necessary, provide pseudocode and/or expected results for any functions you were unable to write.
3. You may refactor the code as desired, including adding new files. However, if you make substantial changes, please leave detailed comments and reasonable file names. You are not required to create separate files for every model training/testing: commenting out parts of the code for different runs like in the scaffold is all right (just add some explanation).
References
[1] Stefan Elfwing, Eiji Uchibe, and Kenji Doya. “Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning”. In: CoRR abs/1702.03118 (2017). arXiv: 1702. 03118. URL: http://arxiv.org/abs/1702.03118.
[2] Jonathan Ho, Ajay Jain, and Pieter Abbeel. “Denoising Diffusion Probabilistic Models”. In: arXiv preprint arxiv:2006.11239 (2020).
[3] Diederik P Kingma and Max Welling. Auto-Encoding Variational Bayes. 2022. arXiv: 1312.6114 [stat.ML]. URL: https://arxiv.org/abs/1312.6114.
[4] Alex Nichol and Prafulla Dhariwal. “Improved Denoising Diffusion Probabilistic Models”. In: CoRR abs/2102.09672 (2021). arXiv: 2102.09672. URL: https://arxiv.org/abs/2102.09672.
[5] Tim Salimans and Jonathan Ho. Progressive Distillation for Fast Sampling of Diffusion Models. 2022. arXiv: 2202.00512 [cs.LG]. URL: https://arxiv.org/abs/2202.00512.
[6] Jascha Sohl-Dickstein et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. 2015. arXiv: 1503.03585 [cs.LG]. URL: https://arxiv.org/abs/1503.03585.
[7] Yang Song and Stefano Ermon. “Generative Modeling by Estimating Gradients of the Data Distribu- tion”. In: CoRR abs/1907.05600 (2019). arXiv: 1907.05600. URL: http://arxiv.org/abs/1907. 05600.
[8] Lilian Weng. “What are diffusion models?” In: lilianweng.github.io (July 2021). URL: https : / / lilianweng.github.io/posts/2021-07-11-diffusion-models/.
[9] Qinsheng Zhang and Yongxin Chen. Fast Sampling of Diffusion Models with Exponential Integrator. 2023. arXiv: 2204.13902 [cs.LG]. URL: https://arxiv.org/abs/2204.13902.
8
软件开发、广告设计客服
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-23:00
微信:codinghelp
热点项目
更多
代写data driven business mod...
2024-11-12
代做acct1101mno introduction...
2024-11-12
代做can207 continuous and di...
2024-11-12
代做dsci 510: principles of ...
2024-11-12
代写25705 financial modellin...
2024-11-12
代做ccc8013 the process of s...
2024-11-12
代做intro to image understan...
2024-11-12
代写eco380: markets, competi...
2024-11-12
代写ems726u/p - engineering ...
2024-11-12
代写cive5975/cw1/2024 founda...
2024-11-12
代做csci235 – database syst...
2024-11-12
代做ban 5013 analytics softw...
2024-11-12
代写cs 17700 — lab 06 fall ...
2024-11-12
热点标签
mktg2509
csci 2600
38170
lng302
csse3010
phas3226
77938
arch1162
engn4536/engn6536
acx5903
comp151101
phl245
cse12
comp9312
stat3016/6016
phas0038
comp2140
6qqmb312
xjco3011
rest0005
ematm0051
5qqmn219
lubs5062m
eee8155
cege0100
eap033
artd1109
mat246
etc3430
ecmm462
mis102
inft6800
ddes9903
comp6521
comp9517
comp3331/9331
comp4337
comp6008
comp9414
bu.231.790.81
man00150m
csb352h
math1041
eengm4100
isys1002
08
6057cem
mktg3504
mthm036
mtrx1701
mth3241
eeee3086
cmp-7038b
cmp-7000a
ints4010
econ2151
infs5710
fins5516
fin3309
fins5510
gsoe9340
math2007
math2036
soee5010
mark3088
infs3605
elec9714
comp2271
ma214
comp2211
infs3604
600426
sit254
acct3091
bbt405
msin0116
com107/com113
mark5826
sit120
comp9021
eco2101
eeen40700
cs253
ece3114
ecmm447
chns3000
math377
itd102
comp9444
comp(2041|9044)
econ0060
econ7230
mgt001371
ecs-323
cs6250
mgdi60012
mdia2012
comm221001
comm5000
ma1008
engl642
econ241
com333
math367
mis201
nbs-7041x
meek16104
econ2003
comm1190
mbas902
comp-1027
dpst1091
comp7315
eppd1033
m06
ee3025
msci231
bb113/bbs1063
fc709
comp3425
comp9417
econ42915
cb9101
math1102e
chme0017
fc307
mkt60104
5522usst
litr1-uc6201.200
ee1102
cosc2803
math39512
omp9727
int2067/int5051
bsb151
mgt253
fc021
babs2202
mis2002s
phya21
18-213
cege0012
mdia1002
math38032
mech5125
07
cisc102
mgx3110
cs240
11175
fin3020s
eco3420
ictten622
comp9727
cpt111
de114102d
mgm320h5s
bafi1019
math21112
efim20036
mn-3503
fins5568
110.807
bcpm000028
info6030
bma0092
bcpm0054
math20212
ce335
cs365
cenv6141
ftec5580
math2010
ec3450
comm1170
ecmt1010
csci-ua.0480-003
econ12-200
ib3960
ectb60h3f
cs247—assignment
tk3163
ics3u
ib3j80
comp20008
comp9334
eppd1063
acct2343
cct109
isys1055/3412
math350-real
math2014
eec180
stat141b
econ2101
msinm014/msing014/msing014b
fit2004
comp643
bu1002
cm2030
联系我们
- QQ: 9951568
© 2021
www.rj363.com
软件定制开发网!