notesum.ai
Published at November 26WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
cs.CV
cs.AI
Released Date: November 26, 2024
Authors: Zongjian Li1, Bin Lin1, Yang Ye1, Liuhan Chen1, Xinhua Cheng1, Shenghai Yuan1, Li Yuan2
Aff.: 1Peking University; 2Peng Cheng Laboratory

| Method | TCPR | Chn | WebVid-10M | Panda-70M | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| PSNR | SSIM | LPIPS | FVD | PSNR | SSIM | LPIPS | FVD | |||
| SD-VAE[27] | 4 | 30.19 | 0.8377 | 0.0568 | 284.90 | 30.46 | 0.8896 | 0.0395 | 182.99 | |
| SVD-VAE[4] | 4 | 31.18 | 0.8689 | 0.0546 | 188.74 | 31.04 | 0.9059 | 0.0379 | 137.67 | |
| CV-VAE[44] | 4 | 30.76 | 0.8566 | 0.0803 | 369.23 | 30.18 | 0.8796 | 0.0672 | 296.28 | |
| OD-VAE[6] | 4 | 30.69 | 0.8635 | 0.0553 | 255.92 | 30.31 | 0.8935 | 0.0439 | 191.23 | |
| Open-Sora VAE [45] | 4 | 31.14 | 0.8572 | 0.1001 | 475.23 | 31.37 | 0.8973 | 0.0662 | 298.47 | |
| Allegro [46] | 4 | 32.18 | 0.8963 | 0.0524 | 209.68 | 31.70 | 0.9158 | 0.0421 | 172.72 | |
| WF-VAE-S (Ours) | 4 | 31.39 | 0.8737 | 0.0517 | 188.04 | 31.27 | 0.9025 | 0.0420 | 146.91 | |
| WF-VAE-L (Ours) | 4 | 32.32 | 0.8920 | 0.0513 | 186.00 | 32.10 | 0.9142 | 0.0411 | 146.24 | |
| CogVideoX-VAE[39] | 16 | 35.72 | 0.9434 | 0.0277 | 59.83 | 35.79 | 0.9527 | 0.0198 | 43.23 | |
| WF-VAE-L (Ours) | 16 | 35.76 | 0.9430 | 0.0230 | 54.36 | 35.87 | 0.9538 | 0.0175 | 39.40 | |