notesum.ai
Published at November 26Accelerating Vision Diffusion Transformers with Skip Branches
cs.CV
Released Date: November 26, 2024
Authors: Guanjie Chen1, Xinyu Zhao2, Yucheng Zhou3, Tianlong Chen2, Cheng Yu4
Aff.: 1Shanghai Jiao Tong University; 2The University of North Carolina at Chapel Hill; 3SKL-IOTSC, CIS, University of Macau; 4The Chinese University of Hong Kong
![[Uncaptioned image]](https://arxiv.org/html/2411.17616v1/x1.png)
| Method | UCF101 | FFS | Sky | Taichi | FLOPs (T) | Latency (s) | Speedup | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| FVD () | FID () | FVD () | FID () | FVD () | FID () | FVD () | FID () | ||||
| Latte | 155.22 | 22.97 | 28.88 | 5.36 | 49.46 | 11.51 | 166.84 | 11.57 | 278.63 | 9.90 | 1.00 |
| -DiT | 161.62 | 25.33 | 25.80 | 4.46 | 51.70 | 11.67 | 188.39 | 12.09 | 226.10 | 8.09 | 1.22 |
| FORA | 160.52 | 23.52 | 27.23 | 4.64 | 52.90 | 11.96 | 198.56 | 13.68 | 240.26 | 9.00 | 1.10 |
| PAB23 | 213.50 | 30.96 | 58.15 | 5.94 | 96.97 | 16.38 | 274.90 | 16.05 | 233.87 | 7.63 | 1.30 |
| PAB35 | 1176.57 | 93.30 | 863.18 | 128.34 | 573.72 | 55.66 | 828.40 | 42.96 | 222.90 | 7.14 | 1.39 |
| Skip-Cache | |||||||||||
| Skip-DiT | 141.30 | 23.78 | 20.62 | 4.32 | 49.21 | 11.92 | 163.03 | 13.55 | 290.05 | 10.02 | 1.00 |
| 141.42 | 21.46 | 23.55 | 4.49 | 51.13 | 12.66 | 167.54 | 13.89 (0.34) | 180.68 | 6.40 | 1.56 | |
| 137.98 | 19.93 | 26.76 | 4.75 | 54.17 | 13.11 | 179.43 | 14.53 | 145.87 | 5.24 | 1.91 | |
| 143.00 | 19.03 | 30.19 | 5.18 | 57.36 | 13.77 | 188.44 | 14.38 | 125.99 | 4.57 | 2.19 | |
| 145.39 | 18.72 | 35.52 | 5.86 | 62.92 | 14.18 | 209.38 | 15.20 | 121.02 | 4.35 | 2.30 | |
| 151.77 | 18.78 | 42.41 | 6.42 | 68.96 | 15.16 | 208.04 | 15.78 | 111.07 | 4.12 | 2.43 | |