notesum.ai
Published at December 3Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis
cs.CV
Released Date: December 3, 2024
Authors: Yu Yuan1, Xijun Wang, Yichen Sheng, Prateek Chennuri, Xingguang Zhang, Stanley Chan
Aff.: 1Purdue University
![[Uncaptioned image]](https://arxiv.org/html/2412.02168v1/x1.png)
| Methods | Bokeh Rendering | Focal Length | Shuttle Speed | Color Temperature | ||||||||
| Accuracy | Consistency | Quality | Accuracy | Consistency | Quality | Accuracy | Consistency | Quality | Accuracy | Consistency | Quality | |
| CorrCoef | LPIPS | CLIP | CorrCoef | LPIPS | CLIP | CorrCoef | LPIPS | CLIP | CorrCoef | LPIPS | CLIP | |
| Reference | 1.0000 | 0.0527 | 0.3974 | 1.0000 | 0.4709 | 0.3853 | 1.0000 | 0.0511 | 0.3783 | 1.0000 | 0.0398 | 0.4053 |
| SD3 [4] | 0.2492 | 0.7253 | 0.3278 | 0.2356 | 0.7108 | 0.3097 | 0.2731 | 0.6937 | 0.3169 | 0.2312 | 0.6891 | 0.3276 |
| FLUX [3] | 0.2006 | 0.6770 | 0.3257 | 0.2003 | 0.6461 | 0.3086 | 0.2398 | 0.6403 | 0.3192 | 0.2363 | 0.6155 | 0.3207 |
| AnimateDiff [20] (w/o FT) | 0.2960 | 0.1005 | 0.2753 | 0.2613 | 0.1208 | 0.2532 | 0.1843 | 0.1002 | 0.2631 | 0.1834 | 0.0805 | 0.2659 |
| AnimateDiff [20] (w/ FT) | 0.3714 | 0.0255 | 0.2984 | 0.2597 | 0.2288 | 0.2739 | 0.2198 | 0.0948 | 0.2936 | 0.2897 | 0.0205 | 0.2839 |
| CameraCtrl [22] (w/o FT) | 0.3303 | 0.1447 | 0.2804 | 0.2913 | 0.1144 | 0.2644 | 0.1896 | 0.0986 | 0.2912 | 0.1773 | 0.0935 | 0.2753 |
| CameraCtrl [22] (w/ FT) | 0.6025 | 0.1158 | 0.3017 | 0.8671 | 0.4606 | 0.2865 | 0.7526 | 0.0775 | 0.2981 | 0.5812 | 0.0651 | 0.2885 |
| Ours | 0.8626 | 0.0788 | 0.3007 | 0.9695 | 0.4647 | 0.2871 | 0.9264 | 0.0695 | 0.3015 | 0.8970 | 0.0499 | 0.2910 |