notesum.ai
Published at December 10DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
cs.CV
Released Date: December 10, 2024
Authors: Jianzong Wu1, Chao Tang, Jingbo Wang, Yanhong Zeng, Xiangtai Li, Yunhai Tong
Aff.: 1Peking University

| Method | FID | CLIP | DINO-I | DINO-C | F1 score |
|---|---|---|---|---|---|
| AR-LDM* [25] | 0.409 | 0.257 | 0.548 | 0.507 | 0.004 |
| StoryGen* [21] | 0.411 | 0.219 | 0.536 | 0.488 | 0.012 |
| SEED-Story* [45] | 0.411 | 0.169 | 0.416 | 0.405 | 0.006 |
| StoryDiffusion* [52] | 0.409 | 0.244 | 0.461 | 0.362 | 0.002 |
| MS-Diffusion† [38] | 0.408 | 0.229 | 0.610 | 0.641 | 0.720 |
| DiffSensei | 0.407 | 0.235 | 0.618 | 0.651 | 0.727 |