notesum.ai
Published at November 25Text-to-Image Synthesis: A Decade Survey
cs.CV
Released Date: November 25, 2024

| Research Area | Model |
|---|---|
| Sec.4.1: Alignment and Human Feedback | RAHF[134], DifffAgent[135], HPS[136], ImageReward[137], Multi-dimensional Human |
| Preference (MHP)[138], MPS[138] | |
| Sec.4.2: Personalized Generation | Textual Inversion[139], PIA[140], Prompt-Free Diffusion[141], ADI[142], HyperDreamBooth[143], |
| DreamBooth[43], Personalized Residuals[144], PALP[145], PhotoMaker[146], DreamTurner[147], | |
| ELITE[148], Tailored Visions[149], RealCustom[150], SAG[151], CoRe[152], Imagine Yourself[153], | |
| FastComposer[154], InstantID[155], Specialist Diffusion[103], ControlStyle[156], UniPortrait[157], | |
| TIGC[158], DETEX[159], FlashFace[160], IDAdapter[161], SSR-Encoder [162] | |
| Sec.4.3: Controllable T2I Generation | ControlNet[13], T2I-Adapter[163], Custom Diffusion[106], SID[164], JeDi [165], ViCo[166], |
| DreamMatcher[167], RealCustom[150], StyleGAN Meets Stable Diffusion[168], Uni-ControlNet[169], | |
| BLIP-Diffusion[170], LayerDiffusion[171], ReCo[172], SpaText[173], Codi[174], StyleTokenizer[175], | |
| P+[176], IP-Adapter[177], ResAdapter[178], DetDiffusion[179], CAN[180], SceneDiffusion[181], | |
| Zero-Painter[182], FreeControl[183], PCDMs[184], ControlNet++[185], ControlNeXt[186], | |
| Composer[154], MultiDiffusion[187] | |
| Sec.4.4: T2I Style Transfer | Styleformer[188], InST[189], ArtAdapt[190], InstantBooth[191], OSASIS[192], DEADiff[193], |
| Uncovering the Disentanglement Capability[194], InstantStyle[195], StyleAligned[196], FreeCustom[197] | |
| Sec.4.5: Text-guided Image Generation | DisenDiff[198], Predicated Diffusion[199], LEDITS++[200], PH2P[201], GSN[202], NMG[203], |
| Imagic[204], SINE[205], AdapEdit[206], FISEdit[207], BARET[208], D-Edit[209], PromptCharm[210], | |
| RAPHAEL[211], SUR-Adapter[212], Prompt Diffusion[213], SDG[214], InfEdit[215], FPE[216], | |
| FoI[217], DiffEditor[218], Prompt Augmentation[219], TurboEdit[220], PPE [221], DE-Net [222] | |
| Sec.4.6: Performance and Effectiveness | LowRank Adaptation (LoRA)[223], ECLIPSE[224], InstaFlow[225], SVDiff[107], LinFusion[226], |
| Speculative Jacobi Decoding[99], Token-Critic[227], Null-text Inversion[228], DiffFit[229], | |
| Progressive Distillation[230], YOSO[231], SwiftBrush[232], StreamDiffusion[233], TiNO-Edit[234], | |
| On the Scalability of Diffusion-based T2I Generation[235],Muse[236], UFC-BERT[237], MarkovGen[238] | |
| Sec.4.7: Reward Mechanism | Prompt AutoEditing (PAE)[239], AIG[240], IterComp[241], Powerful and Flexible[242], |
| Optimizing Prompts for T2I Generation[243], ImageReward[137], SPIN-Diffusion[244], | |
| Sec.4.8: Safety Issues | Self-Discovering[245], POSI[246], Detection-based Method [247], SteerDiff[248], MetaCloak[249], |
| OpenBias[250], SPM[251], Guidance-based Methods [246, 33, 252], Fine-tuning-based Method [253], | |
| HFI[254], RECE[255], RIATIG[256] | |
| Sec.4.9: Copyright Protection Measures | AdvDM[257], Mist[258], Anti-DreamBooth[259], InMark[260], VA3[261], SimAC[262] |
| Sec.4.10: Consistency Between | TokenCompose[263], Attention Refocusing[264], DPT[265], MIGC[266], INITNO[267], |
| Text and Image Content | DiffusionCLIP[268], SDEdit[269], MasaCtrl[270], RIE[271], Scene TextGen[272], |
| AnyText[273], SceneGenie[274], ZestGuide[275], Adma-GAN[276], AtHom[277], TL-Norm[278], | |
| GSN[202], DiffPNG[279], Compose and Conquer[280], StoryDiffusion[281], SynGen[282], | |
| ParaDiffusion[283], Token Merging (Tome)[284], Training-Free Structured Diffusion Guidance[285], | |
| StyleT2I[286], Bounded Attention[287], DAC[288], Attention Map Control[289], DDS[290], | |
| CDS[291], StableDrag[292], FreeDrag[293], RegionDrag[294] | |
| Sec.4.11: Spatial Consistency | SPRIGHT[295], CoMat[296], CLIP[68], MULAN[297], TCTIG[298], PLACE[299], |
| Between Text and Image | Backward Guidance[300], SSMG[301] |
| Sec.4.12: Specific Content Generation | CosmicMan[302], Text2Human[303], CAFE[304], ZS-SBIR[305], HanDiffuser[306], EmoGen[307], |
| Cross Initialization for Face Personalization[308], HcP[309], PanFusion[310], StoryGen[311], | |
| Text2Street[312], SVGDreamer[313], Face-Adapter[314], StoryDALL-E[315], TexFit[316], EditWorld[317] | |
| Sec.4.13: Fine-grained | Continuous 3D Words[318], FiVA[319], GLIGEN[320], InteractDiffusion[321], PreciseControl [322], |
| Control in Generation | Localizing Object-level Shape Variations[323], Motion Guidance[324], SingDiffusion[325], CFLD[326], |
| NoiseCollage[327], Concept Weaver[328], Playground v2.5[329], LayerDiffuse[330], ConForm[331] | |
| Sec.4.14: LLM-assisted T2I | LayoutGPT[332], DiagrammerGPT[333], AutoStory[334], CAFE[304], LayoutLLM-T2I[335], MoMA[336], |
| SmartEdit[337], RFNet[338], TIAC[339], RPG[112], ELLA[340], DialogGen[341], MGIE[342] |