notesum.ai
Published at November 15Instruction-Guided Editing Controls for Images and Multimedia: A Survey in LLM era
cs.CV
cs.AI
cs.HC
cs.LG
cs.MM
Released Date: November 15, 2024
Authors: Thanh Tam Nguyen1, Zhao Ren2, Trinh Pham3, Phi Le Nguyen4, Hongzhi Yin5, Quoc Viet Hung Nguyen1
Aff.: 1Griffith University; 2University of Bremen; 3Ho Chi Minh City University of Technology; 4Hanoi University of Science and Technology; 5The University of Queensland

| Survey | Focused Task | Focused Modality | Key Contents |
| (Qin et al., 2024) | Editing | Text | Instruction development, Evaluation concerns |
| (Yin et al., 2023) | Editing | Text | LLM-empowered instructions, Instruction tuning |
| (Li et al., 2024b) | Retrieval | Image, Video, Audio | Image-text composite retrieval, Multimodal composite retrieval |
| (Zhan et al., 2023) | Generation | Image | Text guidance, Audio guidance, Sketch guidance, etc. |
| Ours | Editing | Image, Video, Audio | Instruction mechanisms, Augmentations, Learning stragies, Model designs, Loss functions |