notesum.ai

Published at December 4

Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges

cs.LG

Released Date: December 4, 2024

Authors: Minghao Shao¹, Abdul Basit², Ramesh Karri¹, Muhammad Shafique²

Aff.: ¹New York University, USA; ²New York University Abu Dhabi, UAE

Arxiv: http://arxiv.org/pdf/2412.03220v1

Refer to caption

GAN Model	Year	Key Idea
Vanilla GAN [16]	2014	Introduced the fundamental adversarial training between the generator and discriminator.
DCGAN [17]	2015	Utilizes convolutional layers to enhance the performance and stability of GANs in image generation tasks.
CGAN [18]	2014	Introduces conditioning variables (e.g., class labels) into both the generator and discriminator to control the output.
WGAN [19]	2017	Replaces the original GAN loss with the Wasserstein distance to improve training stability and reduce mode collapse.
WGAN-GP [20]	2017	Extends WGAN by adding a gradient penalty term to enforce the Lipschitz constraint more effectively.
LSGAN [21]	2017	Uses least-squares loss instead of the cross-entropy loss to address vanishing gradients and stabilize training.
CycleGAN [22]	2017	Introduces cycle consistency loss to enable image-to-image translation without paired training data.
StyleGAN [23]	2019	Introduces a style-based generator architecture, allowing control over different aspects and details of generated images.
BigGAN [24]	2018	Focuses on scaling up GANs using large batch sizes and deeper architectures to generate higher-quality images.
SAGAN [25]	2018	Incorporates self-attention mechanisms in GANs to capture long-range dependencies and generate detailed images.
Progressive GAN [26]	2017	Gradually increases the resolution of generated images during training to achieve more stable results.
StarGAN [27]	2018	Aims to perform multi-domain image-to-image translation using a single generator and discriminator.