Playing Atari with Deep Reinforcement Learning
FCN:Fully Convolutional Networks for Semantic Segmentation
U-Net:Convolutional Networks for Biomedical Image Segmentation
GAN:Generative Adversarial Nets
Attention Is All You Need
GPT-1:Improving Language Understanding by Generative Pre-Training
InstructGPT:Training language models to follow instructions with human feedback
BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding
BART:Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
T5:Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
ELMo:Deep contextualized word representations
ViT:AN IMAGE IS WORTH 16X16 WORDS:TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE
Distilling the Knowledge in a Neural Network
DeiT:Training data-efficient image transformers & distillation through attention
Swin Transformer:Hierarchical Vision Transformer using Shifted Windows
DETR:End-to-End Object Detection with Transformers
CLIP:Learning Transferable Visual Models From Natural Language Supervision
VAE:Auto-Encoding Variational Bayes
VQ-VAE:Neural Discrete Representation Learning
VQ-VAE2:Generating Diverse High-Fidelity Images with VQ-VAE-2
KAN:Kolmogorov–Arnold Networks
Pixel RNN:Pixel Recurrent Neural Networks
Conditional Image Generation with PixelCNN Decoders
GQA:Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
FlashAttention:Fast and Memory-Efficient Exact Attention with IO-Awareness
Efficient Memory Management for Large Language Model Serving with PagedAttention
Last updated 4 months ago