learndeep
search
⌘Ctrlk
learndeep
  • Introduction
  • 1. 开山模型
    • Playing Atari with Deep Reinforcement Learning
    • FCN:Fully Convolutional Networks for Semantic Segmentation
    • U-Net:Convolutional Networks for Biomedical Image Segmentation
    • GAN:Generative Adversarial Nets
    • Attention Is All You Need
    • GPT-1:Improving Language Understanding by Generative Pre-Training
    • InstructGPT:Training language models to follow instructions with human feedback
    • BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding
    • BART:Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
    • T5:Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
    • ELMo:Deep contextualized word representations
    • ViT:AN IMAGE IS WORTH 16X16 WORDS_ TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE
    • Distilling the Knowledge in a Neural Network
    • DeiT:Training data-efficient image transformers & distillation through attention
    • Swin Transformer:Hierarchical Vision Transformer using Shifted Windows
    • DETR:End-to-End Object Detection with Transformers
    • CLIP:Learning Transferable Visual Models From Natural Language Supervision
    • VAE:Auto-Encoding Variational Bayes
    • VQ-VAE:Neural Discrete Representation Learning
    • VQ-VAE2:Generating Diverse High-Fidelity Images with VQ-VAE-2
    • KAN:Kolmogorov–Arnold Networks
    • Pixel RNN:Pixel Recurrent Neural Networks
    • Conditional Image Generation with PixelCNN Decoders
    • GQA:Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
    • FlashAttention:Fast and Memory-Efficient Exact Attention with IO-Awareness
    • Efficient Memory Management for Large Language Model Serving with PagedAttention
  • 2. 自然语言处理
  • 3. 计算机视觉
  • 4. 强化学习
  • 5. 大模型微调
  • 6. 参考文献
  • 7. 贡献
gitbookPowered by GitBook
block-quoteOn this pagechevron-down
  1. 1. 开山模型

ELMo:Deep contextualized word representations

1802.05365

hashtag
摘要

PreviousT5:Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformerchevron-leftNextViT:AN IMAGE IS WORTH 16X16 WORDS_ TRANSFORMERS FOR IMAGE RECOGNITION AT SCALEchevron-right

Last updated 10 months ago