Paper Gallery
Explore influential machine learning papers with summaries, visualizations, and community discussions.
FlexOLMo: Open Language Models for Flexible Data Use
Weijia Shi, Luca Soldaini, Kyle Lo, Shane Arora, Oyvind Tafjord, Noah A. Smith, et al.
We present FlexOLMo, a new class of language models that supports distributed training without data sharing and data-flexible inference. Different model parameters can be independently trained on closed datasets, and these parameters along with their associated data can be flexibly included or excluded from model inferences with no further training.
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai, Saurav Kadavath, Sandipan Kundu, et al.
We present Constitutional AI (CAI), a method for training harmless AI assistants without human feedback labels for harms. CAI trains a helpful assistant that can engage with harmful queries by explaining its objections to them.
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Tri Dao, Daniel Y. Fu, Stefano Ermon, et al.
We propose FlashAttention, an IO-aware exact attention algorithm that uses tiling to reduce memory reads/writes between GPU high bandwidth memory (HBM) and GPU on-chip SRAM.
Training Language Models to Follow Instructions with Human Feedback
Long Ouyang, Jeff Wu, Xu Jiang, et al.
We fine-tune language models using reinforcement learning from human feedback (RLHF). Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations.
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, et al.
We propose Latent Diffusion Models (LDMs), which apply the diffusion process in the latent space of powerful pretrained autoencoders. This allows us to achieve state-of-the-art synthesis results on image data.
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, et al.
We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million image-text pairs.
Language Models are Few-Shot Learners
Tom B. Brown, Benjamin Mann, Nick Ryder, et al.
We demonstrate that scaling up language models greatly improves task-agnostic, few-shot performance. We train GPT-3, an autoregressive language model with 175 billion parameters, and test its performance in the few-shot setting.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. BERT is designed to pre-train deep bidirectional representations from unlabeled text.
Attention Is All You Need
Ashish Vaswani, Noam Shazeer, Niki Parmar, et al.
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms.
Deep Residual Learning for Image Recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.
Generative Adversarial Networks
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, et al.
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G.