Paper Gallery

Explore influential machine learning papers with summaries, visualizations, and community discussions.

Sort by:
Filter:
Showing 11 of 11 papers

FlexOLMo: Open Language Models for Flexible Data Use

Weijia Shi, Luca Soldaini, Kyle Lo, Shane Arora, Oyvind Tafjord, Noah A. Smith, et al.

We present FlexOLMo, a new class of language models that supports distributed training without data sharing and data-flexible inference. Different model parameters can be independently trained on closed datasets, and these parameters along with their associated data can be flexibly included or excluded from model inferences with no further training.

Language ModelsFederated LearningPrivacy-Preserving AI
30 min
arXiv:2507.07024

Constitutional AI: Harmlessness from AI Feedback

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, et al.

We present Constitutional AI (CAI), a method for training harmless AI assistants without human feedback labels for harms. CAI trains a helpful assistant that can engage with harmful queries by explaining its objections to them.

AI SafetyRLHFLanguage Models
25 min
arXiv:2212.08073

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Tri Dao, Daniel Y. Fu, Stefano Ermon, et al.

We propose FlashAttention, an IO-aware exact attention algorithm that uses tiling to reduce memory reads/writes between GPU high bandwidth memory (HBM) and GPU on-chip SRAM.

OptimizationAttentionGPU
25 min
arXiv:2205.14135

Training Language Models to Follow Instructions with Human Feedback

Long Ouyang, Jeff Wu, Xu Jiang, et al.

We fine-tune language models using reinforcement learning from human feedback (RLHF). Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations.

RLHFLanguage ModelsAlignment
30 min
arXiv:2203.02155

High-Resolution Image Synthesis with Latent Diffusion Models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, et al.

We propose Latent Diffusion Models (LDMs), which apply the diffusion process in the latent space of powerful pretrained autoencoders. This allows us to achieve state-of-the-art synthesis results on image data.

Diffusion ModelsImage GenerationGenerative AI
40 min
arXiv:2112.10752

Learning Transferable Visual Models From Natural Language Supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, et al.

We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million image-text pairs.

Vision-LanguageMultimodalContrastive Learning
35 min
arXiv:2103.00020

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, Nick Ryder, et al.

We demonstrate that scaling up language models greatly improves task-agnostic, few-shot performance. We train GPT-3, an autoregressive language model with 175 billion parameters, and test its performance in the few-shot setting.

Language ModelsFew-shot LearningScale
45 min
arXiv:2005.14165

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. BERT is designed to pre-train deep bidirectional representations from unlabeled text.

NLPTransformersPre-training
20 min
arXiv:1810.04805

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, et al.

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms.

TransformersAttentionNLP
30 min
arXiv:1706.03762

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.

Deep LearningComputer VisionArchitecture
25 min
arXiv:1512.03385

Generative Adversarial Networks

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, et al.

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

Generative ModelsDeep LearningComputer Vision
20 min
arXiv:1406.2661