← Back to Paper Gallery

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, et al.
45 min24,567 citations
Language ModelsFew-shot LearningScale
View on arXiv

Paper Summary

GPT-3 demonstrates that language models can be few-shot learners, achieving strong performance on many NLP tasks without any gradient updates or fine-tuning, using only textual interactions with the model. This paradigm shift shows that scale alone can lead to qualitative improvements in capabilities.

Abstract

We demonstrate that scaling up language models greatly improves task-agnostic, few-shot performance. We train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting.

Critical Analysis & Questions for Consideration

GPT-3 demonstrated that scale enables emergent capabilities, but the paper's claims about few-shot learning and the implications of massive scale deserve scrutiny.

Scale as Scientific Contribution

GPT-3 proved that scale alone can produce qualitative improvements in capabilities - few-shot learning, arithmetic, and reasoning emerged without explicit training. This validated the scaling hypothesis in a way that changed AI research priorities globally.

Few-Shot Learning Oversold

The paper conflates in-context learning with true few-shot learning. GPT-3 likely memorized many "few-shot" examples during training on internet data. Without careful data contamination analysis, few-shot claims are questionable.

Environmental Impact Ignored

Training GPT-3 generated ~552 tons of CO2. The paper completely ignores environmental costs of massive models, a serious ethical oversight given climate concerns.

Benchmark Contamination

With 300B tokens from the internet, GPT-3 likely saw many benchmark test sets during training. The paper's contamination analysis is superficial and doesn't adequately address this fundamental validity threat.

Reasoning vs Pattern Matching

The paper presents arithmetic and reasoning as emergent capabilities, but are these true reasoning or sophisticated pattern matching? The paper doesn't probe the depth of understanding.

Access Inequality

GPT-3's size makes it inaccessible to most researchers, creating a two-tier system in AI research. The paper doesn't grapple with how this concentration of capability affects the field's democratic ideals.

MachinaLearning - Machine Learning Education Platform