← Back

Industry Ingest [Archive]

Historical record of major AI breakthroughs and milestones

Category:
Showing 31 of 31 developments

2025(4 developments)

Image GenerationRecent

Google's 'Nano Banana' AI Image Editor Launches in Gemini 2.5 Flash

Google releases Gemini 2.5 Flash Image (codenamed 'Nano Banana'), a breakthrough AI image editor that excels at maintaining character consistency while enabling natural language transformations and multi-image blending. Available via Gemini API at $0.04 per image.

Read more →
Foundation ModelsRecent

GPT-5: OpenAI's Most Advanced Model

OpenAI unveils GPT-5, breaking the 98% barrier on HumanEval with 98.2% and achieving 99.1% on GSM8K. The model demonstrates unprecedented capabilities in code generation and mathematical reasoning.

Read more →
Foundation ModelsRecent

Claude Opus 4.1: Anthropic's Latest Flagship Model

Anthropic releases Claude Opus 4.1 with breakthrough performance, achieving 97.5% on HumanEval and 98.7% on GSM8K. Features enhanced reasoning capabilities and improved long-context understanding.

Read more →
World ModelsRecent

Genie 3: Google DeepMind's Interactive World Model

DeepMind unveils Genie 3, a foundation world model generating real-time interactive environments at 24 fps from text prompts. The model represents a crucial stepping stone toward AGI with auto-regressive frame generation and physical consistency without explicit 3D representation.

Read more →

2024(10 developments)

Foundation Models

Gemini 2.0: Native Tool Use and Agentic Capabilities

Google releases Gemini 2.0 with native tool use capabilities, enabling AI agents to interact with external systems and perform complex multi-step tasks autonomously.

Read more →
Foundation Models

Claude 3.5 Computer Use: AI Controls Computers

Anthropic introduces computer use capabilities, allowing Claude to control computers through screenshots, opening new possibilities for automation and accessibility.

Read more →
Foundation Models

OpenAI o1: Reasoning Through Chain-of-Thought

OpenAI releases o1-preview, a model that uses extended chain-of-thought reasoning to solve complex problems in mathematics, coding, and science.

Read more →
Foundation Models

Llama 3.1 405B: Meta's Open-Source Frontier Model

Meta releases Llama 3.1 with a 405B parameter model, the largest open-source model to date, democratizing access to frontier AI capabilities.

Read more →
Foundation Models

Claude 3.5 Sonnet: Enhanced Coding and Reasoning

Anthropic releases Claude 3.5 Sonnet with significant improvements in code generation, mathematical reasoning, and visual understanding.

Read more →
Foundation Models

GPT-4o: Omni-Modal Understanding

OpenAI introduces GPT-4o with native multimodal capabilities, processing text, vision, and audio in a unified architecture for more natural interactions.

Read more →
Research

AI Index 2024: AI Beats Humans on Several Benchmarks

Stanford HAI's AI Index reveals AI now surpasses human performance on several benchmarks including image classification, visual reasoning, and English understanding, but still lags in complex tasks like math and planning.

Read more →
Foundation Models

Claude 3: Opus, Sonnet, and Haiku Family

Anthropic releases the Claude 3 model family with vision capabilities, offering different sizes for various use cases from edge devices to complex reasoning tasks.

Read more →
Computer Vision

Sora: OpenAI's Text-to-Video Generation

OpenAI unveils Sora, a text-to-video model capable of generating minute-long videos with consistent characters, physics, and camera movements.

Read more →
Foundation Models

Gemini 1.5: 1M Token Context Window

Google introduces Gemini 1.5 with a breakthrough 1 million token context window, enabling processing of entire books, codebases, and long videos.

Read more →

2023(7 developments)

Foundation Models

Gemini: Google's Multimodal Model Family

Google launches Gemini, a family of multimodal models designed from the ground up to understand text, images, audio, video, and code.

Read more →
Foundation Models

GPT-4 Turbo: 128K Context and Improved Performance

OpenAI releases GPT-4 Turbo with 128K context window, improved instruction following, and significantly reduced costs.

Read more →
Computer Vision

DALL-E 3: Advanced Text Understanding for Images

OpenAI launches DALL-E 3 with dramatically improved text understanding and adherence to prompts, integrated with ChatGPT for iterative refinement.

Read more →
Foundation Models

Claude 2: 100K Context Window

Anthropic releases Claude 2 with 100K token context window, improved reasoning, and better performance on coding and mathematical tasks.

Read more →
Foundation Models

PaLM 2: Google's Multilingual Reasoning Model

Google announces PaLM 2 with improved reasoning capabilities, better multilingual support, and efficient scaling across different model sizes.

Read more →
Foundation Models

GPT-4: Multimodal Capabilities

OpenAI releases GPT-4 with vision capabilities, improved reasoning, and significantly better performance on academic and professional exams.

Read more →
Foundation Models

Claude: Constitutional AI Assistant

Anthropic launches Claude, an AI assistant trained with Constitutional AI to be helpful, harmless, and honest.

Read more →

2022(4 developments)

Foundation Models

ChatGPT: Conversational AI Reaches Mainstream

OpenAI releases ChatGPT, reaching 100 million users in just 2 months and sparking global interest in conversational AI.

Read more →
Computer Vision

Stable Diffusion: Open-Source Image Generation

Stability AI releases Stable Diffusion, democratizing AI art generation with an open-source model that can run on consumer hardware.

Read more →
Computer Vision

Imagen: Google's Photorealistic Text-to-Image

Google Research unveils Imagen, a text-to-image diffusion model with unprecedented photorealism and deep language understanding.

Read more →
Foundation Models

PaLM: 540B Parameters Show Emergent Abilities

Google introduces PaLM with 540 billion parameters, demonstrating breakthrough performance and emergent capabilities in reasoning tasks.

Read more →

2021(3 developments)

Foundation Models

Codex: AI Code Generation Powers GitHub Copilot

OpenAI releases Codex, a descendant of GPT-3 fine-tuned on code, powering GitHub Copilot and revolutionizing programming assistance.

Read more →
Computer Vision

DALL-E: Text-to-Image Generation

OpenAI introduces DALL-E, demonstrating the ability to generate creative and diverse images from text descriptions.

Read more →
Computer Vision

CLIP: Connecting Text and Images

OpenAI releases CLIP, a neural network that learns visual concepts from natural language supervision, enabling zero-shot image classification.

Read more →

2020(3 developments)

Research

AlphaFold 2: Solving Protein Folding

DeepMind's AlphaFold 2 achieves a breakthrough in protein structure prediction, solving a 50-year grand challenge in biology.

Read more →
Foundation Models

GPT-3: 175B Parameters Enable Few-Shot Learning

OpenAI releases GPT-3 with 175 billion parameters, demonstrating remarkable few-shot learning capabilities across diverse tasks.

Read more →
Research

Scaling Laws for Neural Language Models

OpenAI discovers predictable scaling relationships between model size, dataset size, and performance, guiding future model development.

Read more →