Released May 13, 2024
proprietary license
128,000 context
closedmultimodal

Overview

GPT-4o (“omni”), released by OpenAI in May 2024, is a multimodal flagship model designed to unify text, image, and audio processing in a single system. Unlike earlier GPT-4 variants, GPT-4o supports real-time speech-to-speech interaction, enabling natural voice conversations alongside text and image reasoning. It features a context window of ~128,000 tokens for text input, with smaller output limits (commonly ~16K tokens), and has a knowledge cutoff of October 2023.

The model is optimized for efficiency and multilingual accessibility, supporting over 50 languages and covering ~97% of the world’s speakers. GPT-4o offers a cost-effective balance of speed and capability. It powers ChatGPT across free and paid tiers, making it widely accessible for applications in conversational AI, real-time translation, multimodal assistants, and global-scale communication tools.

Performance

Avg. Latency

Model Rankings

Supported Tasks