MediaPipe is an open-source framework developed by Google for building real-time machine learning pipelines across mobile, web, desktop, and edge platforms. First released in 2019, the framework uses a graph-based architecture where pre-built components called Calculators process streaming data such as images, video, and audio through configurable computation graphs. This design allows developers to compose perception pipelines from reusable building blocks without writing custom glue code between models. The current MediaPipe Tasks API replaces the earlier Solutions API and provides a unified cross-platform interface for vision, text, and audio.
Rather than providing a single model, MediaPipe ships a suite of ready-to-use Tasks that wrap trained models for specific problems. These include MediaPipe Pose Landmarker for 33-point body landmark detection, Hand Landmarker for 21-point hand tracking, Face Landmarker which extends the earlier 468-point Face Mesh with blendshape outputs for facial expression, Selfie Segmentation for person-background separation, and Holistic Landmarker for combined body, hand, and face tracking. The Tasks prioritize on-device inference with low latency and support GPU acceleration where available, making the framework a common choice for mobile augmented reality, fitness and wellness applications, gesture-based interfaces, and accessibility features such as sign language recognition.
Usage
Past 30 DaysNot available
Not in Playground
Not yet ranked in arena
Other models worth comparing for similar use cases.
License terms and commercial-use guidance for MediaPipe.
License information is provided as a guide and is not legal advice.