Google MediaPipe and WebRTC - How AI & ML can transform Live and Streaming Media - BigStep Technologies

post-template-default,single,single-post,postid-10588,single-format-standard,ajax_fade,page_not_loaded,,qode_grid_1200,vss_responsive_adv,qode-child-theme-ver-1.0.0,qode-theme-ver-1.0.0,qode-theme-bridge child,bridge-child,wpb-js-composer js-comp-ver-5.1.1,vc_responsive

Google MediaPipe and WebRTC – How AI & ML can transform Live and Streaming Media

January 18, 2023 0

MediaPipe is a cross-platform pipeline framework for building custom Machine Learning solutions for live and streaming media. This framework is open-sourced by Google and is currently in the alpha stage.

In this blog, we’ll learn about how WebRTC and Google MediaPipe are connected and how they can be helpful if combined

What is WebRTC (Real-Time Communication for the web)

With WebRTC, you can add Real-Time Communication capabilities to your application that works on top of an open standard. It supports video, voice, and generic data to be sent between peers, allowing developers to build powerful voice and video communication solutions.

What can WebRTC do?

There are many different use cases for WebRTC, from basic web apps that use the camera or microphone, to more advanced video-calling applications and screen sharing.

Using WebRTC as a technology we have developed our own SDK which we can use to build super cool applications on all platforms whether it’s the web, mobile, or desktop. We build many applications using third-party (Agora) SDKs across all platforms. Over the past few months, we are also gaining expertise with the Google MediaPipe Library built by Google.

What is MediaPipe used for?

The MediaPipe framework is mainly used for the rapid prototyping of perception pipelines with inference AI models and other reusable components. It also makes it easy to deploy computer vision applications to demos and applications on different hardware platforms.

AI Models Vs. Application

Image or video input data is usually loaded as separate streams and analyzed using neural networks such as TensorFlow, PyTorch, CNTK, or MXNet. Such models process data in a simple and deterministic way: one input generates one output, which allows processing to be performed very efficiently. MediaPipe, on the other hand, operates at a much higher level of semantics and allows for more complex and dynamic behavior. For example, one input can generate zero, one, or more outputs that cannot be modeled using neural networks. Video processing and AI perception require streaming processing compared to batch methods.

MediaPipe Concept

The MediaPipe Framework consists of three main elements: A framework for inference from sensory data (audio or video), A set of performance evaluation tools, and Reusable components for inference and processing (calculators) The main components of MediaPipe:

1. Packet: The basic unit of a data stream is called a “packet”. It consists of a numeric timestamp and a shared pointer to an immutable payload.

2. Graph: Processing takes place inside a graph that defines the packet flow paths between nodes. A graph can have any number of inputs and outputs and branch or merge data.

3. Nodes: Nodes are where most of the graph’s work happens. They are also called “calculators” (for historical reasons) and produce or consume packets. The interface of each node defines the number of input and output ports.

4. Streams: A stream is a connection between two nodes that transmits a sequence of packets with increasing timestamps.

Who can use MediaPipe?

MediaPipe was built for Machine Learning (ML) teams and software developers who implement production-ready ML applications, or for students and researchers who publish code and prototypes as part of their research work.

What are the Advantages of MediaPipe?

1. This unified framework provides solutions for Android, IOS, desktop, web, and IoT platforms.

2. It is open source, free, fully extensible, and customizable.

3. There are prebuilt ML solutions that demonstrate the full power of the MediaPipe library.

4. It provides prebuilt solutions in many languages including C++, JS, Python, Android, and iOS.

Demo Application using Google MediaPipe Hands

MediaPipe Hands is a high-fidelity hand and finger tracking solution. It employs Machine Learning (ML) to infer 21 3D landmarks of a hand from just a single frame. Whereas current state-of-the-art approaches rely primarily on powerful desktop environments for inference, this method achieves real-time performance on a mobile phone, and even scales to multiple hands.

We built a demo application on Google MediaPipe Hands using one of the MediaPipe features called “Hand Posed” where a user can answer the questions in the form of ‘Yes’ or ‘No’ by simply giving a thumbs up/down on the screen. The application will track the hand gestures you make while pointing to the front camera of your device. The app will take the respective answers without needing an input device. This demo application is an example of the possibilities of development using Google MediaPipe.

Final Words

Google MediaPipe provides a framework for cross-platform, customizable machine learning solutions for live and streaming media that enable live ML anywhere. This powerful tool is suitable for creating computer vision pipelines and complex applications.

If you are looking for an enterprise-grade solution to build customizable ML solutions, you can contact our team of experts at info@bigsteptech.com.

Dinesh Mangal

Technology Lead @BigStep Technologies. Specialized in WebRTC Technology and expert in solving modern technology problems.

Services

Our Specialities

Products

Packages

Mobile Apps

Platforms

Use Cases

APIs & SDK