logo

WebRTC

STUN/TURN

STUN and TURN are networking protocols used to solve a specific problem: How do two devices on the internet talk directly to each other when they are both hidden behind routers and firewalls?

This is the foundation of WebRTC (video calls like Zoom/Google Meet), VoIP, and online gaming.

The Problem: NAT (Network Address Translation)

Most devices do not have a "Public IP address." Instead, they have a "Private IP" (like 192.168.1.5) assigned by a router. The router has one Public IP that it shares for the whole house.

When Device A wants to call Device B:

  1. Device A doesn't know its own Public IP.
  2. Even if it did, Device B’s firewall will block incoming data because it didn’t "request" it.
  3. This is called NAT Traversal.

1. STUN (Session Traversal Utilities for NAT)

The "Mirror" Protocol.

STUN is the first thing a device tries. Its job is to help a device discover its own Public IP address and the type of NAT it is behind.

  • How it works: A device sends a request to a STUN server on the public internet. The STUN server looks at the incoming packet and says, "Hey, I received this from Public IP 203.0.113.5 on Port 5000." It sends that info back to the device.
  • The Result: Now the device knows its "Public Identity." It can tell the other peer: "Hey, send your video data to 203.0.113.5:5000."
  • Pros: Very fast and lightweight. The STUN server only handles the initial handshake; it doesn't touch the actual call/video data.
  • Cons: It fails about 20-30% of the time. If the router uses "Symmetric NAT" (common in large corporate offices), the STUN method won't work because the router will only allow data back from the STUN server itself, not from a random peer.

2. TURN (Traversal Using Relays around NAT)

The "Middleman" Protocol.

If STUN fails (usually due to strict corporate firewalls), the devices switch to TURN.

  • How it works: Instead of trying to connect directly, both devices connect to a TURN Server. Device A sends its video to the TURN server, and the server "relays" it to Device B.
  • The Result: Connection is guaranteed to work because both devices are making outgoing connections to a public server, which firewalls almost always allow.
  • Pros: It works 100% of the time, even on the strictest networks.
  • Cons: Expensive and Higher Latency. Because the TURN server has to carry the actual video/audio data, it requires a lot of bandwidth. It also adds a "hop" in the middle, which can cause lag.

3. ICE (The Manager)

You can't talk about STUN and TURN without mentioning ICE (Interactive Connectivity Establishment).

ICE is the framework that manages the two. When you start a video call:

  1. ICE tries to connect the devices via local IP (if they are on the same Wi-Fi).
  2. If that fails, ICE tries STUN to get a direct public connection.
  3. If that fails, ICE falls back to TURN as a last resort.

WebRTC

WebRTC (Web Real-Time Communication) is an open-source project and HTML5 standard that allows web browsers and mobile applications to communicate with each other in real-time (audio, video, and data) without needing any plugins or external software.

If you are using Google Meet, Discord (in a browser), or Zoom (web version), you are using WebRTC.

1. The Core Philosophy: Peer-to-Peer (P2P)

Before WebRTC, if you wanted to send video from User A to User B, the video usually had to go through a central server. This added latency (lag) and was expensive for the service provider.

WebRTC aims to connect the two browsers directly to each other. Once the connection is established, the video/audio flows directly between users, bypassing the server entirely.

2. The Three Main APIs

WebRTC is composed of three main JavaScript building blocks:

  • MediaStream (getUserMedia): This gives the browser permission to access your camera and microphone. It handles things like echo cancellation and volume normalization.
  • RTCPeerConnection: This is the "brain." It handles the complex math and networking required to establish a stable connection between two peers, including encryption and bandwidth management.
  • RTCDataChannel: This allows peers to send arbitrary data (non-video) back and forth. It is used for ultra-low latency activities like file sharing or multiplayer gaming state updates.

3. How a WebRTC Connection is Made

A WebRTC connection happens in three phases: Signaling, Connecting, and Streaming.

Phase A: Signaling (The Handshake)

Paradoxically, WebRTC does not define how two devices find each other. You have to provide your own "Signaling Server" (usually using WebSockets).

  1. Peer A creates an SDP Offer (a text file describing their video format, resolution, etc.).
  2. Peer A sends this offer to Peer B via the Signaling Server.
  3. Peer B sends back an SDP Answer. Now they know what kind of media they can both support.

Phase B: Connecting (The NAT Traversal)

This is where STUN/TURN (which we discussed earlier) comes in.

  1. Both peers use STUN to find their public IP addresses.
  2. They exchange "ICE Candidates" (possible routes to find them).
  3. If they can't find a direct path, they fall back to a TURN relay server.

Phase C: Streaming (The P2P Flow)

Once the connection is established, the media begins to flow.

  • Security: All WebRTC traffic is mandatory encrypted using DTLS (Datagram Transport Layer Security) and SRTP (Secure Real-time Transport Protocol).

4. Why is WebRTC a "Big Deal"?

  1. No Plugins: In the 2000s, you needed Adobe Flash or a Java Applet to do video in a browser. WebRTC made it a native part of the web.
  2. Ultra-Low Latency: Because it uses UDP (instead of TCP), it doesn't wait for lost packets to be re-sent. This makes it much faster than streaming protocols like HLS or DASH (which have 5–30 seconds of lag).
  3. Adaptive Bitrate: It automatically lowers the video quality if your Wi-Fi gets weak to prevent the call from dropping.

5. The Limitations (The "Mesh" Problem)

WebRTC is perfect for 1-on-1 calls. However, it struggles with Large Group Calls.

  • Mesh Network: If you have 10 people in a call, your computer has to upload your video 9 times (once to each person). This will likely crash your upload bandwidth or overheat your laptop.
  • The Solution: For large groups, developers use a Media Server (called an SFU - Selective Forwarding Unit). You upload your video once to the server, and the server distributes it to everyone else.