Case Study · Communication Technology

WebRTC communication platform with 80% lower SDK costs and 50% less latency

How we built a real-time communication system with text, audio, and video using native WebRTC — eliminating third-party SDK dependency and cutting communication latency in half.

IndustrySaaS / Communication
RegionInternational
Timeline12 weeks
StackReact, Node.js, Socket.io, WebRTC (native), coturn, PostgreSQL

The situation

A startup building a product that required embedded real-time communication had evaluated Twilio, Agora, and Daily.co for video and audio. All three had per-minute or per-participant pricing that made the unit economics unworkable at their projected scale — the cost would exceed their customer revenue per seat at any reasonable usage level. They also had a hard requirement that conversation data not transit third-party infrastructure for compliance reasons.

A junior developer had prototyped a WebRTC solution but run into NAT traversal failures — connections worked on the same network but broke for participants behind firewalls or on corporate networks. The prototype also had no persistence (closing the tab ended the session permanently) and no room management.

What we built

WebRTC peer-to-peer signalling layer

A Node.js signalling server using Socket.io manages the WebRTC offer/answer/ICE candidate exchange. The signalling server is stateless in terms of media — it only coordinates the handshake and then gets out of the way. Media flows peer-to-peer or through TURN when NAT traversal requires it, not through our servers.

coturn TURN/STUN infrastructure

A self-hosted coturn server handles NAT traversal for participants behind restrictive firewalls. STUN handles the common case (direct peer-to-peer). TURN relay handles the hard cases. Deploying our own TURN server rather than using a managed service eliminated the per-relay-minute cost and kept media traffic within our infrastructure boundary.

Room and session management

Persistent rooms with access codes, waiting room queue for participants before the host admits them, session recording to S3 (audio/video), and replay on the platform. Participants who accidentally disconnect can rejoin a session in progress. Room state persists even if the host momentarily drops — the session doesn't end until explicitly closed.

Text chat and screen sharing

In-session text chat using the WebRTC data channel (not a separate Socket.io connection) — lower latency and no additional server load. Screen sharing uses the browser's native `getDisplayMedia` API, surfacing as an additional video track alongside the camera feed.

React embedded component

The entire communication interface is packaged as a React component that accepts a room ID and user token. The host product integrates it with four lines of code. Theming is via CSS custom properties so it matches the host product's design.

Results

  • ~50% reduction in communication latency vs. the third-party SDK prototype
  • ~35% improvement in connection stability (NAT traversal success rate)
  • ~80% reduction in SDK/infrastructure costs vs. per-minute third-party pricing
  • NAT traversal failure rate (the original prototype's core problem) reduced to under 2%
  • Data sovereignty requirement met — no conversation media transits third-party infrastructure

Own your communication infrastructure. Stop paying per minute.

Book Your Strategy Call