DataChannel Protocol

Most apps don't need to use the protocol directly — the state machine handles it. Use this when you need to read events the state doesn't expose (tool calls, audio metrics) or when you're sending custom client commands.

Accessing raw events#

Subscribe to the "event" listener — every server message is forwarded as-is:

session.addEventListener("event", (e) => {
  const raw = e.detail; // the parsed JSON from the server
  console.log(raw.event, raw);
});

Server → Client events#

Speech detection (STT)#

Event	Fields	Description
`speech.started`	—	User started physically speaking (VAD detected voice)
`speech.ended`	—	User stopped speaking (VAD silence)
`user.speaking`	`text`	STT partial/interim result — text may change
`user.message`	`text`	STT final result — text is locked, turn is over

Turn detection#

Event	Fields	Description
`turn.pause`	—	Brief silence detected — user might still be talking
`turn.end`	—	Silence confirmed — user's turn is over, LLM starts
`turn.resumed`	—	User started speaking again during the pause

Bot speech (TTS)#

Event	Fields	Description
`bot.speaking`	`message_id`, `text`	TTS generation started. `text` has the full intended response. The widget intentionally starts empty and builds word-by-word.
`bot.word`	`message_id`, `word`, `word_index`	A single word was spoken by TTS. Arrives in real-time as audio plays.
`bot.finished`	`message_id`, `text`	TTS completed normally. `text` is the polished final response.
`bot.interrupted`	`message_id`	User barged in — TTS was cut short.

Audio metrics#

When enabled via session config (analysis.send_audio_metrics):

Event	Fields	Description
`audio.metrics`	`source`, `energy_db`, `rms`, `peak`, `is_speech`, `vad_prob`	Server-side audio analysis. `source` is `"user"` or `"bot"`. Sent every ~100ms.

Use it to build live waveform meters, energy bars, or VAD visualizations.

LLM / tool events#

These events are not processed by the state machine but are forwarded through the "event" listener. They come from the Pinecall server's LLM handler:

Event	Fields	Description
`llm.thinking`	—	LLM started generating a response
`llm.tool_call`	`tool_calls[]`, `msg_id`, `call_id`	LLM requested tool/function calls. Each item has `id`, `name`, `arguments` (JSON string).
`llm.tool_result`	`call_id`, `msg_id`, `results[]`	Tool execution results sent back to LLM. Each item has `tool_call_id`, `result`.
`llm.response`	`text`, `finish_reason`	LLM finished generating (text may be empty for tool-only turns)
`llm.error`	`error`	LLM error occurred

Session limits#

Event	Fields	Description
`session.idle_warning`	`remaining_seconds`	User hasn't spoken — call will timeout in `remaining_seconds`. Drives the `idleWarning` state field.
`session.timeout`	`reason`	Session timed out (`"idle_timeout"` or `"max_duration"`). The client auto-disconnects.

Client → Server commands#

The client sends these through the DataChannel:

Message	Format	Description
Ping	`"ping"` (string)	Keepalive, sent every 1s by the SDK
Mute	`{ "action": "mute" }`	Stop processing user audio server-side
Unmute	`{ "action": "unmute" }`	Resume processing user audio
Configure	`{ "action": "configure", ...config }`	Hot-swap voice, STT, language, or turn detection mid-call
Inject Text	`{ "action": "inject_text", "text": "..." }`	Send text as if the user spoke it (for tool UI interactions)
Set Context	`{ "action": "set_context", "key": "...", "value": "..." }`	Inject/update keyed context in the LLM prompt

Most of these have helper methods on VoiceSession (toggleMute, configure). The lower-level commands (inject_text, set_context) are used by @pinecall/voice-widget to power the Tools API and dynamic context injection.

Worked examples#

Monitoring tool calls#

session.addEventListener("event", (e) => {
  const { event, tool_calls, results } = e.detail;

  if (event === "llm.tool_call" && tool_calls) {
    for (const tc of tool_calls) {
      console.log(`Agent calling ${tc.name}(${tc.arguments})`);
    }
  }
  if (event === "llm.tool_result") {
    console.log("Tool results:", results);
  }
});

Custom audio meter from `audio.metrics`#

const meter = document.getElementById("meter");

session.addEventListener("event", (e) => {
  if (e.detail.event === "audio.metrics" && e.detail.source === "user") {
    meter.style.width = `${e.detail.rms * 100}%`;
  }
});

Injecting text from a button click#

If you have UI components that the user can click to "say" something:

// User clicks "Yes, that's right" instead of saying it
document.getElementById("yes-btn").onclick = () => {
  session.send(JSON.stringify({ action: "inject_text", text: "Yes, that's right" }));
};

The @pinecall/voice-widget exposes this as the sendText() helper — see Tools API.

WebRTC connection flow#

For completeness, here's what happens when you call connect():

Browser                              Voice Server
   │                                       │
   ├─ GET /webrtc/token?agent_id=mara ────►│
   │◄─ { token, expiresIn } ───────────────┤
   │                                       │
   ├─ GET /webrtc/ice-servers ────────────►│
   │◄─ [{ urls: "stun:...", ... }] ────────┤
   │                                       │
   ├─ getUserMedia({ audio: true }) ───────│  (browser-local)
   ├─ new RTCPeerConnection(iceServers) ──│
   ├─ pc.addTrack(micTrack) ──────────────│
   ├─ pc.createDataChannel("events") ─────│
   ├─ pc.createOffer() ───────────────────│
   ├─ pc.setLocalDescription(offer) ──────│
   │                                       │
   ├─ POST /webrtc/offer { sdp, token } ──►│
   │◄─ { sdp: answer } ────────────────────┤
   │                                       │
   ├─ pc.setRemoteDescription(answer) ────│
   │  (ICE candidates exchanged)           │
   │                                       │
   │◄═══════ media + datachannel ═════════►│

What's next#

VoiceSession — the high-level API
State and phases — how raw events map to state
Tools API — building UI that responds to llm.tool_call