Pinecall

DataChannel Protocol

The raw WebRTC DataChannel protocol — every server event and every client command.

Most apps don't need to use the protocol directly — the state machine handles it. Use this when you need to read events the state doesn't expose (tool calls, audio metrics) or when you're sending custom client commands.

Accessing raw events#

Subscribe to the "event" listener — every server message is forwarded as-is:

session.addEventListener("event", (e) => {
  const raw = e.detail; // the parsed JSON from the server
  console.log(raw.event, raw);
});

Server → Client events#

Speech detection (STT)#

EventFieldsDescription
speech.startedUser started physically speaking (VAD detected voice)
speech.endedUser stopped speaking (VAD silence)
user.speakingtextSTT partial/interim result — text may change
user.messagetextSTT final result — text is locked, turn is over

Turn detection#

EventFieldsDescription
turn.pauseBrief silence detected — user might still be talking
turn.endSilence confirmed — user's turn is over, LLM starts
turn.resumedUser started speaking again during the pause

Bot speech (TTS)#

EventFieldsDescription
bot.speakingmessage_id, textTTS generation started. text has the full intended response. The widget intentionally starts empty and builds word-by-word.
bot.wordmessage_id, word, word_indexA single word was spoken by TTS. Arrives in real-time as audio plays.
bot.finishedmessage_id, textTTS completed normally. text is the polished final response.
bot.interruptedmessage_idUser barged in — TTS was cut short.

Audio metrics#

When enabled via session config (analysis.send_audio_metrics):

EventFieldsDescription
audio.metricssource, energy_db, rms, peak, is_speech, vad_probServer-side audio analysis. source is "user" or "bot". Sent every ~100ms.

Use it to build live waveform meters, energy bars, or VAD visualizations.

LLM / tool events#

These events are not processed by the state machine but are forwarded through the "event" listener. They come from the Pinecall server's LLM handler:

EventFieldsDescription
llm.thinkingLLM started generating a response
llm.tool_calltool_calls[], msg_id, call_idLLM requested tool/function calls. Each item has id, name, arguments (JSON string).
llm.tool_resultcall_id, msg_id, results[]Tool execution results sent back to LLM. Each item has tool_call_id, result.
llm.responsetext, finish_reasonLLM finished generating (text may be empty for tool-only turns)
llm.errorerrorLLM error occurred

Session limits#

EventFieldsDescription
session.idle_warningremaining_secondsUser hasn't spoken — call will timeout in remaining_seconds. Drives the idleWarning state field.
session.timeoutreasonSession timed out ("idle_timeout" or "max_duration"). The client auto-disconnects.

Client → Server commands#

The client sends these through the DataChannel:

MessageFormatDescription
Ping"ping" (string)Keepalive, sent every 1s by the SDK
Mute{ "action": "mute" }Stop processing user audio server-side
Unmute{ "action": "unmute" }Resume processing user audio
Configure{ "action": "configure", ...config }Hot-swap voice, STT, language, or turn detection mid-call
Inject Text{ "action": "inject_text", "text": "..." }Send text as if the user spoke it (for tool UI interactions)
Set Context{ "action": "set_context", "key": "...", "value": "..." }Inject/update keyed context in the LLM prompt

Most of these have helper methods on VoiceSession (toggleMute, configure). The lower-level commands (inject_text, set_context) are used by @pinecall/voice-widget to power the Tools API and dynamic context injection.

Worked examples#

Monitoring tool calls#

session.addEventListener("event", (e) => {
  const { event, tool_calls, results } = e.detail;

  if (event === "llm.tool_call" && tool_calls) {
    for (const tc of tool_calls) {
      console.log(`Agent calling ${tc.name}(${tc.arguments})`);
    }
  }
  if (event === "llm.tool_result") {
    console.log("Tool results:", results);
  }
});

Custom audio meter from audio.metrics#

const meter = document.getElementById("meter");

session.addEventListener("event", (e) => {
  if (e.detail.event === "audio.metrics" && e.detail.source === "user") {
    meter.style.width = `${e.detail.rms * 100}%`;
  }
});

Injecting text from a button click#

If you have UI components that the user can click to "say" something:

// User clicks "Yes, that's right" instead of saying it
document.getElementById("yes-btn").onclick = () => {
  session.send(JSON.stringify({ action: "inject_text", text: "Yes, that's right" }));
};

The @pinecall/voice-widget exposes this as the sendText() helper — see Tools API.

WebRTC connection flow#

For completeness, here's what happens when you call connect():

Browser                              Voice Server
   │                                       │
   ├─ GET /webrtc/token?agent_id=mara ────►│
   │◄─ { token, expiresIn } ───────────────┤
   │                                       │
   ├─ GET /webrtc/ice-servers ────────────►│
   │◄─ [{ urls: "stun:...", ... }] ────────┤
   │                                       │
   ├─ getUserMedia({ audio: true }) ───────│  (browser-local)
   ├─ new RTCPeerConnection(iceServers) ──│
   ├─ pc.addTrack(micTrack) ──────────────│
   ├─ pc.createDataChannel("events") ─────│
   ├─ pc.createOffer() ───────────────────│
   ├─ pc.setLocalDescription(offer) ──────│
   │                                       │
   ├─ POST /webrtc/offer { sdp, token } ──►│
   │◄─ { sdp: answer } ────────────────────┤
   │                                       │
   ├─ pc.setRemoteDescription(answer) ────│
   │  (ICE candidates exchanged)           │
   │                                       │
   │◄═══════ media + datachannel ═════════►│

What's next#