Pinecall

State and Phases

The reactive state model: status, phases, transcript messages, and lifecycles.

State shape#

interface VoiceSessionState {
  status: "idle" | "connecting" | "connected" | "error";
  error: string | null;
  isMuted: boolean;
  phase: "idle" | "listening" | "speaking" | "pause" | "thinking";
  userSpeaking: boolean;
  agentSpeaking: boolean;
  duration: number;             // seconds since connected
  messages: TranscriptMessage[];
  idleWarning: number | null;   // seconds until idle timeout (null = no warning)
}
FieldMeaning
statusThe connection lifecycle. idleconnectingconnected → (idle on disconnect, or error).
errorPopulated when status === "error". Always check this when handling errors.
isMutedMic state. Mirrors setMuted() / toggleMute().
phaseWhat the conversation is doing right now (see below).
userSpeakingtrue between speech.started and speech.ended events. Use for live waveform UIs.
agentSpeakingtrue while TTS is playing.
durationSeconds since status became connected. Updates every second.
messagesFull transcript — user and bot turns. See Transcript messages below.
idleWarningWhen the server emits session.idle_warning, this holds the seconds remaining until timeout. null when no warning is active.

Call phases#

phase tells you what the conversation is doing right now. It's the field you'll bind to UI state most often (orb color, animation, status label).

PhaseMeaningTriggered by
idleNot in a callInitial state, after disconnect
listeningMic is hot, waiting for speechConnection established; after bot finishes; after turn.resumed
speakingAgent is speaking (TTS playing)First bot.word event
thinkingProcessing user input, waiting for LLMuser.message (STT final), turn.end
pauseTurn detection pause — user may still be talkingturn.pause (brief silence detected)

Typical flow during one exchange:

listening ──► (user speaks) ──► thinking ──► speaking ──► (bot finishes) ──► listening

                                   │ (turn.pause / turn.resumed cycles)

                                 pause

Transcript messages#

The messages array contains the full conversation history. Each message is structured:

interface TranscriptMessage {
  id: number;
  role: "user" | "bot";
  text: string;
  isInterim?: boolean;     // user only: STT is still processing
  speaking?: boolean;      // bot only: TTS is playing this message
  interrupted?: boolean;   // bot only: user barged in
  messageId?: string;      // bot only: server-assigned ID
}

Messages mutate in place as STT refines, words stream in, and the bot finishes speaking — they don't get replaced. That means if you bind to messages reactively, the right entry will update.

User message lifecycle#

user.speaking → { role: "user", text: "Hola",     isInterim: true }
                                  text updates as STT refines...
user.speaking → { role: "user", text: "Hola que", isInterim: true }
user.message  → { role: "user", text: "Hola, ¿qué tal?", isInterim: false }

If you're rendering a transcript, render isInterim: true messages with reduced opacity or a "typing" indicator so the user sees that the STT is still processing.

Bot message lifecycle (word-by-word)#

bot.speaking  → { role: "bot", text: "",                 speaking: true, messageId: "abc" }
bot.word      → text: "Hello"
bot.word      → text: "Hello there"
bot.word      → text: "Hello there how"
bot.word      → text: "Hello there how are"
bot.word      → text: "Hello there how are you"
bot.finished  → { speaking: false, text: "Hello there, how are you?" }

bot.speaking arrives with the full intended text, but the widget intentionally starts with text: "" and builds word-by-word so the on-screen captions stay in sync with the audio.

bot.finished may include a polished final text (with proper punctuation that the per-word stream doesn't have).

Interrupted bot#

When the user barges in mid-utterance:

bot.word        → text: "Hello there how"
bot.interrupted → { speaking: false, interrupted: true }

Render interrupted messages with a visual marker (e.g. icon, ellipsis, gray border) so users see the bot was cut off rather than just suddenly stopping.

Subscribing to changes#

The state object is stable by identitygetState() returns the same reference until something changes. This is what makes it safe for React's useSyncExternalStore:

session.subscribe(() => {
  const next = session.getState(); // new reference only if state changed
  // ...
});

For more targeted updates, subscribe to specific events:

session.addEventListener("phase", (e) => {
  // only fires when phase actually changes
  document.body.dataset.phase = e.detail.phase;
});

Driving UI from phase and agentSpeaking#

A common pattern: bind your "orb" or status visual to phase for the overall mode, and use agentSpeaking for a faster-reacting animation layer.

const orb = document.getElementById("orb");

session.subscribe(() => {
  const { phase, agentSpeaking, idleWarning } = session.getState();

  orb.dataset.phase = phase;                    // CSS handles per-phase styling
  orb.classList.toggle("speaking", agentSpeaking);
  orb.classList.toggle("idle-warning", idleWarning !== null);
});

The voice-widget package follows exactly this pattern — see its theming guide for the full set of CSS classes.

What's next#