Pinecall

Server-side vs Client-side LLM

The single most important architectural decision when building a Pinecall agent.

The two modes#

The Pinecall server runs the LLM. You give it a prompt, a model, and (optionally) tool definitions. The server handles STT, runs the LLM, generates TTS — you only handle tool calls.

const agent = pc.agent("receptionist", {
  voice: "elevenlabs:abc",
  language: "en",
  llm: {
    provider: "openai",
    model: "gpt-4.1-mini",
    enabled: true,
    prompt: "You are a helpful receptionist. Be concise.",
  },
});

agent.on("llm.tool_call", async (data, call) => {
  const results = [];
  for (const tc of data.toolCalls) {
    const args = JSON.parse(tc.arguments);
    const result = await myToolHandler(tc.name, args);
    results.push({ toolCallId: tc.id, result });
  }
  call.toolResult(data.msgId, results);
});

Client-side LLM (bring your own)#

You run the LLM yourself. The server handles STT → text and text → TTS. You receive the user's text on turn.end, generate a response with whatever LLM you want, and stream it back.

import OpenAI from "openai";
const openai = new OpenAI();

const agent = pc.agent("my-bot", { voice: "cartesia:abc", language: "en" });

agent.on("turn.end", async (turn, call) => {
  const stream = call.replyStream(turn);
  const completion = await openai.chat.completions.create({
    model: "gpt-4.1-mini",
    messages: [
      { role: "system", content: "You are helpful. Be concise." },
      { role: "user", content: turn.text },
    ],
    stream: true,
  });
  for await (const chunk of completion) {
    if (stream.aborted) break;
    const token = chunk.choices[0]?.delta?.content;
    if (token) stream.write(token);
  }
  stream.end();
});

Which one to choose#

Server-sideClient-side
LLM choiceOpenAI, Mistral (more coming)Any provider, any model, local
You handle conversation history❌ Server does it✅ You do it
You see tool calls✅ Via llm.tool_call✅ You define them
Easier to ship✅ YesSlightly more code
Required for WhatsApp✅ Yes❌ No (server-side only)
LatencySlightly lower (LLM runs near the audio pipeline)Depends on your provider
CostPinecall passes through provider costYou pay your provider directly

Pick server-side if: you're using OpenAI or Mistral, you want the simplest possible code, or you need WhatsApp.

Pick client-side if: you need a specific LLM Pinecall doesn't host (Anthropic, local Ollama, fine-tuned model), you have an existing LangChain/LlamaIndex pipeline, or you need full control over the prompt-building logic.

You can mix them#

A single Pinecall instance can host multiple agents, each with a different LLM strategy:

// Server-side agent for WhatsApp + phone
const support = pc.agent("support", {
  llm: { provider: "openai", model: "gpt-4.1-mini", enabled: true, prompt: "..." },
});
support.addChannel("whatsapp", { /* config */ });
support.addChannel("phone", "+13186330963");

// Client-side agent using Anthropic for a specialized use case
const research = pc.agent("research", { voice: "elevenlabs:xyz", language: "en" });
research.addChannel("webrtc");
research.on("turn.end", async (turn, call) => {
  /* call Anthropic, stream back */
});

What about hybrid?#

What if you want to use the server-side LLM but inject context or modify history mid-call? You can:

  • Inject context dynamicallycall.addContext("Recent order: #12345 shipped today")
  • Replace the prompt mid-callcall.setPrompt("Now you're in escalation mode.")
  • Set template variables — define {{customer_name}} in the prompt, fill it per-call
  • Modify historycall.addHistory([...]), call.setHistory([...]), call.clearHistory()

See Hot-Reload for the full set of mid-call controls.

What's next#