Skip to main content

Prerequisites

Smithers ships with voice support built in. You need:
  • An OpenAI API key (or another AI SDK-supported provider)
  • smithers-orchestrator version 0.12.8 or later

Install

No extra packages. The ai and @ai-sdk/openai dependencies are already included.

Create a Voice Provider

The simplest provider wraps AI SDK models for batch TTS and STT:
import { createAiSdkVoice } from "smithers-orchestrator/voice";
import { openai } from "@ai-sdk/openai";

const voice = createAiSdkVoice({
  speechModel: openai.speech("tts-1"),
  transcriptionModel: openai.transcription("whisper-1"),
});

Add Voice to a Workflow

Wrap tasks with the <Voice> component:
import { Workflow, Task, Voice, createSmithers } from "smithers-orchestrator";
import { z } from "zod";

const { outputs, workflow } = createSmithers({
  transcript: z.object({ text: z.string() }),
  summary: z.object({ content: z.string() }),
});

export default (
  <Workflow>
    <Voice provider={voice} speaker="alloy">
      <Task id="transcribe" output={outputs.transcript} agent={myAgent}>
        Transcribe the audio input and return the text.
      </Task>
      <Task id="summarize" output={outputs.summary} agent={myAgent} dependsOn={["transcribe"]}>
        Summarize the transcript.
      </Task>
    </Voice>
  </Workflow>
);

Use Composite Voice

Mix different providers for input and output:
import { createCompositeVoice, createAiSdkVoice } from "smithers-orchestrator/voice";
import { openai } from "@ai-sdk/openai";

const stt = createAiSdkVoice({
  transcriptionModel: openai.transcription("whisper-1"),
});

const tts = createAiSdkVoice({
  speechModel: openai.speech("tts-1"),
});

const voice = createCompositeVoice({
  input: stt,
  output: tts,
});

Use Realtime Voice

For low-latency bidirectional audio, use the OpenAI Realtime provider:
import { createOpenAIRealtimeVoice } from "smithers-orchestrator/voice";

const realtime = createOpenAIRealtimeVoice({
  apiKey: process.env.OPENAI_API_KEY,
  model: "gpt-4o-mini-realtime-preview-2024-12-17",
  speaker: "alloy",
});

// Connect before use
await realtime.connect();

// Listen for events
realtime.on("speaking", (data) => {
  // handle audio output
});

realtime.on("writing", (data) => {
  // handle text transcription
});

// Send audio
await realtime.send(audioStream);

// Disconnect when done
realtime.close();

Voice with Effect.ts

Use the Effect service layer for typed voice operations:
import { VoiceService, speak, listen } from "smithers-orchestrator/voice";
import { Effect } from "effect";

const program = Effect.gen(function* () {
  const text = yield* listen(audioStream);
  const audio = yield* speak(`The transcript says: ${text}`);
  return { text, audio };
}).pipe(Effect.provideService(VoiceService, voice));

Supported Providers

Any provider supported by the Vercel AI SDK works with createAiSdkVoice:
ProviderTTSSTT
OpenAIopenai.speech("tts-1")openai.transcription("whisper-1")
ElevenLabselevenlabs.speech(...)elevenlabs.transcription(...)
Deepgramdeepgram.transcription("nova-3")
Googlegoogle.speech(...)google.transcription(...)
For realtime speech-to-speech, use createOpenAIRealtimeVoice directly.