API Reference#
Kani#
- class kani.ext.realtime.OpenAIRealtimeKani(
- api_key: str | None = None,
- model='gpt-4o-realtime-preview-2024-10-01',
- *,
- ws_base: str = 'wss://api.openai.com/v1/realtime',
- headers: dict | None = None,
- system_prompt: str | None = None,
- chat_history: list[ChatMessage] | None = None,
- always_included_messages: list[ChatMessage] | None = None,
- **generation_args,
In addition to all of
kani.Kani‘s method, the OpenAIRealtimeKani provides the following two methods for interacting with the realtime API.- async connect(session_config: SessionConfig | None = None)[source]#
Connect to the WS and update the internal state until the engine is closed.
- async full_duplex(
- audio_stream: AsyncIterable[bytes],
- audio_callback: Callable[[bytes], Awaitable] | None = None,
- **kwargs,
Stream audio bytes from the given stream to the realtime model.
Yields a stream for each conversation item created (both USER and ASSISTANT). Each stream will be related to exactly one conversation item (i.e., message), and multiple streams may emit simultaneously.
To consume tokens from a stream, use this class as so:
stream_tasks = set() async def handle_stream(stream): # do processing for a single message's stream here... # this example code does NOT account for multiple simultaneous messages async for token in stream: print(token, end="") msg = await stream.message() async for stream in ai.full_duplex(audio_stream): task = asyncio.create_task(handle_stream(stream)) # to keep a live reference to the task # see https://docs.python.org/3/library/asyncio-task.html#creating-tasks stream_tasks.add(task) task.add_done_callback(stream_tasks.discard)
Check out the implementation of
chat_in_terminal_audio_async()for more in-depth stream handling (e.g., printing out streams simultaneously without clobbering other messages’ outputs).Each
StreamManagerobject yielded by this method contains aStreamManager.roleattribute that can be used to determine if a message is from the user, engine or a function call. This attribute will be available before iterating over the stream.Note
This method will exit once the
audio_streamis exhausted (i.e., the iterator raises StopAsyncIteration).Note
For lower-level control over the realtime chat session (e.g. to send events directly to the server), see
RealtimeSessionandevents.client. For example, you might use the following to request a response when serverside VAD is disabled:from kani.ext.realtime import events await ai.session.send(events.client.ResponseCreate())
See https://platform.openai.com/docs/api-reference/realtime-client-events for more details.
- Parameters:
audio_stream – An async iterator that emits audio frames (bytes). Audio frames should be encoded as raw 16 bit PCM audio at 24kHz, 1 channel, little-endian. See
get_audio_stream()to get such an audio stream from a system microphone.audio_callback – An async function that consumes audio frames as emitted by the model. Use
play_audio()to play the audio from the system speaker.
Audio Utilities#
- async kani.ext.realtime.audio.play_audio(audio_bytes: bytes)[source]#
Play the given audio at the next available opportunity, using a global audio queue.
This is a callback that should be passed to
full_round_stream()orchat_round_stream(), orOpenAIRealtimeKani.full_duplex()as theaudio_callbackparameter.
Command-Line Utilities#
- kani.ext.realtime.cli.chat_in_terminal_audio(
- kani: Kani,
- *,
- rounds: int = 0,
- stopword: str = None,
- echo: bool = False,
- ai_first: bool = False,
- width: int = None,
- show_function_args: bool = False,
- show_function_returns: bool = False,
- verbose: bool = False,
- mode: Literal['chat', 'stream', 'full_duplex'] = 'full_duplex',
- mic_id: int | None = None,
Chat with a kani right in your terminal.
Useful for playing with kani, quick prompt engineering, or demoing the library.
If the environment variable
KANI_DEBUGis set, debug logging will be enabled.Warning
This function is only a development utility and should not be used in production.
- Parameters:
rounds (int) – The number of chat rounds to play (defaults to 0 for infinite; chat or stream mode only).
stopword (str) – Break out of the chat loop if the user sends this message (chat or stream mode only).
echo (bool) – Whether to echo the user’s input to stdout after they send a message (e.g. to save in interactive notebook outputs; default false; chat or stream mode only)
ai_first (bool) – Whether the user should send the first message (default) or the model should generate a completion before prompting the user for a message.
width (int) – The maximum width of the printed outputs (default unlimited).
show_function_args (bool) – Whether to print the arguments the model is calling functions with for each call (default false).
show_function_returns (bool) – Whether to print the results of each function call (default false).
verbose (bool) – Equivalent to setting
echo,show_function_args, andshow_function_returnsto True.mode (str) – The chat mode: “chat” for turn-based chat without streaming, “stream” for turn-based chat with streaming and audio, “full_duplex” for realtime conversation from the system default mic.
mic_id (int) – The microphone ID to use for recording audio (default system default mic; full_duplex mode only)
- async kani.ext.realtime.cli.chat_in_terminal_audio_async(
- kani: OpenAIRealtimeKani,
- *,
- rounds: int = 0,
- stopword: str | None = None,
- echo: bool = False,
- ai_first: bool = False,
- width: int | None = None,
- show_function_args: bool = False,
- show_function_returns: bool = False,
- verbose: bool = False,
- mode: Literal['chat', 'stream', 'full_duplex'] = 'full_duplex',
- mic_id: int = 0,
Async version of
chat_in_terminal_audio(). Use in environments when there is already an asyncio loop running (e.g. Google Colab).
Additional Classes#
- class kani.ext.realtime.session.RealtimeSession(
- api_key: str,
- model='gpt-4o-realtime-preview-2024-10-01',
- *,
- ws_base: str = 'wss://api.openai.com/v1/realtime',
- headers: dict | None = None,
- **generation_args,
This is an internal object used to manage the state of the OpenAI Realtime session.
- async connect()[source]#
Connect to the WS, begin a task for event handling, and init the session.
You should usually call
OpenAIRealtimeKani.connect()instead of this.
- add_listener( )[source]#
Add a listener which is called for every event received from the WS. The listener must be an asynchronous function that takes in an event in a single argument.
- remove_listener(callback)[source]#
Remove a listener added by
add_listener().
- class kani.ext.realtime.models.TurnDetectionConfig(
- *,
- type: str = 'server_vad',
- threshold: float = 0.5,
- prefix_padding_ms: int = 300,
- silence_duration_ms: int = 500,
- class kani.ext.realtime.models.FunctionDefinition(*, type: str = 'function', name: str, description: str, parameters: dict)[source]#
- class kani.ext.realtime.models.ResponseConfig(
- *,
- modalities: list[str] = ['text', 'audio'],
- instructions: str = '',
- voice: Literal['alloy', 'echo', 'shimmer'] = 'alloy',
- output_audio_format: Literal['pcm16', 'g711_ulaw', 'g711_alaw'] = 'pcm16',
- tools: list[FunctionDefinition] = [],
- tool_choice: Literal['auto', 'none', 'required'] | str = 'auto',
- temperature: float = 0.8,
-
- tools: list[FunctionDefinition]#
- class kani.ext.realtime.models.SessionConfig(
- *,
- modalities: list[str] = ['text', 'audio'],
- instructions: str = '',
- voice: Literal['alloy', 'echo', 'shimmer'] = 'alloy',
- output_audio_format: Literal['pcm16', 'g711_ulaw', 'g711_alaw'] = 'pcm16',
- tools: list[FunctionDefinition] = [],
- tool_choice: Literal['auto', 'none', 'required'] | str = 'auto',
- temperature: float = 0.8,
- input_audio_format: Literal['pcm16', 'g711_ulaw', 'g711_alaw'] = 'pcm16',
- input_audio_transcription: AudioTranscriptionConfig | None = AudioTranscriptionConfig(model='whisper-1'),
- turn_detection: TurnDetectionConfig | None = TurnDetectionConfig(type='server_vad', threshold=0.5, prefix_padding_ms=300, silence_duration_ms=500),
-
- input_audio_transcription: AudioTranscriptionConfig | None#
- turn_detection: TurnDetectionConfig | None#
- class kani.ext.realtime.models.TextContentPart(*, type: Literal['input_text', 'text'] = 'input_text', text: str)[source]#
- class kani.ext.realtime.models.AudioContentPart(
- *,
- type: Literal['input_audio', 'audio'] = 'input_audio',
- audio: str | None = None,
- transcript: str | None,
- class kani.ext.realtime.models.ConversationItemBase(
- *,
- id: str = None,
- type: str,
- status: Literal['completed', 'in_progress', 'incomplete'] = 'completed',
- class kani.ext.realtime.models.MessageConversationItem(
- *,
- id: str = None,
- type: Literal['message'] = 'message',
- status: Literal['completed', 'in_progress', 'incomplete'] = 'completed',
- role: Literal['user', 'assistant', 'system'],
- content: list[TextContentPart | AudioContentPart] = [],
-
- content: list[TextContentPart | AudioContentPart]#
- class kani.ext.realtime.models.FunctionCallConversationItem(
- *,
- id: str = None,
- type: Literal['function_call'] = 'function_call',
- status: Literal['completed', 'in_progress', 'incomplete'] = 'completed',
- call_id: str,
- name: str,
- arguments: str,
- class kani.ext.realtime.models.FunctionCallOutputConversationItem(
- *,
- id: str = None,
- type: Literal['function_call_output'] = 'function_call_output',
- status: Literal['completed', 'in_progress', 'incomplete'] = 'completed',
- output: str,
- class kani.ext.realtime.models.ErrorDetails(*, type: str, code: str | None = None, message: str, param: str | None = None, event_id: str | None = None)[source]#
- class kani.ext.realtime.models.SessionDetails(
- *,
- modalities: list[str] = ['text', 'audio'],
- instructions: str = '',
- voice: Literal['alloy', 'echo', 'shimmer'] = 'alloy',
- output_audio_format: Literal['pcm16', 'g711_ulaw', 'g711_alaw'] = 'pcm16',
- tools: list[FunctionDefinition] = [],
- tool_choice: Literal['auto', 'none', 'required'] | str = 'auto',
- temperature: float = 0.8,
- input_audio_format: Literal['pcm16', 'g711_ulaw', 'g711_alaw'] = 'pcm16',
- input_audio_transcription: AudioTranscriptionConfig | None = AudioTranscriptionConfig(model='whisper-1'),
- turn_detection: TurnDetectionConfig | None = TurnDetectionConfig(type='server_vad', threshold=0.5, prefix_padding_ms=300, silence_duration_ms=500),
- id: str,
- object: Literal['realtime.session'],
- class kani.ext.realtime.models.ConversationDetails(*, id: str, object: Literal['realtime.conversation'])[source]#
- class kani.ext.realtime.models.UsageDetails(*, total_tokens: int, input_tokens: int, output_tokens: int)[source]#
- class kani.ext.realtime.models.RealtimeResponse(
- *,
- id: str,
- object: Literal['realtime.response'],
- status: Literal['in_progress', 'completed', 'cancelled', 'failed', 'incomplete'],
- status_details: dict | None,
- output: list[MessageConversationItem | FunctionCallConversationItem | FunctionCallOutputConversationItem] = [],
- usage: UsageDetails | None,
-
- output: list[MessageConversationItem | FunctionCallConversationItem | FunctionCallOutputConversationItem]#
- usage: UsageDetails | None#