API Reference#

Kani#

class kani.ext.realtime.OpenAIRealtimeKani(

api_key: str | None = None,

model='gpt-4o-realtime-preview-2024-10-01',

*,

ws_base: str = 'wss://api.openai.com/v1/realtime',

headers: dict | None = None,

system_prompt: str | None = None,

chat_history: list[ChatMessage] | None = None,

always_included_messages: list[ChatMessage] | None = None,

**generation_args,

)[source]#

In addition to all of kani.Kani‘s method, the OpenAIRealtimeKani provides the following two methods for interacting with the realtime API.

async connect(session_config: SessionConfig | None = None)[source]#: Connect to the WS and update the internal state until the engine is closed.

async full_duplex(

audio_stream: AsyncIterable[bytes],

audio_callback: Callable[[bytes], Awaitable] | None = None,

**kwargs,

) → AsyncIterable[StreamManager][source]#

Stream audio bytes from the given stream to the realtime model.

Yields a stream for each conversation item created (both USER and ASSISTANT). Each stream will be related to exactly one conversation item (i.e., message), and multiple streams may emit simultaneously.

To consume tokens from a stream, use this class as so:

stream_tasks = set()

async def handle_stream(stream):
    # do processing for a single message's stream here...
    # this example code does NOT account for multiple simultaneous messages
    async for token in stream:
        print(token, end="")
    msg = await stream.message()

async for stream in ai.full_duplex(audio_stream):
    task = asyncio.create_task(handle_stream(stream))
    # to keep a live reference to the task
    # see https://docs.python.org/3/library/asyncio-task.html#creating-tasks
    stream_tasks.add(task)
    task.add_done_callback(stream_tasks.discard)

Check out the implementation of chat_in_terminal_audio_async() for more in-depth stream handling (e.g., printing out streams simultaneously without clobbering other messages’ outputs).

Each StreamManager object yielded by this method contains a StreamManager.role attribute that can be used to determine if a message is from the user, engine or a function call. This attribute will be available before iterating over the stream.

Note

This method will exit once the audio_stream is exhausted (i.e., the iterator raises StopAsyncIteration).

Note

For lower-level control over the realtime chat session (e.g. to send events directly to the server), see RealtimeSession and events.client. For example, you might use the following to request a response when serverside VAD is disabled:

from kani.ext.realtime import events

await ai.session.send(events.client.ResponseCreate())

See https://platform.openai.com/docs/api-reference/realtime-client-events for more details.

Parameters:

audio_stream – An async iterator that emits audio frames (bytes). Audio frames should be encoded as raw 16 bit PCM audio at 24kHz, 1 channel, little-endian. See get_audio_stream() to get such an audio stream from a system microphone.
audio_callback – An async function that consumes audio frames as emitted by the model. Use play_audio() to play the audio from the system speaker.

Audio Utilities#

kani.ext.realtime.audio.get_audio_stream(*_, **__)[source]#

kani.ext.realtime.audio.list_mics(*_, **__)[source]#

async kani.ext.realtime.audio.play_audio(audio_bytes: bytes)[source]#

Play the given audio at the next available opportunity, using a global audio queue.

This is a callback that should be passed to full_round_stream() or chat_round_stream(), or OpenAIRealtimeKani.full_duplex() as the audio_callback parameter.

Command-Line Utilities#

kani.ext.realtime.cli.chat_in_terminal_audio( kani: Kani, *, rounds: int = 0, stopword: str = None, echo: bool = False, ai_first: bool = False, width: int = None, show_function_args: bool = False, show_function_returns: bool = False, verbose: bool = False, mode: Literal['chat', 'stream', 'full_duplex'] = 'full_duplex', mic_id: int | None = None, )[source]#

Chat with a kani right in your terminal.

Useful for playing with kani, quick prompt engineering, or demoing the library.

If the environment variable KANI_DEBUG is set, debug logging will be enabled.

Warning

This function is only a development utility and should not be used in production.

Parameters:

rounds (int) – The number of chat rounds to play (defaults to 0 for infinite; chat or stream mode only).
stopword (str) – Break out of the chat loop if the user sends this message (chat or stream mode only).
echo (bool) – Whether to echo the user’s input to stdout after they send a message (e.g. to save in interactive notebook outputs; default false; chat or stream mode only)
ai_first (bool) – Whether the user should send the first message (default) or the model should generate a completion before prompting the user for a message.
width (int) – The maximum width of the printed outputs (default unlimited).
show_function_args (bool) – Whether to print the arguments the model is calling functions with for each call (default false).
show_function_returns (bool) – Whether to print the results of each function call (default false).
verbose (bool) – Equivalent to setting echo, show_function_args, and show_function_returns to True.
mode (str) – The chat mode: “chat” for turn-based chat without streaming, “stream” for turn-based chat with streaming and audio, “full_duplex” for realtime conversation from the system default mic.
mic_id (int) – The microphone ID to use for recording audio (default system default mic; full_duplex mode only)

async kani.ext.realtime.cli.chat_in_terminal_audio_async( kani: OpenAIRealtimeKani, *, rounds: int = 0, stopword: str | None = None, echo: bool = False, ai_first: bool = False, width: int | None = None, show_function_args: bool = False, show_function_returns: bool = False, verbose: bool = False, mode: Literal['chat', 'stream', 'full_duplex'] = 'full_duplex', mic_id: int = 0, )[source]#: Async version of chat_in_terminal_audio(). Use in environments when there is already an asyncio loop running (e.g. Google Colab).

Additional Classes#

class kani.ext.realtime.session.RealtimeSession(

api_key: str,

model='gpt-4o-realtime-preview-2024-10-01',

*,

ws_base: str = 'wss://api.openai.com/v1/realtime',

headers: dict | None = None,

**generation_args,

)[source]#

This is an internal object used to manage the state of the OpenAI Realtime session.

async connect()[source]#

Connect to the WS, begin a task for event handling, and init the session.

You should usually call OpenAIRealtimeKani.connect() instead of this.

async send(event: ClientEvent)[source]#: Send a client event to the websocket.

add_listener( callback: Callable[[ServerEvent], Awaitable[Any]], )[source]#: Add a listener which is called for every event received from the WS. The listener must be an asynchronous function that takes in an event in a single argument.

remove_listener(callback)[source]#: Remove a listener added by add_listener().

async wait_for( event_type: str, predicate: Callable[[ServerEventT], bool] | None = None, timeout: int = 60, ) → ServerEventT[source]#: Wait for the next event of a given type, and return it.

class kani.ext.realtime.models.AudioTranscriptionConfig(*, model: str = 'whisper-1')[source]#

model: str#

class kani.ext.realtime.models.TurnDetectionConfig( *, type: str = 'server_vad', threshold: float = 0.5, prefix_padding_ms: int = 300, silence_duration_ms: int = 500, )[source]#

type: str#

threshold: float#

prefix_padding_ms: int#

silence_duration_ms: int#

class kani.ext.realtime.models.FunctionDefinition(*, type: str = 'function', name: str, description: str, parameters: dict)[source]#

type: str#

name: str#

description: str#

parameters: dict#

class kani.ext.realtime.models.ResponseConfig( *, modalities: list[str] = ['text', 'audio'], instructions: str = '', voice: Literal['alloy', 'echo', 'shimmer'] = 'alloy', output_audio_format: Literal['pcm16', 'g711_ulaw', 'g711_alaw'] = 'pcm16', tools: list[FunctionDefinition] = [], tool_choice: Literal['auto', 'none', 'required'] | str = 'auto', temperature: float = 0.8, )[source]#

modalities: list[str]#

instructions: str#

voice: Literal['alloy', 'echo', 'shimmer']#

output_audio_format: Literal['pcm16', 'g711_ulaw', 'g711_alaw']#

tools: list[FunctionDefinition]#

tool_choice: Literal['auto', 'none', 'required'] | str#

temperature: float#

class kani.ext.realtime.models.SessionConfig( *, modalities: list[str] = ['text', 'audio'], instructions: str = '', voice: Literal['alloy', 'echo', 'shimmer'] = 'alloy', output_audio_format: Literal['pcm16', 'g711_ulaw', 'g711_alaw'] = 'pcm16', tools: list[FunctionDefinition] = [], tool_choice: Literal['auto', 'none', 'required'] | str = 'auto', temperature: float = 0.8, input_audio_format: Literal['pcm16', 'g711_ulaw', 'g711_alaw'] = 'pcm16', input_audio_transcription: AudioTranscriptionConfig | None = AudioTranscriptionConfig(model='whisper-1'), turn_detection: TurnDetectionConfig | None = TurnDetectionConfig(type='server_vad', threshold=0.5, prefix_padding_ms=300, silence_duration_ms=500), )[source]#

input_audio_format: Literal['pcm16', 'g711_ulaw', 'g711_alaw']#

input_audio_transcription: AudioTranscriptionConfig | None#

turn_detection: TurnDetectionConfig | None#

class kani.ext.realtime.models.TextContentPart(*, type: Literal['input_text', 'text'] = 'input_text', text: str)[source]#

type: Literal['input_text', 'text']#

text: str#

class kani.ext.realtime.models.AudioContentPart( *, type: Literal['input_audio', 'audio'] = 'input_audio', audio: str | None = None, transcript: str | None, )[source]#

type: Literal['input_audio', 'audio']#

audio: str | None#

transcript: str | None#

class kani.ext.realtime.models.ConversationItemBase( *, id: str = None, type: str, status: Literal['completed', 'in_progress', 'incomplete'] = 'completed', )[source]#

id: str#

type: str#

status: Literal['completed', 'in_progress', 'incomplete']#

class kani.ext.realtime.models.MessageConversationItem( *, id: str = None, type: Literal['message'] = 'message', status: Literal['completed', 'in_progress', 'incomplete'] = 'completed', role: Literal['user', 'assistant', 'system'], content: list[TextContentPart | AudioContentPart] = [], )[source]#

type: Literal['message']#

role: Literal['user', 'assistant', 'system']#

content: list[TextContentPart | AudioContentPart]#

class kani.ext.realtime.models.FunctionCallConversationItem( *, id: str = None, type: Literal['function_call'] = 'function_call', status: Literal['completed', 'in_progress', 'incomplete'] = 'completed', call_id: str, name: str, arguments: str, )[source]#

type: Literal['function_call']#

call_id: str#

name: str#

arguments: str#

class kani.ext.realtime.models.FunctionCallOutputConversationItem( *, id: str = None, type: Literal['function_call_output'] = 'function_call_output', status: Literal['completed', 'in_progress', 'incomplete'] = 'completed', output: str, )[source]#

type: Literal['function_call_output']#

output: str#

class kani.ext.realtime.models.ErrorDetails(*, type: str, code: str | None = None, message: str, param: str | None = None, event_id: str | None = None)[source]#

type: str#

code: str | None#

message: str#

param: str | None#

event_id: str | None#

class kani.ext.realtime.models.SessionDetails( *, modalities: list[str] = ['text', 'audio'], instructions: str = '', voice: Literal['alloy', 'echo', 'shimmer'] = 'alloy', output_audio_format: Literal['pcm16', 'g711_ulaw', 'g711_alaw'] = 'pcm16', tools: list[FunctionDefinition] = [], tool_choice: Literal['auto', 'none', 'required'] | str = 'auto', temperature: float = 0.8, input_audio_format: Literal['pcm16', 'g711_ulaw', 'g711_alaw'] = 'pcm16', input_audio_transcription: AudioTranscriptionConfig | None = AudioTranscriptionConfig(model='whisper-1'), turn_detection: TurnDetectionConfig | None = TurnDetectionConfig(type='server_vad', threshold=0.5, prefix_padding_ms=300, silence_duration_ms=500), id: str, object: Literal['realtime.session'], )[source]#

id: str#

object: Literal['realtime.session']#

class kani.ext.realtime.models.ConversationDetails(*, id: str, object: Literal['realtime.conversation'])[source]#

id: str#

object: Literal['realtime.conversation']#

class kani.ext.realtime.models.UsageDetails(*, total_tokens: int, input_tokens: int, output_tokens: int)[source]#

total_tokens: int#

input_tokens: int#

output_tokens: int#

class kani.ext.realtime.models.RealtimeResponse( *, id: str, object: Literal['realtime.response'], status: Literal['in_progress', 'completed', 'cancelled', 'failed', 'incomplete'], status_details: dict | None, output: list[MessageConversationItem | FunctionCallConversationItem | FunctionCallOutputConversationItem] = [], usage: UsageDetails | None, )[source]#

id: str#

object: Literal['realtime.response']#

status: Literal['in_progress', 'completed', 'cancelled', 'failed', 'incomplete']#

status_details: dict | None#

output: list[MessageConversationItem | FunctionCallConversationItem | FunctionCallOutputConversationItem]#

usage: UsageDetails | None#

class kani.ext.realtime.models.RateLimitInfo(*, name: str, limit: int, remaining: int, reset_seconds: float)[source]#

name: str#

limit: int#

remaining: int#

reset_seconds: float#