Realtime real-time conversation document

Overview

The Realtime API provides low-latency text/audio real-time conversation capabilities. Currently, the gpt-realtime and gpt-realtime-mini models are available. It establishes a persistent connection via WebSocket and interacts based on an event stream (sending session settings, conversation messages, generation requests; receiving incremental text/audio and usage statistics).

Basic Information

Item	Content
Base URL	`wss://llm.ai-nebula.com`
Endpoint	`/v1/realtime?model={model}`
Authentication	`Authorization: Bearer sk-xxxx`
Protocol	WebSocket (JSON Event Stream)
Supported Models	`gpt-realtime`, `gpt-realtime-mini`
Audio Format	Input/output are both PCM16 Mono, Sample Rate 24000Hz (if audio is enabled)

Event Overview

Client Sends

session.update: Set/update session configuration (modalities, system instructions, voice, etc.)

conversation.item.create: Send conversation message (input_text or input_audio)

input_audio_buffer.append + input_audio_buffer.commit: Stream audio push

response.create: Request to generate a response

Server Returns

session.created / session.updated: Session is ready or has been updated

response.created: Generation has started

response.text.delta / response.text.done: Text delta and completion

response.audio_transcript.delta / response.audio_transcript.done: Audio transcription delta and completion

response.audio.delta / response.audio.done: Audio delta and completion

response.done: Turn finished, includes usage statistics

error: Error event

Call Flow (Recommended)

Establish connection: wss://llm.ai-nebula.com/v1/realtime?model=gpt-realtime (or gpt-realtime-mini), carrying the Authorization header.

Send session.update to configure the session (can be updated multiple times).

Send user messages (text or audio) via conversation.item.create.

Send response.create to trigger generation and receive incremental events.

Continue multi-turn conversation within the same connection by repeating steps 3-4.

Session Configuration Example (session.update)

{
 "event_id": "evt_001",
 "type": "session.update",
 "session": {
  "modalities": ["text", "audio"],    // Supports "text" / "audio" (one or both)
  "instructions": "You are a friendly assistant",
  "voice": "alloy",            // TTS voice, optional
  "temperature": 0.8,
  "input_audio_format": "pcm16",
  "output_audio_format": "pcm16",
  "input_audio_transcription": { "model": "whisper-1" }
 }
}

Text Message Example (conversation.item.create)

{
 "event_id": "evt_002",
 "type": "conversation.item.create",
 "item": {
  "id": "item_01",
  "type": "message",
  "role": "user",
  "content": [
   { "type": "input_text", "text": "Hello, please briefly introduce yourself." }
  ]
 }
}

Trigger Generation (response.create)

{ "event_id": "evt_003", "type": "response.create" }

Typical Server Response

{ "type": "session.created", "session": { "id": "sess_xxx" } }
{ "type": "response.created", "response": { "id": "resp_xxx" } }
{ "type": "response.text.delta", "delta": "Hello! I am" }
{ "type": "response.text.delta", "delta": " Nebula's realtime assistant." }
{ "type": "response.done",
 "response": {
  "usage": {
   "total_tokens": 123,
   "input_tokens": 45,
   "output_tokens": 78
  }
 }
}

Audio Input and Streaming Push

One-time Audio Sending

Place input_audio in content. The audio must be base64 encoded first (PCM16, 24k Mono).

{
 "type": "conversation.item.create",
 "item": {
  "type": "message",
  "role": "user",
  "content": [
   { "type": "input_audio", "audio": "<base64-of-pcm16>" }
  ]
 }
}

Streaming Audio

Send input_audio_buffer.append multiple times; the audio field contains the base64 chunk.

Send input_audio_buffer.commit to let the server generate the conversation item.

Call response.create to get the response.

{ "type": "input_audio_buffer.append", "audio": "<chunk-1-base64>" }
{ "type": "input_audio_buffer.append", "audio": "<chunk-2-base64>" }
{ "type": "input_audio_buffer.commit", "item": { "type": "message", "role": "user" } }
{ "type": "response.create" }

Python Quick Start

Common Errors

Authentication Failure: Confirm that the API Key in the Authorization header is valid and has the sk- prefix.

Model Does Not Exist: The query parameter model only supports gpt-realtime / gpt-realtime-mini.

Audio Decoding Error: Ensure the audio is PCM16 Mono, Sample Rate 24000Hz, and correctly base64 encoded.

No Incremental Events Received: Check if response.create has been sent, or if the connection is still alive.

External System Realtime Conversation `/api/sync/system/realtime`

Allows external systems to initiate realtime conversations using a system access token. It performs deduction and channel selection for a specified user_id and internally forwards the request to /v1/realtime.

Address and Authentication

URL: wss://llm.ai-nebula.com/api/sync/system/realtime

Authentication: Authorization: <system_access_token> (No Bearer prefix required)

Query Parameters:

user_id (Required, int): The ID of the actual user to be charged.

model (Required, string): gpt-realtime or gpt-realtime-mini.

group (Optional, string): Specify a group; otherwise, the user's default group is used.

Behavior and Differences

Different Authorization: Uses a system access token; the deduction subject is the user_id in the query parameters.

Internally creates a temporary Token for that user, selects an available channel based on the group, and forwards to /v1/realtime. The event protocol and return stream are identical.

The context is marked as a "playground" scenario to facilitate logging and routing policy differentiation.

Typical Handshake Example (Python websocket-client)

Usage Tips

The user_id must exist and not be disabled; the model must be a configured realtime model.

Pass group if routing switching is needed; otherwise, the user's default group is used.

Subsequent event sending is consistent with /v1/realtime (e.g., session.update, conversation.item.create, response.create).

Realtime real-time conversation document

Overview#

Basic Information#

Event Overview#

Call Flow (Recommended)#

Session Configuration Example (session.update)#

Text Message Example (conversation.item.create)#

Trigger Generation (response.create)#

Typical Server Response#

Audio Input and Streaming Push#

One-time Audio Sending#

Streaming Audio#

Python Quick Start#

Common Errors#

External System Realtime Conversation /api/sync/system/realtime#