1. Real time conversation
Nebula-API操作文档
🇺🇸English
  • 🇨🇳中文
  • 🇺🇸English
  • Chat
    • General Text Dialogue Interface Document
    • Tongyi Qianwen General Dialogue Interface Document
    • DeepSeek General Dialogue Document
    • GPT Chat General Dialogue Document
    • Grok Model (xAI) General Dialogue Interface Document
  • Image
    • General Image Generation Interface Document
    • Nano Banana Image Generation Interface Document
    • Tongyi Qianwen Text to Image Model Interface Document
    • Tongyi Qianwen Image Editing Model Interface Document
  • Video
    • Sora-2 interface document
    • Alibaba Wanxiang Wan2.5 Tu Sheng Video Interface Document
    • Google Veo Video Model Interface Document
    • General Video Generation Interface Document
  • AI App
    • Cherry Studio Integration Guide
    • LangChain Development Framework Integration Guide
    • Cursor Code Editor Integration Guide
    • Claude Code and other client integration guidelines
    • Cline (VS Code) AI Programming Assistant Integration Guide
    • Immersive Translation Integration Guide
  • Real time conversation
    • Realtime real-time conversation document
  1. Real time conversation

Realtime real-time conversation document

Overview#

The Realtime API provides low-latency text/audio real-time conversation capabilities. Currently, the gpt-realtime and gpt-realtime-mini models are available. It establishes a persistent connection via WebSocket and interacts based on an event stream (sending session settings, conversation messages, generation requests; receiving incremental text/audio and usage statistics).

Basic Information#

ItemContent
Base URLwss://llm.ai-nebula.com
Endpoint/v1/realtime?model={model}
AuthenticationAuthorization: Bearer sk-xxxx
ProtocolWebSocket (JSON Event Stream)
Supported Modelsgpt-realtime, gpt-realtime-mini
Audio FormatInput/output are both PCM16 Mono, Sample Rate 24000Hz (if audio is enabled)

Event Overview#

Client Sends
session.update: Set/update session configuration (modalities, system instructions, voice, etc.)
conversation.item.create: Send conversation message (input_text or input_audio)
input_audio_buffer.append + input_audio_buffer.commit: Stream audio push
response.create: Request to generate a response
Server Returns
session.created / session.updated: Session is ready or has been updated
response.created: Generation has started
response.text.delta / response.text.done: Text delta and completion
response.audio_transcript.delta / response.audio_transcript.done: Audio transcription delta and completion
response.audio.delta / response.audio.done: Audio delta and completion
response.done: Turn finished, includes usage statistics
error: Error event

Call Flow (Recommended)#

1.
Establish connection: wss://llm.ai-nebula.com/v1/realtime?model=gpt-realtime (or gpt-realtime-mini), carrying the Authorization header.
2.
Send session.update to configure the session (can be updated multiple times).
3.
Send user messages (text or audio) via conversation.item.create.
4.
Send response.create to trigger generation and receive incremental events.
5.
Continue multi-turn conversation within the same connection by repeating steps 3-4.

Session Configuration Example (session.update)#

{
 "event_id": "evt_001",
 "type": "session.update",
 "session": {
  "modalities": ["text", "audio"],    // Supports "text" / "audio" (one or both)
  "instructions": "You are a friendly assistant",
  "voice": "alloy",            // TTS voice, optional
  "temperature": 0.8,
  "input_audio_format": "pcm16",
  "output_audio_format": "pcm16",
  "input_audio_transcription": { "model": "whisper-1" }
 }
}

Text Message Example (conversation.item.create)#

{
 "event_id": "evt_002",
 "type": "conversation.item.create",
 "item": {
  "id": "item_01",
  "type": "message",
  "role": "user",
  "content": [
   { "type": "input_text", "text": "Hello, please briefly introduce yourself." }
  ]
 }
}

Trigger Generation (response.create)#

{ "event_id": "evt_003", "type": "response.create" }

Typical Server Response#

{ "type": "session.created", "session": { "id": "sess_xxx" } }
{ "type": "response.created", "response": { "id": "resp_xxx" } }
{ "type": "response.text.delta", "delta": "Hello! I am" }
{ "type": "response.text.delta", "delta": " Nebula's realtime assistant." }
{ "type": "response.done",
 "response": {
  "usage": {
   "total_tokens": 123,
   "input_tokens": 45,
   "output_tokens": 78
  }
 }
}

Audio Input and Streaming Push#

One-time Audio Sending#

Place input_audio in content. The audio must be base64 encoded first (PCM16, 24k Mono).
{
 "type": "conversation.item.create",
 "item": {
  "type": "message",
  "role": "user",
  "content": [
   { "type": "input_audio", "audio": "<base64-of-pcm16>" }
  ]
 }
}

Streaming Audio#

1.
Send input_audio_buffer.append multiple times; the audio field contains the base64 chunk.
2.
Send input_audio_buffer.commit to let the server generate the conversation item.
3.
Call response.create to get the response.
{ "type": "input_audio_buffer.append", "audio": "<chunk-1-base64>" }
{ "type": "input_audio_buffer.append", "audio": "<chunk-2-base64>" }
{ "type": "input_audio_buffer.commit", "item": { "type": "message", "role": "user" } }
{ "type": "response.create" }

Python Quick Start#

Common Errors#

Authentication Failure: Confirm that the API Key in the Authorization header is valid and has the sk- prefix.
Model Does Not Exist: The query parameter model only supports gpt-realtime / gpt-realtime-mini.
Audio Decoding Error: Ensure the audio is PCM16 Mono, Sample Rate 24000Hz, and correctly base64 encoded.
No Incremental Events Received: Check if response.create has been sent, or if the connection is still alive.

External System Realtime Conversation /api/sync/system/realtime#

Allows external systems to initiate realtime conversations using a system access token. It performs deduction and channel selection for a specified user_id and internally forwards the request to /v1/realtime.
Address and Authentication
URL: wss://llm.ai-nebula.com/api/sync/system/realtime
Authentication: Authorization: <system_access_token> (No Bearer prefix required)
Query Parameters:
user_id (Required, int): The ID of the actual user to be charged.
model (Required, string): gpt-realtime or gpt-realtime-mini.
group (Optional, string): Specify a group; otherwise, the user's default group is used.
Behavior and Differences
Different Authorization: Uses a system access token; the deduction subject is the user_id in the query parameters.
Internally creates a temporary Token for that user, selects an available channel based on the group, and forwards to /v1/realtime. The event protocol and return stream are identical.
The context is marked as a "playground" scenario to facilitate logging and routing policy differentiation.
Typical Handshake Example (Python websocket-client)
Usage Tips
The user_id must exist and not be disabled; the model must be a configured realtime model.
Pass group if routing switching is needed; otherwise, the user's default group is used.
Subsequent event sending is consistent with /v1/realtime (e.g., session.update, conversation.item.create, response.create).
修改于 2025-12-08 04:03:30
上一页
Immersive Translation Integration Guide
Built with