GPT Chat General Dialogue Document

Overview

This document describes how to invoke standard ChatGPT (OpenAI Chat Completions) capabilities via Nebula's OpenAI-compatible interface, including minimal examples, streaming, tool calling, and structured output highlights.

Basic Information

Item	Content
Base URL	`https://llm.ai-nebula.com/v1/chat/completions`
Authentication Method	API Key (Token)
Request Headers	`Authorization: Bearer sk-xxxx`, `Content-Type: application/json`

Supported Models (Examples)

gpt-4o, gpt-4.1, gpt-4o-mini, gpt-3.5-turbo, etc. (subject to routing configuration)

API Interface

1. Minimal Example (Non-streaming)

2. Streaming SSE Example

3. Tool Calling (Functions / Tools)

Complete Tool Calling Process (Two Stages)

Phase 1: The model returns tool_calls (content is usually null, finish_reason=tool_calls). You need to execute the corresponding function on your server based on tool_calls[*].function.name/arguments.

Phase 2: Pass the tool execution result back to the model as a role:"tool" message and continue completion (streaming is supported).

Non-streaming continuation example (Phase 2):

Streaming continuation example (Phase 2 supports streaming as well):

Note:

The tool_call_id must match the one returned in Phase 1.

If tool execution fails, readable error information or degraded results should be returned to avoid blocking subsequent completions.

4. Structured Output (response_format/json_schema)

5. File Input

In the examples below, we first upload a PDF file using the following methods:

File URL: You can upload a PDF file by linking to an external URL.

Base64 Encoded File: Send as Base64 encoded input.

Usage Notes

File Size Limits
You can upload multiple files; each file must not exceed 50 MB. The total size limit for all files in a single API request is 50 MB.

Supported Models
Only models that support text and image inputs, such as gpt-4o, gpt-4o-mini, or o1, can accept PDF files as input.

Response and Usage

Non-streaming: Returns standard OpenAI structure at once, including choices, usage.

Streaming: Returns SSE chunks; usage aggregation may be attached at the end. If the channel supports stream_options.include_usage=true, chunks may contain real-time usage.

Frequently Asked Questions (FAQ)

How to improve the stability of structured outputs?

Use response_format: json_schema and provide a strict JSON Schema; if necessary, combine with lowering temperature and setting max_tokens.

How to handle tool execution?

Read tool_calls from incremental chunks, execute the function on the server, and pass the result back to the model as a tool message.

Is Reproducible (Seed) supported?

If the channel supports it, seed can be used; implementation may vary across vendors, so it is recommended to enable it only for workflows requiring reproducibility.

Best Practices

Frontend streaming should use event stream parsing and render in real-time.

It is recommended to turn off/lower temperature under strict JSON mode.

Implement timeout and retry mechanisms for tool calls to avoid blocking model responses.

About "Deep Thinking / Reasoning Process"

Standard ChatGPT series (such as gpt-4o, gpt-4.1, gpt-3.5-turbo) do not provide visual chain-of-thought output; passing enable_thinking in the request body will not take effect.

If you need models with "reasoning capabilities/usage statistics" (such as o1, o3, o4-mini, etc.), please use the Responses API (/v1/responses).

Overview#

Basic Information#

Supported Models (Examples)#

API Interface#

1. Minimal Example (Non-streaming)#

2. Streaming SSE Example#

3. Tool Calling (Functions / Tools)#

Complete Tool Calling Process (Two Stages)#

4. Structured Output (response_format/json_schema)#

5. File Input#

Response and Usage#

Frequently Asked Questions (FAQ)#

Best Practices#

About "Deep Thinking / Reasoning Process"#