Grok Model (xAI) General Dialogue Interface Document

Overview

This document introduces how to invoke Grok model capabilities via Nebula API's OpenAI-compatible interface, including minimal examples, streaming, tool calling, and structured output highlights.

Basic Information

Item	Content
Base URL	`https://llm.ai-nebula.com/v1/chat/completions`
Authentication Method	API Key (Token)
Request Headers	`Authorization: Bearer sk-xxxx`, `Content-Type: application/json`

Supported Models

grok-3: Standard version

grok-4: Latest version

grok-3-fast: Fast version

grok-4-fast-reasoning: Fast reasoning version, specialized for scenarios requiring deep reasoning

API Interface

1. Minimal Example (Non-streaming)

2. Streaming SSE Example

3. Tool Calling (Functions / Tools)

Complete Tool Calling Process (Two Stages)

Phase 1: The model returns tool_calls (content is usually null, finish_reason=tool_calls). You need to execute the corresponding function on your server based on tool_calls[*].function.name/arguments.

Phase 2: Pass the tool execution result back to the model as a role:"tool" message and continue completion (streaming is supported).

Non-streaming continuation example (Phase 2):

Streaming continuation example (Phase 2 supports streaming as well):

Note:

tool_call_id must match the one returned in Phase 1.

If tool execution fails, readable error information or degraded results should be returned to avoid blocking subsequent completions.

4. Structured Output (response_format/json_schema)

Response and Usage

Non-streaming: One-time return of standard OpenAI structure, including choices, usage.

Streaming: SSE chunked return; usage aggregation may be appended at the end; if stream_options.include_usage=true is enabled, chunks may contain real-time usage.

Reasoning Tokens: For models supporting reasoning (e.g., grok-4-fast-reasoning), usage in the response will distinguish between completion_tokens and reasoning_tokens.

Non-streaming response: text_tokens = completion_tokens - reasoning_tokens

Streaming response: usage statistics are updated in real-time or aggregated finally in the streaming response.

Frequently Asked Questions (FAQ)

How to improve the stability of structured outputs?

Use response_format: json_schema and provide a strict JSON Schema; if necessary, combine with lowering temperature and setting max_tokens.

How to handle tool execution?

Read tool_calls from incremental chunks, execute the function on the server, and pass the result back to the model as a tool message.

Is Reproducible (Seed) supported?

The seed parameter is supported; it is recommended to enable it only for workflows requiring reproducibility.

How to choose a model?

grok-3 and grok-4 are standard versions, suitable for most scenarios.

grok-3-fast is a fast version, suitable for scenarios requiring quick responses.

grok-4-fast-reasoning is a reasoning version, suitable for scenarios requiring deep thinking and complex reasoning.

Best Practices

Frontend streaming should use event stream parsing and render in real-time.

It is recommended to turn off/lower temperature under strict JSON mode.

Implement timeout and retry mechanisms for tool calls to avoid blocking model responses.

Choose the appropriate model based on task requirements: use grok-3-fast for quick responses and grok-4-fast-reasoning for complex reasoning.

About "Deep Thinking / Reasoning Process"

Grok models (such as grok-4, grok-3) support reasoning capabilities but do not output visual chain-of-thought text.

grok-4-fast-reasoning is a fast reasoning version, specialized for scenarios requiring deep reasoning.

The usage field in the response will contain reasoning_tokens statistics (inside completion_token_details) to understand the model's reasoning consumption.

text_tokens = completion_tokens - reasoning_tokens, making it easy to distinguish between actual output text and tokens consumed by the reasoning process.

Grok Model (xAI) General Dialogue Interface Document

Overview#

Basic Information#

Supported Models#

API Interface#

1. Minimal Example (Non-streaming)#

2. Streaming SSE Example#

3. Tool Calling (Functions / Tools)#

Complete Tool Calling Process (Two Stages)#

4. Structured Output (response_format/json_schema)#

Response and Usage#

Frequently Asked Questions (FAQ)#

Best Practices#

About "Deep Thinking / Reasoning Process"#