1. Chat
Nebula-API操作文档
🇺🇸English
  • 🇨🇳中文
  • 🇺🇸English
  • Chat
    • General Text Dialogue Interface Document
    • Tongyi Qianwen General Dialogue Interface Document
    • DeepSeek General Dialogue Document
    • GPT Chat General Dialogue Document
    • Grok Model (xAI) General Dialogue Interface Document
  • Image
    • General Image Generation Interface Document
    • Nano Banana Image Generation Interface Document
    • Tongyi Qianwen Text to Image Model Interface Document
    • Tongyi Qianwen Image Editing Model Interface Document
  • Video
    • Sora-2 interface document
    • Alibaba Wanxiang Wan2.5 Tu Sheng Video Interface Document
    • Google Veo Video Model Interface Document
    • General Video Generation Interface Document
  • AI App
    • Cherry Studio Integration Guide
    • LangChain Development Framework Integration Guide
    • Cursor Code Editor Integration Guide
    • Claude Code and other client integration guidelines
    • Cline (VS Code) AI Programming Assistant Integration Guide
    • Immersive Translation Integration Guide
  • Real time conversation
    • Realtime real-time conversation document
  1. Chat

Grok Model (xAI) General Dialogue Interface Document

Overview#

This document introduces how to invoke Grok model capabilities via Nebula API's OpenAI-compatible interface, including minimal examples, streaming, tool calling, and structured output highlights.

Basic Information#

ItemContent
Base URLhttps://llm.ai-nebula.com/v1/chat/completions
Authentication MethodAPI Key (Token)
Request HeadersAuthorization: Bearer sk-xxxx, Content-Type: application/json

Supported Models#

grok-3: Standard version
grok-4: Latest version
grok-3-fast: Fast version
grok-4-fast-reasoning: Fast reasoning version, specialized for scenarios requiring deep reasoning

API Interface#

1. Minimal Example (Non-streaming)#

2. Streaming SSE Example#

3. Tool Calling (Functions / Tools)#

Complete Tool Calling Process (Two Stages)#

1.
Phase 1: The model returns tool_calls (content is usually null, finish_reason=tool_calls). You need to execute the corresponding function on your server based on tool_calls[*].function.name/arguments.
2.
Phase 2: Pass the tool execution result back to the model as a role:"tool" message and continue completion (streaming is supported).
Non-streaming continuation example (Phase 2):
Streaming continuation example (Phase 2 supports streaming as well):
Note:
tool_call_id must match the one returned in Phase 1.
If tool execution fails, readable error information or degraded results should be returned to avoid blocking subsequent completions.

4. Structured Output (response_format/json_schema)#


Response and Usage#

Non-streaming: One-time return of standard OpenAI structure, including choices, usage.
Streaming: SSE chunked return; usage aggregation may be appended at the end; if stream_options.include_usage=true is enabled, chunks may contain real-time usage.
Reasoning Tokens: For models supporting reasoning (e.g., grok-4-fast-reasoning), usage in the response will distinguish between completion_tokens and reasoning_tokens.
Non-streaming response: text_tokens = completion_tokens - reasoning_tokens
Streaming response: usage statistics are updated in real-time or aggregated finally in the streaming response.

Frequently Asked Questions (FAQ)#

1.
How to improve the stability of structured outputs?
Use response_format: json_schema and provide a strict JSON Schema; if necessary, combine with lowering temperature and setting max_tokens.
2.
How to handle tool execution?
Read tool_calls from incremental chunks, execute the function on the server, and pass the result back to the model as a tool message.
3.
Is Reproducible (Seed) supported?
The seed parameter is supported; it is recommended to enable it only for workflows requiring reproducibility.
4.
How to choose a model?
grok-3 and grok-4 are standard versions, suitable for most scenarios.
grok-3-fast is a fast version, suitable for scenarios requiring quick responses.
grok-4-fast-reasoning is a reasoning version, suitable for scenarios requiring deep thinking and complex reasoning.

Best Practices#

Frontend streaming should use event stream parsing and render in real-time.
It is recommended to turn off/lower temperature under strict JSON mode.
Implement timeout and retry mechanisms for tool calls to avoid blocking model responses.
Choose the appropriate model based on task requirements: use grok-3-fast for quick responses and grok-4-fast-reasoning for complex reasoning.

About "Deep Thinking / Reasoning Process"#

Grok models (such as grok-4, grok-3) support reasoning capabilities but do not output visual chain-of-thought text.
grok-4-fast-reasoning is a fast reasoning version, specialized for scenarios requiring deep reasoning.
The usage field in the response will contain reasoning_tokens statistics (inside completion_token_details) to understand the model's reasoning consumption.
text_tokens = completion_tokens - reasoning_tokens, making it easy to distinguish between actual output text and tokens consumed by the reasoning process.
修改于 2025-12-04 07:49:20
上一页
GPT Chat General Dialogue Document
下一页
General Image Generation Interface Document
Built with