1. Chat
Nebula-API操作文档
🇺🇸English
  • 🇨🇳中文
  • 🇺🇸English
  • Chat
    • General Text Dialogue Interface Document
    • Tongyi Qianwen General Dialogue Interface Document
    • DeepSeek General Dialogue Document
    • GPT Chat General Dialogue Document
    • Grok Model (xAI) General Dialogue Interface Document
  • Image
    • General Image Generation Interface Document
    • Nano Banana Image Generation Interface Document
    • Tongyi Qianwen Text to Image Model Interface Document
    • Tongyi Qianwen Image Editing Model Interface Document
  • Video
    • Sora-2 interface document
    • Alibaba Wanxiang Wan2.5 Tu Sheng Video Interface Document
    • Google Veo Video Model Interface Document
    • General Video Generation Interface Document
  • AI App
    • Cherry Studio Integration Guide
    • LangChain Development Framework Integration Guide
    • Cursor Code Editor Integration Guide
    • Claude Code and other client integration guidelines
    • Cline (VS Code) AI Programming Assistant Integration Guide
    • Immersive Translation Integration Guide
  • Real time conversation
    • Realtime real-time conversation document
  1. Chat

Tongyi Qianwen General Dialogue Interface Document

Overview#

This document describes how to call the Qwen conversational model through Nebula's OpenAI-compatible interface, supporting the pass-through and standardized placement of extension parameters such as deep thinking (enable_thinking), search, and speech recognition.

Basic Information#

ItemContent
Base URLhttps://llm.ai-nebula.com/v1/chat/completions
AuthenticationAPI Key (Token)
Request HeadersAuthorization: Bearer sk-xxxx, Content-Type: application/json
Refer to the official Qwen Chat API (extension parameters need to be placed in parameters): https://bailian.console.aliyun.com/?tab=api#/api/?type=model&url=2712576

Supported Models (Examples)#

qwen3-omni-flash
Other Qwen chat models (subject to the routing you enabled in the console)

API Interface#

1. Minimal Example (Non-streaming)#

2. Enable Deep Thinking (Streaming SSE)#

Deep thinking requires streaming output: enable_thinking: true and stream: true.
If you set enable_thinking: true but stream: false, the system will automatically disable deep thinking to avoid upstream errors.

Optional: Inline reasoning process into content#

Add toggle: "nebula_thinking_to_content": true (Only affects downlink display, not passed upstream, does not affect billing).
Effect: Reasoning content will be wrapped in <think>...</think> and appear in content along with normal content, suitable for terminals or SDKs that only display content.

3. Qwen Extension Parameter Placement Specification#

All Qwen extension parameters should be placed in the parameters object:
Reasoning/Search: enable_thinking, incremental_output, search_options, enable_search
Speech Recognition: asr_options
Sampling/Control: temperature, top_p, top_k, seed, stop, max_tokens
Constraints/Penalties: presence_penalty, frequency_penalty, etc. (refer to official documentation)
Structured Output: response_format (text/json_object/json_schema), json_schema
Example:

Response and Usage Instructions#

Streaming: Returned in SSE chunks, possibly containing usage at the end; upstream usually does not provide reasoning_tokens details, so this value may be 0 even if deep thinking is enabled.
Non-streaming: Returns text all at once; when used with enable_thinking: true, deep thinking will be automatically disabled to avoid upstream errors.
Example (Any chunk):
{
 "id": "chatcmpl-...",
 "object": "chat.completion.chunk",
 "created": 1762153960,
 "model": "qwen3-omni-flash",
 "choices": [ ... ],
 "usage": {
  "prompt_tokens": 53,
  "completion_tokens": 2123,
  "total_tokens": 2176,
  "completion_tokens_details": {
   "reasoning_tokens": 0
  }
 }
}

Frequently Asked Questions (FAQ)#

1.
Enabled deep thinking but didn't see the process?
Please confirm stream: true; if the client does not display reasoning_content, you can add nebula_thinking_to_content: true to inline reasoning into content.
2.
Why is reasoning_tokens 0?
In compatibility mode, upstream often does not return reasoning token details. We will not speculate on the split, so displaying 0 is normal.
3.
Error "This model does not support non-streaming output."?
Deep thinking requires streaming output. Please change to stream: true or remove enable_thinking.

Best Practices#

Reasonably set top_p/top_k/temperature and combine with incremental_output to improve interaction experience.
Always place Search/ASR parameters in parameters; placing them at the top level will be automatically corrected, but passing parameters according to specification is recommended.
For streaming, frontend should use event stream parsing, noting that the last chunk contains usage.
修改于 2025-12-04 07:49:02
上一页
General Text Dialogue Interface Document
下一页
DeepSeek General Dialogue Document
Built with