Tongyi Qianwen General Dialogue Interface Document

Overview

This document describes how to call the Qwen conversational model through Nebula's OpenAI-compatible interface, supporting the pass-through and standardized placement of extension parameters such as deep thinking (enable_thinking), search, and speech recognition.

Basic Information

Item	Content
Base URL	`https://llm.ai-nebula.com/v1/chat/completions`
Authentication	API Key (Token)
Request Headers	`Authorization: Bearer sk-xxxx`, `Content-Type: application/json`

Refer to the official Qwen Chat API (extension parameters need to be placed in parameters): https://bailian.console.aliyun.com/?tab=api#/api/?type=model&url=2712576

Supported Models (Examples)

qwen3-omni-flash

Other Qwen chat models (subject to the routing you enabled in the console)

API Interface

1. Minimal Example (Non-streaming)

2. Enable Deep Thinking (Streaming SSE)

Deep thinking requires streaming output: enable_thinking: true and stream: true.

If you set enable_thinking: true but stream: false, the system will automatically disable deep thinking to avoid upstream errors.

Optional: Inline reasoning process into content

Add toggle: "nebula_thinking_to_content": true (Only affects downlink display, not passed upstream, does not affect billing).

Effect: Reasoning content will be wrapped in <think>...</think> and appear in content along with normal content, suitable for terminals or SDKs that only display content.

3. Qwen Extension Parameter Placement Specification

All Qwen extension parameters should be placed in the parameters object:

Reasoning/Search: enable_thinking, incremental_output, search_options, enable_search

Speech Recognition: asr_options

Sampling/Control: temperature, top_p, top_k, seed, stop, max_tokens

Constraints/Penalties: presence_penalty, frequency_penalty, etc. (refer to official documentation)

Structured Output: response_format (text/json_object/json_schema), json_schema

Example:

Response and Usage Instructions

Streaming: Returned in SSE chunks, possibly containing usage at the end; upstream usually does not provide reasoning_tokens details, so this value may be 0 even if deep thinking is enabled.

Non-streaming: Returns text all at once; when used with enable_thinking: true, deep thinking will be automatically disabled to avoid upstream errors.

Example (Any chunk):

{
 "id": "chatcmpl-...",
 "object": "chat.completion.chunk",
 "created": 1762153960,
 "model": "qwen3-omni-flash",
 "choices": [ ... ],
 "usage": {
  "prompt_tokens": 53,
  "completion_tokens": 2123,
  "total_tokens": 2176,
  "completion_tokens_details": {
   "reasoning_tokens": 0
  }
 }
}

Frequently Asked Questions (FAQ)

Enabled deep thinking but didn't see the process?

Please confirm stream: true; if the client does not display reasoning_content, you can add nebula_thinking_to_content: true to inline reasoning into content.

Why is reasoning_tokens 0?

In compatibility mode, upstream often does not return reasoning token details. We will not speculate on the split, so displaying 0 is normal.

Error "This model does not support non-streaming output."?

Deep thinking requires streaming output. Please change to stream: true or remove enable_thinking.

Best Practices

Reasonably set top_p/top_k/temperature and combine with incremental_output to improve interaction experience.

Always place Search/ASR parameters in parameters; placing them at the top level will be automatically corrected, but passing parameters according to specification is recommended.

For streaming, frontend should use event stream parsing, noting that the last chunk contains usage.

Tongyi Qianwen General Dialogue Interface Document

Overview#

Basic Information#

Supported Models (Examples)#

API Interface#

1. Minimal Example (Non-streaming)#

2. Enable Deep Thinking (Streaming SSE)#

Optional: Inline reasoning process into content#

3. Qwen Extension Parameter Placement Specification#

Response and Usage Instructions#

Frequently Asked Questions (FAQ)#

Best Practices#