
Qwen 3 on Fireworks AI: Controllable Chain-of-Thought and Tool Calling at Frontier Scale
By Fireworks AI|5/6/2025
Qwen 3 models are now available with SOTA reasoning, coding and agentic tool use capabilities. Try Qwen 3 now
By Fireworks AI|5/6/2025
Until now, open-source LLMs forced a choice: show the chain of thought or call tools deterministically. Qwen 3’s new architecture does both in one pass, and keeps the reasoning block segregated so downstream code can ignore or audit it at will.
Pair that with a 128-expert MoE that only activates eight experts (≈22 B live parameters) and you get near-frontier quality at a fraction of the compute- fully Apache-2.0 and live on Fireworks today (Fireworks - Qwen3 235B-A22B model).
from openai import OpenAI
import os, json
client = OpenAI(
base_url="<https://api.fireworks.ai/inference/v1>",
api_key=os.environ["FIREWORKS_API_KEY"],
)
messages = [{"role": "user",
"content": "What’s the weather in Boston today?"}]
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Return current weather for a US city",
"parameters": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"]
}
}
}]
# Reasoning based tool calls
resp = client.chat.completions.create(
model="accounts/fireworks/models/qwen3-235b-a22b",
messages=messages,
tools=tools,
max_tokens=4096,
temperature=0.6,
)
first = resp.choices[0].message
print(first.content) # contains <think> … </think>
print(first.tool_calls)
# Non-reasoning based tool calls
resp = client.chat.completions.create(
model="accounts/fireworks/models/qwen3-235b-a22b",
messages=messages,
tools=tools,
max_tokens=4096,
temperature=0.6,
extra_body={
"reasoning_effort": "none",
},
)
second = resp.choices[0].message
print(second.content) # does not contain <think> … </think>
print(second.tool_calls)
The first call contains reasoning chain of thought + tool call, the second doesn’t think, and just makes the tool calls.
reasoning_effort!=”none”
)
<think> … </think>
and a final answer.temperature ≈ 0.6, top_p ≈ 0.95, top_k = 20
.reasoning_effort=”none”
or a /no_think
tag)
temperature ≈ 0.7, top_p ≈ 0.8
.Because the trace sits in its own tag, you can log, redact, or meter it independently- the same pattern we covered in Constrained Generation with Reasoning.
<think>
before storage.Our endpoint is fully OpenAI compatible, please give it a try!
curl https://api.fireworks.ai/inference/v1/chat/completions \
-H "Authorization: Bearer $FIREWORKS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model":"accounts/fireworks/models/qwen3-235b-a22b",
"messages":[{"role":"user","content":"Translate 这本书多少钱?"}],
"reasoning_effort": "none"
}'
With Qwen 3-235B-A22B, open-source finally gets a model that:
No secret weights, no bespoke SDKs. Just point your existing OpenAI-style client at Fireworks and build.
Questions, feedback, or cool demos? Drop by our Discord or tag us on X.
Happy shipping!