Qwen: Qwen3.5 Flash
qwen3.5-flash
chatQwen
Quick Reference
- Input
- Text, Image
- Output
- Text
- Context
- 1M
- Max Output
- 65.5K
- Input Price
- $0.03/M
- Output Price
- $0.29/M
- Author
- Alibaba
- Version
- main
- Open Source
- Yes
Overview
Qwen3.5 native vision-language Flash model, built on a hybrid architecture combining linear attention and a sparse mixture-of-experts design for higher inference efficiency. Both pure-text and multimodal performance leap forward versus the 3 series; delivers fast response while balancing inference speed and performance.
Input modalities
TextImage
Output modalities
Text
Capabilities
chatreasoningvision
Features
Function Calling
Structured Output
Caching
Prefix Completion
Pricing
Per-token prices for Qwen: Qwen3.5 Flash.
Input <= 128K
| Token Type | Price | Unit |
|---|---|---|
| Input | $0.03/M | per million tokens |
| Output | $0.29/M | per million tokens |
| Cache Read | $0.0029/M | per million tokens |
Specifications
Context Window
1Mtokens
Max Input
934Ktokens
Max Output
65.5Ktokens
API Reference
OpenAI-compatible endpoint at https://api.inferoute.ai/v1.
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.inferoute.ai/v1",
api_key=os.environ.get("INFEROUTE_API_KEY"),
)
try:
response = client.chat.completions.create(
model="qwen3.5-flash",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a haiku about recursion."},
],
max_tokens=512,
temperature=0.7,
)
print(response.choices[0].message.content)
except Exception as e:
print(f"Error: {e}")