DeepSeek: DeepSeek V4 Flash
deepseek-v4-flash
chatDeepSeek
Quick Reference
- Input
- Text
- Output
- Text
- Context
- 1M
- Max Output
- 393K
- Input Price
- $0.14/M
- Output Price
- $0.28/M
- Author
- DeepSeek
- Version
- main
- Open Source
- Yes
Overview
Lightweight and efficient MoE model with 284B total parameters and 13B activated parameters, natively supporting million-token ultra-long context. Offers fast inference, low latency, and low call cost with balanced overall capability. Targeted at high-concurrency, lightweight tasks and well suited for everyday conversation, content creation, basic RAG, and batch copy processing in mainstream cost-sensitive scenarios.
Input modalities
Text
Output modalities
Text
Capabilities
chat
Features
Function Calling
Structured Output
Caching
Pricing
Per-token prices for DeepSeek: DeepSeek V4 Flash.
| Token Type | Price | Unit |
|---|---|---|
| Input | $0.14/M | per million tokens |
| Output | $0.28/M | per million tokens |
| Cache Read | $0.0028/M | per million tokens |
Specifications
Context Window
1Mtokens
Max Input
607Ktokens
Max Output
393Ktokens
API Reference
OpenAI-compatible endpoint at https://api.inferoute.ai/v1.
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.inferoute.ai/v1",
api_key=os.environ.get("INFEROUTE_API_KEY"),
)
try:
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a haiku about recursion."},
],
max_tokens=512,
temperature=0.7,
)
print(response.choices[0].message.content)
except Exception as e:
print(f"Error: {e}")