DeepSeek: DeepSeek V4 Flash

deepseek-v4-flash
chatDeepSeek

Quick Reference

Input
Text
Output
Text
Context
1M
Max Output
393K
Input Price
$0.14/M
Output Price
$0.28/M
Author
DeepSeek
Version
main
Open Source
Yes

Overview

Lightweight and efficient MoE model with 284B total parameters and 13B activated parameters, natively supporting million-token ultra-long context. Offers fast inference, low latency, and low call cost with balanced overall capability. Targeted at high-concurrency, lightweight tasks and well suited for everyday conversation, content creation, basic RAG, and batch copy processing in mainstream cost-sensitive scenarios.

Input modalities

Text

Output modalities

Text

Capabilities

chat

Features

Function Calling
Structured Output
Caching

Pricing

Per-token prices for DeepSeek: DeepSeek V4 Flash.

Token TypePriceUnit
Input$0.14/Mper million tokens
Output$0.28/Mper million tokens
Cache Read$0.0028/Mper million tokens

Specifications

Context Window

1Mtokens

Max Input

607Ktokens

Max Output

393Ktokens

API Reference

OpenAI-compatible endpoint at https://api.inferoute.ai/v1.

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.inferoute.ai/v1",
    api_key=os.environ.get("INFEROUTE_API_KEY"),
)

try:
    response = client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Write a haiku about recursion."},
        ],
        max_tokens=512,
        temperature=0.7,
    )

    print(response.choices[0].message.content)
except Exception as e:
    print(f"Error: {e}")