Z.ai: GLM-5.1

glm-5.1

chatZhipu

Quick Reference

Input: Text
Output: Text

Context: 202K
Max Output: 131K

Input Price: $0.88/M
Output Price: $3.53/M

Author: Zhipu AI
Version: main
Open Source: Yes

Overview

GLM-5.1 is Zhipu AI's model designed for Long Horizon Tasks, featuring 744B total parameters, supporting 200K ultra-long context and up to 128K output tokens. It offers powerful logical reasoning, long-text understanding, and code generation, balancing performance and inference efficiency; it performs excellently across multi-task benchmarks and suits intelligent interaction, enterprise applications, and developer assistance.

Input modalities

Text

Output modalities

Text

Capabilities

chat

Features

Function Calling

Structured Output

Caching

Batch Processing

Web Search

Pricing

Per-token prices for Z.ai: GLM-5.1.

Token Type	Price	Unit
Input	$0.88/M	per million tokens
Output	$3.53/M	per million tokens
Cache Read	$0.09/M	per million tokens

Specifications

Context Window

202Ktokens

Max Input

71Ktokens

Max Output

131Ktokens

API Reference

OpenAI-compatible endpoint at https://api.inferoute.ai/v1.

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.inferoute.ai/v1",
    api_key=os.environ.get("INFEROUTE_API_KEY"),
)

try:
    response = client.chat.completions.create(
        model="glm-5.1",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Write a haiku about recursion."},
        ],
        max_tokens=512,
        temperature=0.7,
    )

    print(response.choices[0].message.content)
except Exception as e:
    print(f"Error: {e}")