2 unstable releases
| new 0.2.0 | Feb 15, 2026 |
|---|---|
| 0.1.0 | Feb 13, 2026 |
#220 in Images
91KB
1.5K
SLoC
glm-vision-rs
Rust client for GLM-4.6V (Zhipu AI, Z.AI) vision model analysis. Inspired by the official Z.AI MCP Server.
Note: This is an unofficial, community-built library. It is not affiliated with, endorsed by, or sponsored by Zhipu AI or Z.AI.
Background
I wanted to use GLM-4.6V vision capabilities in my Z.AI coding plan but didn't want to run the official npm-based MCP server just for that. The system prompts and tool designs in this crate are ported from the @z_ai/mcp-server npm package into a standalone Rust library with no Node.js dependency.
Tools
Each tool function in glm_vision::tools automatically uses the matching system prompt from glm_vision::prompts. The prompts are also exposed as public constants if you want to override them via client.completion() or client.completion_raw() directly.
| Tool | Prompt (auto) | Description |
|---|---|---|
analyze_image |
GENERAL_IMAGE_ANALYSIS |
General-purpose image description and analysis |
extract_text |
TEXT_EXTRACTION |
Extract text, code, logs from screenshots |
diagnose_error |
ERROR_DIAGNOSIS |
Diagnose errors with root cause and fix suggestions |
understand_diagram |
DIAGRAM_UNDERSTANDING |
Analyze UML, flowcharts, ER, sequence diagrams |
analyze_data_viz |
DATA_VIZ_ANALYSIS |
Analyze charts, graphs, dashboards |
ui_diff_check |
UI_DIFF_CHECK |
Compare two UI screenshots for visual regression |
ui_to_artifact |
UI_TO_ARTIFACT_CODE |
Convert UI screenshot to code (default) |
UI_TO_ARTIFACT_PROMPT |
Convert UI screenshot to an LLM prompt (output_type: "prompt") |
|
UI_TO_ARTIFACT_SPEC |
Convert UI screenshot to a technical spec (output_type: "spec") |
|
UI_TO_ARTIFACT_DESCRIPTION |
Convert UI screenshot to a text description (output_type: "description") |
|
analyze_video |
VIDEO_ANALYSIS |
Analyze video content |
ui_to_artifact defaults to generating code. Pass output_type to select a different variant: "prompt", "spec", or "description".
Providers
Three built-in providers are supported. If none is configured, you must set base_url manually.
| Provider | Endpoint | Description |
|---|---|---|
Zhipu |
https://open.bigmodel.cn/api/paas/v4/ |
Zhipu Open Platform |
Zai |
https://api.z.ai/api/paas/v4/ |
Z.AI API |
ZaiCoding |
https://api.z.ai/api/coding/paas/v4/ |
Z.AI Coding Plan |
Usage
HTTP Client
This library does not bundle an HTTP client. You provide your own by implementing the HttpClient trait, which lets you use whatever HTTP crate (and version) your project already depends on.
use glm_vision_rs::{HttpClient, HttpResponse};
struct ReqwestClient(reqwest::Client);
impl HttpClient for ReqwestClient {
async fn post(
&self,
url: &str,
headers: &[(&str, &str)],
body: &[u8],
) -> Result<HttpResponse, Box<dyn std::error::Error + Send + Sync>> {
let mut req = self.0.post(url);
for &(k, v) in headers {
req = req.header(k, v);
}
let resp = req.body(body.to_vec()).send().await?;
Ok(HttpResponse {
status: resp.status().as_u16(),
body: resp.text().await?,
})
}
}
Setup
use glm_vision_rs::{Provider, VisionClient, VisionConfig};
let config = VisionConfig::new("your-api-key")
.with_provider(Provider::ZaiCoding);
let http = ReqwestClient(
reqwest::Client::builder()
.timeout(std::time::Duration::from_secs(config.timeout_secs))
.build()?,
);
let client = VisionClient::new(config, http);
Configuration
let config = VisionConfig::new("your-api-key")
.with_provider(Provider::Zhipu) // or Provider::Zai, Provider::ZaiCoding
.with_model("glm-4.6v") // default
.with_temperature(0.8) // default
.with_thinking(true); // default: enables reasoning mode
// Or use a custom endpoint instead of a provider:
let config = VisionConfig::new("your-api-key")
.with_base_url("https://custom.example.com/v1/");
Analyze an image
Works with URLs or local file paths:
let result = glm_vision::tools::analyze_image(
&client,
"https://example.com/photo.jpg", // or "/path/to/local.png"
"Describe what you see in this image.",
)
.await?;
Extract text from a screenshot
let result = glm_vision::tools::extract_text(
&client,
"/path/to/screenshot.png",
"Extract all visible code from this screenshot.",
Some("rust"), // optional: programming language hint
)
.await?;
Diagnose an error
let result = glm_vision::tools::diagnose_error(
&client,
"/path/to/error.png",
"What is this error and how do I fix it?",
Some("Running cargo build"), // optional: context
)
.await?;
Analyze a diagram
let result = glm_vision::tools::understand_diagram(
&client,
"/path/to/diagram.png",
"Explain this diagram in detail.",
Some("sequence"), // optional: diagram type hint
)
.await?;
Analyze a data visualization
let result = glm_vision::tools::analyze_data_viz(
&client,
"/path/to/dashboard.png",
"What trends do you see in this data?",
Some("trends"), // optional: analysis focus
)
.await?;
Compare two UI screenshots
let result = glm_vision::tools::ui_diff_check(
&client,
"/path/to/expected.png",
"/path/to/actual.png",
"List all visual differences between these two screenshots.",
)
.await?;
Convert UI to artifact
Defaults to code generation. Pass output_type to select a different variant:
// Generate HTML/CSS code (default)
let code = glm_vision::tools::ui_to_artifact(
&client,
"/path/to/ui.png",
None, // defaults to "code"
"Generate responsive HTML/CSS for this design.",
)
.await?;
// Generate a technical spec
let spec = glm_vision::tools::ui_to_artifact(
&client,
"/path/to/ui.png",
Some("spec"), // or "prompt", "description"
"Write a technical specification for this UI.",
)
.await?;
Analyze a video
let result = glm_vision::tools::analyze_video(
&client,
"/path/to/video.mp4",
"Describe what happens in this video.",
)
.await?;
Advanced: raw JSON response
Use client.completion_raw() with any prompt for direct access to the API response:
let image = client.process_image("/path/to/image.png")?;
let raw_json = client
.completion_raw(
glm_vision::prompts::GENERAL_IMAGE_ANALYSIS,
vec![image],
"Describe this image.",
)
.await?;
Example Results
See examples/EXAMPLES.md for full output from all 11 tools with token counts, timings, and input images.
License
MIT
Dependencies
~2.5–4MB
~71K SLoC