Qwen3 VL 32B Instruct

Qwen · qwen/qwen3-vl-32b-instruct

← Back to leaderboard

Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...

open weightsimagetexttext+image->text

Context

Max context: 131072
Max output: 32768

Pricing

Input / 1M: 0.10
Output / 1M: 0.42
Blend / 1M: 0.26

Quality

Quality index:

Provider

Provider: Qwen
Moderated: no