Qwen3 VL 8B Instruct

Qwen · qwen/qwen3-vl-8b-instruct

← Back to leaderboard

Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon...

open weightsimagetexttext+image->text

Context

Max context: 131072
Max output: 32768

Pricing

Input / 1M: 0.08
Output / 1M: 0.50
Blend / 1M: 0.29

Quality

Quality index:

Provider

Provider: Qwen
Moderated: no