UI-TARS 7B

Bytedance · bytedance/ui-tars-1.5-7b

← Back to leaderboard

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement...

open weightsimagetexttext+image->text

Context

Max context: 128000
Max output: 2048

Pricing

Input / 1M: 0.10
Output / 1M: 0.20
Blend / 1M: 0.15

Quality

Quality index:

Provider

Provider: Bytedance
Moderated: no