LogoLexBench
  • Leaderboard
  • Data
  • Agents
  • Features
  • Contact
View the Latest Leaderboard

专业的 AI Agent 评测平台

LexBench provides comprehensive agent evaluation capabilities, supporting multiple data and agents to help you objectively assess model performance on real-world tasks.

View LeaderboardStart Evaluation
免费开始使用无需信用卡实时评测结果
0+
评测任务
0+
支持模型
0
Data
Vision & Roadmap

Digital Data

LexBench aims to build a comprehensive Agent Data evaluation system spanning multimodal capabilities, cross-platform environments, and both digital and physical domains

We currently support the Browser Use Agent Data, which evaluates agents on task automation, information retrieval, navigation, and interaction within browser environments. Looking ahead, we plan to expand to Computer Use, Phone Use, Coding Agent, Personal Assistant, establishing LexBench as an industry-leading Digital Evaluation Framework.

Available

Browser Use

Browser Agent: Evaluate agents on task automation, information retrieval, and navigation within web environments

Available
Coming Soon

Computer Use

Desktop/System Agent: Evaluate agents on OS-level automation, cross-application workflows, and GUI interactions

Coming Soon

Phone Use

Mobile Agent: Evaluate agents on mobile platforms including touch interactions, app switching, and UI navigation

Coming Soon

Coding Agent

Code Generation Agent: Evaluate agents on code writing, debugging, refactoring, and project management capabilities

Coming Soon

Personal Assistant

Personal Assistant Agent: Evaluate agents on schedule management, task planning, information integration, and other personal assistant scenarios

LexBench Evaluation Platform

Platform Stats

Professional and comprehensive browser agent evaluation service

0+

Eval Tasks

01
0+

Tested Models

02
0

Data

03
0+

Websites Covered

04
Recent Results

Recent Results

View the latest evaluation results and rankings on the platform

Top Performers

Latest evaluation results sorted by pass rate

#
Agent / Data
通过率
任务数
分数
Claude-4-Sonnet + Agent-TARS
LexBench-Browser
314/340
92.4%
GPT-5 (Thinking) + Manus
LexBench-Browser
309/340
91.0%
Agent-TARS-v2
Online-Mind2Web
165/200
82.4%
4
Gemini-3-Pro + Browser-Use
LexBench-Browser
301/340
88.6%
5
DeepSeek-R1-0528
BrowseComp
84/100
84.2%
Why Choose LexBench

Core Features

LexBench provides professional browser agent evaluation capabilities to help you comprehensively assess AI Agent performance on real web tasks

01

Diverse Data

Support LexBench-Browser, Online-Mind2Web, BrowseComp and more, covering Chinese and English websites with various task types and difficulty levels

02

Professional Evaluation

Using GPT-4o as the evaluation model with multiple strategies (functional verification, UI comparison, semantic matching) for objective and accurate results

03

Visual Analytics

Rich visualization including pass rate trends, task distribution, multi-dimensional radar charts to present evaluation results intuitively

04

Open Leaderboard

Transparent public leaderboard with multi-dimensional filtering and comparison to quickly understand Agent performance

开始评测

Start Evaluation

Start evaluating your browser agent now and get detailed performance reports

快速评测数据可视化多模型对比
Start Evaluation查看排行榜
LogoLexBench

Professional AI Agent Evaluation Platform

GitHubGitHubTwitterX (Twitter)BlueskyBlueskyMastodonDiscordYouTubeYouTubeLinkedInEmail
Evaluation
  • Leaderboard
  • Data
  • Agents
Resources
  • Blog
  • Documentation
  • Changelog
  • Roadmap
Company
  • About
  • Contact
  • Waitlist
Legal
  • Cookie Policy
  • Privacy Policy
  • Terms of Service
© 2026 LexBench All Rights Reserved.