View the Latest Leaderboard

专业的 AI Agent 评测平台

LexBench provides comprehensive agent evaluation capabilities, supporting multiple data and agents to help you objectively assess model performance on real-world tasks.

View Leaderboard Start Evaluation

免费开始使用无需信用卡实时评测结果

评测任务

支持模型

Data

Vision & Roadmap

Digital Data

LexBench aims to build a comprehensive Agent Data evaluation system spanning multimodal capabilities, cross-platform environments, and both digital and physical domains

We currently support the Browser Use Agent Data, which evaluates agents on task automation, information retrieval, navigation, and interaction within browser environments. Looking ahead, we plan to expand to Computer Use, Phone Use, Coding Agent, Personal Assistant, establishing LexBench as an industry-leading Digital Evaluation Framework.

Available

Browser Use

Browser Agent: Evaluate agents on task automation, information retrieval, and navigation within web environments

Available

Coming Soon

Computer Use

Desktop/System Agent: Evaluate agents on OS-level automation, cross-application workflows, and GUI interactions

Coming Soon

Phone Use

Mobile Agent: Evaluate agents on mobile platforms including touch interactions, app switching, and UI navigation

Coming Soon

Coding Agent

Code Generation Agent: Evaluate agents on code writing, debugging, refactoring, and project management capabilities

Coming Soon

Personal Assistant

Personal Assistant Agent: Evaluate agents on schedule management, task planning, information integration, and other personal assistant scenarios

LexBench Evaluation Platform

Platform Stats

Professional and comprehensive browser agent evaluation service

Eval Tasks

Tested Models

Data

Websites Covered

Core Features

LexBench provides professional browser agent evaluation capabilities to help you comprehensively assess AI Agent performance on real web tasks

Diverse Data

Support LexBench-Browser, Online-Mind2Web, BrowseComp and more, covering Chinese and English websites with various task types and difficulty levels

Professional Evaluation

Using GPT-4o as the evaluation model with multiple strategies (functional verification, UI comparison, semantic matching) for objective and accurate results

Visual Analytics

Rich visualization including pass rate trends, task distribution, multi-dimensional radar charts to present evaluation results intuitively

Open Leaderboard

Transparent public leaderboard with multi-dimensional filtering and comparison to quickly understand Agent performance

开始评测

Start Evaluation

Start evaluating your browser agent now and get detailed performance reports

快速评测数据可视化多模型对比

Start Evaluation 查看排行榜

View the Latest Leaderboard

专业的 AI Agent 评测平台

LexBench provides comprehensive agent evaluation capabilities, supporting multiple data and agents to help you objectively assess model performance on real-world tasks.

View Leaderboard Start Evaluation

免费开始使用无需信用卡实时评测结果

评测任务

支持模型

Data

Vision & Roadmap

Digital Data

LexBench aims to build a comprehensive Agent Data evaluation system spanning multimodal capabilities, cross-platform environments, and both digital and physical domains

Available

Browser Use

Browser Agent: Evaluate agents on task automation, information retrieval, and navigation within web environments

Available

Coming Soon

Computer Use

Desktop/System Agent: Evaluate agents on OS-level automation, cross-application workflows, and GUI interactions

Coming Soon

Phone Use

Mobile Agent: Evaluate agents on mobile platforms including touch interactions, app switching, and UI navigation

Coming Soon

Coding Agent

Code Generation Agent: Evaluate agents on code writing, debugging, refactoring, and project management capabilities

Coming Soon

Personal Assistant

Personal Assistant Agent: Evaluate agents on schedule management, task planning, information integration, and other personal assistant scenarios

LexBench Evaluation Platform

Platform Stats

Professional and comprehensive browser agent evaluation service

Eval Tasks

Tested Models

Data

Websites Covered

Core Features

LexBench provides professional browser agent evaluation capabilities to help you comprehensively assess AI Agent performance on real web tasks

Diverse Data

Support LexBench-Browser, Online-Mind2Web, BrowseComp and more, covering Chinese and English websites with various task types and difficulty levels

Professional Evaluation

Using GPT-4o as the evaluation model with multiple strategies (functional verification, UI comparison, semantic matching) for objective and accurate results

Visual Analytics

Rich visualization including pass rate trends, task distribution, multi-dimensional radar charts to present evaluation results intuitively

Open Leaderboard

Transparent public leaderboard with multi-dimensional filtering and comparison to quickly understand Agent performance

开始评测

Start Evaluation

Start evaluating your browser agent now and get detailed performance reports

快速评测数据可视化多模型对比

Start Evaluation 查看排行榜

专业的 AI Agent 评测平台

Digital Data

Browser Use

Computer Use

Phone Use

Coding Agent

Personal Assistant

Platform Stats

Recent Results

Top Performers

Core Features

Diverse Data

Professional Evaluation

Visual Analytics

Open Leaderboard

Start Evaluation

专业的 AI Agent 评测平台

Digital Data

Browser Use

Computer Use

Phone Use

Coding Agent

Personal Assistant

Platform Stats

Recent Results

Top Performers

Core Features

Diverse Data

Professional Evaluation

Visual Analytics

Open Leaderboard

Start Evaluation