专业的 AI Agent 评测平台
LexBench provides comprehensive agent evaluation capabilities, supporting multiple data and agents to help you objectively assess model performance on real-world tasks.
Digital Data
LexBench aims to build a comprehensive Agent Data evaluation system spanning multimodal capabilities, cross-platform environments, and both digital and physical domains
We currently support the Browser Use Agent Data, which evaluates agents on task automation, information retrieval, navigation, and interaction within browser environments. Looking ahead, we plan to expand to Computer Use, Phone Use, Coding Agent, Personal Assistant, establishing LexBench as an industry-leading Digital Evaluation Framework.
Browser Use
Browser Agent: Evaluate agents on task automation, information retrieval, and navigation within web environments
Computer Use
Desktop/System Agent: Evaluate agents on OS-level automation, cross-application workflows, and GUI interactions
Phone Use
Mobile Agent: Evaluate agents on mobile platforms including touch interactions, app switching, and UI navigation
Coding Agent
Code Generation Agent: Evaluate agents on code writing, debugging, refactoring, and project management capabilities
Personal Assistant
Personal Assistant Agent: Evaluate agents on schedule management, task planning, information integration, and other personal assistant scenarios
Platform Stats
Professional and comprehensive browser agent evaluation service
Eval Tasks
Tested Models
Data
Websites Covered
Recent Results
View the latest evaluation results and rankings on the platform
Core Features
LexBench provides professional browser agent evaluation capabilities to help you comprehensively assess AI Agent performance on real web tasks
Diverse Data
Support LexBench-Browser, Online-Mind2Web, BrowseComp and more, covering Chinese and English websites with various task types and difficulty levels
Professional Evaluation
Using GPT-4o as the evaluation model with multiple strategies (functional verification, UI comparison, semantic matching) for objective and accurate results
Visual Analytics
Rich visualization including pass rate trends, task distribution, multi-dimensional radar charts to present evaluation results intuitively
Open Leaderboard
Transparent public leaderboard with multi-dimensional filtering and comparison to quickly understand Agent performance
Start Evaluation
Start evaluating your browser agent now and get detailed performance reports