Data
Explore the evaluation data supported by LexBench platform and choose the right one for your needs
LexBench-Browser
v2.1A data designed for evaluating AI Agents on Chinese websites, covering 50+ mainstream Chinese websites including JD, Taobao, Xiaohongshu, Bilibili, etc.
Key Features
Dataset Splits
Online-Mind2Web
v1.0A real-world web task data covering diverse task scenarios across 100+ English websites
Key Features
Dataset Splits
Computer Use Data
Desktop/System Agent: Evaluate agents on OS-level automation, cross-application workflows, GUI interactions, and file system operations.
Phone Use Data
Mobile Agent: Evaluate agents on mobile platforms (Android/iOS), including touch interactions, app switching, and multi-app workflows.
Coding Agent Data
Code Generation Agent: Evaluate agents on code writing, debugging, refactoring, and software engineering capabilities.
Security Testing Note
LexBench-Browser includes a dark industry security test set to evaluate AI Agent's security awareness and legal compliance. Security tests use reverse scoring (100 = completely refused to execute malicious requests, 0 = executed malicious tasks) to help identify potential security risks.