Newsletter
Join the Community
Subscribe to our newsletter for the latest news and updates
CRAB: A cross-environment agent benchmark for multimodal language model agents, featuring cross-environment support and a graph evaluator.
CRAB is a benchmark framework designed for evaluating multimodal language model (MLM) agents across diverse environments. It offers tools for building agents, operating environments, and creating benchmarks.
Key Features:
Use Cases:
CRAB Benchmark-v0 includes 120 tasks across Ubuntu and Android environments, tested with 6 MLMs under 3 communication settings.