AI Agents Directory
icon of CRAB

CRAB

CRAB: A cross-environment agent benchmark for multimodal language model agents, featuring cross-environment support and a graph evaluator.

Introduction

CRAB: Cross-environment Agent Benchmark

CRAB is a benchmark framework designed for evaluating multimodal language model (MLM) agents across diverse environments. It offers tools for building agents, operating environments, and creating benchmarks.

Key Features:

  • Cross-environment Support: Agents can adapt and perform seamlessly across different interfaces like Ubuntu and Android.
  • Graph Evaluator: Provides fine-grained performance analysis beyond simple success rates.
  • Task Generation: Automates task creation using a graph-based method, mimicking real-world scenarios.
  • Easy-to-Use: Python-based definitions for agent operations, observations, and benchmark evaluators.

Use Cases:

  • Evaluating and comparing the performance of different MLM agents.
  • Developing agents that can operate across multiple platforms.
  • Analyzing agent strengths and weaknesses through detailed performance metrics.
  • Automating the creation of complex, real-world tasks for agent training and evaluation.

CRAB Benchmark-v0 includes 120 tasks across Ubuntu and Android environments, tested with 6 MLMs under 3 communication settings.

Information

Newsletter

Join the Community

Subscribe to our newsletter for the latest news and updates