Buttercup CRS: AI-Powered Vulnerability Discovery and Patching System

October 3, 2025

python #python #code-analysis #evidence-based

┌─────────────────────────────────────────────────────┐
│ Analysis Summary                                    │
├─────────────────────────────────────────────────────┤
│ Type: Project                                       │
│ Purpose: AI-Powered Vulnerability Discovery and Patching System│
│ Primary Language: python + json + yaml              │
│ LOC: 125K                                           │
│ Test Files: 110                                     │
│ Architecture: python                                │
│ Confidence: High                                    │
└─────────────────────────────────────────────────────┘

Analyzed: 008bb9cd from 2025-10-03

Buttercup CRS: AI-Powered Vulnerability Discovery and Patching System

Buttercup is a Cyber Reasoning System (CRS) developed by Trail of Bits for the DARPA AI Cyber Challenge (AIxCC). The system automates the discovery and patching of software vulnerabilities in open-source C and Java repositories through AI/ML-assisted fuzzing campaigns built on OSS-Fuzz. When vulnerabilities are found, Buttercup analyzes them and uses a multi-agent AI-driven patcher to repair the vulnerability automatically.

The system consists of five core components: an Orchestrator that coordinates the overall workflow, a Seed Generator for creating fuzzing inputs, a Fuzzer for vulnerability discovery, a Program Model for code analysis, and a Patcher for generating security fixes.

Quick Start

git clone --recurse-submodules https://github.com/trailofbits/buttercup.git
cd buttercup
make setup-local
make deploy
make send-libpng-task

Time to first vulnerability scan: ~10 minutes (requires third-party AI API keys)

Alternative Approaches

Solution	Setup Complexity	AI Integration	Target Languages	Cost Model
Buttercup	High	Native LLM	C, Java	Pay-per-API-call
CodeQL	Medium	None	20+ languages	Enterprise license
Semgrep	Low	Limited	30+ languages	Freemium
Snyk	Low	Basic	10+ languages	SaaS subscription
SonarQube	Medium	None	25+ languages	Self-hosted/SaaS

Architecture and Implementation

The system uses a distributed architecture with Kubernetes orchestration. The competition API serves as the central coordination point, implementing a FastAPI-based REST interface with Pydantic models for type safety.

Task management follows a structured approach with clearly defined data models:

class TaskInfo(BaseModel):
    task_id: str
    name: str | None = None
    project_name: str
    status: str  # active, expired
    duration: int
    deadline: str
    challenge_repo_url: str | None = None
    challenge_repo_head_ref: str | None = None
    challenge_repo_base_ref: str | None = None
    fuzz_tooling_url: str | None = None
    fuzz_tooling_ref: str | None = None
    povs: list[dict[str, Any]] = []
    patches: list[dict[str, Any]] = []
    bundles: list[dict[str, Any]] = []

File: orchestrator/src/buttercup/orchestrator/ui/competition_api/main.py:64-78

The artifact management system uses a file-based approach with organized directory structures. The implementation handles different artifact types through a unified interface:

def save_artifact(
    task_id: str,
    artifact_type: str,
    artifact_id: str,
    content: str | dict,
    is_base64: bool = False,
) -> bool:
    """Save an artifact to the appropriate directory structure."""
    try:
        run_dir = get_run_data_dir()
        task_dir = run_dir / task_id / artifact_type
        task_dir.mkdir(parents=True, exist_ok=True)

        if artifact_type == "bundles":
            file_path = task_dir / f"{artifact_id}.json"

File: orchestrator/src/buttercup/orchestrator/ui/competition_api/main.py:158-172

The system implements singleton patterns for core services with lazy initialization:

def get_database_manager() -> DatabaseManager:
    """Get database manager singleton."""
    global _database_manager
    if _database_manager is None:
        settings = get_settings()
        _database_manager = DatabaseManager(settings.database_url)
    return _database_manager

File: orchestrator/src/buttercup/orchestrator/ui/competition_api/main.py:140-154

Performance Characteristics

System Requirements:

CPU: 8 cores minimum
Memory: 16 GB RAM
Storage: 100 GB available space
Network: Stable internet for AI API calls

Component Distribution:

Total codebase: 124,911 lines across 544 files
Primary language: Python (73,185 lines)
Configuration: JSON (24,340 lines) + YAML (16,338 lines)
Test coverage: 110 test files

Runtime Dependencies:

Kubernetes cluster for orchestration
Redis for task registry and state management
Third-party LLM APIs (OpenAI, Anthropic, Google)
Docker containers for component isolation

Best for: Organizations needing automated vulnerability discovery and patching for C/Java codebases with budget for AI API consumption.

Security Architecture

Credential Management:

API keys for third-party LLM services stored as Kubernetes secrets
Database credentials managed through environment variables
No hardcoded secrets in codebase

Network Security:

Component isolation through Kubernetes networking
External API calls limited to configured LLM providers
Web UI exposed through controlled port forwarding

Audit Logging:

Comprehensive logging through SigNoz deployment
LLM usage tracking via optional LangFuse integration
Task execution traces for compliance monitoring

Transport Security:

HTTPS for all external LLM API communications
Internal component communication over Kubernetes service mesh

When to Use Buttercup CRS

The evidence suggests this project fits well for:

Research organizations participating in AI security challenges requiring automated vulnerability discovery and patching capabilities
Security teams with budget for LLM API consumption who need to process C/Java codebases at scale
Organizations with Kubernetes infrastructure looking to integrate AI-driven security analysis into existing workflows

Consider alternatives when:

Working primarily with languages other than C and Java (limited target language support)
Operating under strict budget constraints (requires ongoing LLM API costs)
Needing immediate deployment without complex setup (high infrastructure requirements with Kubernetes dependency)