DeepResearch: Multi-Agent Web Research Framework with Reinforcement Learning

October 5, 2025

python #python #code-analysis #evidence-based

┌─────────────────────────────────────────────────────┐
│ Analysis Summary                                    │
├─────────────────────────────────────────────────────┤
│ Type: Framework                                     │
│ Primary Language: python + markdown + shell         │
│ LOC: 13K                                            │
│ Test Files: 0                                       │
│ Architecture: python                                │
│ Confidence: Medium                                  │
└─────────────────────────────────────────────────────┘

Analyzed: 569126e3 from 2025-10-05

DeepResearch: Multi-Agent Web Research Framework with Reinforcement Learning

The DeepResearch project from Alibaba-NLP presents Tongyi DeepResearch, a 30.5 billion parameter large language model designed specifically for long-horizon, deep information-seeking tasks. The framework combines web browsing capabilities, document processing, and reinforcement learning to create an autonomous research agent that can navigate websites, extract information, and synthesize comprehensive responses.

The system demonstrates state-of-the-art performance across multiple agentic search benchmarks including Humanity’s Last Exam, BrowserComp, WebWalkerQA, and SimpleQA. The architecture supports both ReAct inference for evaluating core model abilities and an “IterResearch-based Heavy mode” that uses test-time scaling strategies to maximize performance.

Quick Start

The project requires Python 3.10.0 and uses conda for environment management:

# Example with Conda
conda create -n react_infer_env python=3.10.0
conda activate react_infer_env
pip install -r requirements.txt

Configuration involves copying the example environment file and setting up API keys:

# Copy the example environment file
cp .env.example .env

Time to first result: ~10 minutes (after API key configuration)

Alternative Approaches

Framework	Training Speed	Inference Speed	Deployment	Ecosystem Size
DeepResearch	Custom RL training	30B-A3B model	Local/Cloud	Alibaba ecosystem
LangChain Agents	N/A (no training)	Varies by LLM	Easy	Large
AutoGPT	N/A (no training)	OpenAI API dependent	Medium	Medium
ReAct Framework	N/A (no training)	Model dependent	Easy	Growing
WebVoyager	Research-focused	Model dependent	Complex	Academic

Architecture and Implementation

The codebase reveals a sophisticated multi-component architecture centered around web interaction and content processing. The core web navigation functionality resides in WebAgent/WebWalker/src/app.py, which implements link extraction and page visitation tools.

The extract_links_with_text function (lines 56-70) demonstrates the HTML parsing approach:

def extract_links_with_text(html):
    """
    Args:
        html (str): html content
    
    Returns:
        str: clickable buttons
    """
    with open("ROOT_URL.txt", "r") as f:
        ROOT_URL = f.read()
    soup = BeautifulSoup(html, 'html.parser')
    links = []

    for a_tag in soup.find_all('a', href=True):
        url = a_tag['href']

The VisitPage class (lines 211-225) extends BaseTool to provide webpage analysis capabilities:

class VisitPage(BaseTool):
    """
    description: A tool that visits a webpage and extracts the content of the page.
    parameters:
        - name: url
            type: string
            description: The URL of the webpage to visit.
            required: true
    """
    description = 'A tool analyzes the content of a webpage and extracts buttons associated with sublinks. Simply input the button which you want to explore, and the tool will return both the markdown-formatted content of the corresponding page of button and a list of new clickable buttons found on the new page.'

The web content summarization component in WebAgent/WebResummer/src/tool_visit.py implements a robust content processing pipeline. The Visit class (lines 31-45) handles both single URLs and URL arrays:

class Visit(BaseTool):
    # The `description` tells the agent the functionality of this tool.
    name = 'visit'
    description = 'Visit webpage(s) and return the summary of the content.'
    # The `parameters` tell the agent what input parameters the tool has.
    parameters = {
        "type": "object",
        "properties": {
            "url": {
                "type": ["string", "array"],
                "items": {
                    "type": "string"
                    },
                "minItems": 1,
                "description": "The URL(s) of the webpage(s) to visit. Can be a single URL or an array of URLs."

The system implements token-aware content truncation using tiktoken (lines 19-33):

def truncate_to_tokens(text: str, max_tokens: int = 95000) -> str:
    encoding = tiktoken.get_encoding("cl100k_base")
    
    tokens = encoding.encode(text)
    if len(tokens) <= max_tokens:
        return text
    
    truncated_tokens = tokens[:max_tokens]
    return encoding.decode(truncated_tokens)

Performance Characteristics

Bundle Impact:

Total dependencies: 188 packages
Core codebase: 12,937 lines across 77 files
Python components: 9,873 lines
Documentation: 2,596 lines markdown

Model Specifications:

Model size: 30.5B total parameters (3.3B activated per token)
Context length: 128K tokens
Architecture: Mixture of experts design

Other Considerations:

Runtime dependencies: 188 packages including ML frameworks
Test coverage: 0 test files identified
Example implementations: 18 example files
Build tooling: Shell scripts for deployment

Best for: Long-horizon research tasks requiring web navigation and multi-source information synthesis

Model Performance and Requirements

The model architecture uses a 30.5B parameter design with sparse activation (3.3B parameters per token), optimizing for both capability and efficiency. The system supports 128K token context length, enabling processing of extensive web content and research materials.

Training employs end-to-end reinforcement learning with Group Relative Policy Optimization, featuring token-level policy gradients and leave-one-out advantage estimation. The training pipeline includes synthetic data generation for agentic pre-training, supervised fine-tuning, and reinforcement learning phases.

Inference requirements include GPU memory for the 30B model, though the sparse activation reduces computational demands. The system supports both local deployment and cloud services through Alibaba’s Bailian platform. Response generation involves multiple API calls for web scraping (Jina, Serper), content summarization (OpenAI-compatible), and document processing (Dashscope).

The framework demonstrates superior performance on research-oriented benchmarks, with the “Heavy mode” providing additional test-time scaling for maximum performance on complex queries.

When to Use DeepResearch

The evidence suggests this project fits well for:

Academic research requiring comprehensive web-based information gathering across multiple sources
Enterprise applications needing autonomous document analysis and synthesis capabilities
Long-horizon question-answering tasks that require multi-step reasoning and evidence collection

Consider alternatives when:

Simple web scraping tasks that don’t require advanced reasoning or synthesis
Applications with strict latency requirements where the 30B model overhead is prohibitive
Scenarios requiring extensive customization of the underlying language model architecture