DeepResearch: Multi-Agent Web Research Framework with Reinforcement Learning
┌─────────────────────────────────────────────────────┐
│ Analysis Summary │
├─────────────────────────────────────────────────────┤
│ Type: Framework │
│ Primary Language: python + markdown + shell │
│ LOC: 13K │
│ Test Files: 0 │
│ Architecture: python │
│ Confidence: Medium │
└─────────────────────────────────────────────────────┘
Analyzed: 569126e3 from 2025-10-05
DeepResearch: Multi-Agent Web Research Framework with Reinforcement Learning
The DeepResearch project from Alibaba-NLP presents Tongyi DeepResearch, a 30.5 billion parameter large language model designed specifically for long-horizon, deep information-seeking tasks. The framework combines web browsing capabilities, document processing, and reinforcement learning to create an autonomous research agent that can navigate websites, extract information, and synthesize comprehensive responses.
The system demonstrates state-of-the-art performance across multiple agentic search benchmarks including Humanity’s Last Exam, BrowserComp, WebWalkerQA, and SimpleQA. The architecture supports both ReAct inference for evaluating core model abilities and an “IterResearch-based Heavy mode” that uses test-time scaling strategies to maximize performance.
Quick Start
The project requires Python 3.10.0 and uses conda for environment management:
# Example with Conda
conda create -n react_infer_env python=3.10.0
conda activate react_infer_env
pip install -r requirements.txt
Configuration involves copying the example environment file and setting up API keys:
# Copy the example environment file
cp .env.example .env
Time to first result: ~10 minutes (after API key configuration)
Alternative Approaches
| Framework | Training Speed | Inference Speed | Deployment | Ecosystem Size |
|---|---|---|---|---|
| DeepResearch | Custom RL training | 30B-A3B model | Local/Cloud | Alibaba ecosystem |
| LangChain Agents | N/A (no training) | Varies by LLM | Easy | Large |
| AutoGPT | N/A (no training) | OpenAI API dependent | Medium | Medium |
| ReAct Framework | N/A (no training) | Model dependent | Easy | Growing |
| WebVoyager | Research-focused | Model dependent | Complex | Academic |
Architecture and Implementation
The codebase reveals a sophisticated multi-component architecture centered around web interaction and content processing. The core web navigation functionality resides in WebAgent/WebWalker/src/app.py, which implements link extraction and page visitation tools.
The extract_links_with_text function (lines 56-70) demonstrates the HTML parsing approach:
def extract_links_with_text(html):
"""
Args:
html (str): html content
Returns:
str: clickable buttons
"""
with open("ROOT_URL.txt", "r") as f:
ROOT_URL = f.read()
soup = BeautifulSoup(html, 'html.parser')
links = []
for a_tag in soup.find_all('a', href=True):
url = a_tag['href']
The VisitPage class (lines 211-225) extends BaseTool to provide webpage analysis capabilities:
class VisitPage(BaseTool):
"""
description: A tool that visits a webpage and extracts the content of the page.
parameters:
- name: url
type: string
description: The URL of the webpage to visit.
required: true
"""
description = 'A tool analyzes the content of a webpage and extracts buttons associated with sublinks. Simply input the button which you want to explore, and the tool will return both the markdown-formatted content of the corresponding page of button and a list of new clickable buttons found on the new page.'
The web content summarization component in WebAgent/WebResummer/src/tool_visit.py implements a robust content processing pipeline. The Visit class (lines 31-45) handles both single URLs and URL arrays:
class Visit(BaseTool):
# The `description` tells the agent the functionality of this tool.
name = 'visit'
description = 'Visit webpage(s) and return the summary of the content.'
# The `parameters` tell the agent what input parameters the tool has.
parameters = {
"type": "object",
"properties": {
"url": {
"type": ["string", "array"],
"items": {
"type": "string"
},
"minItems": 1,
"description": "The URL(s) of the webpage(s) to visit. Can be a single URL or an array of URLs."
The system implements token-aware content truncation using tiktoken (lines 19-33):
def truncate_to_tokens(text: str, max_tokens: int = 95000) -> str:
encoding = tiktoken.get_encoding("cl100k_base")
tokens = encoding.encode(text)
if len(tokens) <= max_tokens:
return text
truncated_tokens = tokens[:max_tokens]
return encoding.decode(truncated_tokens)
Performance Characteristics
Bundle Impact:
- Total dependencies: 188 packages
- Core codebase: 12,937 lines across 77 files
- Python components: 9,873 lines
- Documentation: 2,596 lines markdown
Model Specifications:
- Model size: 30.5B total parameters (3.3B activated per token)
- Context length: 128K tokens
- Architecture: Mixture of experts design
Other Considerations:
- Runtime dependencies: 188 packages including ML frameworks
- Test coverage: 0 test files identified
- Example implementations: 18 example files
- Build tooling: Shell scripts for deployment
Best for: Long-horizon research tasks requiring web navigation and multi-source information synthesis
Model Performance and Requirements
The model architecture uses a 30.5B parameter design with sparse activation (3.3B parameters per token), optimizing for both capability and efficiency. The system supports 128K token context length, enabling processing of extensive web content and research materials.
Training employs end-to-end reinforcement learning with Group Relative Policy Optimization, featuring token-level policy gradients and leave-one-out advantage estimation. The training pipeline includes synthetic data generation for agentic pre-training, supervised fine-tuning, and reinforcement learning phases.
Inference requirements include GPU memory for the 30B model, though the sparse activation reduces computational demands. The system supports both local deployment and cloud services through Alibaba’s Bailian platform. Response generation involves multiple API calls for web scraping (Jina, Serper), content summarization (OpenAI-compatible), and document processing (Dashscope).
The framework demonstrates superior performance on research-oriented benchmarks, with the “Heavy mode” providing additional test-time scaling for maximum performance on complex queries.
When to Use DeepResearch
The evidence suggests this project fits well for:
- Academic research requiring comprehensive web-based information gathering across multiple sources
- Enterprise applications needing autonomous document analysis and synthesis capabilities
- Long-horizon question-answering tasks that require multi-step reasoning and evidence collection
Consider alternatives when:
- Simple web scraping tasks that don’t require advanced reasoning or synthesis
- Applications with strict latency requirements where the 30B model overhead is prohibitive
- Scenarios requiring extensive customization of the underlying language model architecture