OpenManus Architecture

Introduction to OpenManus

OpenManus is an open-source project aimed at replicating the capabilities of Manus AI, a groundbreaking general-purpose AI system. It uses a modular, containerized framework built with Docker, Python, and JavaScript to create a multi-agent AI system capable of autonomously executing complex tasks.

This powerful system can handle diverse tasks ranging from personalized travel planning to stock analysis, leveraging a collaborative team of AI agents working together to solve problems.

Python 3.9+ JavaScript ES6+ Docker Open Source Multi-Agent Architecture

Key Features

Multi-Agent System: Collaborative AI agents working together to solve complex tasks
Dockerized Environment: Easy setup and deployment with containerization
Task Execution: Supports tasks like travel planning, data analysis, and content generation
Tool Integration: Web browsing, code execution, and data retrieval capabilities
Modular Design: Easily extendable with new agents, tools, or features

Learning Journey Overview

This guide takes you through a progressive learning journey to understand OpenManus:

Understanding Multi-Agent System Architecture
Exploring Different Agent Types
Diving into the Workflow System
Learning about Tool Integration
Putting it All Together - How Components Work in Harmony

1

Multi-Agent System Architecture

At its core, OpenManus is built on a multi-agent architecture where specialized AI agents collaborate to solve complex tasks. This modular design enables high code reusability, strong extensibility, and clear separation of responsibilities.

OpenManus Component Architecture

Core Components

Agent Layer

The brain of OpenManus, consisting of specialized AI agents that handle different aspects of task execution. Agents are organized in a hierarchical structure, from basic proxies to specialized ones.

LLM Component

Handles interactions with large language models, serving as the intelligence engine that powers decision-making, content generation, and understanding.

Memory Component

Stores and manages conversation history and context, ensuring coherent and contextually relevant interactions across multiple exchanges.

Tools Component

Provides interfaces for agents to interact with external systems and perform actions like web browsing, code execution, and data retrieval.

Flow Component

Manages the workflows and execution patterns, coordinating how multiple agents collaborate to solve complex tasks.

Prompt Component

Defines the behavior patterns and guidelines for agents, shaping how they respond to tasks and make decisions.

Project Structure

OpenManus Project Structure

OpenManus/
 docker/               # Docker configurations
    frontend/        # Next.js frontend container
       Dockerfile   # Frontend container configuration
    unified/         # Backend container configuration
        Dockerfile   # Backend container configuration
        start.sh     # Container startup script
 src/                 # Source code
    agents/          # Multi-agent logic (Python)
       nodes/       # Agent node implementations
       browser_agent.py
       coder_agent.py
       coordinator.py
       reporter_agent.py
       research_agent.py
    components/      # React components
    config/          # Configuration files
    graph/           # Graph-based workflow
    llms/            # LLM integrations
    pages/           # Next.js pages
    prompts/         # Agent prompts
    service/         # Backend services
    tools/           # Tool implementations
    utils/           # Utility functions
    workflow/        # Workflow management
    client.py        # CLI client for testing
    server.py        # FastAPI server
 docs/                # Documentation and API specs
 package.json         # Next.js frontend dependencies
 next.config.js       # Next.js configuration
 docker-compose.yml   # Docker Compose configuration
 README.md           # Main documentation file

2

Different Agent Types

OpenManus implements a hierarchical agent structure, with each agent type building upon the capabilities of the previous one. This modular approach allows for specialized agents that excel at specific tasks while sharing common functionality.

Agent Hierarchy

BaseAgent

ReActAgent

ToolCallAgent

PlanningAgent

Manus

BaseAgent

BaseAgent is the foundation of the entire agent framework, defining the core attributes and methods that all agents share. It handles basic state management, memory operations, and the execution lifecycle.

BaseAgent Implementation

class BaseAgent(BaseModel, ABC):
    """Abstract base class for managing agent state and execution."""
    # Core attributes
    name: str = Field(..., description="Unique name of the agent")
    description: Optional[str] = Field(None, description="Optional agent description")

    # Prompts
    system_prompt: Optional[str] = Field(None, description="System-level instruction prompt")
    next_step_prompt: Optional[str] = Field(None, description="Prompt for determining next action")

    # Dependent components
    llm: LLM = Field(default_factory=LLM, description="Language model instance")
    memory: Memory = Field(default_factory=Memory, description="Agent's memory store")
    state: AgentState = Field(default=AgentState.IDLE, description="Current agent state")

    # Execution control
    max_steps: int = Field(default=10, description="Maximum steps before termination")
    current_step: int = Field(default=0, description="Current step in execution")

Key Responsibilities:

Managing agent state (idle, thinking, acting, etc.)
Storing and retrieving messages from memory
Handling basic execution lifecycle
Providing core attributes that all agents need

ReActAgent

ReActAgent extends BaseAgent by implementing the "Think-Act" pattern, which divides the agent's execution into two distinct phases: a thinking phase for decision making and an action phase for execution.

ReActAgent Implementation

class ReActAgent(BaseAgent, ABC):
    @abstractmethod
    async def think(self) -> bool:
        """Process the current state and decide the next action."""

    @abstractmethod
    async def act(self) -> str:
        """Execute the decided actions."""

    async def step(self) -> str:
        """Execute a single step: think and act."""
        should_act = await self.think()
        if not should_act:
            return "Thinking complete - no action needed"
        return await self.act()

Key Responsibilities:

Implementing the Think-Act pattern for decision making
Separating the reasoning process from action execution
Providing a step method that orchestrates a think-act cycle
Enabling more sophisticated agent behavior through deliberation

ToolCallAgent

ToolCallAgent extends ReActAgent by adding the ability to interact with external tools and APIs. This enables the agent to perform actions like web browsing, code execution, and data retrieval.

ToolCallAgent Implementation

class ToolCallAgent(ReActAgent):
    """Base agent class for handling tool/function calls with enhanced abstraction"""

    available_tools: ToolCollection = ToolCollection(
        CreateChatCompletion(), Terminate()
    )
    tool_choices: Literal["none", "auto", "required"] = "auto"

    async def think(self) -> bool:
        # Get the LLM response and tool selection
        response = await self.llm.ask_tool(
            messages=self.messages,
            system_msgs=[Message.system_message(self.system_prompt)]
            if self.system_prompt
            else None,
            tools=self.available_tools.to_params(),
            tool_choice=self.tool_choices,
        )
        self.tool_calls = response.tool_calls

        # Process the response and tool calls
        # ...

    async def act(self) -> str:
        # Execute tool calls
        results = []
        for command in self.tool_calls:
            result = await self.execute_tool(command)
            # Add tool response to memory
            # ...
            results.append(result)

        return "\n\n".join(results)

Key Responsibilities:

Managing available tools and their parameters
Interpreting tool calls from LLM responses
Executing tool operations and handling results
Providing a bridge between the agent's reasoning and external actions

PlanningAgent

PlanningAgent extends ToolCallAgent by adding planning capabilities, allowing it to break down complex tasks into manageable steps and track progress through the execution of a plan.

PlanningAgent Implementation

class PlanningAgent(ToolCallAgent):
    """
    An agent that creates and manages plans to solve tasks.
    This agent uses a planning tool to create and manage structured plans,
    and tracks progress through individual steps until task completion.
    """
    name: str = "planning"
    description: str = "An agent that creates and manages plans to solve tasks"
    system_prompt: str = PLANNING_SYSTEM_PROMPT
    next_step_prompt: str = NEXT_STEP_PROMPT
    available_tools: ToolCollection = Field(
        default_factory=lambda: ToolCollection(PlanningTool(), Terminate())
    )

    # Step execution tracker
    step_execution_tracker: Dict[str, Dict] = Field(default_factory=dict)
    current_step_index: Optional[int] = None

    async def think(self) -> bool:
        """Decide the next action based on plan status."""
        prompt = (
            f"CURRENT PLAN STATUS:\n{await self.get_plan()}\n\n{self.next_step_prompt}"
            if self.active_plan_id
            else self.next_step_prompt
        )
        self.messages.append(Message.user_message(prompt))

        # Get the current step index
        self.current_step_index = await self._get_current_step_index()
        result = await super().think()
        # Associate tool calls with the current step
        if result and self.tool_calls:
            # ...association logic...
        return result

Key Responsibilities:

Creating and managing plans for complex tasks
Breaking tasks into logical, sequential steps
Tracking progress through plan execution
Providing status updates and adapting plans as needed

Manus

Manus is the flagship agent of OpenManus, combining all the capabilities of previous agent types with additional specialized tools to create a versatile, general-purpose AI assistant.

Manus Implementation

class Manus(ToolCallAgent):
    """
    A versatile general-purpose agent that uses planning to solve various tasks.
    This agent extends PlanningAgent with a comprehensive set of tools and capabilities,
    including Python execution, web browsing, file operations, and information retrieval
    to handle a wide range of user requests.
    """
    name: str = "manus"
    description: str = "A versatile general-purpose agent"
    system_prompt: str = SYSTEM_PROMPT
    next_step_prompt: str = NEXT_STEP_PROMPT
    available_tools: ToolCollection = Field(
        default_factory=lambda: ToolCollection(
            PythonExecute(), GoogleSearch(), BrowserUseTool(), FileSaver(), Terminate()
        )
    )

Key Responsibilities:

Providing a comprehensive set of tools for general-purpose use
Handling a wide range of user requests and tasks
Combining planning, execution, and specialized capabilities
Serving as the primary interface for end-user interactions

Comparing Agent Capabilities

Agent Type	Basic State Management	Think-Act Pattern	Tool Usage	Planning	Specialized Capabilities
BaseAgent	✅	❌	❌	❌	❌
ReActAgent	✅	✅	❌	❌	❌
ToolCallAgent	✅	✅	✅	❌	❌
PlanningAgent	✅	✅	✅	✅	❌
Manus	✅	✅	✅	✅	✅

3

Workflow System

OpenManus's workflow system orchestrates how agents collaborate to solve complex tasks. The Flow component manages these workflows, determining which agents handle which parts of a task and how their results are integrated.

Workflow Execution

User Input

Coordinator

Research Agent

Planning Agent

Browser Agent

Coder Agent

Reporter Agent

Final Output

Flow Components

BaseFlow Implementation

class BaseFlow(BaseModel, ABC):
    """Base class for execution flows supporting multiple agents"""

    agents: Dict[str, BaseAgent]
    tools: Optional[List] = None
    primary_agent_key: Optional[str] = None

    @property
    def primary_agent(self) -> Optional[BaseAgent]:
        """Get the primary agent for the flow"""
        return self.agents.get(self.primary_agent_key)

    @abstractmethod
    async def execute(self, input_text: str) -> str:
        """Execute the flow with the given input"""

PlanningFlow Implementation

class PlanningFlow(BaseFlow):
    """A flow that manages planning and execution of tasks using agents."""
    llm: LLM = Field(default_factory=lambda: LLM())
    planning_tool: PlanningTool = Field(default_factory=PlanningTool)
    executor_keys: List[str] = Field(default_factory=list)
    active_plan_id: str = Field(default_factory=lambda: f"plan_{int(time.time())}")
    current_step_index: Optional[int] = None

    async def execute(self, input_text: str) -> str:
        """Execute the planning flow with agents."""
        try:
            # Create the initial plan
            if input_text:
                await self._create_initial_plan(input_text)

            # Execute plan steps
            while await self._has_next_step():
                # Get the current step
                step_info = await self._get_current_step()

                # Select the appropriate executor
                executor = self.get_executor(step_info.get("type"))

                # Execute the step
                result = await self._execute_step(executor, step_info)

                # Update the step status
                await self._update_step_status(step_info["index"], "completed")

            # Complete the plan
            return await self._finalize_plan()

        except Exception as e:
            # Handle exceptions
            return f"Error executing flow: {str(e)}"

Graph-Based Workflow

OpenManus implements a graph-based workflow system that allows for flexible orchestration of agent activities. Nodes in the graph represent agents or actions, while edges represent the flow of data and control.

Key Benefit: This approach enables complex branching logic and parallel execution paths.

Task Decomposition

When a user submits a task, the workflow system breaks it down into smaller, manageable sub-tasks. Each sub-task is assigned to the most suitable agent based on its capabilities.

Key Benefit: Complex problems become tractable through effective division of labor.

Agent Coordination

The workflow system handles communication and coordination between agents, ensuring they can share information and build upon each other's work. This coordination is managed by specialized flow components.

Key Benefit: Agents can collaborate effectively without direct knowledge of each other's internals.

Result Integration

As agents complete their assigned sub-tasks, their results are collected and integrated into a coherent final output. This integration considers dependencies between sub-tasks and ensures logical flow.

Key Benefit: Users receive unified responses that represent the collective intelligence of the system.

Workflow Execution Example

Consider a user asking OpenManus to "Plan a 3-day trip to Tokyo with a budget of $1000":

Task Analysis: The coordinator analyzes the request and identifies it as a travel planning task.
Task Delegation: The coordinator assigns research on Tokyo attractions to the Research Agent, budget analysis to the Planning Agent, and web searches for current prices to the Browser Agent.
Parallel Execution: Agents work simultaneously on their assigned tasks, communicating interim results as needed.
Integration: The Reporter Agent collects all findings and generates a comprehensive 3-day itinerary within budget.
Response: The finalized trip plan is presented to the user, with details on attractions, accommodations, transportation, and costs.

4

Tool Integration

Tools are the interfaces through which OpenManus agents interact with the external world. The flexible tool system allows agents to perform a wide range of actions, from web browsing to code execution.

Tool Integration Architecture

Tool System

BaseTool Implementation

class BaseTool(ABC, BaseModel):
    name: str
    description: str
    parameters: Optional[dict] = None

    async def __call__(self, **kwargs) -> Any:
        """Execute the tool with given parameters."""
        return await self.execute(**kwargs)

    @abstractmethod
    async def execute(self, **kwargs) -> Any:
        """Execute the tool with given parameters."""

    def to_param(self) -> Dict:
        """Convert tool to function call format."""
        return {
            "type": "function",
            "function": {
                "name": self.name,
                "description": self.description,
                "parameters": self.parameters,
            },
        }

ToolResult Implementation

class ToolResult(BaseModel):
    """Represents the result of a tool execution."""
    output: Any = Field(default=None)
    error: Optional[str] = Field(default=None)
    system: Optional[str] = Field(default=None)

Web Tools

GoogleSearch: Performs web searches
BrowserUseTool: Opens and navigates web browsers
WebScraper: Extracts data from web pages
APITool: Makes calls to external APIs

Code Tools

PythonExecute: Runs Python code
JSExecute: Executes JavaScript code
ShellTool: Runs shell commands
GitTool: Manages Git repositories

Data Tools

FileSaver: Saves files to disk
FileReader: Reads files from disk
DataAnalyzer: Performs data analysis
DataVisualizer: Creates visualizations

Tool Usage Example

Let's look at how an agent might use the PythonExecute tool to perform data analysis:

PythonExecute Tool Example

# Tool definition
class PythonExecute(BaseTool):
    name: str = "python_execute"
    description: str = "Execute Python code and return the result."
    parameters: dict = {
        "type": "object",
        "properties": {
            "code": {
                "type": "string",
                "description": "The Python code to execute.",
            }
        },
        "required": ["code"],
    }

    async def execute(self, code: str) -> ToolResult:
        try:
            # Set up a secure execution environment
            local_vars = {}

            # Execute the code
            exec(code, {"__builtins__": __builtins__}, local_vars)

            # Return the result
            return ToolResult(output=local_vars.get("result", "Code executed successfully"))
        except Exception as e:
            return ToolResult(error=str(e))

# Agent using the tool
async def analyze_data(agent, dataset_url):
    # First, download the dataset
    browser_result = await agent.execute_tool({
        "name": "browser_use",
        "arguments": {"url": dataset_url}
    })

    # Now, analyze the data with Python
    python_result = await agent.execute_tool({
        "name": "python_execute",
        "arguments": {
            "code": """
import pandas as pd
import matplotlib.pyplot as plt

# Load the data
data = pd.read_csv('downloaded_data.csv')

# Perform analysis
summary = data.describe()
correlations = data.corr()

# Create visualization
plt.figure(figsize=(10, 6))
data.plot(kind='scatter', x='feature1', y='feature2')
plt.savefig('analysis_plot.png')

# Store result for return
result = {
    'summary': summary.to_dict(),
    'correlations': correlations.to_dict()
}
"""
        }
    })

    # Save the visualization
    save_result = await agent.execute_tool({
        "name": "file_saver",
        "arguments": {
            "file_path": "analysis_results.json",
            "content": json.dumps(python_result.output)
        }
    })

    return {
        "analysis": python_result.output,
        "visualization": "analysis_plot.png",
        "saved_results": save_result.output
    }

5

Putting It All Together

Now that we've explored the individual components of OpenManus, let's see how they all work together to create a powerful multi-agent AI system capable of handling complex tasks.

Complete System Architecture

End-to-End Task Execution

1. User Input

User submits a task request via the CLI, API, or web interface.

python client.py --task "Build a dashboard to visualize Tesla stock trends for the past year"

2. Task Analysis

The Coordinator Agent analyzes the task and creates a plan using the Planning Agent.


Plan created:
1. Research Tesla stock data sources
2. Collect historical stock data
3. Select visualization framework
4. Create dashboard code
5. Test and refine dashboard

3. Parallel Agent Execution

Multiple agents work on different aspects of the task in parallel.

Research Agent: Finds reliable stock data APIs
Browser Agent: Navigates to financial websites to confirm data availability
Planning Agent: Updates the plan with specific implementation details

4. Tool Utilization

Agents use various tools to perform actions and gather information.


# Browser Agent uses BrowserUseTool
await agent.execute_tool({
    "name": "browser_use",
    "arguments": {"url": "https://finance.yahoo.com/quote/TSLA"}
})

# Coder Agent uses PythonExecute
await agent.execute_tool({
    "name": "python_execute",
    "arguments": {"code": "import yfinance as yf\ndata = yf.download('TSLA', period='1y')\nresult = data.head()"}
})

5. Integration and Output

The Flow Manager collects results from all agents and constructs the final solution.

Combines code snippets from different agents
Ensures integration between components
Verifies the solution meets requirements
Delivers final code and instructions to the user

Advanced Use Cases

OpenManus's architecture enables a wide range of complex applications:

Autonomous Research Assistant

Conducts comprehensive research on topics, synthesizing information from multiple sources, verifying facts, and generating coherent reports.

Full-Stack Development Helper

Plans, codes, and tests applications based on user requirements, handling both frontend and backend components with appropriate frameworks.

Data Analysis Pipeline

Collects, cleans, analyzes, and visualizes data from various sources, applying appropriate statistical methods and creating insightful visualizations.

Content Generation System

Researches topics, plans content structure, generates written material, creates visual assets, and optimizes for specific platforms and audiences.

Conclusion

OpenManus represents a significant step forward in open-source AI agent architecture, offering a modular, extensible framework for building powerful multi-agent systems. By understanding its architecture, you can now:

Understand how multiple specialized agents can collaborate to solve complex tasks
See how the hierarchical agent structure provides both simplicity and power
Recognize the importance of flexible tool integration for real-world capabilities
Appreciate the workflow system that orchestrates agent activities
Implement or extend the OpenManus architecture for your own projects

Next Steps

Ready to dive deeper into OpenManus? Here are some ways to continue your journey:

Try It Yourself

Set up the OpenManus environment and experiment with its capabilities.

Visit GitHub Repository

Extend Functionality

Create new specialized agents or tools to enhance OpenManus's capabilities.

Contribution Guidelines

Join the Community

Connect with other developers and researchers working on AI agent systems.

Discussions & Issues