Al-banna Tutorials LogoHome

OpenManus Architecture

A step-by-step guide to understanding the open-source multi-agent AI system

Introduction to OpenManus

OpenManus is an open-source project aimed at replicating the capabilities of Manus AI, a groundbreaking general-purpose AI system. It uses a modular, containerized framework built with Docker, Python, and JavaScript to create a multi-agent AI system capable of autonomously executing complex tasks.

This powerful system can handle diverse tasks ranging from personalized travel planning to stock analysis, leveraging a collaborative team of AI agents working together to solve problems.

Python 3.9+ JavaScript ES6+ Docker Open Source Multi-Agent Architecture

Key Features

  • Multi-Agent System: Collaborative AI agents working together to solve complex tasks
  • Dockerized Environment: Easy setup and deployment with containerization
  • Task Execution: Supports tasks like travel planning, data analysis, and content generation
  • Tool Integration: Web browsing, code execution, and data retrieval capabilities
  • Modular Design: Easily extendable with new agents, tools, or features

Learning Journey Overview

This guide takes you through a progressive learning journey to understand OpenManus:

  1. Understanding Multi-Agent System Architecture
  2. Exploring Different Agent Types
  3. Diving into the Workflow System
  4. Learning about Tool Integration
  5. Putting it All Together - How Components Work in Harmony
1

Multi-Agent System Architecture

At its core, OpenManus is built on a multi-agent architecture where specialized AI agents collaborate to solve complex tasks. This modular design enables high code reusability, strong extensibility, and clear separation of responsibilities.

OpenManus Component Architecture

Agent Layer LLM Memory Tools Layer Flow Prompt

Core Components

Agent Layer

The brain of OpenManus, consisting of specialized AI agents that handle different aspects of task execution. Agents are organized in a hierarchical structure, from basic proxies to specialized ones.

LLM Component

Handles interactions with large language models, serving as the intelligence engine that powers decision-making, content generation, and understanding.

Memory Component

Stores and manages conversation history and context, ensuring coherent and contextually relevant interactions across multiple exchanges.

Tools Component

Provides interfaces for agents to interact with external systems and perform actions like web browsing, code execution, and data retrieval.

Flow Component

Manages the workflows and execution patterns, coordinating how multiple agents collaborate to solve complex tasks.

Prompt Component

Defines the behavior patterns and guidelines for agents, shaping how they respond to tasks and make decisions.

Project Structure

OpenManus Project Structure
OpenManus/
 docker/               # Docker configurations
    frontend/        # Next.js frontend container
       Dockerfile   # Frontend container configuration
    unified/         # Backend container configuration
        Dockerfile   # Backend container configuration
        start.sh     # Container startup script
 src/                 # Source code
    agents/          # Multi-agent logic (Python)
       nodes/       # Agent node implementations
       browser_agent.py
       coder_agent.py
       coordinator.py
       reporter_agent.py
       research_agent.py
    components/      # React components
    config/          # Configuration files
    graph/           # Graph-based workflow
    llms/            # LLM integrations
    pages/           # Next.js pages
    prompts/         # Agent prompts
    service/         # Backend services
    tools/           # Tool implementations
    utils/           # Utility functions
    workflow/        # Workflow management
    client.py        # CLI client for testing
    server.py        # FastAPI server
 docs/                # Documentation and API specs
 package.json         # Next.js frontend dependencies
 next.config.js       # Next.js configuration
 docker-compose.yml   # Docker Compose configuration
 README.md           # Main documentation file
2

Different Agent Types

OpenManus implements a hierarchical agent structure, with each agent type building upon the capabilities of the previous one. This modular approach allows for specialized agents that excel at specific tasks while sharing common functionality.

Agent Hierarchy

BaseAgent ReActAgent ToolCallAgent PlanningAgent SWEAgent Manus
BaseAgent
ReActAgent
ToolCallAgent
PlanningAgent
Manus

BaseAgent

BaseAgent is the foundation of the entire agent framework, defining the core attributes and methods that all agents share. It handles basic state management, memory operations, and the execution lifecycle.

BaseAgent Implementation
class BaseAgent(BaseModel, ABC):
    """Abstract base class for managing agent state and execution."""
    # Core attributes
    name: str = Field(..., description="Unique name of the agent")
    description: Optional[str] = Field(None, description="Optional agent description")

    # Prompts
    system_prompt: Optional[str] = Field(None, description="System-level instruction prompt")
    next_step_prompt: Optional[str] = Field(None, description="Prompt for determining next action")

    # Dependent components
    llm: LLM = Field(default_factory=LLM, description="Language model instance")
    memory: Memory = Field(default_factory=Memory, description="Agent's memory store")
    state: AgentState = Field(default=AgentState.IDLE, description="Current agent state")

    # Execution control
    max_steps: int = Field(default=10, description="Maximum steps before termination")
    current_step: int = Field(default=0, description="Current step in execution")

Key Responsibilities:

  • Managing agent state (idle, thinking, acting, etc.)
  • Storing and retrieving messages from memory
  • Handling basic execution lifecycle
  • Providing core attributes that all agents need

ReActAgent

ReActAgent extends BaseAgent by implementing the "Think-Act" pattern, which divides the agent's execution into two distinct phases: a thinking phase for decision making and an action phase for execution.

ReActAgent Implementation
class ReActAgent(BaseAgent, ABC):
    @abstractmethod
    async def think(self) -> bool:
        """Process the current state and decide the next action."""

    @abstractmethod
    async def act(self) -> str:
        """Execute the decided actions."""

    async def step(self) -> str:
        """Execute a single step: think and act."""
        should_act = await self.think()
        if not should_act:
            return "Thinking complete - no action needed"
        return await self.act()

Key Responsibilities:

  • Implementing the Think-Act pattern for decision making
  • Separating the reasoning process from action execution
  • Providing a step method that orchestrates a think-act cycle
  • Enabling more sophisticated agent behavior through deliberation

ToolCallAgent

ToolCallAgent extends ReActAgent by adding the ability to interact with external tools and APIs. This enables the agent to perform actions like web browsing, code execution, and data retrieval.

ToolCallAgent Implementation
class ToolCallAgent(ReActAgent):
    """Base agent class for handling tool/function calls with enhanced abstraction"""

    available_tools: ToolCollection = ToolCollection(
        CreateChatCompletion(), Terminate()
    )
    tool_choices: Literal["none", "auto", "required"] = "auto"

    async def think(self) -> bool:
        # Get the LLM response and tool selection
        response = await self.llm.ask_tool(
            messages=self.messages,
            system_msgs=[Message.system_message(self.system_prompt)]
            if self.system_prompt
            else None,
            tools=self.available_tools.to_params(),
            tool_choice=self.tool_choices,
        )
        self.tool_calls = response.tool_calls

        # Process the response and tool calls
        # ...

    async def act(self) -> str:
        # Execute tool calls
        results = []
        for command in self.tool_calls:
            result = await self.execute_tool(command)
            # Add tool response to memory
            # ...
            results.append(result)

        return "\n\n".join(results)

Key Responsibilities:

  • Managing available tools and their parameters
  • Interpreting tool calls from LLM responses
  • Executing tool operations and handling results
  • Providing a bridge between the agent's reasoning and external actions

PlanningAgent

PlanningAgent extends ToolCallAgent by adding planning capabilities, allowing it to break down complex tasks into manageable steps and track progress through the execution of a plan.

PlanningAgent Implementation
class PlanningAgent(ToolCallAgent):
    """
    An agent that creates and manages plans to solve tasks.
    This agent uses a planning tool to create and manage structured plans,
    and tracks progress through individual steps until task completion.
    """
    name: str = "planning"
    description: str = "An agent that creates and manages plans to solve tasks"
    system_prompt: str = PLANNING_SYSTEM_PROMPT
    next_step_prompt: str = NEXT_STEP_PROMPT
    available_tools: ToolCollection = Field(
        default_factory=lambda: ToolCollection(PlanningTool(), Terminate())
    )

    # Step execution tracker
    step_execution_tracker: Dict[str, Dict] = Field(default_factory=dict)
    current_step_index: Optional[int] = None

    async def think(self) -> bool:
        """Decide the next action based on plan status."""
        prompt = (
            f"CURRENT PLAN STATUS:\n{await self.get_plan()}\n\n{self.next_step_prompt}"
            if self.active_plan_id
            else self.next_step_prompt
        )
        self.messages.append(Message.user_message(prompt))

        # Get the current step index
        self.current_step_index = await self._get_current_step_index()
        result = await super().think()
        # Associate tool calls with the current step
        if result and self.tool_calls:
            # ...association logic...
        return result

Key Responsibilities:

  • Creating and managing plans for complex tasks
  • Breaking tasks into logical, sequential steps
  • Tracking progress through plan execution
  • Providing status updates and adapting plans as needed

Manus

Manus is the flagship agent of OpenManus, combining all the capabilities of previous agent types with additional specialized tools to create a versatile, general-purpose AI assistant.

Manus Implementation
class Manus(ToolCallAgent):
    """
    A versatile general-purpose agent that uses planning to solve various tasks.
    This agent extends PlanningAgent with a comprehensive set of tools and capabilities,
    including Python execution, web browsing, file operations, and information retrieval
    to handle a wide range of user requests.
    """
    name: str = "manus"
    description: str = "A versatile general-purpose agent"
    system_prompt: str = SYSTEM_PROMPT
    next_step_prompt: str = NEXT_STEP_PROMPT
    available_tools: ToolCollection = Field(
        default_factory=lambda: ToolCollection(
            PythonExecute(), GoogleSearch(), BrowserUseTool(), FileSaver(), Terminate()
        )
    )

Key Responsibilities:

  • Providing a comprehensive set of tools for general-purpose use
  • Handling a wide range of user requests and tasks
  • Combining planning, execution, and specialized capabilities
  • Serving as the primary interface for end-user interactions

Comparing Agent Capabilities

Agent Type Basic State Management Think-Act Pattern Tool Usage Planning Specialized Capabilities
BaseAgent
ReActAgent
ToolCallAgent
PlanningAgent
Manus
3

Workflow System

OpenManus's workflow system orchestrates how agents collaborate to solve complex tasks. The Flow component manages these workflows, determining which agents handle which parts of a task and how their results are integrated.

Workflow Execution

User Input
Coordinator
Research Agent
Planning Agent
Browser Agent
Coder Agent
Reporter Agent
Final Output

Flow Components

BaseFlow Implementation
class BaseFlow(BaseModel, ABC):
    """Base class for execution flows supporting multiple agents"""

    agents: Dict[str, BaseAgent]
    tools: Optional[List] = None
    primary_agent_key: Optional[str] = None

    @property
    def primary_agent(self) -> Optional[BaseAgent]:
        """Get the primary agent for the flow"""
        return self.agents.get(self.primary_agent_key)

    @abstractmethod
    async def execute(self, input_text: str) -> str:
        """Execute the flow with the given input"""
PlanningFlow Implementation
class PlanningFlow(BaseFlow):
    """A flow that manages planning and execution of tasks using agents."""
    llm: LLM = Field(default_factory=lambda: LLM())
    planning_tool: PlanningTool = Field(default_factory=PlanningTool)
    executor_keys: List[str] = Field(default_factory=list)
    active_plan_id: str = Field(default_factory=lambda: f"plan_{int(time.time())}")
    current_step_index: Optional[int] = None

    async def execute(self, input_text: str) -> str:
        """Execute the planning flow with agents."""
        try:
            # Create the initial plan
            if input_text:
                await self._create_initial_plan(input_text)

            # Execute plan steps
            while await self._has_next_step():
                # Get the current step
                step_info = await self._get_current_step()

                # Select the appropriate executor
                executor = self.get_executor(step_info.get("type"))

                # Execute the step
                result = await self._execute_step(executor, step_info)

                # Update the step status
                await self._update_step_status(step_info["index"], "completed")

            # Complete the plan
            return await self._finalize_plan()

        except Exception as e:
            # Handle exceptions
            return f"Error executing flow: {str(e)}"

Graph-Based Workflow

OpenManus implements a graph-based workflow system that allows for flexible orchestration of agent activities. Nodes in the graph represent agents or actions, while edges represent the flow of data and control.

Key Benefit: This approach enables complex branching logic and parallel execution paths.

Task Decomposition

When a user submits a task, the workflow system breaks it down into smaller, manageable sub-tasks. Each sub-task is assigned to the most suitable agent based on its capabilities.

Key Benefit: Complex problems become tractable through effective division of labor.

Agent Coordination

The workflow system handles communication and coordination between agents, ensuring they can share information and build upon each other's work. This coordination is managed by specialized flow components.

Key Benefit: Agents can collaborate effectively without direct knowledge of each other's internals.

Result Integration

As agents complete their assigned sub-tasks, their results are collected and integrated into a coherent final output. This integration considers dependencies between sub-tasks and ensures logical flow.

Key Benefit: Users receive unified responses that represent the collective intelligence of the system.

Workflow Execution Example

Consider a user asking OpenManus to "Plan a 3-day trip to Tokyo with a budget of $1000":

  1. Task Analysis: The coordinator analyzes the request and identifies it as a travel planning task.
  2. Task Delegation: The coordinator assigns research on Tokyo attractions to the Research Agent, budget analysis to the Planning Agent, and web searches for current prices to the Browser Agent.
  3. Parallel Execution: Agents work simultaneously on their assigned tasks, communicating interim results as needed.
  4. Integration: The Reporter Agent collects all findings and generates a comprehensive 3-day itinerary within budget.
  5. Response: The finalized trip plan is presented to the user, with details on attractions, accommodations, transportation, and costs.
4

Tool Integration

Tools are the interfaces through which OpenManus agents interact with the external world. The flexible tool system allows agents to perform a wide range of actions, from web browsing to code execution.

Tool Integration Architecture

Agent Tool Manager Web Tools Code Tools Data Tools Web APIs Python Runtime File System BaseTool ToolResult

Tool System

BaseTool Implementation
class BaseTool(ABC, BaseModel):
    name: str
    description: str
    parameters: Optional[dict] = None

    async def __call__(self, **kwargs) -> Any:
        """Execute the tool with given parameters."""
        return await self.execute(**kwargs)

    @abstractmethod
    async def execute(self, **kwargs) -> Any:
        """Execute the tool with given parameters."""

    def to_param(self) -> Dict:
        """Convert tool to function call format."""
        return {
            "type": "function",
            "function": {
                "name": self.name,
                "description": self.description,
                "parameters": self.parameters,
            },
        }
ToolResult Implementation
class ToolResult(BaseModel):
    """Represents the result of a tool execution."""
    output: Any = Field(default=None)
    error: Optional[str] = Field(default=None)
    system: Optional[str] = Field(default=None)

Web Tools

  • GoogleSearch: Performs web searches
  • BrowserUseTool: Opens and navigates web browsers
  • WebScraper: Extracts data from web pages
  • APITool: Makes calls to external APIs

Code Tools

  • PythonExecute: Runs Python code
  • JSExecute: Executes JavaScript code
  • ShellTool: Runs shell commands
  • GitTool: Manages Git repositories

Data Tools

  • FileSaver: Saves files to disk
  • FileReader: Reads files from disk
  • DataAnalyzer: Performs data analysis
  • DataVisualizer: Creates visualizations

Tool Usage Example

Let's look at how an agent might use the PythonExecute tool to perform data analysis:

PythonExecute Tool Example
# Tool definition
class PythonExecute(BaseTool):
    name: str = "python_execute"
    description: str = "Execute Python code and return the result."
    parameters: dict = {
        "type": "object",
        "properties": {
            "code": {
                "type": "string",
                "description": "The Python code to execute.",
            }
        },
        "required": ["code"],
    }

    async def execute(self, code: str) -> ToolResult:
        try:
            # Set up a secure execution environment
            local_vars = {}

            # Execute the code
            exec(code, {"__builtins__": __builtins__}, local_vars)

            # Return the result
            return ToolResult(output=local_vars.get("result", "Code executed successfully"))
        except Exception as e:
            return ToolResult(error=str(e))

# Agent using the tool
async def analyze_data(agent, dataset_url):
    # First, download the dataset
    browser_result = await agent.execute_tool({
        "name": "browser_use",
        "arguments": {"url": dataset_url}
    })

    # Now, analyze the data with Python
    python_result = await agent.execute_tool({
        "name": "python_execute",
        "arguments": {
            "code": """
import pandas as pd
import matplotlib.pyplot as plt

# Load the data
data = pd.read_csv('downloaded_data.csv')

# Perform analysis
summary = data.describe()
correlations = data.corr()

# Create visualization
plt.figure(figsize=(10, 6))
data.plot(kind='scatter', x='feature1', y='feature2')
plt.savefig('analysis_plot.png')

# Store result for return
result = {
    'summary': summary.to_dict(),
    'correlations': correlations.to_dict()
}
"""
        }
    })

    # Save the visualization
    save_result = await agent.execute_tool({
        "name": "file_saver",
        "arguments": {
            "file_path": "analysis_results.json",
            "content": json.dumps(python_result.output)
        }
    })

    return {
        "analysis": python_result.output,
        "visualization": "analysis_plot.png",
        "saved_results": save_result.output
    }
5

Putting It All Together

Now that we've explored the individual components of OpenManus, let's see how they all work together to create a powerful multi-agent AI system capable of handling complex tasks.

Complete System Architecture

User API Server Flow Manager Research Agent Browser Agent Planning Agent Coder Agent LLM Tools Memory External Systems

End-to-End Task Execution

1. User Input

User submits a task request via the CLI, API, or web interface.

python client.py --task "Build a dashboard to visualize Tesla stock trends for the past year"

2. Task Analysis

The Coordinator Agent analyzes the task and creates a plan using the Planning Agent.

Plan created: 1. Research Tesla stock data sources 2. Collect historical stock data 3. Select visualization framework 4. Create dashboard code 5. Test and refine dashboard

3. Parallel Agent Execution

Multiple agents work on different aspects of the task in parallel.

  • Research Agent: Finds reliable stock data APIs
  • Browser Agent: Navigates to financial websites to confirm data availability
  • Planning Agent: Updates the plan with specific implementation details

4. Tool Utilization

Agents use various tools to perform actions and gather information.

# Browser Agent uses BrowserUseTool await agent.execute_tool({ "name": "browser_use", "arguments": {"url": "https://finance.yahoo.com/quote/TSLA"} }) # Coder Agent uses PythonExecute await agent.execute_tool({ "name": "python_execute", "arguments": {"code": "import yfinance as yf\ndata = yf.download('TSLA', period='1y')\nresult = data.head()"} })

5. Integration and Output

The Flow Manager collects results from all agents and constructs the final solution.

  • Combines code snippets from different agents
  • Ensures integration between components
  • Verifies the solution meets requirements
  • Delivers final code and instructions to the user

Advanced Use Cases

OpenManus's architecture enables a wide range of complex applications:

Autonomous Research Assistant

Conducts comprehensive research on topics, synthesizing information from multiple sources, verifying facts, and generating coherent reports.

Full-Stack Development Helper

Plans, codes, and tests applications based on user requirements, handling both frontend and backend components with appropriate frameworks.

Data Analysis Pipeline

Collects, cleans, analyzes, and visualizes data from various sources, applying appropriate statistical methods and creating insightful visualizations.

Content Generation System

Researches topics, plans content structure, generates written material, creates visual assets, and optimizes for specific platforms and audiences.

Conclusion

OpenManus represents a significant step forward in open-source AI agent architecture, offering a modular, extensible framework for building powerful multi-agent systems. By understanding its architecture, you can now:

Next Steps

Ready to dive deeper into OpenManus? Here are some ways to continue your journey:

Try It Yourself

Set up the OpenManus environment and experiment with its capabilities.

Visit GitHub Repository

Extend Functionality

Create new specialized agents or tools to enhance OpenManus's capabilities.

Contribution Guidelines

Join the Community

Connect with other developers and researchers working on AI agent systems.

Discussions & Issues