A step-by-step guide to understanding the open-source multi-agent AI system
OpenManus is an open-source project aimed at replicating the capabilities of Manus AI, a groundbreaking general-purpose AI system. It uses a modular, containerized framework built with Docker, Python, and JavaScript to create a multi-agent AI system capable of autonomously executing complex tasks.
This powerful system can handle diverse tasks ranging from personalized travel planning to stock analysis, leveraging a collaborative team of AI agents working together to solve problems.
This guide takes you through a progressive learning journey to understand OpenManus:
At its core, OpenManus is built on a multi-agent architecture where specialized AI agents collaborate to solve complex tasks. This modular design enables high code reusability, strong extensibility, and clear separation of responsibilities.
The brain of OpenManus, consisting of specialized AI agents that handle different aspects of task execution. Agents are organized in a hierarchical structure, from basic proxies to specialized ones.
Handles interactions with large language models, serving as the intelligence engine that powers decision-making, content generation, and understanding.
Stores and manages conversation history and context, ensuring coherent and contextually relevant interactions across multiple exchanges.
Provides interfaces for agents to interact with external systems and perform actions like web browsing, code execution, and data retrieval.
Manages the workflows and execution patterns, coordinating how multiple agents collaborate to solve complex tasks.
Defines the behavior patterns and guidelines for agents, shaping how they respond to tasks and make decisions.
OpenManus/ docker/ # Docker configurations frontend/ # Next.js frontend container Dockerfile # Frontend container configuration unified/ # Backend container configuration Dockerfile # Backend container configuration start.sh # Container startup script src/ # Source code agents/ # Multi-agent logic (Python) nodes/ # Agent node implementations browser_agent.py coder_agent.py coordinator.py reporter_agent.py research_agent.py components/ # React components config/ # Configuration files graph/ # Graph-based workflow llms/ # LLM integrations pages/ # Next.js pages prompts/ # Agent prompts service/ # Backend services tools/ # Tool implementations utils/ # Utility functions workflow/ # Workflow management client.py # CLI client for testing server.py # FastAPI server docs/ # Documentation and API specs package.json # Next.js frontend dependencies next.config.js # Next.js configuration docker-compose.yml # Docker Compose configuration README.md # Main documentation file
OpenManus implements a hierarchical agent structure, with each agent type building upon the capabilities of the previous one. This modular approach allows for specialized agents that excel at specific tasks while sharing common functionality.
BaseAgent is the foundation of the entire agent framework, defining the core attributes and methods that all agents share. It handles basic state management, memory operations, and the execution lifecycle.
class BaseAgent(BaseModel, ABC): """Abstract base class for managing agent state and execution.""" # Core attributes name: str = Field(..., description="Unique name of the agent") description: Optional[str] = Field(None, description="Optional agent description") # Prompts system_prompt: Optional[str] = Field(None, description="System-level instruction prompt") next_step_prompt: Optional[str] = Field(None, description="Prompt for determining next action") # Dependent components llm: LLM = Field(default_factory=LLM, description="Language model instance") memory: Memory = Field(default_factory=Memory, description="Agent's memory store") state: AgentState = Field(default=AgentState.IDLE, description="Current agent state") # Execution control max_steps: int = Field(default=10, description="Maximum steps before termination") current_step: int = Field(default=0, description="Current step in execution")
ReActAgent extends BaseAgent by implementing the "Think-Act" pattern, which divides the agent's execution into two distinct phases: a thinking phase for decision making and an action phase for execution.
class ReActAgent(BaseAgent, ABC): @abstractmethod async def think(self) -> bool: """Process the current state and decide the next action.""" @abstractmethod async def act(self) -> str: """Execute the decided actions.""" async def step(self) -> str: """Execute a single step: think and act.""" should_act = await self.think() if not should_act: return "Thinking complete - no action needed" return await self.act()
ToolCallAgent extends ReActAgent by adding the ability to interact with external tools and APIs. This enables the agent to perform actions like web browsing, code execution, and data retrieval.
class ToolCallAgent(ReActAgent): """Base agent class for handling tool/function calls with enhanced abstraction""" available_tools: ToolCollection = ToolCollection( CreateChatCompletion(), Terminate() ) tool_choices: Literal["none", "auto", "required"] = "auto" async def think(self) -> bool: # Get the LLM response and tool selection response = await self.llm.ask_tool( messages=self.messages, system_msgs=[Message.system_message(self.system_prompt)] if self.system_prompt else None, tools=self.available_tools.to_params(), tool_choice=self.tool_choices, ) self.tool_calls = response.tool_calls # Process the response and tool calls # ... async def act(self) -> str: # Execute tool calls results = [] for command in self.tool_calls: result = await self.execute_tool(command) # Add tool response to memory # ... results.append(result) return "\n\n".join(results)
PlanningAgent extends ToolCallAgent by adding planning capabilities, allowing it to break down complex tasks into manageable steps and track progress through the execution of a plan.
class PlanningAgent(ToolCallAgent): """ An agent that creates and manages plans to solve tasks. This agent uses a planning tool to create and manage structured plans, and tracks progress through individual steps until task completion. """ name: str = "planning" description: str = "An agent that creates and manages plans to solve tasks" system_prompt: str = PLANNING_SYSTEM_PROMPT next_step_prompt: str = NEXT_STEP_PROMPT available_tools: ToolCollection = Field( default_factory=lambda: ToolCollection(PlanningTool(), Terminate()) ) # Step execution tracker step_execution_tracker: Dict[str, Dict] = Field(default_factory=dict) current_step_index: Optional[int] = None async def think(self) -> bool: """Decide the next action based on plan status.""" prompt = ( f"CURRENT PLAN STATUS:\n{await self.get_plan()}\n\n{self.next_step_prompt}" if self.active_plan_id else self.next_step_prompt ) self.messages.append(Message.user_message(prompt)) # Get the current step index self.current_step_index = await self._get_current_step_index() result = await super().think() # Associate tool calls with the current step if result and self.tool_calls: # ...association logic... return result
Manus is the flagship agent of OpenManus, combining all the capabilities of previous agent types with additional specialized tools to create a versatile, general-purpose AI assistant.
class Manus(ToolCallAgent): """ A versatile general-purpose agent that uses planning to solve various tasks. This agent extends PlanningAgent with a comprehensive set of tools and capabilities, including Python execution, web browsing, file operations, and information retrieval to handle a wide range of user requests. """ name: str = "manus" description: str = "A versatile general-purpose agent" system_prompt: str = SYSTEM_PROMPT next_step_prompt: str = NEXT_STEP_PROMPT available_tools: ToolCollection = Field( default_factory=lambda: ToolCollection( PythonExecute(), GoogleSearch(), BrowserUseTool(), FileSaver(), Terminate() ) )
Agent Type | Basic State Management | Think-Act Pattern | Tool Usage | Planning | Specialized Capabilities |
---|---|---|---|---|---|
BaseAgent | ✅ | ❌ | ❌ | ❌ | ❌ |
ReActAgent | ✅ | ✅ | ❌ | ❌ | ❌ |
ToolCallAgent | ✅ | ✅ | ✅ | ❌ | ❌ |
PlanningAgent | ✅ | ✅ | ✅ | ✅ | ❌ |
Manus | ✅ | ✅ | ✅ | ✅ | ✅ |
OpenManus's workflow system orchestrates how agents collaborate to solve complex tasks. The Flow component manages these workflows, determining which agents handle which parts of a task and how their results are integrated.
class BaseFlow(BaseModel, ABC): """Base class for execution flows supporting multiple agents""" agents: Dict[str, BaseAgent] tools: Optional[List] = None primary_agent_key: Optional[str] = None @property def primary_agent(self) -> Optional[BaseAgent]: """Get the primary agent for the flow""" return self.agents.get(self.primary_agent_key) @abstractmethod async def execute(self, input_text: str) -> str: """Execute the flow with the given input"""
class PlanningFlow(BaseFlow): """A flow that manages planning and execution of tasks using agents.""" llm: LLM = Field(default_factory=lambda: LLM()) planning_tool: PlanningTool = Field(default_factory=PlanningTool) executor_keys: List[str] = Field(default_factory=list) active_plan_id: str = Field(default_factory=lambda: f"plan_{int(time.time())}") current_step_index: Optional[int] = None async def execute(self, input_text: str) -> str: """Execute the planning flow with agents.""" try: # Create the initial plan if input_text: await self._create_initial_plan(input_text) # Execute plan steps while await self._has_next_step(): # Get the current step step_info = await self._get_current_step() # Select the appropriate executor executor = self.get_executor(step_info.get("type")) # Execute the step result = await self._execute_step(executor, step_info) # Update the step status await self._update_step_status(step_info["index"], "completed") # Complete the plan return await self._finalize_plan() except Exception as e: # Handle exceptions return f"Error executing flow: {str(e)}"
OpenManus implements a graph-based workflow system that allows for flexible orchestration of agent activities. Nodes in the graph represent agents or actions, while edges represent the flow of data and control.
When a user submits a task, the workflow system breaks it down into smaller, manageable sub-tasks. Each sub-task is assigned to the most suitable agent based on its capabilities.
The workflow system handles communication and coordination between agents, ensuring they can share information and build upon each other's work. This coordination is managed by specialized flow components.
As agents complete their assigned sub-tasks, their results are collected and integrated into a coherent final output. This integration considers dependencies between sub-tasks and ensures logical flow.
Consider a user asking OpenManus to "Plan a 3-day trip to Tokyo with a budget of $1000":
Tools are the interfaces through which OpenManus agents interact with the external world. The flexible tool system allows agents to perform a wide range of actions, from web browsing to code execution.
class BaseTool(ABC, BaseModel): name: str description: str parameters: Optional[dict] = None async def __call__(self, **kwargs) -> Any: """Execute the tool with given parameters.""" return await self.execute(**kwargs) @abstractmethod async def execute(self, **kwargs) -> Any: """Execute the tool with given parameters.""" def to_param(self) -> Dict: """Convert tool to function call format.""" return { "type": "function", "function": { "name": self.name, "description": self.description, "parameters": self.parameters, }, }
class ToolResult(BaseModel): """Represents the result of a tool execution.""" output: Any = Field(default=None) error: Optional[str] = Field(default=None) system: Optional[str] = Field(default=None)
Let's look at how an agent might use the PythonExecute tool to perform data analysis:
# Tool definition class PythonExecute(BaseTool): name: str = "python_execute" description: str = "Execute Python code and return the result." parameters: dict = { "type": "object", "properties": { "code": { "type": "string", "description": "The Python code to execute.", } }, "required": ["code"], } async def execute(self, code: str) -> ToolResult: try: # Set up a secure execution environment local_vars = {} # Execute the code exec(code, {"__builtins__": __builtins__}, local_vars) # Return the result return ToolResult(output=local_vars.get("result", "Code executed successfully")) except Exception as e: return ToolResult(error=str(e)) # Agent using the tool async def analyze_data(agent, dataset_url): # First, download the dataset browser_result = await agent.execute_tool({ "name": "browser_use", "arguments": {"url": dataset_url} }) # Now, analyze the data with Python python_result = await agent.execute_tool({ "name": "python_execute", "arguments": { "code": """ import pandas as pd import matplotlib.pyplot as plt # Load the data data = pd.read_csv('downloaded_data.csv') # Perform analysis summary = data.describe() correlations = data.corr() # Create visualization plt.figure(figsize=(10, 6)) data.plot(kind='scatter', x='feature1', y='feature2') plt.savefig('analysis_plot.png') # Store result for return result = { 'summary': summary.to_dict(), 'correlations': correlations.to_dict() } """ } }) # Save the visualization save_result = await agent.execute_tool({ "name": "file_saver", "arguments": { "file_path": "analysis_results.json", "content": json.dumps(python_result.output) } }) return { "analysis": python_result.output, "visualization": "analysis_plot.png", "saved_results": save_result.output }
Now that we've explored the individual components of OpenManus, let's see how they all work together to create a powerful multi-agent AI system capable of handling complex tasks.
User submits a task request via the CLI, API, or web interface.
python client.py --task "Build a dashboard to visualize Tesla stock trends for the past year"
The Coordinator Agent analyzes the task and creates a plan using the Planning Agent.
Plan created:
1. Research Tesla stock data sources
2. Collect historical stock data
3. Select visualization framework
4. Create dashboard code
5. Test and refine dashboard
Multiple agents work on different aspects of the task in parallel.
Agents use various tools to perform actions and gather information.
# Browser Agent uses BrowserUseTool
await agent.execute_tool({
"name": "browser_use",
"arguments": {"url": "https://finance.yahoo.com/quote/TSLA"}
})
# Coder Agent uses PythonExecute
await agent.execute_tool({
"name": "python_execute",
"arguments": {"code": "import yfinance as yf\ndata = yf.download('TSLA', period='1y')\nresult = data.head()"}
})
The Flow Manager collects results from all agents and constructs the final solution.
OpenManus's architecture enables a wide range of complex applications:
Conducts comprehensive research on topics, synthesizing information from multiple sources, verifying facts, and generating coherent reports.
Plans, codes, and tests applications based on user requirements, handling both frontend and backend components with appropriate frameworks.
Collects, cleans, analyzes, and visualizes data from various sources, applying appropriate statistical methods and creating insightful visualizations.
Researches topics, plans content structure, generates written material, creates visual assets, and optimizes for specific platforms and audiences.
OpenManus represents a significant step forward in open-source AI agent architecture, offering a modular, extensible framework for building powerful multi-agent systems. By understanding its architecture, you can now:
Ready to dive deeper into OpenManus? Here are some ways to continue your journey:
Set up the OpenManus environment and experiment with its capabilities.
Visit GitHub RepositoryCreate new specialized agents or tools to enhance OpenManus's capabilities.
Contribution GuidelinesConnect with other developers and researchers working on AI agent systems.
Discussions & Issues