Multi-Agent Research Assistant
Automated Research, Writing & Review System
1. Project Overview
Build a multi-agent system where specialized AI agents collaborate to produce high-quality research reports on any given topic. The system takes a research question as input, and through a coordinated workflow of autonomous agents, delivers a comprehensive, reviewed, and cited research report enriched with named entity extraction, source classification, web-based data collection via browser automation, and AI-generated illustrations.
This project integrates all major topics covered throughout the course: transformer architecture understanding, text classification, named entity recognition, LangChain fundamentals, LangGraph orchestration, large language model configuration, Mixture of Experts analysis, AI agent design with ReAct, multi-agent collaboration, Model Context Protocol for tool integration, Large Action Model capabilities for browser automation, and diffusion models for report illustration generation.
| Item | Details |
| Type | Multi-Agent System with LangGraph Orchestration |
| Team Size | 2-3 Students |
| Difficulty | Advanced |
| Topics Covered | Week 1 (Transformers), Week 2 (Text Classification), Week 3 (NER), Week 4 (LangChain), Week 5 (LangGraph), Week 6 (LLMs), Week 7 (MoE), Week 9 (ReAct Agents), Week 10 (Multi-Agent), Week 11 (MCP), Week 12 (LAMs), Week 13 (Diffusion Models) |
| Deliverables | Working application, source code, documentation, transformer analysis report, presentation, demo video |
2. Problem Statement
Conducting research is a time-consuming process that involves multiple stages: searching for relevant sources, reading and extracting key information, classifying sources by relevance and topic, extracting named entities, organizing findings into a coherent structure, writing a clear report with illustrations, and reviewing it for accuracy and completeness. Each stage requires different skills and focus areas.
This project applies the multi-agent paradigm to automate this pipeline, where each agent specializes in one stage, and a central orchestrator manages the workflow — mirroring how a real research team operates. The system leverages transformer-based language models as the backbone, employs text classification and NER techniques for intelligent information processing, uses the ReAct pattern for autonomous research, exposes tools via the Model Context Protocol, performs browser-based actions through Large Action Model capabilities, and generates visual content using diffusion models.
3. System Architecture
User Input (Research Topic)
|
v
+-------------------------+
| Orchestrator | LangGraph StateGraph - manages state & routing
+-------------------------+
|
+-------+-------+--------+--------+
| | | | |
v v v v v
+--------+ +--------+ +----------+ +--------+ +----------+
|Research| |Classif.| | NER | |Browser | | Analyzer|
| Agent | | Agent | | Agent | | Agent | | Agent |
|ReAct + | |Text | |Entity | |LAM for | |Theme |
|Search | |Classif.| |Extract | |Web | |Extract |
+--------+ +--------+ +----------+ +--------+ +----------+
| | | | |
+-----------+----------+------------+-----------+
|
v
+---------------+
| Writer Agent | Drafts report with citations
+---------------+
|
v
+------------------+
|Illustration Agent| Diffusion models for figures
+------------------+
|
v
+---------------+
| Critic Agent | Reviews accuracy, gaps, suggestions
+---------------+
|
+-----+------+
| |
APPROVE REVISE -----> back to Writer Agent
|
v
+---------------+
| Final Report | Markdown / PDF with illustrations
+---------------+
4. Technology Stack
| Component | Technology |
| Transformer Backbone | Transformer-based LLMs (GPT-4, Claude, Mixtral); students must document the architecture |
| Agent Orchestration | LangGraph (stateful graph with conditional edges) |
| Chain Framework | LangChain (chains, prompts, output parsers, document loaders) |
| Large Language Model | OpenAI GPT-4 or Claude API (or Ollama for local); optional MoE model (Mixtral) |
| Text Classification | LLM-based classifier or fine-tuned transformer (e.g., BERT) for source categorization |
| Named Entity Recognition | SpaCy NER pipeline or LLM-based entity extraction |
| Web Search | Tavily Search API / Brave Search API |
| Web Scraping | BeautifulSoup / Trafilatura |
| Browser Automation (LAM) | Playwright / Selenium for web actions (form filling, navigation, structured extraction) |
| Tool Protocol | Model Context Protocol (MCP) for standardized agent-tool integration |
| Image Generation | Stable Diffusion API / DALL-E for report illustrations and diagrams |
| Vector Store | ChromaDB (for storing and retrieving research findings) |
| Output Format | Markdown with embedded images to PDF (via markdown2 + weasyprint) |
| User Interface | Streamlit or Gradio |
| Programming Language | Python 3.12+ |
5. Detailed Agent Descriptions
5.1 Orchestrator Agent (Manager)
- Role: Central coordinator that manages the entire workflow
- Receives the research topic from the user and defines research scope
- Routes tasks to appropriate agents based on current state
- Manages shared state (findings, drafts, feedback, entities, classifications)
- Decides when to loop back (revision) or finalize
- Implements human-in-the-loop checkpoints
- Implementation: LangGraph StateGraph with conditional edges
- Course Topic: Week 5 (LangGraph), Week 10 (Multi-Agent Systems)
5.2 Research Agent (Information Gatherer)
- Role: Searches the internet and collects relevant information using the ReAct pattern
- Generates effective search queries from the research topic
- Searches multiple sources using Tavily/Brave APIs
- Scrapes relevant web pages for detailed content
- Filters and ranks results by relevance (LLM scoring 1-10)
- Stores findings with source URLs for proper citation
- Exposes search and scrape tools via MCP server
- Pattern: ReAct (Reasoning + Acting) -- Week 9
- Tools: Web Search (MCP), Web Scraper (MCP), URL Reader
- Course Topic: Week 9 (AI Agents and ReAct), Week 11 (MCP)
5.3 Classification Agent (Source Categorizer)
- Role: Classifies collected sources by relevance and topic category
- Applies text classification to each collected source
- Assigns topic labels (e.g., "technical", "review", "tutorial", "opinion", "dataset")
- Assigns relevance scores (high, medium, low) using classification models
- Filters out low-relevance or off-topic sources before analysis
- Generates a classification summary report
- Pattern: LLM-based text classification or fine-tuned BERT classifier
- Course Topic: Week 2 (Text Classification)
5.4 NER Agent (Entity Extractor)
- Role: Extracts named entities from all collected research sources
- Identifies key people (researchers, authors), organizations, technologies, dates, and locations
- Builds an entity registry linking entities to their source documents
- Detects entity co-occurrences and relationships
- Provides entity frequency analysis to highlight dominant themes
- Outputs a structured entity report used by the Analyzer and Writer agents
- Implementation: SpaCy NER pipeline or LLM-based extraction with structured output
- Course Topic: Week 3 (Named Entity Recognition)
5.5 Browser Agent (Web Action Performer)
- Role: Performs complex web interactions that go beyond simple API calls
- Navigates dynamic web pages that require JavaScript rendering
- Fills search forms on academic databases (Google Scholar, Semantic Scholar)
- Extracts structured data from tables and interactive pages
- Downloads PDFs and datasets from research repositories
- Handles authentication flows for gated content when credentials are provided
- Implementation: Playwright-based browser automation with LLM planning
- Course Topic: Week 12 (Large Action Models)
5.6 Analyzer Agent (Information Processor)
- Role: Processes raw research data into organized findings
- Receives classified sources and extracted entities as additional input
- Identifies 4-6 key themes and patterns from collected sources
- Groups findings by topic and subtopic, leveraging classification labels
- Detects contradictions between sources
- Ranks findings by importance and relevance
- Creates a structured outline for the report, incorporating entity relationships
- Pattern: Chain-of-Thought reasoning
- Course Topic: Week 4 (LangChain chains and prompts)
5.7 Writer Agent (Content Creator)
- Role: Transforms organized findings into a polished report
- Follows the outline from the Analyzer Agent
- Writes clear, academic-style prose with proper citations
- Creates introduction, body sections, and conclusion
- Generates an executive summary
- Integrates extracted entities naturally into the narrative
- Generates prompts for the Illustration Agent to produce relevant figures
- Formats output in clean Markdown with image placeholders
- Pattern: Structured output with citation formatting
- Course Topic: Week 6 (LLM configuration and optimization)
5.8 Illustration Agent (Visual Content Generator)
- Role: Generates illustrations, diagrams, and figures for the research report
- Receives image generation prompts from the Writer Agent
- Creates conceptual diagrams, architecture illustrations, and data visualizations
- Uses diffusion models (Stable Diffusion API or DALL-E) to generate images
- Optimizes prompts for technical/academic illustration style
- Embeds generated images into the final report at appropriate locations
- Implementation: Diffusion model API integration with prompt engineering
- Course Topic: Week 13 (Diffusion Models)
5.9 Critic Agent (Quality Reviewer)
- Role: Reviews the draft and provides actionable feedback
- Checks factual accuracy against original sources
- Identifies gaps in coverage and weak arguments
- Verifies that NER entities and classification labels are used correctly
- Evaluates coherence, flow, illustration relevance, and citation completeness
- Scores the report (1-10) on: accuracy, completeness, clarity, citations, visual quality
- Decision: APPROVE (score >= 7) or REVISE (with specific feedback)
- Pattern: Reflection pattern
6. Shared State Schema
from typing import TypedDict, List, Optional
class ResearchState(TypedDict):
topic: str # User's research topic
sub_questions: List[str] # Generated sub-questions
raw_sources: List[dict] # {url, title, content, score}
classified_sources: List[dict] # {url, title, category, relevance}
entities: List[dict] # {text, label, source_url, frequency}
entity_relationships: List[dict] # {entity1, entity2, relation, context}
browser_results: List[dict] # {url, action, extracted_data}
organized_findings: List[dict] # {theme, findings, sources, entities}
outline: List[dict] # {section_title, key_points, image_prompts}
draft: str # Markdown draft
illustrations: List[dict] # {section, prompt, image_path}
critic_feedback: Optional[str] # Feedback from Critic
critic_score: Optional[int] # Score 1-10
revision_count: int # Number of revisions done
final_report: Optional[str] # Approved final report with images
transformer_config: dict # LLM model details and parameters
moe_analysis: Optional[dict] # MoE model comparison results
status: str # Current pipeline stage
7. Project Phases & Milestones
Phase 1: Transformer Foundations & Text Classification (Weeks 1-2)
| Task | Description | Deliverable |
| 1.1 | Study and document the transformer architecture used by the chosen LLM | Written analysis of attention mechanism, encoder/decoder structure |
| 1.2 | Set up Python project structure and repository | Project repo with requirements.txt |
| 1.3 | Configure API keys (OpenAI/Anthropic, Tavily) and LLM parameters | .env file with credentials, model config documented |
| 1.4 | Install LangChain, LangGraph, ChromaDB, SpaCy | Working development environment |
| 1.5 | Build the Classification Agent: implement text classification for source categorization | Working classifier that labels sources by topic and relevance |
| 1.6 | Test classification on sample documents from different domains | Classification accuracy report |
Course Alignment: Week 1 (Transformer Architecture) -- understand and document the model backbone. Week 2 (Text Classification) -- build the source classification pipeline.
Phase 2: NER & LangChain Setup (Weeks 3-4)
| Task | Description | Deliverable |
| 2.1 | Build the NER Agent using SpaCy or LLM-based extraction | Working NER pipeline extracting people, orgs, tech, dates |
| 2.2 | Implement entity registry and frequency analysis | Structured entity database with co-occurrence tracking |
| 2.3 | Set up LangChain fundamentals: chains, prompt templates, output parsers | Reusable chain components for all agents |
| 2.4 | Build document loaders for various source formats (HTML, PDF, text) | Multi-format document ingestion pipeline |
| 2.5 | Design the shared state schema (TypedDict) | Complete state schema with all fields defined |
Course Alignment: Week 3 (NER) -- build entity extraction. Week 4 (LangChain Fundamentals) -- establish the framework foundation.
Phase 3: LangGraph Orchestration & LLM Configuration (Weeks 5-6)
| Task | Description | Deliverable |
| 3.1 | Create the LangGraph StateGraph skeleton with all agent nodes | Connected graph with conditional edges |
| 3.2 | Implement state transitions and routing logic | Working orchestration flow |
| 3.3 | Configure and optimize LLM parameters (temperature, max tokens, system prompts) | Documented LLM configuration per agent |
| 3.4 | Compare LLM performance across different models and parameter settings | LLM benchmark comparison table |
| 3.5 | Store findings in ChromaDB with embeddings | Searchable vector store |
Course Alignment: Week 5 (LangGraph) -- core orchestration. Week 6 (LLMs) -- model selection and parameter optimization.
Phase 4: MoE Analysis & Research Agent (Week 7, then Week 9)
| Task | Description | Deliverable |
| 4.1 | Analyze MoE architectures: study Mixtral, Switch Transformer, DeepSeek-MoE | Written MoE analysis comparing dense vs. sparse models |
| 4.2 | Optionally integrate an MoE model (Mixtral) for cost-efficient inference | MoE model integration or comparative benchmark |
| 4.3 | Implement the Research Agent with ReAct pattern | Agent that reasons about search strategy and acts on it |
| 4.4 | Integrate Tavily/Brave Search APIs as tools | Working search with result parsing |
| 4.5 | Build web scraper for detailed content extraction | Full text extraction from URLs |
| 4.6 | Build ReAct loop: agent decides to search more or stop | Autonomous iterative search behavior |
Course Alignment: Week 7 (Mixture of Experts) -- analyze and optionally use MoE models. Week 9 (AI Agents and ReAct) -- implement the core research agent.
Phase 5: Multi-Agent Integration & MCP (Weeks 10-11)
| Task | Description | Deliverable |
| 5.1 | Connect all agents into a working multi-agent pipeline | End-to-end workflow: Research, Classification, NER, Analysis, Writing, Review |
| 5.2 | Implement the Analyzer Agent with theme extraction and outline generation | Organized findings with entity-enriched themes |
| 5.3 | Implement the Writer Agent with citation system and entity integration | Complete draft generation with inline citations |
| 5.4 | Implement the Critic Agent with scoring rubric and APPROVE/REVISE logic | Working review loop with max 3 revision cycles |
| 5.5 | Expose agent tools via Model Context Protocol (MCP) servers | MCP-compliant tool definitions for search, scrape, classify, NER |
| 5.6 | Build MCP client integration so agents consume tools via MCP | Standardized tool invocation across agents |
Course Alignment: Week 10 (Multi-Agent Systems) -- full system integration. Week 11 (MCP) -- standardized tool protocol.
Phase 6: Browser Agent (LAM) & Diffusion Models (Weeks 12-13)
| Task | Description | Deliverable |
| 6.1 | Build the Browser Agent using Playwright for web automation | Agent navigates pages, fills forms, extracts structured data |
| 6.2 | Implement academic database interaction (Google Scholar, Semantic Scholar) | Automated paper search and metadata extraction |
| 6.3 | Build the Illustration Agent with diffusion model integration | Agent generates conceptual diagrams and figures |
| 6.4 | Implement image prompt optimization for academic illustrations | High-quality technical illustrations for reports |
| 6.5 | Integrate illustrations into the Writer Agent output pipeline | Reports with embedded generated images |
| 6.6 | Build Streamlit UI with real-time progress indicators | Working web interface showing agent activity |
| 6.7 | Add agent reasoning display in expandable panels | Transparent decision log in UI |
Course Alignment: Week 12 (Large Action Models) -- browser automation agent. Week 13 (Diffusion Models) -- illustration generation.
Phase 7: Testing, Documentation & Presentation (Week 14)
| Task | Description | Deliverable |
| 7.1 | End-to-end testing on 5 diverse research topics | 5 complete research reports with illustrations |
| 7.2 | Performance benchmarking (time, token cost, MoE vs dense comparison) | Metrics per report including model comparison |
| 7.3 | Evaluate NER accuracy, classification precision, and illustration quality | Component-level evaluation metrics |
| 7.4 | Write README, technical documentation, and transformer analysis | Setup guide, architecture diagram, model analysis |
| 7.5 | Prepare presentation slides and demo video | Slides + 5-min video walkthrough |
8. Timeline Summary
| Week | Phase | Key Deliverable |
| 1-2 | Transformer Foundations & Text Classification | Transformer analysis, project setup, working classifier |
| 3-4 | NER & LangChain Setup | NER pipeline, LangChain components, state schema |
| 5-6 | LangGraph & LLM Configuration | StateGraph skeleton, LLM benchmarks, vector store |
| 7 | MoE Analysis | MoE architecture analysis, optional Mixtral integration |
| 8 | Midterm Exam | -- |
| 9 | Research Agent (ReAct) | Working ReAct agent with iterative search |
| 10-11 | Multi-Agent Integration & MCP | Full pipeline, MCP tool servers, review loop |
| 12-13 | Browser Agent (LAM) & Diffusion | Browser automation, illustrations, Streamlit UI |
| 14 | Testing & Presentation | 5 test reports, documentation, demo video |
9. Competition-Based Evaluation
This project follows a Competition-Based Learning (CBL) approach. All teams will build their own version of the Multi-Agent Research Assistant, then compete head-to-head in a live evaluation event. This fosters deeper engagement, peer learning, and real-world benchmarking skills.
9.1 Competition Format
| Stage | When | Description |
| Checkpoint 1 | Week 7 | Teams demo their NLP pipeline (classification + NER + research agent). Instructor feedback only, no scoring. |
| Checkpoint 2 | Week 11 | Teams demo full multi-agent pipeline with MCP integration. Peer feedback round. |
| Competition Day | Week 15 | Live head-to-head competition. All teams receive the SAME 3 unseen research topics and run their systems in real-time. |
9.2 Competition Day Protocol
- The instructor reveals 3 research topics (unseen by all teams) at the start of the session.
- All teams run their systems simultaneously on the same topics.
- Each system produces 3 research reports. Time limit: 30 minutes per topic.
- Reports are anonymized and distributed to a judging panel (instructor + 2 external judges or peer teams).
- Teams give a 10-minute live presentation of their architecture and demo.
- Judges score each report and presentation independently using the rubric below.
- Scores are averaged across judges and topics.
9.3 Scoring Rubric (100 points per report)
| Criteria | Points | Description |
| Report Quality | 20 | Coherence, depth, clarity, readability, proper structure (intro, body, conclusion) |
| Source Coverage | 15 | Number and diversity of relevant sources found (minimum 10), proper citations |
| Entity Extraction | 10 | Accuracy and completeness of NER output (people, organizations, technologies, dates) |
| Source Classification | 10 | Correct categorization of sources by type, domain, and relevance tier |
| Illustrations | 10 | Relevance and quality of AI-generated figures, diagrams, or visualizations |
| Speed | 10 | Time to complete the full pipeline. Faster = higher score (within the 30-min limit) |
| Cost Efficiency | 5 | Total API token cost per report. Lower cost = higher score |
| Error Handling | 5 | System handles failures gracefully (API errors, bad sources, timeouts) |
| Innovation | 15 | Creative features: MoE routing, advanced MCP tools, LAM web actions, multi-model strategies |
9.4 Final Grade Breakdown
| Component | Weight | Description |
| Competition Score | 40% | Average score across 3 topics from judging panel (rubric above) |
| Architecture & Topic Coverage | 20% | Demonstrates all 12 course topics integrated into the system (transformers, NER, MoE, MCP, LAMs, diffusion, etc.) |
| Code Quality & Documentation | 15% | Clean code, GitHub repo, README, technical report with architecture diagrams |
| Live Presentation | 15% | 10-minute demo + Q&A. Clarity, technical depth, team coordination |
| Peer Review | 10% | Each team reviews 2 other teams' reports and provides structured feedback. Quality of review is graded. |
9.5 Competition Prizes & Ranking
| Rank | Award | Bonus |
| 1st Place | Gold Award | +5 bonus points on final grade |
| 2nd Place | Silver Award | +3 bonus points on final grade |
| 3rd Place | Bronze Award | +2 bonus points on final grade |
| Best Innovation | Innovation Award | +2 bonus points (can stack with placement) |
| Best Report | Quality Award | +2 bonus points (can stack with placement) |
| Fastest System | Speed Award | +1 bonus point |
Academic Integrity: All code must be original team work. Teams may use open-source libraries and LLM APIs but must not copy other teams' agent logic or prompts. Plagiarism results in disqualification and a failing grade.
10. Expected Output Example
Input: "The impact of Mixture of Experts architecture on Large Language Model efficiency"
# Research Report: The Impact of MoE on LLM Efficiency
## Executive Summary
This report examines how Mixture of Experts (MoE) architectures have
transformed large language model efficiency by enabling sparse activation,
reducing computational costs while maintaining or improving performance...
## Named Entities Identified
- Organizations: Google, Mistral AI, DeepSeek
- People: Noam Shazeer, William Fedus, Albert Jiang
- Technologies: Switch Transformer, Mixtral 8x7B, DeepSeek-MoE, GShard
- Dates: 2017, 2021, 2024
## Source Classification Summary
- Technical Papers: 8 sources (high relevance)
- Blog Posts / Tutorials: 4 sources (medium relevance)
- News Articles: 2 sources (low relevance, filtered)
## 1. Introduction
The rapid scaling of large language models has created significant
computational challenges...
## 2. Transformer Architecture Background
### 2.1 Self-Attention Mechanism
### 2.2 Feed-Forward Networks as Expert Candidates
## 3. MoE Architecture Fundamentals
### 3.1 Gating Mechanisms
### 3.2 Expert Networks
### 3.3 Top-K Selection
## 4. Key MoE Models
### 4.1 Switch Transformer (Google, 2021)
### 4.2 Mixtral 8x7B (Mistral, 2024)
### 4.3 DeepSeek-MoE
## 5. Performance Analysis
### 5.1 Computational Efficiency
### 5.2 Quality Benchmarks
[Figure 1: MoE vs Dense Model Efficiency Comparison -- generated illustration]
## 6. Challenges and Limitations
## 7. Conclusion
## References
[1] Fedus et al., "Switch Transformers", 2021. https://...
[2] Mistral AI, "Mixtral of Experts", 2024. https://...
[3] Vaswani et al., "Attention Is All You Need", 2017. https://...
...
11. Submission Requirements
| Item | Details |
| Source Code | GitHub repository with clean commit history and README |
| Transformer Analysis | Written document explaining the transformer architecture of the chosen LLM backbone |
| MoE Comparison | Analysis report comparing MoE vs. dense model architectures with benchmarks |
| Documentation | Technical report: architecture, design decisions, NER/classification evaluation, challenges, results |
| Demo Video | 5-minute screen recording demonstrating the full pipeline including NER, classification, browser actions, and illustrations |
| Sample Reports | 3 generated research reports on different topics with entity summaries, classification reports, and illustrations |
| Presentation | 15-minute live presentation with Q&A covering all course topics demonstrated in the project |
| Deadline | Week 14 of the semester |
12. Useful References
- "Attention Is All You Need" -- Vaswani et al., 2017 (Transformer Architecture)
- "BERT: Pre-training of Deep Bidirectional Transformers" -- Devlin et al., 2019 (Text Classification, NER)
- SpaCy Documentation -- https://spacy.io (Named Entity Recognition)
- LangChain Documentation -- https://python.langchain.com
- LangGraph Documentation -- https://langchain-ai.github.io/langgraph/
- "ReAct: Synergizing Reasoning and Acting in Language Models" -- Yao et al., 2023
- "Switch Transformers: Scaling to Trillion Parameter Models" -- Fedus et al., 2021 (MoE)
- "Mixtral of Experts" -- Mistral AI, 2024 (MoE)
- "A Visual Guide to LLM Agents" -- Maarten Grootendorst, 2025
- "Building Effective Agents" -- Anthropic Blog, 2024
- Model Context Protocol Specification -- https://modelcontextprotocol.io
- "Large Action Models: From Inception to Implementation" -- Salesforce Research, 2024
- Playwright Documentation -- https://playwright.dev (Browser Automation)
- "High-Resolution Image Synthesis with Latent Diffusion Models" -- Rombach et al., 2022 (Diffusion Models)
- Stable Diffusion API Documentation -- https://stability.ai
- Tavily Search API -- https://tavily.com
- "AI Engineering" -- Chip Huyen, O'Reilly, 2025