University of Petra
Faculty of Information Technology
Data Science and Artificial Intelligence
UOP Logo
20th Anniversary
كلية تكنولوجيا المعلومات
علم البيانات والذكاء الاصطناعي
Final Project Description
Course: 606363 — Computing Systems for Data Science and AI   |   Semester: 2nd, 2025-2026   |   Instructor: Dr. Abdulkarim Albanna

Multi-Agent Research Assistant

Automated Research, Writing & Review System

1. Project Overview

Build a multi-agent system where specialized AI agents collaborate to produce high-quality research reports on any given topic. The system takes a research question as input, and through a coordinated workflow of autonomous agents, delivers a comprehensive, reviewed, and cited research report enriched with named entity extraction, source classification, web-based data collection via browser automation, and AI-generated illustrations.

This project integrates all major topics covered throughout the course: transformer architecture understanding, text classification, named entity recognition, LangChain fundamentals, LangGraph orchestration, large language model configuration, Mixture of Experts analysis, AI agent design with ReAct, multi-agent collaboration, Model Context Protocol for tool integration, Large Action Model capabilities for browser automation, and diffusion models for report illustration generation.

ItemDetails
TypeMulti-Agent System with LangGraph Orchestration
Team Size2-3 Students
DifficultyAdvanced
Topics CoveredWeek 1 (Transformers), Week 2 (Text Classification), Week 3 (NER), Week 4 (LangChain), Week 5 (LangGraph), Week 6 (LLMs), Week 7 (MoE), Week 9 (ReAct Agents), Week 10 (Multi-Agent), Week 11 (MCP), Week 12 (LAMs), Week 13 (Diffusion Models)
DeliverablesWorking application, source code, documentation, transformer analysis report, presentation, demo video
2. Problem Statement

Conducting research is a time-consuming process that involves multiple stages: searching for relevant sources, reading and extracting key information, classifying sources by relevance and topic, extracting named entities, organizing findings into a coherent structure, writing a clear report with illustrations, and reviewing it for accuracy and completeness. Each stage requires different skills and focus areas.

This project applies the multi-agent paradigm to automate this pipeline, where each agent specializes in one stage, and a central orchestrator manages the workflow — mirroring how a real research team operates. The system leverages transformer-based language models as the backbone, employs text classification and NER techniques for intelligent information processing, uses the ReAct pattern for autonomous research, exposes tools via the Model Context Protocol, performs browser-based actions through Large Action Model capabilities, and generates visual content using diffusion models.

3. System Architecture
User Input (Research Topic) | v +-------------------------+ | Orchestrator | LangGraph StateGraph - manages state & routing +-------------------------+ | +-------+-------+--------+--------+ | | | | | v v v v v +--------+ +--------+ +----------+ +--------+ +----------+ |Research| |Classif.| | NER | |Browser | | Analyzer| | Agent | | Agent | | Agent | | Agent | | Agent | |ReAct + | |Text | |Entity | |LAM for | |Theme | |Search | |Classif.| |Extract | |Web | |Extract | +--------+ +--------+ +----------+ +--------+ +----------+ | | | | | +-----------+----------+------------+-----------+ | v +---------------+ | Writer Agent | Drafts report with citations +---------------+ | v +------------------+ |Illustration Agent| Diffusion models for figures +------------------+ | v +---------------+ | Critic Agent | Reviews accuracy, gaps, suggestions +---------------+ | +-----+------+ | | APPROVE REVISE -----> back to Writer Agent | v +---------------+ | Final Report | Markdown / PDF with illustrations +---------------+
4. Technology Stack
ComponentTechnology
Transformer BackboneTransformer-based LLMs (GPT-4, Claude, Mixtral); students must document the architecture
Agent OrchestrationLangGraph (stateful graph with conditional edges)
Chain FrameworkLangChain (chains, prompts, output parsers, document loaders)
Large Language ModelOpenAI GPT-4 or Claude API (or Ollama for local); optional MoE model (Mixtral)
Text ClassificationLLM-based classifier or fine-tuned transformer (e.g., BERT) for source categorization
Named Entity RecognitionSpaCy NER pipeline or LLM-based entity extraction
Web SearchTavily Search API / Brave Search API
Web ScrapingBeautifulSoup / Trafilatura
Browser Automation (LAM)Playwright / Selenium for web actions (form filling, navigation, structured extraction)
Tool ProtocolModel Context Protocol (MCP) for standardized agent-tool integration
Image GenerationStable Diffusion API / DALL-E for report illustrations and diagrams
Vector StoreChromaDB (for storing and retrieving research findings)
Output FormatMarkdown with embedded images to PDF (via markdown2 + weasyprint)
User InterfaceStreamlit or Gradio
Programming LanguagePython 3.12+
5. Detailed Agent Descriptions
5.1 Orchestrator Agent (Manager)
5.2 Research Agent (Information Gatherer)
5.3 Classification Agent (Source Categorizer)
5.4 NER Agent (Entity Extractor)
5.5 Browser Agent (Web Action Performer)
5.6 Analyzer Agent (Information Processor)
5.7 Writer Agent (Content Creator)
5.8 Illustration Agent (Visual Content Generator)
5.9 Critic Agent (Quality Reviewer)
6. Shared State Schema
from typing import TypedDict, List, Optional class ResearchState(TypedDict): topic: str # User's research topic sub_questions: List[str] # Generated sub-questions raw_sources: List[dict] # {url, title, content, score} classified_sources: List[dict] # {url, title, category, relevance} entities: List[dict] # {text, label, source_url, frequency} entity_relationships: List[dict] # {entity1, entity2, relation, context} browser_results: List[dict] # {url, action, extracted_data} organized_findings: List[dict] # {theme, findings, sources, entities} outline: List[dict] # {section_title, key_points, image_prompts} draft: str # Markdown draft illustrations: List[dict] # {section, prompt, image_path} critic_feedback: Optional[str] # Feedback from Critic critic_score: Optional[int] # Score 1-10 revision_count: int # Number of revisions done final_report: Optional[str] # Approved final report with images transformer_config: dict # LLM model details and parameters moe_analysis: Optional[dict] # MoE model comparison results status: str # Current pipeline stage
7. Project Phases & Milestones
Phase 1: Transformer Foundations & Text Classification (Weeks 1-2)
TaskDescriptionDeliverable
1.1Study and document the transformer architecture used by the chosen LLMWritten analysis of attention mechanism, encoder/decoder structure
1.2Set up Python project structure and repositoryProject repo with requirements.txt
1.3Configure API keys (OpenAI/Anthropic, Tavily) and LLM parameters.env file with credentials, model config documented
1.4Install LangChain, LangGraph, ChromaDB, SpaCyWorking development environment
1.5Build the Classification Agent: implement text classification for source categorizationWorking classifier that labels sources by topic and relevance
1.6Test classification on sample documents from different domainsClassification accuracy report
Course Alignment: Week 1 (Transformer Architecture) -- understand and document the model backbone. Week 2 (Text Classification) -- build the source classification pipeline.
Phase 2: NER & LangChain Setup (Weeks 3-4)
TaskDescriptionDeliverable
2.1Build the NER Agent using SpaCy or LLM-based extractionWorking NER pipeline extracting people, orgs, tech, dates
2.2Implement entity registry and frequency analysisStructured entity database with co-occurrence tracking
2.3Set up LangChain fundamentals: chains, prompt templates, output parsersReusable chain components for all agents
2.4Build document loaders for various source formats (HTML, PDF, text)Multi-format document ingestion pipeline
2.5Design the shared state schema (TypedDict)Complete state schema with all fields defined
Course Alignment: Week 3 (NER) -- build entity extraction. Week 4 (LangChain Fundamentals) -- establish the framework foundation.
Phase 3: LangGraph Orchestration & LLM Configuration (Weeks 5-6)
TaskDescriptionDeliverable
3.1Create the LangGraph StateGraph skeleton with all agent nodesConnected graph with conditional edges
3.2Implement state transitions and routing logicWorking orchestration flow
3.3Configure and optimize LLM parameters (temperature, max tokens, system prompts)Documented LLM configuration per agent
3.4Compare LLM performance across different models and parameter settingsLLM benchmark comparison table
3.5Store findings in ChromaDB with embeddingsSearchable vector store
Course Alignment: Week 5 (LangGraph) -- core orchestration. Week 6 (LLMs) -- model selection and parameter optimization.
Phase 4: MoE Analysis & Research Agent (Week 7, then Week 9)
TaskDescriptionDeliverable
4.1Analyze MoE architectures: study Mixtral, Switch Transformer, DeepSeek-MoEWritten MoE analysis comparing dense vs. sparse models
4.2Optionally integrate an MoE model (Mixtral) for cost-efficient inferenceMoE model integration or comparative benchmark
4.3Implement the Research Agent with ReAct patternAgent that reasons about search strategy and acts on it
4.4Integrate Tavily/Brave Search APIs as toolsWorking search with result parsing
4.5Build web scraper for detailed content extractionFull text extraction from URLs
4.6Build ReAct loop: agent decides to search more or stopAutonomous iterative search behavior
Course Alignment: Week 7 (Mixture of Experts) -- analyze and optionally use MoE models. Week 9 (AI Agents and ReAct) -- implement the core research agent.
Phase 5: Multi-Agent Integration & MCP (Weeks 10-11)
TaskDescriptionDeliverable
5.1Connect all agents into a working multi-agent pipelineEnd-to-end workflow: Research, Classification, NER, Analysis, Writing, Review
5.2Implement the Analyzer Agent with theme extraction and outline generationOrganized findings with entity-enriched themes
5.3Implement the Writer Agent with citation system and entity integrationComplete draft generation with inline citations
5.4Implement the Critic Agent with scoring rubric and APPROVE/REVISE logicWorking review loop with max 3 revision cycles
5.5Expose agent tools via Model Context Protocol (MCP) serversMCP-compliant tool definitions for search, scrape, classify, NER
5.6Build MCP client integration so agents consume tools via MCPStandardized tool invocation across agents
Course Alignment: Week 10 (Multi-Agent Systems) -- full system integration. Week 11 (MCP) -- standardized tool protocol.
Phase 6: Browser Agent (LAM) & Diffusion Models (Weeks 12-13)
TaskDescriptionDeliverable
6.1Build the Browser Agent using Playwright for web automationAgent navigates pages, fills forms, extracts structured data
6.2Implement academic database interaction (Google Scholar, Semantic Scholar)Automated paper search and metadata extraction
6.3Build the Illustration Agent with diffusion model integrationAgent generates conceptual diagrams and figures
6.4Implement image prompt optimization for academic illustrationsHigh-quality technical illustrations for reports
6.5Integrate illustrations into the Writer Agent output pipelineReports with embedded generated images
6.6Build Streamlit UI with real-time progress indicatorsWorking web interface showing agent activity
6.7Add agent reasoning display in expandable panelsTransparent decision log in UI
Course Alignment: Week 12 (Large Action Models) -- browser automation agent. Week 13 (Diffusion Models) -- illustration generation.
Phase 7: Testing, Documentation & Presentation (Week 14)
TaskDescriptionDeliverable
7.1End-to-end testing on 5 diverse research topics5 complete research reports with illustrations
7.2Performance benchmarking (time, token cost, MoE vs dense comparison)Metrics per report including model comparison
7.3Evaluate NER accuracy, classification precision, and illustration qualityComponent-level evaluation metrics
7.4Write README, technical documentation, and transformer analysisSetup guide, architecture diagram, model analysis
7.5Prepare presentation slides and demo videoSlides + 5-min video walkthrough
8. Timeline Summary
WeekPhaseKey Deliverable
1-2Transformer Foundations & Text ClassificationTransformer analysis, project setup, working classifier
3-4NER & LangChain SetupNER pipeline, LangChain components, state schema
5-6LangGraph & LLM ConfigurationStateGraph skeleton, LLM benchmarks, vector store
7MoE AnalysisMoE architecture analysis, optional Mixtral integration
8Midterm Exam--
9Research Agent (ReAct)Working ReAct agent with iterative search
10-11Multi-Agent Integration & MCPFull pipeline, MCP tool servers, review loop
12-13Browser Agent (LAM) & DiffusionBrowser automation, illustrations, Streamlit UI
14Testing & Presentation5 test reports, documentation, demo video
9. Competition-Based Evaluation

This project follows a Competition-Based Learning (CBL) approach. All teams will build their own version of the Multi-Agent Research Assistant, then compete head-to-head in a live evaluation event. This fosters deeper engagement, peer learning, and real-world benchmarking skills.

9.1 Competition Format
StageWhenDescription
Checkpoint 1Week 7Teams demo their NLP pipeline (classification + NER + research agent). Instructor feedback only, no scoring.
Checkpoint 2Week 11Teams demo full multi-agent pipeline with MCP integration. Peer feedback round.
Competition DayWeek 15Live head-to-head competition. All teams receive the SAME 3 unseen research topics and run their systems in real-time.
9.2 Competition Day Protocol
  1. The instructor reveals 3 research topics (unseen by all teams) at the start of the session.
  2. All teams run their systems simultaneously on the same topics.
  3. Each system produces 3 research reports. Time limit: 30 minutes per topic.
  4. Reports are anonymized and distributed to a judging panel (instructor + 2 external judges or peer teams).
  5. Teams give a 10-minute live presentation of their architecture and demo.
  6. Judges score each report and presentation independently using the rubric below.
  7. Scores are averaged across judges and topics.
9.3 Scoring Rubric (100 points per report)
CriteriaPointsDescription
Report Quality20Coherence, depth, clarity, readability, proper structure (intro, body, conclusion)
Source Coverage15Number and diversity of relevant sources found (minimum 10), proper citations
Entity Extraction10Accuracy and completeness of NER output (people, organizations, technologies, dates)
Source Classification10Correct categorization of sources by type, domain, and relevance tier
Illustrations10Relevance and quality of AI-generated figures, diagrams, or visualizations
Speed10Time to complete the full pipeline. Faster = higher score (within the 30-min limit)
Cost Efficiency5Total API token cost per report. Lower cost = higher score
Error Handling5System handles failures gracefully (API errors, bad sources, timeouts)
Innovation15Creative features: MoE routing, advanced MCP tools, LAM web actions, multi-model strategies
9.4 Final Grade Breakdown
ComponentWeightDescription
Competition Score40%Average score across 3 topics from judging panel (rubric above)
Architecture & Topic Coverage20%Demonstrates all 12 course topics integrated into the system (transformers, NER, MoE, MCP, LAMs, diffusion, etc.)
Code Quality & Documentation15%Clean code, GitHub repo, README, technical report with architecture diagrams
Live Presentation15%10-minute demo + Q&A. Clarity, technical depth, team coordination
Peer Review10%Each team reviews 2 other teams' reports and provides structured feedback. Quality of review is graded.
9.5 Competition Prizes & Ranking
RankAwardBonus
1st PlaceGold Award+5 bonus points on final grade
2nd PlaceSilver Award+3 bonus points on final grade
3rd PlaceBronze Award+2 bonus points on final grade
Best InnovationInnovation Award+2 bonus points (can stack with placement)
Best ReportQuality Award+2 bonus points (can stack with placement)
Fastest SystemSpeed Award+1 bonus point
Academic Integrity: All code must be original team work. Teams may use open-source libraries and LLM APIs but must not copy other teams' agent logic or prompts. Plagiarism results in disqualification and a failing grade.
10. Expected Output Example

Input: "The impact of Mixture of Experts architecture on Large Language Model efficiency"

# Research Report: The Impact of MoE on LLM Efficiency ## Executive Summary This report examines how Mixture of Experts (MoE) architectures have transformed large language model efficiency by enabling sparse activation, reducing computational costs while maintaining or improving performance... ## Named Entities Identified - Organizations: Google, Mistral AI, DeepSeek - People: Noam Shazeer, William Fedus, Albert Jiang - Technologies: Switch Transformer, Mixtral 8x7B, DeepSeek-MoE, GShard - Dates: 2017, 2021, 2024 ## Source Classification Summary - Technical Papers: 8 sources (high relevance) - Blog Posts / Tutorials: 4 sources (medium relevance) - News Articles: 2 sources (low relevance, filtered) ## 1. Introduction The rapid scaling of large language models has created significant computational challenges... ## 2. Transformer Architecture Background ### 2.1 Self-Attention Mechanism ### 2.2 Feed-Forward Networks as Expert Candidates ## 3. MoE Architecture Fundamentals ### 3.1 Gating Mechanisms ### 3.2 Expert Networks ### 3.3 Top-K Selection ## 4. Key MoE Models ### 4.1 Switch Transformer (Google, 2021) ### 4.2 Mixtral 8x7B (Mistral, 2024) ### 4.3 DeepSeek-MoE ## 5. Performance Analysis ### 5.1 Computational Efficiency ### 5.2 Quality Benchmarks [Figure 1: MoE vs Dense Model Efficiency Comparison -- generated illustration] ## 6. Challenges and Limitations ## 7. Conclusion ## References [1] Fedus et al., "Switch Transformers", 2021. https://... [2] Mistral AI, "Mixtral of Experts", 2024. https://... [3] Vaswani et al., "Attention Is All You Need", 2017. https://... ...
11. Submission Requirements
ItemDetails
Source CodeGitHub repository with clean commit history and README
Transformer AnalysisWritten document explaining the transformer architecture of the chosen LLM backbone
MoE ComparisonAnalysis report comparing MoE vs. dense model architectures with benchmarks
DocumentationTechnical report: architecture, design decisions, NER/classification evaluation, challenges, results
Demo Video5-minute screen recording demonstrating the full pipeline including NER, classification, browser actions, and illustrations
Sample Reports3 generated research reports on different topics with entity summaries, classification reports, and illustrations
Presentation15-minute live presentation with Q&A covering all course topics demonstrated in the project
DeadlineWeek 14 of the semester
12. Useful References