University of Petra
Faculty of Information Technology
Data Science and Artificial Intelligence

كلية تكنولوجيا المعلومات
علم البيانات والذكاء الاصطناعي

Final Project Description

Course: 606363 — Computing Systems for Data Science and AI | Semester: 2nd, 2025-2026 | Instructor: Dr. Abdulkarim Albanna

Multi-Agent Research Assistant

Automated Research, Writing & Review System

1. Project Overview

Build a multi-agent system where specialized AI agents collaborate to produce high-quality research reports on any given topic. The system takes a research question as input, and through a coordinated workflow of autonomous agents, delivers a comprehensive, reviewed, and cited research report enriched with named entity extraction, source classification, web-based data collection via browser automation, and AI-generated illustrations.

This project integrates all major topics covered throughout the course: transformer architecture understanding, text classification, named entity recognition, LangChain fundamentals, LangGraph orchestration, large language model configuration, Mixture of Experts analysis, AI agent design with ReAct, multi-agent collaboration, Model Context Protocol for tool integration, Large Action Model capabilities for browser automation, and diffusion models for report illustration generation.

Item	Details
Type	Multi-Agent System with LangGraph Orchestration
Team Size	2-3 Students
Difficulty	Advanced
Topics Covered	Week 1 (Transformers), Week 2 (Text Classification), Week 3 (NER), Week 4 (LangChain), Week 5 (LangGraph), Week 6 (LLMs), Week 7 (MoE), Week 9 (ReAct Agents), Week 10 (Multi-Agent), Week 11 (MCP), Week 12 (LAMs), Week 13 (Diffusion Models)
Deliverables	Working application, source code, documentation, transformer analysis report, presentation, demo video

2. Problem Statement

Conducting research is a time-consuming process that involves multiple stages: searching for relevant sources, reading and extracting key information, classifying sources by relevance and topic, extracting named entities, organizing findings into a coherent structure, writing a clear report with illustrations, and reviewing it for accuracy and completeness. Each stage requires different skills and focus areas.

This project applies the multi-agent paradigm to automate this pipeline, where each agent specializes in one stage, and a central orchestrator manages the workflow — mirroring how a real research team operates. The system leverages transformer-based language models as the backbone, employs text classification and NER techniques for intelligent information processing, uses the ReAct pattern for autonomous research, exposes tools via the Model Context Protocol, performs browser-based actions through Large Action Model capabilities, and generates visual content using diffusion models.

3. System Architecture

4. Technology Stack

Component	Technology
Transformer Backbone	Transformer-based LLMs (GPT-4, Claude, Mixtral); students must document the architecture
Agent Orchestration	LangGraph (stateful graph with conditional edges)
Chain Framework	LangChain (chains, prompts, output parsers, document loaders)
Large Language Model	OpenAI GPT-4 or Claude API (or Ollama for local); optional MoE model (Mixtral)
Text Classification	LLM-based classifier or fine-tuned transformer (e.g., BERT) for source categorization
Named Entity Recognition	SpaCy NER pipeline or LLM-based entity extraction
Web Search	Tavily Search API / Brave Search API
Web Scraping	BeautifulSoup / Trafilatura
Browser Automation (LAM)	Playwright / Selenium for web actions (form filling, navigation, structured extraction)
Tool Protocol	Model Context Protocol (MCP) for standardized agent-tool integration
Image Generation	Stable Diffusion API / DALL-E for report illustrations and diagrams
Vector Store	ChromaDB (for storing and retrieving research findings)
Output Format	Markdown with embedded images to PDF (via markdown2 + weasyprint)
User Interface	Streamlit or Gradio
Programming Language	Python 3.12+

5. Detailed Agent Descriptions

5.1 Orchestrator Agent (Manager)

Role: Central coordinator that manages the entire workflow
Receives the research topic from the user and defines research scope
Routes tasks to appropriate agents based on current state
Manages shared state (findings, drafts, feedback, entities, classifications)
Decides when to loop back (revision) or finalize
Implements human-in-the-loop checkpoints
Implementation: LangGraph StateGraph with conditional edges
Course Topic: Week 5 (LangGraph), Week 10 (Multi-Agent Systems)

5.2 Research Agent (Information Gatherer)

Role: Searches the internet and collects relevant information using the ReAct pattern
Generates effective search queries from the research topic
Searches multiple sources using Tavily/Brave APIs
Scrapes relevant web pages for detailed content
Filters and ranks results by relevance (LLM scoring 1-10)
Stores findings with source URLs for proper citation
Exposes search and scrape tools via MCP server
Pattern: ReAct (Reasoning + Acting) -- Week 9
Tools: Web Search (MCP), Web Scraper (MCP), URL Reader
Course Topic: Week 9 (AI Agents and ReAct), Week 11 (MCP)

5.3 Classification Agent (Source Categorizer)

Role: Classifies collected sources by relevance and topic category
Applies text classification to each collected source
Assigns topic labels (e.g., "technical", "review", "tutorial", "opinion", "dataset")
Assigns relevance scores (high, medium, low) using classification models
Filters out low-relevance or off-topic sources before analysis
Generates a classification summary report
Pattern: LLM-based text classification or fine-tuned BERT classifier
Course Topic: Week 2 (Text Classification)

5.4 NER Agent (Entity Extractor)

Role: Extracts named entities from all collected research sources
Identifies key people (researchers, authors), organizations, technologies, dates, and locations
Builds an entity registry linking entities to their source documents
Detects entity co-occurrences and relationships
Provides entity frequency analysis to highlight dominant themes
Outputs a structured entity report used by the Analyzer and Writer agents
Implementation: SpaCy NER pipeline or LLM-based extraction with structured output
Course Topic: Week 3 (Named Entity Recognition)

5.5 Browser Agent (Web Action Performer)

Role: Performs complex web interactions that go beyond simple API calls
Navigates dynamic web pages that require JavaScript rendering
Fills search forms on academic databases (Google Scholar, Semantic Scholar)
Extracts structured data from tables and interactive pages
Downloads PDFs and datasets from research repositories
Handles authentication flows for gated content when credentials are provided
Implementation: Playwright-based browser automation with LLM planning
Course Topic: Week 12 (Large Action Models)

5.6 Analyzer Agent (Information Processor)

Role: Processes raw research data into organized findings
Receives classified sources and extracted entities as additional input
Identifies 4-6 key themes and patterns from collected sources
Groups findings by topic and subtopic, leveraging classification labels
Detects contradictions between sources
Ranks findings by importance and relevance
Creates a structured outline for the report, incorporating entity relationships
Pattern: Chain-of-Thought reasoning
Course Topic: Week 4 (LangChain chains and prompts)

5.7 Writer Agent (Content Creator)

Role: Transforms organized findings into a polished report
Follows the outline from the Analyzer Agent
Writes clear, academic-style prose with proper citations
Creates introduction, body sections, and conclusion
Generates an executive summary
Integrates extracted entities naturally into the narrative
Generates prompts for the Illustration Agent to produce relevant figures
Formats output in clean Markdown with image placeholders
Pattern: Structured output with citation formatting
Course Topic: Week 6 (LLM configuration and optimization)

5.8 Illustration Agent (Visual Content Generator)

Role: Generates illustrations, diagrams, and figures for the research report
Receives image generation prompts from the Writer Agent
Creates conceptual diagrams, architecture illustrations, and data visualizations
Uses diffusion models (Stable Diffusion API or DALL-E) to generate images
Optimizes prompts for technical/academic illustration style
Embeds generated images into the final report at appropriate locations
Implementation: Diffusion model API integration with prompt engineering
Course Topic: Week 13 (Diffusion Models)

5.9 Critic Agent (Quality Reviewer)

Role: Reviews the draft and provides actionable feedback
Checks factual accuracy against original sources
Identifies gaps in coverage and weak arguments
Verifies that NER entities and classification labels are used correctly
Evaluates coherence, flow, illustration relevance, and citation completeness
Scores the report (1-10) on: accuracy, completeness, clarity, citations, visual quality
Decision: APPROVE (score >= 7) or REVISE (with specific feedback)
Pattern: Reflection pattern

6. Shared State Schema

from typing import TypedDict, List, Optional

class ResearchState(TypedDict):
    topic: str                              # User's research topic
    sub_questions: List[str]                # Generated sub-questions
    raw_sources: List[dict]                 # {url, title, content, score}
    classified_sources: List[dict]          # {url, title, category, relevance}
    entities: List[dict]                    # {text, label, source_url, frequency}
    entity_relationships: List[dict]        # {entity1, entity2, relation, context}
    browser_results: List[dict]             # {url, action, extracted_data}
    organized_findings: List[dict]          # {theme, findings, sources, entities}
    outline: List[dict]                     # {section_title, key_points, image_prompts}
    draft: str                              # Markdown draft
    illustrations: List[dict]              # {section, prompt, image_path}
    critic_feedback: Optional[str]          # Feedback from Critic
    critic_score: Optional[int]             # Score 1-10
    revision_count: int                     # Number of revisions done
    final_report: Optional[str]             # Approved final report with images
    transformer_config: dict                # LLM model details and parameters
    moe_analysis: Optional[dict]            # MoE model comparison results
    status: str                             # Current pipeline stage

7. Project Phases & Milestones

Phase 1: Transformer Foundations & Text Classification (Weeks 1-2)

Task	Description	Deliverable
1.1	Study and document the transformer architecture used by the chosen LLM	Written analysis of attention mechanism, encoder/decoder structure
1.2	Set up Python project structure and repository	Project repo with requirements.txt
1.3	Configure API keys (OpenAI/Anthropic, Tavily) and LLM parameters	.env file with credentials, model config documented
1.4	Install LangChain, LangGraph, ChromaDB, SpaCy	Working development environment
1.5	Build the Classification Agent: implement text classification for source categorization	Working classifier that labels sources by topic and relevance
1.6	Test classification on sample documents from different domains	Classification accuracy report

Course Alignment: Week 1 (Transformer Architecture) -- understand and document the model backbone. Week 2 (Text Classification) -- build the source classification pipeline.

Phase 2: NER & LangChain Setup (Weeks 3-4)

Task	Description	Deliverable
2.1	Build the NER Agent using SpaCy or LLM-based extraction	Working NER pipeline extracting people, orgs, tech, dates
2.2	Implement entity registry and frequency analysis	Structured entity database with co-occurrence tracking
2.3	Set up LangChain fundamentals: chains, prompt templates, output parsers	Reusable chain components for all agents
2.4	Build document loaders for various source formats (HTML, PDF, text)	Multi-format document ingestion pipeline
2.5	Design the shared state schema (TypedDict)	Complete state schema with all fields defined

Course Alignment: Week 3 (NER) -- build entity extraction. Week 4 (LangChain Fundamentals) -- establish the framework foundation.

Phase 3: LangGraph Orchestration & LLM Configuration (Weeks 5-6)

Task	Description	Deliverable
3.1	Create the LangGraph StateGraph skeleton with all agent nodes	Connected graph with conditional edges
3.2	Implement state transitions and routing logic	Working orchestration flow
3.3	Configure and optimize LLM parameters (temperature, max tokens, system prompts)	Documented LLM configuration per agent
3.4	Compare LLM performance across different models and parameter settings	LLM benchmark comparison table
3.5	Store findings in ChromaDB with embeddings	Searchable vector store

Course Alignment: Week 5 (LangGraph) -- core orchestration. Week 6 (LLMs) -- model selection and parameter optimization.

Phase 4: MoE Analysis & Research Agent (Week 7, then Week 9)

Task	Description	Deliverable
4.1	Analyze MoE architectures: study Mixtral, Switch Transformer, DeepSeek-MoE	Written MoE analysis comparing dense vs. sparse models
4.2	Optionally integrate an MoE model (Mixtral) for cost-efficient inference	MoE model integration or comparative benchmark
4.3	Implement the Research Agent with ReAct pattern	Agent that reasons about search strategy and acts on it
4.4	Integrate Tavily/Brave Search APIs as tools	Working search with result parsing
4.5	Build web scraper for detailed content extraction	Full text extraction from URLs
4.6	Build ReAct loop: agent decides to search more or stop	Autonomous iterative search behavior

Course Alignment: Week 7 (Mixture of Experts) -- analyze and optionally use MoE models. Week 9 (AI Agents and ReAct) -- implement the core research agent.

Phase 5: Multi-Agent Integration & MCP (Weeks 10-11)

Task	Description	Deliverable
5.1	Connect all agents into a working multi-agent pipeline	End-to-end workflow: Research, Classification, NER, Analysis, Writing, Review
5.2	Implement the Analyzer Agent with theme extraction and outline generation	Organized findings with entity-enriched themes
5.3	Implement the Writer Agent with citation system and entity integration	Complete draft generation with inline citations
5.4	Implement the Critic Agent with scoring rubric and APPROVE/REVISE logic	Working review loop with max 3 revision cycles
5.5	Expose agent tools via Model Context Protocol (MCP) servers	MCP-compliant tool definitions for search, scrape, classify, NER
5.6	Build MCP client integration so agents consume tools via MCP	Standardized tool invocation across agents

Course Alignment: Week 10 (Multi-Agent Systems) -- full system integration. Week 11 (MCP) -- standardized tool protocol.

Phase 6: Browser Agent (LAM) & Diffusion Models (Weeks 12-13)

Task	Description	Deliverable
6.1	Build the Browser Agent using Playwright for web automation	Agent navigates pages, fills forms, extracts structured data
6.2	Implement academic database interaction (Google Scholar, Semantic Scholar)	Automated paper search and metadata extraction
6.3	Build the Illustration Agent with diffusion model integration	Agent generates conceptual diagrams and figures
6.4	Implement image prompt optimization for academic illustrations	High-quality technical illustrations for reports
6.5	Integrate illustrations into the Writer Agent output pipeline	Reports with embedded generated images
6.6	Build Streamlit UI with real-time progress indicators	Working web interface showing agent activity
6.7	Add agent reasoning display in expandable panels	Transparent decision log in UI

Course Alignment: Week 12 (Large Action Models) -- browser automation agent. Week 13 (Diffusion Models) -- illustration generation.

Phase 7: Testing, Documentation & Presentation (Week 14)

Task	Description	Deliverable
7.1	End-to-end testing on 5 diverse research topics	5 complete research reports with illustrations
7.2	Performance benchmarking (time, token cost, MoE vs dense comparison)	Metrics per report including model comparison
7.3	Evaluate NER accuracy, classification precision, and illustration quality	Component-level evaluation metrics
7.4	Write README, technical documentation, and transformer analysis	Setup guide, architecture diagram, model analysis
7.5	Prepare presentation slides and demo video	Slides + 5-min video walkthrough

8. Timeline Summary

Week	Phase	Key Deliverable
1-2	Transformer Foundations & Text Classification	Transformer analysis, project setup, working classifier
3-4	NER & LangChain Setup	NER pipeline, LangChain components, state schema
5-6	LangGraph & LLM Configuration	StateGraph skeleton, LLM benchmarks, vector store
7	MoE Analysis	MoE architecture analysis, optional Mixtral integration
8	Midterm Exam	--
9	Research Agent (ReAct)	Working ReAct agent with iterative search
10-11	Multi-Agent Integration & MCP	Full pipeline, MCP tool servers, review loop
12-13	Browser Agent (LAM) & Diffusion	Browser automation, illustrations, Streamlit UI
14	Testing & Presentation	5 test reports, documentation, demo video

9. Competition-Based Evaluation

This project follows a Competition-Based Learning (CBL) approach. All teams will build their own version of the Multi-Agent Research Assistant, then compete head-to-head in a live evaluation event. This fosters deeper engagement, peer learning, and real-world benchmarking skills.

9.1 Competition Format

Stage	When	Description
Checkpoint 1	Week 7	Teams demo their NLP pipeline (classification + NER + research agent). Instructor feedback only, no scoring.
Checkpoint 2	Week 11	Teams demo full multi-agent pipeline with MCP integration. Peer feedback round.
Competition Day	Week 15	Live head-to-head competition. All teams receive the SAME 3 unseen research topics and run their systems in real-time.

9.2 Competition Day Protocol

The instructor reveals 3 research topics (unseen by all teams) at the start of the session.
All teams run their systems simultaneously on the same topics.
Each system produces 3 research reports. Time limit: 30 minutes per topic.
Reports are anonymized and distributed to a judging panel (instructor + 2 external judges or peer teams).
Teams give a 10-minute live presentation of their architecture and demo.
Judges score each report and presentation independently using the rubric below.
Scores are averaged across judges and topics.

9.3 Scoring Rubric (100 points per report)

Criteria	Points	Description
Report Quality	20	Coherence, depth, clarity, readability, proper structure (intro, body, conclusion)
Source Coverage	15	Number and diversity of relevant sources found (minimum 10), proper citations
Entity Extraction	10	Accuracy and completeness of NER output (people, organizations, technologies, dates)
Source Classification	10	Correct categorization of sources by type, domain, and relevance tier
Illustrations	10	Relevance and quality of AI-generated figures, diagrams, or visualizations
Speed	10	Time to complete the full pipeline. Faster = higher score (within the 30-min limit)
Cost Efficiency	5	Total API token cost per report. Lower cost = higher score
Error Handling	5	System handles failures gracefully (API errors, bad sources, timeouts)
Innovation	15	Creative features: MoE routing, advanced MCP tools, LAM web actions, multi-model strategies

9.4 Final Grade Breakdown

Component	Weight	Description
Competition Score	40%	Average score across 3 topics from judging panel (rubric above)
Architecture & Topic Coverage	20%	Demonstrates all 12 course topics integrated into the system (transformers, NER, MoE, MCP, LAMs, diffusion, etc.)
Code Quality & Documentation	15%	Clean code, GitHub repo, README, technical report with architecture diagrams
Live Presentation	15%	10-minute demo + Q&A. Clarity, technical depth, team coordination
Peer Review	10%	Each team reviews 2 other teams' reports and provides structured feedback. Quality of review is graded.

9.5 Competition Prizes & Ranking

Rank	Award	Bonus
1st Place	Gold Award	+5 bonus points on final grade
2nd Place	Silver Award	+3 bonus points on final grade
3rd Place	Bronze Award	+2 bonus points on final grade
Best Innovation	Innovation Award	+2 bonus points (can stack with placement)
Best Report	Quality Award	+2 bonus points (can stack with placement)
Fastest System	Speed Award	+1 bonus point

Academic Integrity: All code must be original team work. Teams may use open-source libraries and LLM APIs but must not copy other teams' agent logic or prompts. Plagiarism results in disqualification and a failing grade.

10. Expected Output Example

Input: "The impact of Mixture of Experts architecture on Large Language Model efficiency"

# Research Report: The Impact of MoE on LLM Efficiency

## Executive Summary
This report examines how Mixture of Experts (MoE) architectures have 
transformed large language model efficiency by enabling sparse activation, 
reducing computational costs while maintaining or improving performance...

## Named Entities Identified
- Organizations: Google, Mistral AI, DeepSeek
- People: Noam Shazeer, William Fedus, Albert Jiang
- Technologies: Switch Transformer, Mixtral 8x7B, DeepSeek-MoE, GShard
- Dates: 2017, 2021, 2024

## Source Classification Summary
- Technical Papers: 8 sources (high relevance)
- Blog Posts / Tutorials: 4 sources (medium relevance)
- News Articles: 2 sources (low relevance, filtered)

## 1. Introduction
The rapid scaling of large language models has created significant 
computational challenges...

## 2. Transformer Architecture Background
### 2.1 Self-Attention Mechanism
### 2.2 Feed-Forward Networks as Expert Candidates

## 3. MoE Architecture Fundamentals
### 3.1 Gating Mechanisms
### 3.2 Expert Networks  
### 3.3 Top-K Selection

## 4. Key MoE Models
### 4.1 Switch Transformer (Google, 2021)
### 4.2 Mixtral 8x7B (Mistral, 2024)
### 4.3 DeepSeek-MoE

## 5. Performance Analysis
### 5.1 Computational Efficiency
### 5.2 Quality Benchmarks
[Figure 1: MoE vs Dense Model Efficiency Comparison -- generated illustration]

## 6. Challenges and Limitations

## 7. Conclusion

## References
[1] Fedus et al., "Switch Transformers", 2021. https://...
[2] Mistral AI, "Mixtral of Experts", 2024. https://...
[3] Vaswani et al., "Attention Is All You Need", 2017. https://...
...

11. Submission Requirements

Item	Details
Source Code	GitHub repository with clean commit history and README
Transformer Analysis	Written document explaining the transformer architecture of the chosen LLM backbone
MoE Comparison	Analysis report comparing MoE vs. dense model architectures with benchmarks
Documentation	Technical report: architecture, design decisions, NER/classification evaluation, challenges, results
Demo Video	5-minute screen recording demonstrating the full pipeline including NER, classification, browser actions, and illustrations
Sample Reports	3 generated research reports on different topics with entity summaries, classification reports, and illustrations
Presentation	15-minute live presentation with Q&A covering all course topics demonstrated in the project
Deadline	Week 14 of the semester

12. Useful References

"Attention Is All You Need" -- Vaswani et al., 2017 (Transformer Architecture)
"BERT: Pre-training of Deep Bidirectional Transformers" -- Devlin et al., 2019 (Text Classification, NER)
SpaCy Documentation -- https://spacy.io (Named Entity Recognition)
LangChain Documentation -- https://python.langchain.com
LangGraph Documentation -- https://langchain-ai.github.io/langgraph/
"ReAct: Synergizing Reasoning and Acting in Language Models" -- Yao et al., 2023
"Switch Transformers: Scaling to Trillion Parameter Models" -- Fedus et al., 2021 (MoE)
"Mixtral of Experts" -- Mistral AI, 2024 (MoE)
"A Visual Guide to LLM Agents" -- Maarten Grootendorst, 2025
"Building Effective Agents" -- Anthropic Blog, 2024
Model Context Protocol Specification -- https://modelcontextprotocol.io
"Large Action Models: From Inception to Implementation" -- Salesforce Research, 2024
Playwright Documentation -- https://playwright.dev (Browser Automation)
"High-Resolution Image Synthesis with Latent Diffusion Models" -- Rombach et al., 2022 (Diffusion Models)
Stable Diffusion API Documentation -- https://stability.ai
Tavily Search API -- https://tavily.com
"AI Engineering" -- Chip Huyen, O'Reilly, 2025