Chat with Databases using LangChain

LangChain provides powerful tools to interact with your databases using natural language. This application demonstrates how to create a chat interface that allows users to query databases using conversational language instead of SQL.

Why use LangChain for database interactions?

Query databases using natural language
Convert user questions to optimized SQL
Interpret results in human-readable format
Handle follow-up questions with context
Protect against SQL injection with built-in safeguards

Chat Interface

Hello! I'm your database assistant. I can help you query the sample Chinook database which contains information about a music store. What would you like to know?

Which country's customers spent the most?

Looking at the data, customers from the USA spent the most, with a total of $523.06.

Who are the top 3 best selling artists?

The top 3 best selling artists are:

Iron Maiden - 140 units sold
U2 - 107 units sold
Metallica - 91 units sold

Database Schema

Chinook Database Schema

Customer

Column	Type	Constraints
CustomerId	INTEGER	PRIMARY KEY
FirstName	NVARCHAR(40)	NOT NULL
LastName	NVARCHAR(20)	NOT NULL
Country	NVARCHAR(40)
Email	NVARCHAR(60)	NOT NULL

Invoice

Column	Type	Constraints
InvoiceId	INTEGER	PRIMARY KEY
CustomerId	INTEGER	FOREIGN KEY
InvoiceDate	DATETIME	NOT NULL
BillingCountry	NVARCHAR(40)
Total	NUMERIC(10,2)	NOT NULL

Artist

Column	Type	Constraints
ArtistId	INTEGER	PRIMARY KEY
Name	NVARCHAR(120)

Album

Column	Type	Constraints
AlbumId	INTEGER	PRIMARY KEY
Title	NVARCHAR(160)	NOT NULL
ArtistId	INTEGER	FOREIGN KEY

Key relationships in the Chinook database:

Customer (1) → Invoice (many)
Artist (1) → Album (many)
Album (1) → Track (many)
Track (many) ← InvoiceLine (many) → Invoice (1)

How LangChain Connects to Databases

LangChain SQL Database Toolkit

LangChain provides a specialized toolkit for SQL database interactions through its SQLDatabaseToolkit. This toolkit contains several tools that enable an AI agent to query and interact with SQL databases.

Key Components:

QuerySQLDataBaseTool: Executes SQL queries
InfoSQLDatabaseTool: Gets schema information
ListSQLDatabaseTool: Lists available tables
QuerySQLCheckerTool: Validates SQL queries before execution

The Language Model's Role

The language model (like GPT-4) acts as the bridge between natural language and SQL. It:

Understands the user's question in natural language
Translates the question into a valid SQL query based on database schema
Reviews the query for errors or security issues
Receives raw query results and formats them into user-friendly responses
Maintains conversation context for follow-up questions

Architecture Overview

System Components

Database Connection: SQLAlchemy connection to your database (SQLite, MySQL, PostgreSQL, etc.)
LLM Integration: Connection to OpenAI, Anthropic, or other LLM APIs
LangChain Tools: Specialized tools for SQL query generation and validation
Agent Framework: LangChain's agent system for orchestrating complex interactions
Chat Interface: Frontend for user interactions with the system

Security Considerations

⚠️ Important Security Warning

Building Q&A systems for SQL databases requires executing model-generated SQL queries. There are inherent risks in doing this:

Always use read-only database connections when possible
Scope database permissions as narrowly as possible
Consider implementing additional validation layers
Monitor query patterns and implement rate limiting

Process Flow

1

User Query Input

The user enters a natural language question about the database data.

Example: "Which country's customers spent the most?"

2

Database Schema Analysis

The system uses ListSQLDatabaseTool and InfoSQLDatabaseTool to understand the database structure.

Tables identified: Customer, Invoice, InvoiceLine

3

SQL Query Generation

The LLM converts the natural language question into a SQL query based on the schema.

SELECT c.Country, SUM(i.Total) AS TotalSpent
FROM Customer c
JOIN Invoice i ON c.CustomerId = i.CustomerId
GROUP BY c.Country
ORDER BY TotalSpent DESC
LIMIT 1

4

Query Validation

The QuerySQLCheckerTool validates the SQL query for errors and potential improvements.

Query validated successfully

5

Database Execution

The QuerySQLDataBaseTool executes the SQL query against the database.

[('USA', 523.0600000000003)]

6

Result Interpretation

The LLM translates the raw query results into a natural language response.

"Customers from the USA spent the most, with a total amount spent of $523.06."

Sample Queries You Can Try

Sales Analysis

Which country's customers spent the most?

Sales Geography Aggregation

Artist Popularity

Who are the top 3 best selling artists?

Artists Sales Ranking

Genre Analysis

What's the most popular music genre by sales?

Genres Popularity Analysis

Customer Insights

Which customer has purchased the most tracks?

Customers Purchases Loyalty

Time-based Analysis

What was the highest sales month in 2021?

Time Sales Seasonal

Complex Relationship

Which employee has the highest sales through their customers?

Employees Customers Relationships

Implementation Code

Setting Up the Environment

# Install required packages
pip install langchain langchain-community langchain-openai langgraph

# Import necessary modules
import os
from langchain_openai import ChatOpenAI
from langchain_community.utilities import SQLDatabase
from langchain import hub
from langchain_community.tools.sql_database.tool import QuerySQLDatabaseTool
from langgraph.prebuilt import create_react_agent

# Set your OpenAI API key
os.environ["OPENAI_API_KEY"] = "your-api-key-here"

Database Connection

# Connect to an SQLite Database
from sqlalchemy import create_engine
from sqlalchemy.pool import StaticPool

# For an existing SQLite database
db_path = "chinook.db"  # Path to your SQLite database
engine = create_engine(f"sqlite:///{db_path}")
db = SQLDatabase(engine)

# Alternatively, for in-memory testing with example data
def get_in_memory_db():
    import sqlite3
    import requests

    url = "https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite.sql"
    response = requests.get(url)
    sql_script = response.text

    connection = sqlite3.connect(":memory:", check_same_thread=False)
    connection.executescript(sql_script)

    return create_engine(
        "sqlite://",
        creator=lambda: connection,
        poolclass=StaticPool,
        connect_args={"check_same_thread": False},
    )

engine = get_in_memory_db()
db = SQLDatabase(engine)

Creating the LangChain Agent

# Initialize Language Model
llm = ChatOpenAI(model="gpt-4")  # You can also use gpt-3.5-turbo for lower cost

# Initialize SQL Database Toolkit
from langchain_community.agent_toolkits.sql.toolkit import SQLDatabaseToolkit
toolkit = SQLDatabaseToolkit(db=db, llm=llm)

# Get the tools from the toolkit
tools = toolkit.get_tools()

# Get a prompt from LangChain Hub
prompt_template = hub.pull("langchain-ai/sql-agent-system-prompt")
system_message = prompt_template.format(dialect="SQLite", top_k=5)

# Create the agent
agent_executor = create_react_agent(
    llm,
    tools,
    state_modifier=system_message
)

Django Integration (views.py)

# views.py for Django integration
from django.http import JsonResponse
from django.views.decorators.csrf import csrf_exempt
import json

@csrf_exempt
def chat_with_db(request):
    if request.method == 'POST':
        try:
            data = json.loads(request.body)
            user_message = data.get('message', '')

            # Process the user message with our agent
            response = agent_executor.invoke(
                {"messages": [("user", user_message)]}
            )

            # Extract the final AI message
            ai_message = response["messages"][-1].content

            return JsonResponse({
                'success': True,
                'message': ai_message
            })

        except Exception as e:
            return JsonResponse({
                'success': False,
                'error': str(e)
            })

    return JsonResponse({
        'success': False,
        'error': 'Only POST requests are supported'
    })

Frontend JavaScript

// JavaScript for the chat interface
document.addEventListener('DOMContentLoaded', function() {
    const messageInput = document.querySelector('.chat-input input');
    const sendButton = document.querySelector('.chat-input button');
    const messagesContainer = document.querySelector('.chat-messages');
    const sampleQueries = document.querySelectorAll('.sample-query');

    // Function to add a message to the chat
    function addMessage(message, isUser = false) {
        const messageDiv = document.createElement('div');
        messageDiv.className = isUser ? 'user-message' : 'bot-message';
        messageDiv.innerHTML = message;
        messagesContainer.appendChild(messageDiv);
        messagesContainer.scrollTop = messagesContainer.scrollHeight;
    }

    // Function to send a message to the backend
    async function sendMessage(message) {
        addMessage(message, true);

        // Show loading indicator
        const loadingDiv = document.createElement('div');
        loadingDiv.className = 'bot-message';
        loadingDiv.innerHTML = ' Thinking...';
        messagesContainer.appendChild(loadingDiv);

        try {
            const response = await fetch('/chat_with_db/', {
                method: 'POST',
                headers: {
                    'Content-Type': 'application/json',
                },
                body: JSON.stringify({ message: message }),
            });

            const data = await response.json();

            // Remove loading indicator
            messagesContainer.removeChild(loadingDiv);

            if (data.success) {
                addMessage(data.message);
            } else {
                addMessage('Error: ' + data.error);
            }
        } catch (error) {
            // Remove loading indicator
            messagesContainer.removeChild(loadingDiv);
            addMessage('An error occurred while connecting to the server.');
        }
    }

    // Event listener for send button
    sendButton.addEventListener('click', function() {
        const message = messageInput.value.trim();
        if (message) {
            sendMessage(message);
            messageInput.value = '';
        }
    });

    // Event listener for enter key
    messageInput.addEventListener('keypress', function(e) {
        if (e.key === 'Enter') {
            const message = messageInput.value.trim();
            if (message) {
                sendMessage(message);
                messageInput.value = '';
            }
        }
    });

    // Event listeners for sample queries
    sampleQueries.forEach(query => {
        query.addEventListener('click', function() {
            const questionText = this.querySelector('p').textContent;
            messageInput.value = questionText;
            messageInput.focus();
        });
    });
});

LangChain DB Chat

External Links