Build a RAG Pipeline Inside Joomla for Intelligent Site Search
Joomla's built-in search has always had the same fundamental limitation. It is keyword-based. A visitor types "how do I reset my account" and the search engine looks for articles containing those exact words. If your article uses the phrase "recover your login credentials" instead, it does not show up. The visitor gets no results, concludes your site does not have the answer, and leaves.
This is not a Joomla problem specifically. It is what keyword search does. It matches strings, not meaning. RAG, Retrieval-Augmented Generation, solves this at the architecture level. Instead of matching keywords, it converts both your content and the search query into vector embeddings, finds content that is semantically similar, and uses an LLM to generate a direct answer from that content. A visitor asking "how do I reset my account" gets a proper answer even if none of your articles use those exact words.
I will walk through the full implementation. We will cover the three main vector storage options honestly so you can make the right choice for your setup, then go deep on building the complete RAG pipeline inside a custom Joomla component using PostgreSQL with pgvector and OpenAI.
What you need: Joomla 4 or 5, PHP 8.1+, Composer, PostgreSQL with the pgvector extension installed, and an OpenAI API key.
What RAG Actually Does, Step by Step
Before writing any code it is worth being clear about what the pipeline actually does at each stage. RAG is one of those terms that gets thrown around loosely and the implementation details matter.
Phase 1: Indexing (runs once, then on content updates)
↓
Fetch all published Joomla articles
↓
Split long articles into chunks
↓
Send each chunk to OpenAI Embeddings API
↓
Store the chunk text and its embedding vector in PostgreSQL
Phase 2: Search (runs on every user query)
↓
User submits a search query
↓
Convert query to an embedding vector via OpenAI
↓
Find the most semantically similar chunks using pgvector
↓
Send retrieved chunks plus the original query to GPT-4o
↓
GPT-4o generates a direct answer grounded in your content
↓
Return the answer and source article links to the user
The indexing phase is the slower one and only needs to run when content changes. The search phase is what your visitors experience and it needs to be fast. Keeping those two concerns separate in the architecture makes both easier to manage.
Three Vector Storage Options for Joomla
This is the decision that shapes the rest of the implementation. There is no universally correct answer here, the right choice depends on your infrastructure, team, and content volume.
Option 1: PostgreSQL with pgvector
pgvector is an open source PostgreSQL extension that adds a native vector data type and similarity search operators. You store embeddings directly in a PostgreSQL table alongside your chunk text and metadata. Similarity search runs as a standard SQL query using the cosine distance operator.
The big advantage is that you are not adding a new infrastructure dependency. If you are already running PostgreSQL, this is just an extension install and a new table. Queries are fast, the data lives in your existing database stack, and you have full control. The limitation is that at very large scale, hundreds of thousands of chunks, you need to tune the index carefully to maintain query speed.
This is what we are building in this post. It is the right default for most Joomla sites.
Option 2: MySQL with a Vector Similarity Workaround
Joomla ships with MySQL as its default database, so this is the path of least resistance from an infrastructure standpoint. MySQL 9.0 added experimental vector support but it is not production-ready for most use cases yet. The practical workaround is to store embeddings as JSON or a serialised float array, fetch candidate chunks using a broad text filter, then do the cosine similarity calculation in PHP.
This works for small content sets, a few hundred articles. It gets slow quickly as the content volume grows because you are doing similarity math in PHP rather than in an optimised database index. If your Joomla site runs MySQL and you cannot add PostgreSQL, this is a viable starting point but plan for a migration if the search volume grows.
Option 3: External Vector Store, Pinecone or Qdrant
Pinecone and Qdrant are purpose-built vector databases. You send embeddings to their API, they handle storage and indexing, and you query them via HTTP. Both have generous free tiers for getting started.
The advantage is performance at scale and zero infrastructure management on your end. The disadvantages are an additional external dependency, data leaving your infrastructure, API rate limits, and ongoing costs that grow with your content volume. For enterprise Joomla sites with strict data residency requirements, an external service is often a non-starter.
Good fit for teams that want to move fast without managing PostgreSQL, or sites with very high search volume where a dedicated vector store makes sense operationally.
We are going with pgvector. Here is the full build.
Install pgvector and Set Up the Database Table
First, install the pgvector extension in your PostgreSQL database. If you have superuser access:
CREATE EXTENSION IF NOT EXISTS vector;
If you are on a managed PostgreSQL service like AWS RDS or Supabase, pgvector is available as an enabled extension in the console without needing superuser access.
Create the table that will store your article chunks and their embeddings. Run this in your PostgreSQL database, this is a separate database from Joomla's MySQL database:
CREATE TABLE joomla_article_embeddings (
id SERIAL PRIMARY KEY,
article_id INTEGER NOT NULL,
article_title TEXT NOT NULL,
chunk_index INTEGER NOT NULL,
chunk_text TEXT NOT NULL,
embedding vector(1536),
url TEXT,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX ON joomla_article_embeddings
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
CREATE INDEX ON joomla_article_embeddings (article_id);
The embedding dimension is 1536 because that is what OpenAI's text-embedding-3-small model outputs. If you use text-embedding-3-large instead, change this to 3072. The ivfflat index is what makes similarity search fast at scale. The lists value of 100 is a reasonable starting point, tune it upward if you have more than 100,000 chunks.
Custom Joomla Component Structure
We will build this as a custom Joomla component. Create the following structure under components/com_ragsearch:
components/com_ragsearch/
ragsearch.xml
src/
Service/
OpenAIService.php
VectorStoreService.php
ArticleChunkerService.php
RAGSearchService.php
Controller/
SearchController.php
View/
Search/
HtmlView.php
tmpl/
default.php
tmpl/
index.php
administrator/components/com_ragsearch/
src/
Controller/
IndexController.php
Install the OpenAI PHP client and a PostgreSQL driver via Composer in your Joomla root:
composer require openai-php/client
composer require doctrine/dbal
The OpenAI Service
Create src/Service/OpenAIService.php:
<?php
namespace Joomla\Component\Ragsearch\Site\Service;
use OpenAI;
class OpenAIService
{
private $client;
public function __construct()
{
$params = \JComponentHelper::getParams('com_ragsearch');
$apiKey = $params->get('openai_api_key');
$this->client = OpenAI::client($apiKey);
}
public function embed(string $text): array
{
$response = $this->client->embeddings()->create([
'model' => 'text-embedding-3-small',
'input' => $text,
]);
return $response->embeddings[0]->embedding;
}
public function embedBatch(array $texts): array
{
$response = $this->client->embeddings()->create([
'model' => 'text-embedding-3-small',
'input' => $texts,
]);
$embeddings = [];
foreach ($response->embeddings as $item) {
$embeddings[$item->index] = $item->embedding;
}
return $embeddings;
}
public function generateAnswer(string $query, array $chunks): string
{
$context = implode("\n\n---\n\n", array_column($chunks, 'chunk_text'));
$response = $this->client->chat()->create([
'model' => 'gpt-4o',
'temperature' => 0.3,
'max_tokens' => 600,
'messages' => [
[
'role' => 'system',
'content' => 'You are a helpful site assistant. Answer the user question
using only the content provided below. If the content does
not contain enough information to answer, say so honestly.
Do not make up information. Keep answers clear and concise.',
],
[
'role' => 'user',
'content' => "Content from our site:\n\n{$context}\n\nQuestion: {$query}",
],
],
]);
return $response->choices[0]->message->content;
}
}
Notice the embedBatch method. When indexing articles, sending texts in batches rather than one at a time cuts the number of API calls significantly and speeds up the indexing process. Use it during the indexing phase, use embed for single query embeddings at search time.
The Article Chunker Service
Long articles need to be split into chunks before embedding. Embedding an entire 3,000-word article as a single vector produces a representation that is too diffuse to be useful for retrieval. Smaller focused chunks give the similarity search something meaningful to match against.
Create src/Service/ArticleChunkerService.php:
<?php
namespace Joomla\Component\Ragsearch\Site\Service;
class ArticleChunkerService
{
private int $chunkSize = 400;
private int $chunkOverlap = 50;
public function chunk(string $text): array
{
// Strip HTML tags from article body
$clean = strip_tags($text);
// Normalise whitespace
$clean = preg_replace('/\s+/', ' ', $clean);
$clean = trim($clean);
$words = explode(' ', $clean);
$total = count($words);
$chunks = [];
$start = 0;
while ($start < $total) {
$end = min($start + $this->chunkSize, $total);
$chunkWords = array_slice($words, $start, $end - $start);
$chunks[] = implode(' ', $chunkWords);
// Move forward by chunkSize minus overlap
// so consecutive chunks share some context
$start += ($this->chunkSize - $this->chunkOverlap);
if ($start >= $total) {
break;
}
}
return array_filter($chunks, fn($c) => strlen(trim($c)) > 50);
}
}
The overlap between chunks matters more than it might seem. If a key sentence sits right at the boundary between two chunks, without overlap it gets split in half and neither chunk represents that idea well. A 50-word overlap means boundary content appears in both adjacent chunks, so the similarity search is more likely to retrieve it when it is relevant.
The Vector Store Service
Create src/Service/VectorStoreService.php:
<?php
namespace Joomla\Component\Ragsearch\Site\Service;
use Doctrine\DBAL\DriverManager;
class VectorStoreService
{
private $conn;
public function __construct()
{
$params = \JComponentHelper::getParams('com_ragsearch');
$this->conn = DriverManager::getConnection([
'dbname' => $params->get('pg_database'),
'user' => $params->get('pg_user'),
'password' => $params->get('pg_password'),
'host' => $params->get('pg_host', 'localhost'),
'port' => $params->get('pg_port', 5432),
'driver' => 'pdo_pgsql',
]);
}
public function upsertChunk(
int $articleId,
string $title,
int $chunkIndex,
string $chunkText,
array $embedding,
string $url
): void {
// Delete existing chunks for this article and index first
$this->conn->executeStatement(
'DELETE FROM joomla_article_embeddings
WHERE article_id = :id AND chunk_index = :idx',
['id' => $articleId, 'idx' => $chunkIndex]
);
$vectorLiteral = '[' . implode(',', $embedding) . ']';
$this->conn->executeStatement(
'INSERT INTO joomla_article_embeddings
(article_id, article_title, chunk_index, chunk_text, embedding, url)
VALUES
(:article_id, :title, :chunk_index, :chunk_text, :embedding, :url)',
[
'article_id' => $articleId,
'title' => $title,
'chunk_index' => $chunkIndex,
'chunk_text' => $chunkText,
'embedding' => $vectorLiteral,
'url' => $url,
]
);
}
public function similaritySearch(array $queryEmbedding, int $topK = 5): array
{
$vectorLiteral = '[' . implode(',', $queryEmbedding) . ']';
$sql = "SELECT
article_id,
article_title,
chunk_text,
url,
1 - (embedding <=> :embedding::vector) AS similarity
FROM joomla_article_embeddings
ORDER BY embedding <=> :embedding::vector
LIMIT :limit";
$stmt = $this->conn->executeQuery(
$sql,
[
'embedding' => $vectorLiteral,
'limit' => $topK,
]
);
return $stmt->fetchAllAssociative();
}
public function deleteArticle(int $articleId): void
{
$this->conn->executeStatement(
'DELETE FROM joomla_article_embeddings WHERE article_id = :id',
['id' => $articleId]
);
}
}
The <=> operator is pgvector's cosine distance operator. Cosine distance measures the angle between two vectors rather than the straight-line distance between them, which works better for text embeddings because it focuses on direction, meaning, rather than magnitude. The similarity score in the SELECT is calculated as 1 - cosine_distance, so a score of 1.0 is a perfect match and 0.0 is completely unrelated.
The Indexing Controller
This runs from the Joomla administrator backend. It fetches all published articles, chunks them, embeds them in batches, and stores everything in PostgreSQL. You run this once to build the initial index and then on a schedule or via a hook when articles are updated.
Create administrator/components/com_ragsearch/src/Controller/IndexController.php:
<?php
namespace Joomla\Component\Ragsearch\Administrator\Controller;
use Joomla\CMS\MVC\Controller\BaseController;
use Joomla\Component\Ragsearch\Site\Service\OpenAIService;
use Joomla\Component\Ragsearch\Site\Service\VectorStoreService;
use Joomla\Component\Ragsearch\Site\Service\ArticleChunkerService;
class IndexController extends BaseController
{
private int $batchSize = 20;
public function build(): void
{
$db = $this->app->getDatabase();
$chunker = new ArticleChunkerService();
$openai = new OpenAIService();
$store = new VectorStoreService();
// Fetch all published Joomla articles
$query = $db->getQuery(true)
->select(['a.id', 'a.title', 'a.introtext', 'a.fulltext'])
->from($db->quoteName('#__content', 'a'))
->where($db->quoteName('a.state') . ' = 1');
$articles = $db->setQuery($query)->loadObjectList();
$indexed = 0;
$errors = 0;
foreach ($articles as $article) {
try {
$fullContent = $article->title . "\n\n"
. strip_tags($article->introtext) . "\n\n"
. strip_tags($article->fulltext);
$chunks = $chunker->chunk($fullContent);
if (empty($chunks)) {
continue;
}
$url = \JRoute::_(
'index.php?option=com_content&view=article&id=' . $article->id
);
// Delete old embeddings for this article before re-indexing
$store->deleteArticle($article->id);
// Process chunks in batches to reduce API calls
$chunkBatches = array_chunk($chunks, $this->batchSize);
foreach ($chunkBatches as $batchIndex => $batch) {
$embeddings = $openai->embedBatch($batch);
foreach ($batch as $i => $chunkText) {
$globalIndex = ($batchIndex * $this->batchSize) + $i;
$embedding = $embeddings[$i] ?? null;
if (!$embedding) {
continue;
}
$store->upsertChunk(
$article->id,
$article->title,
$globalIndex,
$chunkText,
$embedding,
$url
);
}
// Small pause between batches to stay within API rate limits
usleep(200000);
}
$indexed++;
} catch (\Exception $e) {
$errors++;
\JLog::add(
'RAG indexing failed for article ' . $article->id . ': ' . $e->getMessage(),
\JLog::ERROR,
'com_ragsearch'
);
}
}
$this->app->enqueueMessage(
"Indexing complete. Articles indexed: {$indexed}. Errors: {$errors}.",
$errors > 0 ? 'warning' : 'success'
);
$this->setRedirect('index.php?option=com_ragsearch');
}
}
The RAG Search Service
Create src/Service/RAGSearchService.php:
<?php
namespace Joomla\Component\Ragsearch\Site\Service;
class RAGSearchService
{
public function __construct(
private OpenAIService $openai,
private VectorStoreService $store
) {}
public function search(string $query): array
{
if (strlen(trim($query)) < 3) {
return [
'answer' => 'Please enter a more specific question.',
'sources' => [],
];
}
// Convert the query to a vector embedding
$queryEmbedding = $this->openai->embed($query);
// Find the most semantically similar chunks
$chunks = $this->store->similaritySearch($queryEmbedding, topK: 5);
if (empty($chunks)) {
return [
'answer' => 'No relevant content found for your query. Try rephrasing your question.',
'sources' => [],
];
}
// Filter out low-similarity results
$relevantChunks = array_filter(
$chunks,
fn($c) => ($c['similarity'] ?? 0) > 0.75
);
if (empty($relevantChunks)) {
return [
'answer' => 'I could not find content closely matching your question. Please try different keywords.',
'sources' => [],
];
}
// Generate a direct answer grounded in the retrieved chunks
$answer = $this->openai->generateAnswer($query, $relevantChunks);
// Deduplicate sources by article ID
$sources = [];
foreach ($relevantChunks as $chunk) {
$aid = $chunk['article_id'];
if (!isset($sources[$aid])) {
$sources[$aid] = [
'title' => $chunk['article_title'],
'url' => $chunk['url'],
];
}
}
return [
'answer' => $answer,
'sources' => array_values($sources),
];
}
}
The similarity threshold of 0.75 is worth paying attention to. Below that score the retrieved chunks are probably not relevant enough to be useful for generating an answer. You can adjust this up or down depending on how your content is structured and how specific the queries on your site tend to be. Start at 0.75 and tune based on real search results.
The Search Controller and View
Create src/Controller/SearchController.php:
<?php
namespace Joomla\Component\Ragsearch\Site\Controller;
use Joomla\CMS\MVC\Controller\BaseController;
use Joomla\Component\Ragsearch\Site\Service\OpenAIService;
use Joomla\Component\Ragsearch\Site\Service\VectorStoreService;
use Joomla\Component\Ragsearch\Site\Service\RAGSearchService;
class SearchController extends BaseController
{
public function search(): void
{
$query = trim($this->input->getString('q', ''));
$result = ['answer' => '', 'sources' => [], 'query' => $query];
if (!empty($query)) {
try {
$service = new RAGSearchService(
new OpenAIService(),
new VectorStoreService()
);
$result = array_merge($result, $service->search($query));
} catch (\Exception $e) {
$result['answer'] = 'Search is temporarily unavailable. Please try again shortly.';
\JLog::add('RAG search error: ' . $e->getMessage(), \JLog::ERROR, 'com_ragsearch');
}
}
$this->app->setUserState('com_ragsearch.result', $result);
$this->setRedirect(\JRoute::_('index.php?option=com_ragsearch&view=search'));
}
}
The Blade-equivalent Joomla view template at src/View/Search/tmpl/default.php:
<?php defined('_JEXEC') or die; ?>
<div class="rag-search">
<form method="POST" action="<?php echo JRoute::_('index.php?option=com_ragsearch&task=search.search'); ?>">
<?php echo JHtml::_('form.token'); ?>
<input type="text"
name="q"
value="<?php echo htmlspecialchars($this->result['query'] ?? ''); ?>"
placeholder="Ask anything about our site..."
autocomplete="off">
<button type="submit">Search</button>
</form>
<?php if (!empty($this->result['answer'])) : ?>
<div class="rag-answer">
<h3>Answer</h3>
<p><?php echo nl2br(htmlspecialchars($this->result['answer'])); ?></p>
</div>
<?php if (!empty($this->result['sources'])) : ?>
<div class="rag-sources">
<h4>Sources</h4>
<ul>
<?php foreach ($this->result['sources'] as $source) : ?>
<li>
<a href="<?php echo htmlspecialchars($source['url']); ?>">
<?php echo htmlspecialchars($source['title']); ?>
</a>
</li>
<?php endforeach; ?>
</ul>
</div>
<?php endif; ?>
<?php endif; ?>
</div>
Keeping the Index fresh
The index goes stale the moment an article is updated and not re-indexed. There are two clean ways to handle this in Joomla.
The first is a Joomla plugin that hooks into onContentAfterSave and triggers re-indexing for the saved article specifically. This keeps the index fresh in real time but adds latency to every article save operation.
<?php
class PlgContentRagsearchIndex extends JPlugin
{
public function onContentAfterSave(
string $context,
object $article,
bool $isNew
): void {
if ($context !== 'com_content.article') {
return;
}
if ((int) $article->state !== 1) {
return;
}
// Dispatch a Joomla queue task instead of indexing synchronously
// to avoid blocking the article save response
\Joomla\CMS\Queue\QueueFacade::push('ragsearch.index_article', [
'article_id' => $article->id,
]);
}
}
The second approach is a scheduled CLI task that re-indexes all articles on a schedule, say every hour or every night. For sites where content does not change frequently, a nightly re-index via Joomla's task scheduler is simpler and puts zero overhead on the save operation.
For most sites the scheduled approach is the right default. Use the plugin approach only if your content changes continuously throughout the day and freshness matters within minutes.
What this looks like for a Real Visitor
Here is a concrete example. Say your Joomla site has articles about software products and a visitor types: "what happens if I cancel my subscription mid-month?"
Your articles probably use phrases like "pro-rata refund policy", "billing cycle", "account downgrade", not the exact words the visitor used. Keyword search returns nothing. The RAG pipeline converts the query to a vector, finds three chunks from your billing and account articles that are semantically close to that question, feeds them to GPT-4o, and returns something like:
"If you cancel mid-month, your account remains active until the end of your current billing period. You will not be charged for the following month. Refunds for unused days are not issued automatically but can be requested within 7 days of cancellation by contacting support."
Below the answer, the visitor sees links to the two source articles that information came from. They got a direct answer, they can read the full policy if they want to, and they did not have to trawl through search results guessing which article might be relevant.
A Few things to know before Go Live
The embedding cost for the initial indexing run is usually smaller than people expect. A site with 500 articles at 400 words each, split into chunks of 400 words with 50-word overlap, produces roughly 600 to 700 chunks. At OpenAI's current pricing for text-embedding-3-small, that initial index costs well under a dollar. Ongoing costs per search query are minimal, one embedding call per query.
Caching search results is worth adding early. Many visitors on the same site ask very similar questions. Store recent query-answer pairs in Joomla's cache layer with a TTL of a few hours. The cache hit rate on popular queries tends to be high and it cuts both API costs and response time meaningfully.
Finally, keep an eye on what people actually search for. Log the queries, log whether the similarity search returned results above the threshold, and log whether users clicked the source links. After a few weeks you will see which questions the pipeline handles well and which ones consistently miss. That data tells you whether your chunking strategy needs adjusting, whether your similarity threshold is set correctly, and whether there are content gaps on your site worth addressing.
The RAG pattern is one of the most practically useful things you can add to a content-heavy Joomla site. It turns a search box that frustrates visitors into one that actually helps them find what they need, in their own words, without requiring your content to match their exact phrasing.
Drupal and LangChain: Building Multi-Step AI Pipelines for Enterprise CMS
Enterprise content teams have a problem that does not get talked about enough. It is not producing content, most large organisations have plenty of that. The problem is what happens to content before it gets published. Review queues that stretch for days, moderation bottlenecks where one editor is the single point of failure, policy checks that get skipped under deadline pressure, and taxonomy tagging that is inconsistent across a team of twenty people all making their own judgment calls.
I worked with an enterprise Drupal site last year that had over 3,000 pieces of content sitting in a moderation queue at any given time. Four editors, no automation, no triage. Good content was getting buried under low-quality submissions and the editors were spending most of their time on mechanical checks rather than actual editorial judgment.
What they needed was a multi-step AI pipeline sitting between content submission and human review. Something that could screen content automatically, flag policy violations, suggest taxonomy terms, score quality, and route content to the right reviewer based on what it found. That is what this post is about.
We will cover how LangChain fits into a Drupal architecture, the honest tradeoffs between the Python, JavaScript, and PHP approaches, and then go deep on building the full pipeline in PHP inside a custom Drupal module.
Why LangChain matters here
LangChain is a framework for building applications that chain multiple AI calls together, each step taking the output of the previous one as its input. Instead of sending one big prompt to an LLM and hoping for a good result, you break the problem into focused steps. One step checks for policy violations. The next scores content quality. The next suggests taxonomy terms. The next decides where to route the content for review. Each step does one thing well.
The reason this matters for an enterprise CMS is that single-prompt AI approaches get inconsistent quickly when content is varied and complex. A one-shot prompt that tries to check policy compliance, assess quality, tag taxonomy, and make a routing decision all at once tends to produce mediocre results across all four. Breaking it into a chain where each step is focused produces meaningfully better output, and more importantly, it makes each step auditable. You can see exactly where the pipeline flagged something and why.
LangChain was originally built in Python, which is where it is most mature. A JavaScript version called LangChain.js followed. There is no official PHP version, which creates an interesting architecture question for Drupal teams.
Three ways to use LangChain with Drupal
Before picking an approach, it is worth understanding what each option actually involves in practice. I have seen teams choose the wrong one based on familiarity rather than fit, and it costs them later.
Option 1: Python LangChain as a Separate Microservice
You build a small Python FastAPI or Flask service that runs LangChain pipelines. Drupal calls this service via HTTP when content needs processing and receives structured JSON back. The pipeline logic lives entirely in Python, Drupal just sends content and handles the response.
This is the most powerful option because you get the full LangChain Python ecosystem, including document loaders, vector stores, agents, and memory. The tradeoff is operational complexity. You are now running and maintaining two separate services, Python and PHP, and your team needs to be comfortable in both.
Good fit for: teams with Python expertise already on staff, complex pipelines that need LangChain agents or vector retrieval, and organisations with proper infrastructure for running multiple services.
Option 2: LangChain.js via a Node.js Microservice
Similar architecture to Option 1 but the sidecar service runs Node.js with LangChain.js instead of Python. Drupal calls it the same way via HTTP. LangChain.js has caught up significantly to the Python version in recent versions and covers most common pipeline patterns.
The advantage over Python is that JavaScript is more widely known across web development teams. The disadvantage is that LangChain.js still lags behind Python on some advanced features, and you still have the same two-service operational overhead.
Good fit for: teams with frontend JavaScript experience who want to avoid Python, simpler pipeline patterns, and organisations already running Node.js services.
Option 3: PHP Pipeline Mimicking LangChain Patterns (What We Are Building)
You implement the same chaining concepts directly in PHP using the OpenAI PHP client, no LangChain library involved. Each step in the pipeline is a focused PHP class. They chain together through a Pipeline orchestrator. The output of each step feeds into the next.
This approach keeps everything inside Drupal, no additional services to deploy or maintain, no cross-language boundaries, no HTTP overhead between steps. The tradeoff is that you implement the chaining logic yourself rather than using a ready-made framework.
Honestly, for most enterprise Drupal use cases this is the right call. The LangChain library provides a lot of features you will not need for a content moderation pipeline. What you need is the chaining pattern, structured prompts, and reliable JSON outputs, and all of that is straightforward to implement in PHP.
Good fit for: Drupal teams without Python or Node.js expertise, pipelines that do not require vector retrieval or complex agents, and organisations that want the full pipeline inside their existing Drupal infrastructure.
That is the option we are going deep on. Here is what we are building.
The Pipeline We Are Building
Four steps, each focused on one job:
Content submitted to Drupal
↓
Step 1: Policy Compliance Check
Does the content violate any publishing policies?
Output: pass / flag / reject + reason
↓
Step 2: Quality Assessment
Is the content well-written, complete, and suitable for publishing?
Output: quality score 1-10 + specific feedback
↓
Step 3: Taxonomy Suggestion
What terms should be applied to this content?
Output: suggested taxonomy terms with confidence scores
↓
Step 4: Routing Decision
Based on the above, where should this content go?
Output: auto-approve / send to editor / send to senior editor / reject
↓
Content routed to correct moderation state in Drupal
Each step receives the original content plus the outputs of all previous steps. By the time Step 4 runs, it has the policy check result, the quality score, and the taxonomy suggestions available to inform its routing decision. That context is what makes the routing intelligent rather than mechanical.
Setting up the custom Drupal module
We will build this as a custom Drupal module. Create the module structure:
modules/custom/ai_content_pipeline/
ai_content_pipeline.info.yml
ai_content_pipeline.services.yml
ai_content_pipeline.module
src/
Pipeline/
ContentModerationPipeline.php
Steps/
PolicyComplianceStep.php
QualityAssessmentStep.php
TaxonomySuggestionStep.php
RoutingDecisionStep.php
Contracts/
PipelineStepInterface.php
Service/
OpenAIService.php
The ai_content_pipeline.info.yml:
name: 'AI Content Pipeline'
type: module
description: 'Multi-step AI pipeline for intelligent content moderation'
core_version_requirement: ^10 || ^11
package: Custom
dependencies:
- drupal:node
Install the OpenAI PHP client via Composer in your Drupal project root:
composer require openai-php/client
Step 1: The Pipeline step interface
Every step in the pipeline implements this interface. It enforces a consistent contract across all steps, which makes the pipeline orchestrator simple to write and easy to extend with new steps later.
Create src/Contracts/PipelineStepInterface.php:
<?php
namespace Drupal\ai_content_pipeline\Contracts;
interface PipelineStepInterface
{
/**
* Execute this pipeline step.
*
* @param string $content The original content being processed.
* @param array $context Results from all previous steps.
*
* @return array Results from this step to pass forward.
*/
public function execute(string $content, array $context): array;
/**
* Human-readable name for this step, used in logging.
*/
public function name(): string;
}
Step 2: The OpenAI Service
Create src/Service/OpenAIService.php:
<?php
namespace Drupal\ai_content_pipeline\Service;
use OpenAI;
class OpenAIService
{
private $client;
public function __construct()
{
$api_key = \Drupal::config('ai_content_pipeline.settings')->get('openai_api_key');
$this->client = OpenAI::client($api_key);
}
public function chat(string $systemPrompt, string $userMessage): string
{
$response = $this->client->chat()->create([
'model' => 'gpt-4o',
'temperature' => 0.2,
'messages' => [
['role' => 'system', 'content' => $systemPrompt],
['role' => 'user', 'content' => $userMessage],
],
]);
return $response->choices[0]->message->content;
}
public function parseJson(string $raw): array
{
$clean = preg_replace('/^```json\s*/i', '', trim($raw));
$clean = preg_replace('/```$/', '', trim($clean));
$data = json_decode(trim($clean), true);
if (json_last_error() !== JSON_ERROR_NONE) {
return ['error' => 'JSON parse failed', 'raw' => $raw];
}
return $data;
}
}
The temperature is set to 0.2, lower than you might expect. For pipeline steps that are making structured decisions, you want as little creative variance as possible. The model should be analytical, not inventive.
Step 3: The Policy Compliance step
Create src/Steps/PolicyComplianceStep.php:
<?php
namespace Drupal\ai_content_pipeline\Steps;
use Drupal\ai_content_pipeline\Contracts\PipelineStepInterface;
use Drupal\ai_content_pipeline\Service\OpenAIService;
class PolicyComplianceStep implements PipelineStepInterface
{
public function __construct(private OpenAIService $ai) {}
public function name(): string
{
return 'Policy Compliance Check';
}
public function execute(string $content, array $context): array
{
$system = 'You are a content policy compliance reviewer for an enterprise CMS.
Review content against publishing policies and return JSON only.
No markdown, no explanation outside the JSON object.';
$prompt = <<<PROMPT
Review the following content against these publishing policies:
1. No hate speech, discrimination, or offensive language targeting any group.
2. No unverified factual claims presented as established fact.
3. No promotional or advertorial content disguised as editorial.
4. No personally identifiable information about private individuals.
5. No content that could create legal liability (defamation, copyright issues).
Return a JSON object with:
- "status": one of "pass", "flag", or "reject"
- "violations": array of specific violations found, empty array if none
- "reason": brief explanation of the status decision
Content to review:
{$content}
PROMPT;
$raw = $this->ai->chat($system, $prompt);
$result = $this->ai->parseJson($raw);
return [
'policy' => $result,
];
}
}
Step 4: The Quality Assessment step
Create src/Steps/QualityAssessmentStep.php:
<?php
namespace Drupal\ai_content_pipeline\Steps;
use Drupal\ai_content_pipeline\Contracts\PipelineStepInterface;
use Drupal\ai_content_pipeline\Service\OpenAIService;
class QualityAssessmentStep implements PipelineStepInterface
{
public function __construct(private OpenAIService $ai) {}
public function name(): string
{
return 'Quality Assessment';
}
public function execute(string $content, array $context): array
{
$policyStatus = $context['policy']['status'] ?? 'unknown';
$system = 'You are a senior editorial quality reviewer for an enterprise CMS.
Assess content quality objectively and return JSON only.';
$prompt = <<<PROMPT
Assess the quality of the following content. Consider:
- Clarity and readability for a general professional audience
- Completeness: does it cover the topic adequately?
- Structure: is it well organised with a logical flow?
- Accuracy indicators: does it make claims without apparent support?
- Tone: is it appropriate for professional publication?
Note: Policy compliance status from previous check is "{$policyStatus}".
Return a JSON object with:
- "score": integer from 1 to 10
- "strengths": array of what the content does well
- "weaknesses": array of specific quality issues found
- "publishable": boolean, true if quality is sufficient for publication
Content:
{$content}
PROMPT;
$raw = $this->ai->chat($system, $prompt);
$result = $this->ai->parseJson($raw);
return [
'quality' => $result,
];
}
}
Step 5: The Taxonomy Suggestion step
Create src/Steps/TaxonomySuggestionStep.php:
<?php
namespace Drupal\ai_content_pipeline\Steps;
use Drupal\ai_content_pipeline\Contracts\PipelineStepInterface;
use Drupal\ai_content_pipeline\Service\OpenAIService;
class TaxonomySuggestionStep implements PipelineStepInterface
{
private array $availableTerms = [
'topics' => ['Technology', 'Business', 'Health', 'Finance', 'Policy', 'Research', 'Opinion'],
'audience' => ['General', 'Technical', 'Executive', 'Academic'],
'content_type' => ['Analysis', 'News', 'Tutorial', 'Case Study', 'Interview', 'Report'],
];
public function __construct(private OpenAIService $ai) {}
public function name(): string
{
return 'Taxonomy Suggestion';
}
public function execute(string $content, array $context): array
{
$termsJson = json_encode($this->availableTerms);
$system = 'You are a content taxonomy specialist for an enterprise CMS.
Suggest appropriate taxonomy terms and return JSON only.';
$prompt = <<<PROMPT
Suggest taxonomy terms for the following content.
Only suggest terms from the available taxonomy list provided.
Available taxonomy terms:
{$termsJson}
Return a JSON object with:
- "suggestions": object with vocabulary names as keys, each containing:
- "terms": array of suggested term names from the available list
- "confidence": "high", "medium", or "low"
- "primary_topic": the single most relevant topic term
Content:
{$content}
PROMPT;
$raw = $this->ai->chat($system, $prompt);
$result = $this->ai->parseJson($raw);
return [
'taxonomy' => $result,
];
}
}
In a real deployment, replace the hardcoded $availableTerms array with a dynamic lookup from your Drupal taxonomy vocabularies. You can load terms using Drupal's entity query system and pass the full list to the prompt.
Step 6: The Routing Decision step
This is where the pipeline pays off. By the time this step runs, it has the policy result, quality score, and taxonomy confidence from the previous three steps. The routing decision is genuinely informed rather than based on a single signal.
Create src/Steps/RoutingDecisionStep.php:
<?php
namespace Drupal\ai_content_pipeline\Steps;
use Drupal\ai_content_pipeline\Contracts\PipelineStepInterface;
use Drupal\ai_content_pipeline\Service\OpenAIService;
class RoutingDecisionStep implements PipelineStepInterface
{
public function __construct(private OpenAIService $ai) {}
public function name(): string
{
return 'Routing Decision';
}
public function execute(string $content, array $context): array
{
$policyStatus = $context['policy']['status'] ?? 'unknown';
$policyReason = $context['policy']['reason'] ?? '';
$qualityScore = $context['quality']['score'] ?? 0;
$publishable = $context['quality']['publishable'] ?? false;
$weaknesses = json_encode($context['quality']['weaknesses'] ?? []);
$violations = json_encode($context['policy']['violations'] ?? []);
$system = 'You are a content workflow manager for an enterprise CMS.
Make routing decisions based on pipeline analysis results.
Return JSON only.';
$prompt = <<<PROMPT
Based on the pipeline analysis below, decide how this content should be routed.
Pipeline results:
- Policy status: {$policyStatus}
- Policy reason: {$policyReason}
- Policy violations: {$violations}
- Quality score: {$qualityScore} / 10
- Publishable assessment: {$publishable}
- Quality weaknesses: {$weaknesses}
Routing options:
- "auto_approve": policy passed, quality score 9-10, no issues
- "editor_review": policy passed, quality score 6-8, minor issues only
- "senior_editor_review": policy flagged or quality score 4-5, needs experienced judgment
- "reject": policy status is reject, or quality score below 4
Return a JSON object with:
- "decision": one of the four routing options above
- "reason": clear explanation of why this routing was chosen
- "reviewer_notes": specific things the human reviewer should check, as an array
- "priority": "high", "normal", or "low"
PROMPT;
$raw = $this->ai->chat($system, $prompt);
$result = $this->ai->parseJson($raw);
return [
'routing' => $result,
];
}
}
Step 7: The Pipeline Orchestrator
This is the class that wires everything together. It runs each step in sequence, collects the context, handles failures gracefully, and returns the full pipeline result.
Create src/Pipeline/ContentModerationPipeline.php:
<?php
namespace Drupal\ai_content_pipeline\Pipeline;
use Drupal\ai_content_pipeline\Contracts\PipelineStepInterface;
use Drupal\Core\Logger\LoggerChannelFactoryInterface;
class ContentModerationPipeline
{
private array $steps = [];
private $logger;
public function __construct(LoggerChannelFactoryInterface $loggerFactory)
{
$this->logger = $loggerFactory->get('ai_content_pipeline');
}
public function addStep(PipelineStepInterface $step): self
{
$this->steps[] = $step;
return $this;
}
public function run(string $content): array
{
$context = [];
$stepLog = [];
$startTime = microtime(true);
foreach ($this->steps as $step) {
$stepName = $step->name();
$stepStart = microtime(true);
try {
$result = $step->execute($content, $context);
$context = array_merge($context, $result);
$stepLog[] = [
'step' => $stepName,
'status' => 'completed',
'duration' => round(microtime(true) - $stepStart, 2) . 's',
];
$this->logger->info('Pipeline step completed: @step', ['@step' => $stepName]);
} catch (\Exception $e) {
$this->logger->error('Pipeline step failed: @step, Error: @error', [
'@step' => $stepName,
'@error' => $e->getMessage(),
]);
$stepLog[] = [
'step' => $stepName,
'status' => 'failed',
'error' => $e->getMessage(),
];
// On failure, route to senior editor for manual review
$context['routing'] = [
'decision' => 'senior_editor_review',
'reason' => "Pipeline step '{$stepName}' failed. Manual review required.",
'reviewer_notes' => ['Pipeline encountered an error, please review manually.'],
'priority' => 'high',
];
break;
}
}
return [
'context' => $context,
'steps' => $stepLog,
'total_duration' => round(microtime(true) - $startTime, 2) . 's',
];
}
}
Step 8: Wiring it into Drupal's Moderation workflow
Now we connect the pipeline to Drupal's content workflow. This hook fires when a node is presaved, runs the pipeline, and applies the routing decision as a moderation state.
In ai_content_pipeline.module:
<?php
use Drupal\node\NodeInterface;
use Drupal\ai_content_pipeline\Service\OpenAIService;
use Drupal\ai_content_pipeline\Pipeline\ContentModerationPipeline;
use Drupal\ai_content_pipeline\Steps\PolicyComplianceStep;
use Drupal\ai_content_pipeline\Steps\QualityAssessmentStep;
use Drupal\ai_content_pipeline\Steps\TaxonomySuggestionStep;
use Drupal\ai_content_pipeline\Steps\RoutingDecisionStep;
function ai_content_pipeline_node_presave(NodeInterface $node): void
{
// Only run on new nodes that are pending moderation
if (!$node->isNew()) {
return;
}
$content = $node->getTitle() . "\n\n" . $node->get('body')->value;
if (empty(trim($content))) {
return;
}
$ai = new OpenAIService();
$pipeline = new ContentModerationPipeline(\Drupal::service('logger.factory'));
$pipeline
->addStep(new PolicyComplianceStep($ai))
->addStep(new QualityAssessmentStep($ai))
->addStep(new TaxonomySuggestionStep($ai))
->addStep(new RoutingDecisionStep($ai));
$result = $pipeline->run($content);
$routing = $result['context']['routing'] ?? null;
if (!$routing) {
return;
}
// Map routing decision to Drupal moderation states
$stateMap = [
'auto_approve' => 'published',
'editor_review' => 'needs_review',
'senior_editor_review' => 'needs_review',
'reject' => 'rejected',
];
$decision = $routing['decision'] ?? 'editor_review';
$state = $stateMap[$decision] ?? 'needs_review';
if ($node->hasField('moderation_state')) {
$node->set('moderation_state', $state);
}
// Store pipeline results in a field for reviewer reference
if ($node->hasField('field_ai_review_notes')) {
$notes = "Routing: {$decision}\n";
$notes .= "Reason: {$routing['reason']}\n\n";
$notes .= "Reviewer notes:\n" . implode("\n", $routing['reviewer_notes'] ?? []);
$node->set('field_ai_review_notes', $notes);
}
}
What the Pipeline output looks like in practice
Here is a realistic example of the full pipeline result for a piece of content that passed policy checks but had quality issues. This is what your editors would see in the review notes field.
{
"context": {
"policy": {
"status": "pass",
"violations": [],
"reason": "Content meets all publishing policy requirements."
},
"quality": {
"score": 6,
"publishable": true,
"strengths": ["Clear headline", "Good factual grounding"],
"weaknesses": ["Conclusion is abrupt and underdeveloped", "Second section lacks supporting evidence"]
},
"taxonomy": {
"suggestions": {
"topics": { "terms": ["Technology", "Business"], "confidence": "high" },
"audience": { "terms": ["Executive"], "confidence": "medium" },
"content_type": { "terms": ["Analysis"], "confidence": "high" }
},
"primary_topic": "Technology"
},
"routing": {
"decision": "editor_review",
"reason": "Policy passed but quality score of 6 indicates minor issues that need editorial attention before publication.",
"reviewer_notes": [
"Strengthen the conclusion, currently ends abruptly",
"Add supporting evidence or sources to the second section",
"Taxonomy auto-applied, verify the Executive audience tag is correct"
],
"priority": "normal"
}
},
"steps": [
{ "step": "Policy Compliance Check", "status": "completed", "duration": "1.8s" },
{ "step": "Quality Assessment", "status": "completed", "duration": "2.1s" },
{ "step": "Taxonomy Suggestion", "status": "completed", "duration": "1.6s" },
{ "step": "Routing Decision", "status": "completed", "duration": "1.4s" }
],
"total_duration": "6.9s"
}
The reviewer opens the content, sees it has been routed to them with a quality score of 6, reads the specific reviewer notes, and knows exactly what to look at. No need to read the whole piece from scratch looking for problems. That is the practical value here.
Things worth knowing before you deploy
Seven seconds of pipeline processing on every content submission is not acceptable for a synchronous save operation. Move the pipeline into a queued job that fires after the initial save, using Drupal's Queue API or a custom queue worker. Store a "pending AI review" state that content sits in while the pipeline runs, then update the moderation state when the job completes.
The system prompts in each step are where the real customisation happens. The policy step above uses generic rules, but for a real enterprise deployment you would replace those with your organisation's actual editorial policies, pulled from a config form or a dedicated policy content type in Drupal itself. That way non-technical editors can update the policy rules without touching code.
On cost, four GPT-4o calls per content submission adds up across a high-volume site. For content that does not need the full pipeline, like very short pieces or resubmissions, consider a lighter first-pass check using GPT-4o-mini before deciding whether to run the full chain. The classification step costs a fraction of the full pipeline and can filter out a significant portion of submissions early.
Finally, keep the pipeline results. Store each step's output against the content revision in a custom table or a long text field. After a few months you will have data on what the pipeline flags most often, how accurate the routing decisions are, and where editors are overriding the AI recommendations. That feedback loop is what lets you improve the system prompts over time and actually measure whether the pipeline is helping.
The pattern here, focused steps, structured JSON outputs, full context passed forward, graceful failure handling, is the same pattern LangChain formalises in its framework. Building it directly in PHP means you keep it inside your existing Drupal infrastructure with no additional services to run. For most enterprise Drupal teams, that is the right tradeoff.
Build a WhatsApp AI Assistant Using Laravel, Twilio and OpenAI
A few months ago a client came to us with a pretty common problem. Their support team was spending most of the day answering the same twenty questions over and over. Shipping times, return policies, order status, payment methods. The questions were predictable. The answers were documented. But every single one still needed a human to respond.
They were already using WhatsApp for customer communication, so the ask was simple: can we put something intelligent on that channel so the team can focus on the cases that actually need them? That is how we ended up building a WhatsApp AI assistant using Laravel, Twilio, and OpenAI, and it is exactly what this post covers.
By the end you will have a working bot that receives WhatsApp messages through a Twilio webhook, maintains conversation memory per customer so context carries across messages, and uses OpenAI to generate replies that sound like a real support agent. The whole thing runs on standard Laravel, no exotic packages.
What you need:
- Laravel 10 or 11
- Twilio account with WhatsApp sandbox access
- OpenAI API key
- Publicly accessible URL for your webhook
If you are working locally, ngrok handles that last part cleanly.
How the system works before we write any code
It is worth spending a minute on the architecture before jumping in. When a customer sends a WhatsApp message, Twilio receives it and forwards it to your webhook URL as an HTTP POST request. Laravel handles that request, pulls the customer's conversation history from cache, appends the new message, sends the full context to OpenAI, gets a reply, stores the updated history back in cache, and sends the response back to Twilio which delivers it to WhatsApp.
Customer sends WhatsApp message
↓
Twilio receives it and POSTs to your Laravel webhook
↓
Laravel pulls conversation history from Cache
↓
Appends new message to history
↓
Sends full conversation context to OpenAI
↓
OpenAI returns a support reply
↓
Laravel stores updated history in Cache
↓
Laravel responds with TwiML so Twilio delivers the message
↓
Customer receives the reply on WhatsApp
The conversation memory is the part most tutorials skip. Without it, every message the customer sends is treated as a brand new conversation. The bot has no idea what was just discussed. That makes for a frustrating experience, especially in support scenarios where context matters a lot.
Step 1: Install Laravel and required packages
composer create-project laravel/laravel whatsapp-ai-assistant
cd whatsapp-ai-assistant
composer require openai-php/laravel twilio/sdk
Publish the OpenAI config:
php artisan vendor:publish --provider="OpenAI\Laravel\ServiceProvider"
Add your credentials to .env:
OPENAI_API_KEY=sk-your-openai-key-here
TWILIO_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TWILIO_AUTH_TOKEN=your-auth-token-here
TWILIO_WHATSAPP_FROM=whatsapp:+14155238886
The number in TWILIO_WHATSAPP_FROM is Twilio's shared WhatsApp sandbox number. Once you go to production and get a dedicated number approved by WhatsApp, you update it there.
Add the Twilio values to config/services.php so you can access them cleanly throughout the app:
'twilio' => [
'sid' => env('TWILIO_SID'),
'auth_token' => env('TWILIO_AUTH_TOKEN'),
'from' => env('TWILIO_WHATSAPP_FROM'),
],
Step 2: The Conversation Memory Service
This is the part that makes the bot actually useful in a support context. Each customer gets their own conversation history stored in Laravel Cache, keyed by their WhatsApp number. Every time they send a message, we load their history, add the new message, send the whole thing to OpenAI, then save the updated history back.
Create app/Services/ConversationMemoryService.php:
<?php
namespace App\Services;
use Illuminate\Support\Facades\Cache;
class ConversationMemoryService
{
private int $maxMessages = 20;
private int $ttlMinutes = 60;
/**
* Get conversation history for a given WhatsApp number.
*/
public function getHistory(string $phone): array
{
return Cache::get($this->key($phone), []);
}
/**
* Append a new message to the conversation history.
*/
public function addMessage(string $phone, string $role, string $content): void
{
$history = $this->getHistory($phone);
$history[] = [
'role' => $role,
'content' => $content,
];
// Keep history trimmed so we do not blow the context window
if (count($history) > $this->maxMessages) {
$history = array_slice($history, -$this->maxMessages);
}
Cache::put($this->key($phone), $history, now()->addMinutes($this->ttlMinutes));
}
/**
* Clear conversation history, useful for reset commands.
*/
public function clearHistory(string $phone): void
{
Cache::forget($this->key($phone));
}
private function key(string $phone): string
{
return 'whatsapp_conversation_' . md5($phone);
}
}
The maxMessages limit of 20 is deliberate. OpenAI has a context window limit and sending an entire day's worth of messages in every request gets expensive fast. Keeping the last 20 exchanges gives the bot enough context to be helpful without unnecessary API cost.
The TTL of 60 minutes means if a customer goes quiet for an hour and comes back, the conversation starts fresh. You can adjust both of these to fit your support workflow.
Step 3: The WhatsApp AI Service
This service handles the OpenAI side. It takes the customer's phone number and their latest message, builds the full conversation context including a system prompt that defines the bot's behaviour, and returns a reply.
Create app/Services/WhatsAppAIService.php:
<?php
namespace App\Services;
use OpenAI\Laravel\Facades\OpenAI;
class WhatsAppAIService
{
public function __construct(
private ConversationMemoryService $memory
) {}
public function respond(string $phone, string $userMessage): string
{
// Save the customer's message to history first
$this->memory->addMessage($phone, 'user', $userMessage);
// Build messages array with system prompt at the top
$messages = array_merge(
[$this->systemPrompt()],
$this->memory->getHistory($phone)
);
$response = OpenAI::chat()->create([
'model' => 'gpt-4o',
'temperature' => 0.5,
'max_tokens' => 300,
'messages' => $messages,
]);
$reply = trim($response->choices[0]->message->content);
// Save the assistant reply to history so context carries forward
$this->memory->addMessage($phone, 'assistant', $reply);
return $reply;
}
private function systemPrompt(): array
{
return [
'role' => 'system',
'content' => 'You are a friendly and professional customer support assistant
for an e-commerce store. You help customers with questions about
orders, shipping, returns, and payments. Keep replies concise and
clear, ideally under 3 sentences, since this is a WhatsApp conversation.
If you do not know something specific about an order, ask the customer
for their order number and let them know a human agent will follow up.
Never make up order details or policies you are not sure about.',
];
}
}
A few things worth pointing out here. The max_tokens: 300 keeps replies short, which is exactly what you want for WhatsApp. Nobody wants to read a five paragraph response on their phone. The system prompt explicitly tells the bot not to make up order details, which is important for a support context where hallucinated information would cause real problems.
The temperature is 0.5, slightly higher than what I used in the code review bot from the last post. Support responses need to feel natural and conversational, so a bit more variation is fine here.
Step 4: The Webhook Controller
php artisan make:controller WhatsAppWebhookController
<?php
namespace App\Http\Controllers;
use Illuminate\Http\Request;
use Illuminate\Http\Response;
use App\Services\WhatsAppAIService;
use App\Services\ConversationMemoryService;
class WhatsAppWebhookController extends Controller
{
public function __construct(
private WhatsAppAIService $aiService,
private ConversationMemoryService $memory
) {}
public function handle(Request $request): Response
{
$from = $request->input('From', '');
$message = trim($request->input('Body', ''));
if (empty($from) || empty($message)) {
return $this->twiml('');
}
// Allow customers to reset their conversation
if (strtolower($message) === 'reset') {
$this->memory->clearHistory($from);
return $this->twiml('Conversation reset. How can I help you today?');
}
// Handle media messages gracefully
if ($request->has('MediaUrl0')) {
return $this->twiml('Thanks for the image. A human agent will review it and get back to you shortly.');
}
$reply = $this->aiService->respond($from, $message);
return $this->twiml($reply);
}
/**
* Build a TwiML response that Twilio uses to send the WhatsApp message.
*/
private function twiml(string $message): Response
{
$xml = '<?xml version="1.0" encoding="UTF-8"?>';
$xml .= '<Response>';
$xml .= '<Message>' . htmlspecialchars($message) . '</Message>';
$xml .= '</Response>';
return response($xml, 200)->header('Content-Type', 'text/xml');
}
}
The reset command is a small touch but worth having. If a customer gets into a confusing exchange and wants to start over, they just send "reset" and the history clears. Useful for testing too.
Step 5: Route and CSRF Exception
Add the webhook route in routes/web.php:
use App\Http\Controllers\WhatsAppWebhookController;
Route::post('/webhook/whatsapp', [WhatsAppWebhookController::class, 'handle'])
->name('webhook.whatsapp');
Twilio sends POST requests to your webhook, and Laravel's CSRF middleware will block them by default because Twilio does not send a CSRF token. You need to exclude this route from CSRF protection.
In Laravel 10, open app/Http/Middleware/VerifyCsrfToken.php and add the route to the exceptions array:
<?php
namespace App\Http\Middleware;
use Illuminate\Foundation\Http\Middleware\VerifyCsrfToken as Middleware;
class VerifyCsrfToken extends Middleware
{
protected $except = [
'webhook/whatsapp',
];
}
In Laravel 11, open bootstrap/app.php and update it there:
->withMiddleware(function (Middleware $middleware) {
$middleware->validateCsrfTokens(except: [
'webhook/whatsapp',
]);
})
This is one of those things that trips people up the first time they set up a Twilio webhook on Laravel. The request just silently fails and you get no clear error message. If your webhook is not responding, check this before anything else.
Step 6: Validating That Requests Actually Come From Twilio
Since this webhook is publicly accessible, you should verify that incoming requests actually came from Twilio and not from someone who found your endpoint. Twilio signs every request with your auth token and sends the signature in the X-Twilio-Signature header.
Create a middleware to handle this:
php artisan make:middleware ValidateTwilioRequest
<?php
namespace App\Http\Middleware;
use Closure;
use Illuminate\Http\Request;
use Twilio\Security\RequestValidator;
class ValidateTwilioRequest
{
public function handle(Request $request, Closure $next): mixed
{
$validator = new RequestValidator(config('services.twilio.auth_token'));
$signature = $request->header('X-Twilio-Signature', '');
$url = $request->fullUrl();
$params = $request->post();
if (!$validator->validate($signature, $url, $params)) {
abort(403, 'Invalid Twilio signature.');
}
return $next($request);
}
}
Apply it to the webhook route:
Route::post('/webhook/whatsapp', [WhatsAppWebhookController::class, 'handle'])
->middleware(\App\Http\Middleware\ValidateTwilioRequest::class)
->name('webhook.whatsapp');
Skip this during local development if it causes issues. Twilio signature validation depends on the exact URL matching, which can get complicated with ngrok. Enable it in staging and production.
Step 7: Exposing Your Local Server With ngrok
Twilio needs a publicly accessible URL to send webhook requests to. During development, ngrok creates a temporary public tunnel to your local machine. Install it from ngrok.com, then run both of these in separate terminals:
php artisan serve
ngrok http 8000
ngrok will give you a forwarding URL that looks something like https://a1b2c3d4.ngrok.io. Your full webhook URL will be:
https://a1b2c3d4.ngrok.io/webhook/whatsapp
Copy that URL, you will need it in the next step.
Step 8: Configure the Twilio WhatsApp Sandbox
Log into your Twilio console and go to Messaging, then Try it out, then Send a WhatsApp message. You will land on the sandbox configuration page.
Find the field labelled "When a message comes in" and paste your ngrok webhook URL there. Make sure the request method is set to HTTP POST, not GET.
To connect your personal WhatsApp number to the sandbox for testing, send the join code shown on that page (something like "join bright-example") to the Twilio sandbox number on WhatsApp. You will get a confirmation message once you are connected.
Send any message from your WhatsApp now and watch it flow through the full stack.
What a Real Customer Support Conversation Looks Like
Here is an example of the kind of exchange this bot handles well. A customer asking about a return:
Customer : Hi, I want to return a jacket I bought last week
Bot : Hi there! Happy to help with your return. Could you share your
order number so I can check the details?
Customer : It is ORDER-48291
Bot : Thanks! Our standard return window is 30 days from delivery,
so you are well within that period. Would you like me to walk
you through the return steps or connect you with a human agent?
Customer : Walk me through it please
Bot : Sure. Repack the item securely, then visit our returns portal
and enter your order number to generate a prepaid label. Drop
it at any courier location and your refund will process within
3 to 5 business days once we receive it.
Customer : What if I lost the original packaging?
Bot : No problem at all. Any sturdy box or bag works fine.
The original packaging is not required for returns.
Four exchanges, the bot carried the order number across messages without the customer repeating it, gave accurate policy information, and offered a clear escalation path. That is exactly what a good first-line support interaction should look like.
Rate Limiting Per Customer
If one customer sends fifty messages in a minute, you do not want to fire fifty OpenAI API calls. Add rate limiting per phone number using Laravel's built-in rate limiter, right at the top of the handle method in your controller:
use Illuminate\Support\Facades\RateLimiter;
$key = 'whatsapp_' . md5($from);
if (RateLimiter::tooManyAttempts($key, 10)) {
return $this->twiml('You are sending messages too quickly. Please wait a moment and try again.');
}
RateLimiter::hit($key, 60);
This allows 10 messages per minute per customer before the rate limit kicks in. Adjust the numbers based on how your support volume actually looks.
Moving From Sandbox to Production
The sandbox works well for testing but has real limitations. Every customer has to send a join code before the bot can message them, and the sandbox number is shared across all Twilio accounts. For an actual deployment you need a dedicated WhatsApp Business number approved through Meta.
The approval process goes through Twilio's WhatsApp sender registration. You submit your business details, Meta reviews and approves the number, and once that is done you update TWILIO_WHATSAPP_FROM in your production environment and point the webhook to your live URL. The rest of the code does not change.
On the infrastructure side, switch CACHE_DRIVER to redis in production. The file cache works locally but Redis handles concurrent requests from multiple customers properly and survives server restarts without losing conversation history mid-session.
Three things to add before handing this to a Client
The core works well but a production support bot needs a bit more to be truly reliable.
First, a database log of every conversation. Both for debugging and for reviewing what the bot is actually saying to customers. A simple whatsapp_messages table with columns for phone, role, content, and created_at is enough to start. You will thank yourself for having this the first time the bot says something unexpected.
Second, a human handoff trigger. If the customer says something like "I want to speak to a real person" or the bot detects repeated frustration in the conversation, it should stop trying to resolve things automatically and flag the conversation for the support team. A keyword check handles the obvious cases, and you can ask OpenAI to classify sentiment alongside the reply for the subtler ones.
Third, a basic admin view showing active conversations, the most common questions coming in, and average response times. That data is useful for improving the system prompt and for giving the support team visibility into what the bot is handling versus what it is escalating.
Those three additions turn a working prototype into something you can confidently hand over and actually maintain.
Build an AI Code Review Bot with Laravel — Real-World Use Case
Let me tell you how this idea actually started. A few months back, our team was doing PR reviews and I kept writing the same comment over and over, something like "this will cause an N+1 issue, please use eager loading." Different developer, different PR, same problem. Third time in two weeks I typed that comment, I thought there has to be a smarter way to handle this first pass.
That is what this is. Not some fancy AI product. Just a practical Laravel tool that takes a PHP code snippet, sends it to OpenAI, and gives back structured feedback before a human reviewer even opens the PR. The idea is simple: catch the obvious stuff automatically so your senior devs can spend their review time on things that actually need a human brain.
I will walk through the full build. By the end you will have a working Laravel app that accepts code, returns severity-tagged issues, security flags, suggestions, and a quality score. We will also hook it up to a queue so the UI does not freeze waiting on the API.
What you need before starting: Laravel 10 or 11, PHP 8.1+, Composer, and an OpenAI API key. That is it.
Why not PHPStan or CodeSniffer?
Because they are rule-based. They catch what they have been told to catch, nothing more.
PHPStan at max level is genuinely good. I use it. But here is the thing, some of the worst bugs in production do not violate a single linting rule. An N+1 query loop is syntactically perfect. A function that silently returns null on failure will not trigger any warning. A missing authorization check on a route will not show up in static analysis at all.
An LLM understands context. It can look at code and say "this will fall apart under load" or "this validation will silently pass null." That is a different category of feedback altogether. Use both, they are not competing with each other.
| What Gets Checked | PHPStan / PHPCS | AI Reviewer |
|---|---|---|
| Syntax and type errors | Strong | Yes |
| Coding standards | Strong | Yes |
| N+1 / query logic problems | No | Yes |
| Security patterns | Partial | Yes |
| Architecture suggestions | No | Yes |
| Explains why something is wrong | No | Yes |
How everything fits together
Before touching any code, here is the flow:
Developer submits PHP code via a form
↓
Laravel controller validates it
↓
CodeReviewService builds a structured prompt
↓
OpenAI GPT-4o analyses the code
↓
JSON response gets parsed
↓
Feedback renders back to the developer
No complex abstractions, no unnecessary packages beyond the OpenAI client. The structure is clean enough that adding features later, storing review history, GitHub webhook triggers, Slack notifications, is straightforward.
Step 1: Install Laravel and the OpenAI Package
composer create-project laravel/laravel ai-code-reviewer
cd ai-code-reviewer
composer require openai-php/laravel
Publish the config file:
php artisan vendor:publish --provider="OpenAI\Laravel\ServiceProvider"
Then open your .env and add your key:
OPENAI_API_KEY=sk-your-key-here
OPENAI_ORGANIZATION=
One thing I will say plainly. I have seen API keys committed to git repos more times than I would like. Double check that .env is in your .gitignore before anything else.
Step 2: Create a Service - CodeReviewService
Third-party API calls belong in a service class. Not in a controller, not in a model. This keeps things testable and means when you want to swap GPT-4o for a different model down the line, you change exactly one file.
Create app/Services/CodeReviewService.php manually:
<?php
namespace App\Services;
use OpenAI\Laravel\Facades\OpenAI;
class CodeReviewService
{
public function review(string $code): array
{
$response = OpenAI::chat()->create([
'model' => 'gpt-4o',
'temperature' => 0.3,
'messages' => [
[
'role' => 'system',
'content' => 'You are a senior PHP developer and Laravel architect.
Review PHP code and return feedback as valid JSON only.
No markdown. No explanation outside the JSON object.',
],
[
'role' => 'user',
'content' => $this->buildPrompt($code),
],
],
]);
return $this->parse($response->choices[0]->message->content);
}
private function buildPrompt(string $code): string
{
return <<<PROMPT
Review the PHP/Laravel code below. Return a JSON object with these keys:
- "summary": 1-2 sentence overall assessment.
- "score": integer 1 to 10 for code quality.
- "issues": array of objects with:
- "severity": "critical", "warning", or "info"
- "line_hint": function name or rough location
- "message": clear explanation of the problem
- "suggestions": array of improvement suggestions as strings.
- "security_flags": array of security concerns, or empty array.
Code:
\`\`\`php
{$code}
\`\`\`
PROMPT;
}
private function parse(string $raw): array
{
$clean = preg_replace('/^```json\s*/i', '', trim($raw));
$clean = preg_replace('/```$/', '', trim($clean));
$data = json_decode(trim($clean), true);
if (json_last_error() !== JSON_ERROR_NONE) {
return [
'summary' => 'Response could not be parsed. Try submitting again.',
'score' => null,
'issues' => [],
'suggestions' => [],
'security_flags' => [],
];
}
return $data;
}
}
The temperature: 0.3 is intentional. Lower temperature means less randomness, the model stays focused and gives consistent output. For creative writing you would push that higher. For structured technical analysis, you want predictable not creative.
Also notice the parse method strips markdown fences. GPT-4o usually returns clean JSON when you ask for it, but it occasionally wraps the output in backtick fences anyway. This handles that without breaking anything.
Step 3: Controller and Routes
php artisan make:controller CodeReviewController
<?php
namespace App\Http\Controllers;
use Illuminate\Http\Request;
use App\Services\CodeReviewService;
class CodeReviewController extends Controller
{
public function __construct(
private CodeReviewService $reviewService
) {}
public function index()
{
return view('code-review.index');
}
public function review(Request $request)
{
$request->validate([
'code' => 'required|string|min:10|max:5000',
]);
$feedback = $this->reviewService->review($request->input('code'));
return view('code-review.result', compact('feedback'));
}
}
Add the routes in routes/web.php:
use App\Http\Controllers\CodeReviewController;
Route::get('/code-review', [CodeReviewController::class, 'index'])
->name('code-review.index');
Route::post('/code-review', [CodeReviewController::class, 'review'])
->name('code-review.review');
Step 4: Blade Views
Keeping these minimal. The styling comes from your existing setup, no need to add anything extra here.
resources/views/code-review/index.blade.php
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>AI Code Reviewer</title>
</head>
<body>
<h1>AI Code Reviewer</h1>
<p>Paste PHP or Laravel code below and get structured feedback instantly.</p>
<form method="POST" action="{{ route('code-review.review') }}">
@csrf
<textarea name="code" rows="15" cols="80"
placeholder="Paste your PHP code here...">{{ old('code') }}</textarea>
@error('code')
<p>{{ $message }}</p>
@enderror
<br>
<button type="submit">Review Code</button>
</form>
</body>
</html>
resources/views/code-review/result.blade.php
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Review Result</title>
</head>
<body>
<h1>Code Review Result</h1>
<p>{{ $feedback['summary'] ?? '' }}</p>
@isset($feedback['score'])
<p><strong>Quality Score: {{ $feedback['score'] }} / 10</strong></p>
@endisset
@if(!empty($feedback['issues']))
<h2>Issues Found</h2>
@foreach($feedback['issues'] as $issue)
<div>
<strong>[{{ strtoupper($issue['severity']) }}]</strong>
@if(!empty($issue['line_hint']))
, {{ $issue['line_hint'] }}
@endif
<p>{{ $issue['message'] }}</p>
</div>
<hr>
@endforeach
@else
<p>No major issues found.</p>
@endif
@if(!empty($feedback['security_flags']))
<h2>Security Flags</h2>
<ul>
@foreach($feedback['security_flags'] as $flag)
<li>{{ $flag }}</li>
@endforeach
</ul>
@endif
@if(!empty($feedback['suggestions']))
<h2>Suggestions</h2>
<ul>
@foreach($feedback['suggestions'] as $s)
<li>{{ $s }}</li>
@endforeach
</ul>
@endif
<p><a href="{{ route('code-review.index') }}">Review another snippet</a></p>
</body>
</html>
Step 5: Queue the API Call, Do not block the UI
GPT-4o usually responds in 2 to 4 seconds for short snippets, sometimes longer. That is not great for a synchronous web request, and on some server configs it will hit a timeout before the response comes back. For any production setup, queue it.
php artisan make:job ProcessCodeReview
<?php
namespace App\Jobs;
use App\Services\CodeReviewService;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
use Illuminate\Support\Facades\Cache;
class ProcessCodeReview implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
public int $timeout = 60;
public int $tries = 2;
public function __construct(
private string $code,
private string $cacheKey
) {}
public function handle(CodeReviewService $service): void
{
$result = $service->review($this->code);
Cache::put($this->cacheKey, $result, now()->addMinutes(10));
}
}
Update the controller to dispatch the job and add a polling method:
public function review(Request $request)
{
$request->validate(['code' => 'required|string|min:10|max:5000']);
$key = 'review_' . md5($request->input('code') . uniqid());
ProcessCodeReview::dispatch($request->input('code'), $key);
return view('code-review.waiting', ['cacheKey' => $key]);
}
public function poll(string $key)
{
$feedback = Cache::get($key);
if (!$feedback) {
return response()->json(['status' => 'pending']);
}
return response()->json(['status' => 'done', 'feedback' => $feedback]);
}
For local development, set QUEUE_CONNECTION=sync in your .env and jobs will run immediately without needing a worker. In production use redis or database.
What the bot actually catching: A Real Example
Here is a piece of code I have seen in various forms across different projects. It works. On a test database with ten orders nobody notices anything wrong with it.
<?php
public function getUserOrders($userId)
{
$orders = DB::table('orders')->where('user_id', $userId)->get();
foreach ($orders as $order) {
$items = DB::table('order_items')->where('order_id', $order->id)->get();
echo $order->id . ': ' . count($items) . ' items<br>';
}
}
Put this in front of a customer with 400 orders and watch what happens to your database.
Paste that into the reviewer and here is what comes back:
{
"summary": "Code is functional but has a critical N+1 query problem and bypasses Eloquent entirely. Direct output with echo also breaks MVC separation and makes this code untestable.",
"score": 3,
"issues": [
{
"severity": "critical",
"line_hint": "foreach loop, DB::table order_items",
"message": "N+1 query problem. One database query fires per order inside the loop. With 400 orders that becomes 401 queries. Use Eloquent with eager loading: Order::with('items')->where('user_id', $userId)->get()"
},
{
"severity": "warning",
"line_hint": "DB::table()",
"message": "Raw query builder bypasses Eloquent model logic, accessors, and relationships. Switching to Eloquent models makes the code significantly easier to maintain and test."
},
{
"severity": "info",
"line_hint": "echo statement",
"message": "Direct output inside a controller or service method violates MVC. Return structured data and handle rendering in the view layer."
}
],
"suggestions": [
"Define a hasMany relationship on Order pointing to OrderItem.",
"Replace DB::table calls with Order::with('items')->where('user_id', $userId)->get()",
"Return a collection and let Blade handle the output, do not echo from service methods."
],
"security_flags": [
"$userId passes into a query with no type check or validation. Confirm this is an authenticated, validated integer before it reaches any DB call."
]
}
Score of 3, one critical issue, one warning, one info note, and a security flag. All accurate, all actionable. That took under four seconds and it is exactly the kind of feedback that usually takes a few minutes of a senior developer's time to write out properly.
Where this fits in an actual Workflow
I want to be direct about this because I have seen people set up tools like this and then either over-rely on them or drop them after two weeks. The right use here is as a first-pass gate, not a replacement for peer review.
The workflow that actually makes sense: developer opens a PR, the bot triggers via a GitHub webhook, posts its feedback as a comment on the PR, and the human reviewer knows the basics have already been handled. They skip straight to the parts that need real judgment, design decisions, edge cases, whether the approach fits the broader architecture.
That is where this earns its place. Not by replacing review. By removing the repetitive first ten minutes of it.
Few things to know before building this:
The prompt structure matters more than anything else in this whole build. Early versions I tried came back as freeform text, which is hard to work with in a UI. Asking the model to return only JSON with field names you define upfront makes parsing reliable every time. Do not skip that part.
GPT-4o is noticeably better than GPT-3.5 for this kind of task, not just in accuracy but in how it explains problems. "Use eager loading" is less useful than "this fires one query per iteration, here is the exact fix." The difference in API cost is worth it if you are using this on a real codebase.
One more thing. Do not feed entire files in at once, at least not to start. Keep the input focused: a single method, one class, a specific feature. Smaller focused reviews produce better feedback. You can extend the input limit later once you are happy with the output quality.
From here the natural extensions to build are a GitHub webhook integration to trigger reviews on every PR automatically, a review history table to track quality trends over time, custom system prompts per project so the bot reviews against your team's conventions specifically, and Slack notifications when a review completes. None of that is complicated to add on top of what we have built here.
If you found this useful, drop a comment in the comment section.
Building a RAG System in Laravel from Scratch
Most RAG tutorials start with "first, sign up for Pinecone." I'm going to skip that entirely. For the majority of Laravel applications, a dedicated vector database is overkill. You already have MySQL. You already have Laravel's queue system. That's enough to build a fully functional retrieval augmented generation pipeline that works well into the tens of thousands of documents.
RAG solves a specific problem. LLMs are trained on general data up to a cutoff date. They know nothing about your application's content, your internal docs, your product knowledge base, or anything else specific to your domain. RAG fixes this by retrieving relevant content from your own data and injecting it into the prompt as context before asking the model to answer. The model stops guessing and starts answering based on what you actually have.
Here is how to build it properly in Laravel.
What We Are Building
A pipeline that does four things:
- Accepts documents (articles, pages, PDFs, anything text-based) and stores them with their embeddings
- When a user asks a question, converts that question into an embedding
- Finds the most semantically similar documents using cosine similarity against your stored embeddings
- Feeds those documents as context to GPT and returns a grounded answer
No external services beyond OpenAI. No Docker containers for a vector DB. Just Laravel, MySQL, and two API calls per query.
Requirements
- Laravel 10 or 11
- PHP 8.1+
- MySQL 8.0+
- OpenAI API key
- Guzzle (ships with Laravel)
Step 1: The Documents Table
php artisan make:migration create_documents_table
public function up(): void
{
Schema::create('documents', function (Blueprint $table) {
$table->id();
$table->string('title');
$table->longText('content');
$table->longText('embedding')->nullable(); // JSON float array
$table->string('source')->nullable(); // URL, filename, etc.
$table->timestamps();
});
}
php artisan migrate
The embedding column stores a JSON-encoded array of 1536 floats (for text-embedding-3-small). Yes, it's a text column, not a native vector type. MySQL 9 adds vector support but for now JSON in a longText column works fine for most use cases.
Step 2: The Document Model
php artisan make:model Document
namespace App\Models;
use Illuminate\Database\Eloquent\Model;
class Document extends Model
{
protected $fillable = ['title', 'content', 'embedding', 'source'];
protected $casts = [
'embedding' => 'array',
];
}
The embedding cast handles the JSON encoding and decoding automatically. When you set $document->embedding = $vectorArray, Laravel serializes it. When you read it back, you get a PHP array of floats.
Step 3: The Embedding Service
Keep all OpenAI communication in one place. This makes it easy to swap providers later.
php artisan make:service EmbeddingService
namespace App\Services;
use Illuminate\Support\Facades\Http;
class EmbeddingService
{
private string $apiKey;
private string $model = 'text-embedding-3-small';
public function __construct()
{
$this->apiKey = config('services.openai.key');
}
public function embed(string $text): array
{
// Trim to ~8000 tokens to stay within model limits
$text = mb_substr(strip_tags($text), 0, 32000);
$response = Http::withToken($this->apiKey)
->post('https://api.openai.com/v1/embeddings', [
'model' => $this->model,
'input' => $text,
]);
if ($response->failed()) {
throw new \RuntimeException('OpenAI embedding request failed: ' . $response->body());
}
return $response->json('data.0.embedding');
}
public function cosineSimilarity(array $a, array $b): float
{
$dot = 0.0;
$magA = 0.0;
$magB = 0.0;
foreach ($a as $i => $val) {
$dot += $val * $b[$i];
$magA += $val ** 2;
$magB += $b[$i] ** 2;
}
$denominator = sqrt($magA) * sqrt($magB);
return $denominator > 0 ? $dot / $denominator : 0.0;
}
}
Register it in config/services.php
'openai' => [
'key' => env('OPENAI_API_KEY'),
],
Step 4: Indexing Documents
A command to process documents and store their embeddings. You run this once on existing content, then hook it into your document creation flow going forward.
php artisan make:command IndexDocuments
namespace App\Console\Commands;
use App\Models\Document;
use App\Services\EmbeddingService;
use Illuminate\Console\Command;
class IndexDocuments extends Command
{
protected $signature = 'rag:index {--fresh : Re-index all documents}';
protected $description = 'Generate and store embeddings for all documents';
public function handle(EmbeddingService $embedder): int
{
$query = Document::query();
if (!$this->option('fresh')) {
$query->whereNull('embedding');
}
$documents = $query->get();
$bar = $this->output->createProgressBar($documents->count());
foreach ($documents as $doc) {
try {
$doc->embedding = $embedder->embed($doc->title . "\n\n" . $doc->content);
$doc->save();
$bar->advance();
} catch (\Exception $e) {
$this->error("Failed on document {$doc->id}: " . $e->getMessage());
}
// Respect OpenAI rate limits
usleep(200000); // 200ms between requests
}
$bar->finish();
$this->newLine();
$this->info('Indexing complete.');
return self::SUCCESS;
}
}
Run it:
php artisan rag:index
Notice I'm concatenating title and content before embedding. The title carries a lot of semantic weight and including it improves retrieval accuracy noticeably.
Step 5: The Retrieval Logic
This is the core of RAG. Given a query, find the most relevant documents.
namespace App\Services;
use App\Models\Document;
class RetrievalService
{
public function __construct(private EmbeddingService $embedder) {}
public function retrieve(string $query, int $topK = 5, float $threshold = 0.75): array
{
$queryVector = $this->embedder->embed($query);
$documents = Document::whereNotNull('embedding')->get();
$scored = $documents->map(function (Document $doc) use ($queryVector) {
return [
'document' => $doc,
'score' => $this->embedder->cosineSimilarity($queryVector, $doc->embedding),
];
})
->filter(fn($item) => $item['score'] >= $threshold)
->sortByDesc('score')
->take($topK)
->values();
return $scored->toArray();
}
}
The $threshold of 0.75 filters out loosely related documents. You may need to tune this for your content, lower it if you're getting no results, raise it if you're getting irrelevant ones. Anywhere between 0.70 and 0.85 is usually sensible.
Step 6: The RAG Query Service
This ties retrieval and generation together.
namespace App\Services;
use Illuminate\Support\Facades\Http;
class RagService
{
public function __construct(
private RetrievalService $retriever,
private string $apiKey
) {
$this->apiKey = config('services.openai.key');
}
public function ask(string $question): array
{
// Step 1: Retrieve relevant documents
$results = $this->retriever->retrieve($question, topK: 4);
if (empty($results)) {
return [
'answer' => 'I could not find relevant information to answer this question.',
'sources' => [],
];
}
// Step 2: Build context from retrieved docs
$context = collect($results)
->map(fn($r) => "### {$r['document']->title}\n{$r['document']->content}")
->join("\n\n---\n\n");
// Step 3: Send to GPT with context
$response = Http::withToken($this->apiKey)
->post('https://api.openai.com/v1/chat/completions', [
'model' => 'gpt-4o-mini',
'temperature' => 0.2,
'messages' => [
[
'role' => 'system',
'content' => "You are a helpful assistant. Answer questions using only the context provided below. If the answer is not in the context, say so clearly. Do not make up information.\n\nContext:\n{$context}"
],
[
'role' => 'user',
'content' => $question,
]
],
]);
return [
'answer' => $response->json('choices.0.message.content'),
'sources' => collect($results)->map(fn($r) => [
'title' => $r['document']->title,
'source' => $r['document']->source,
'score' => round($r['score'], 3),
])->toArray(),
];
}
}
Two things worth noting here. Temperature is set to 0.2, not the default 0.7. You want deterministic, factual answers when doing RAG, not creative ones. And the system prompt explicitly tells the model to stay within the provided context and admit when it doesn't know. Without that instruction, GPT will hallucinate rather than say "I don't have that information."
Step 7: The Controller
php artisan make:controller RagController
namespace App\Http\Controllers;
use App\Services\RagService;
use Illuminate\Http\Request;
class RagController extends Controller
{
public function __construct(private RagService $rag) {}
public function ask(Request $request)
{
$request->validate(['question' => 'required|string|max:500']);
$result = $this->rag->ask($request->input('question'));
return response()->json($result);
}
}
Register the route in routes/api.php
Route::post('/ask', [RagController::class, 'ask']);
Step 8: Test It
Seed a couple of documents first:
Document::create([
'title' => 'Laravel Queue Configuration',
'content' => 'Laravel queues allow you to defer time-consuming tasks...',
'source' => 'https://laravel.com/docs/queues',
]);
Run the indexer:
php artisan rag:index
Then hit the endpoint:
curl -X POST http://your-app.test/api/ask \
-H "Content-Type: application/json" \
-d '{"question": "How do I configure Laravel queues?"}'
Response:
{
"answer": "Laravel queues are configured via the config/queue.php file...",
"sources": [
{
"title": "Laravel Queue Configuration",
"source": "https://laravel.com/docs/queues",
"score": 0.891
}
]
}
Where This Falls Down at Scale
This setup works well up to roughly 50,000 documents. Beyond that, loading all embeddings into memory for comparison becomes a problem. At that point your options are:
- Add a MySQL generated column + raw SQL dot product approximation to filter candidates before full cosine comparison
- Move to pgvector if you can switch to PostgreSQL, which handles this natively and efficiently
- Then and only then consider Pinecone or Weaviate
Most Laravel projects never reach that threshold. Start simple, measure, then scale the storage layer when you actually need to.
What to Build on Top of This
Once the core pipeline is working, the useful next steps are: caching query embeddings so repeated questions don't hit the API twice, chunking long documents into 500-token segments before embedding so retrieval is more granular, adding a feedback mechanism so users can flag bad answers and you can track retrieval quality over time, and per-user conversation history so the model has context across multiple turns.
That is a production-ready RAG foundation in Laravel with no external vector database. The whole thing is maybe 200 lines of actual PHP spread across four service classes and one command.
No more posts to load.
- Building a RAG System in Laravel from Scratch
- Steps to create a Contact Form in Symfony With SwiftMailer
- Build an AI Code Review Bot with Laravel — Real-World Use Case
- Build a WhatsApp AI Assistant Using Laravel, Twilio and OpenAI
- Drupal 7 - Create your custom Hello World module
- Create Front End Component in Joomla - Step by step procedure
- CIBB - Basic Forum With Codeigniter and Twitter Bootstrap
- Migrating a wordpress website to Joomla website
- A step by step procedure to develop wordpress plugin
- Magento - Steps to add Custom Tabs to the Product Admin