Drupal and LangChain: Building Multi-Step AI Pipelines for Enterprise CMS

PHP CMS Frameworks March 28, 2026

Enterprise content teams have a problem that does not get talked about enough. It is not producing content, most large organisations have plenty of that. The problem is what happens to content before it gets published. Review queues that stretch for days, moderation bottlenecks where one editor is the single point of failure, policy checks that get skipped under deadline pressure, and taxonomy tagging that is inconsistent across a team of twenty people all making their own judgment calls.

I worked with an enterprise Drupal site last year that had over 3,000 pieces of content sitting in a moderation queue at any given time. Four editors, no automation, no triage. Good content was getting buried under low-quality submissions and the editors were spending most of their time on mechanical checks rather than actual editorial judgment.

What they needed was a multi-step AI pipeline sitting between content submission and human review. Something that could screen content automatically, flag policy violations, suggest taxonomy terms, score quality, and route content to the right reviewer based on what it found. That is what this post is about.

We will cover how LangChain fits into a Drupal architecture, the honest tradeoffs between the Python, JavaScript, and PHP approaches, and then go deep on building the full pipeline in PHP inside a custom Drupal module.

Why LangChain matters here

LangChain is a framework for building applications that chain multiple AI calls together, each step taking the output of the previous one as its input. Instead of sending one big prompt to an LLM and hoping for a good result, you break the problem into focused steps. One step checks for policy violations. The next scores content quality. The next suggests taxonomy terms. The next decides where to route the content for review. Each step does one thing well.

The reason this matters for an enterprise CMS is that single-prompt AI approaches get inconsistent quickly when content is varied and complex. A one-shot prompt that tries to check policy compliance, assess quality, tag taxonomy, and make a routing decision all at once tends to produce mediocre results across all four. Breaking it into a chain where each step is focused produces meaningfully better output, and more importantly, it makes each step auditable. You can see exactly where the pipeline flagged something and why.

LangChain was originally built in Python, which is where it is most mature. A JavaScript version called LangChain.js followed. There is no official PHP version, which creates an interesting architecture question for Drupal teams.

Three ways to use LangChain with Drupal

Before picking an approach, it is worth understanding what each option actually involves in practice. I have seen teams choose the wrong one based on familiarity rather than fit, and it costs them later.

Option 1: Python LangChain as a Separate Microservice

You build a small Python FastAPI or Flask service that runs LangChain pipelines. Drupal calls this service via HTTP when content needs processing and receives structured JSON back. The pipeline logic lives entirely in Python, Drupal just sends content and handles the response.

This is the most powerful option because you get the full LangChain Python ecosystem, including document loaders, vector stores, agents, and memory. The tradeoff is operational complexity. You are now running and maintaining two separate services, Python and PHP, and your team needs to be comfortable in both.

Good fit for: teams with Python expertise already on staff, complex pipelines that need LangChain agents or vector retrieval, and organisations with proper infrastructure for running multiple services.

Option 2: LangChain.js via a Node.js Microservice

Similar architecture to Option 1 but the sidecar service runs Node.js with LangChain.js instead of Python. Drupal calls it the same way via HTTP. LangChain.js has caught up significantly to the Python version in recent versions and covers most common pipeline patterns.

The advantage over Python is that JavaScript is more widely known across web development teams. The disadvantage is that LangChain.js still lags behind Python on some advanced features, and you still have the same two-service operational overhead.

Good fit for: teams with frontend JavaScript experience who want to avoid Python, simpler pipeline patterns, and organisations already running Node.js services.

Option 3: PHP Pipeline Mimicking LangChain Patterns (What We Are Building)

You implement the same chaining concepts directly in PHP using the OpenAI PHP client, no LangChain library involved. Each step in the pipeline is a focused PHP class. They chain together through a Pipeline orchestrator. The output of each step feeds into the next.

This approach keeps everything inside Drupal, no additional services to deploy or maintain, no cross-language boundaries, no HTTP overhead between steps. The tradeoff is that you implement the chaining logic yourself rather than using a ready-made framework.

Honestly, for most enterprise Drupal use cases this is the right call. The LangChain library provides a lot of features you will not need for a content moderation pipeline. What you need is the chaining pattern, structured prompts, and reliable JSON outputs, and all of that is straightforward to implement in PHP.

Good fit for: Drupal teams without Python or Node.js expertise, pipelines that do not require vector retrieval or complex agents, and organisations that want the full pipeline inside their existing Drupal infrastructure.

That is the option we are going deep on. Here is what we are building.

The Pipeline We Are Building

Four steps, each focused on one job:

Content submitted to Drupal
        ↓
Step 1: Policy Compliance Check
        Does the content violate any publishing policies?
        Output: pass / flag / reject + reason
        ↓
Step 2: Quality Assessment
        Is the content well-written, complete, and suitable for publishing?
        Output: quality score 1-10 + specific feedback
        ↓
Step 3: Taxonomy Suggestion
        What terms should be applied to this content?
        Output: suggested taxonomy terms with confidence scores
        ↓
Step 4: Routing Decision
        Based on the above, where should this content go?
        Output: auto-approve / send to editor / send to senior editor / reject
        ↓
Content routed to correct moderation state in Drupal

Each step receives the original content plus the outputs of all previous steps. By the time Step 4 runs, it has the policy check result, the quality score, and the taxonomy suggestions available to inform its routing decision. That context is what makes the routing intelligent rather than mechanical.

Setting up the custom Drupal module

We will build this as a custom Drupal module. Create the module structure:

modules/custom/ai_content_pipeline/
    ai_content_pipeline.info.yml
    ai_content_pipeline.services.yml
    ai_content_pipeline.module
    src/
        Pipeline/
            ContentModerationPipeline.php
        Steps/
            PolicyComplianceStep.php
            QualityAssessmentStep.php
            TaxonomySuggestionStep.php
            RoutingDecisionStep.php
        Contracts/
            PipelineStepInterface.php
        Service/
            OpenAIService.php

The ai_content_pipeline.info.yml:

name: 'AI Content Pipeline'
type: module
description: 'Multi-step AI pipeline for intelligent content moderation'
core_version_requirement: ^10 || ^11
package: Custom
dependencies:
  - drupal:node

Install the OpenAI PHP client via Composer in your Drupal project root:

composer require openai-php/client

Step 1: The Pipeline step interface

Every step in the pipeline implements this interface. It enforces a consistent contract across all steps, which makes the pipeline orchestrator simple to write and easy to extend with new steps later.

Create src/Contracts/PipelineStepInterface.php:

<?php

namespace Drupal\ai_content_pipeline\Contracts;

interface PipelineStepInterface
{
    /**
     * Execute this pipeline step.
     *
     * @param string $content  The original content being processed.
     * @param array  $context  Results from all previous steps.
     *
     * @return array  Results from this step to pass forward.
     */
    public function execute(string $content, array $context): array;

    /**
     * Human-readable name for this step, used in logging.
     */
    public function name(): string;
}

Step 2: The OpenAI Service

Create src/Service/OpenAIService.php:

<?php

namespace Drupal\ai_content_pipeline\Service;

use OpenAI;

class OpenAIService
{
    private $client;

    public function __construct()
    {
        $api_key = \Drupal::config('ai_content_pipeline.settings')->get('openai_api_key');
        $this->client = OpenAI::client($api_key);
    }

    public function chat(string $systemPrompt, string $userMessage): string
    {
        $response = $this->client->chat()->create([
            'model'       => 'gpt-4o',
            'temperature' => 0.2,
            'messages'    => [
                ['role' => 'system', 'content' => $systemPrompt],
                ['role' => 'user',   'content' => $userMessage],
            ],
        ]);

        return $response->choices[0]->message->content;
    }

    public function parseJson(string $raw): array
    {
        $clean = preg_replace('/^```json\s*/i', '', trim($raw));
        $clean = preg_replace('/```$/', '', trim($clean));
        $data  = json_decode(trim($clean), true);

        if (json_last_error() !== JSON_ERROR_NONE) {
            return ['error' => 'JSON parse failed', 'raw' => $raw];
        }

        return $data;
    }
}

The temperature is set to 0.2, lower than you might expect. For pipeline steps that are making structured decisions, you want as little creative variance as possible. The model should be analytical, not inventive.

Step 3: The Policy Compliance step

Create src/Steps/PolicyComplianceStep.php:

<?php

namespace Drupal\ai_content_pipeline\Steps;

use Drupal\ai_content_pipeline\Contracts\PipelineStepInterface;
use Drupal\ai_content_pipeline\Service\OpenAIService;

class PolicyComplianceStep implements PipelineStepInterface
{
    public function __construct(private OpenAIService $ai) {}

    public function name(): string
    {
        return 'Policy Compliance Check';
    }

    public function execute(string $content, array $context): array
    {
        $system = 'You are a content policy compliance reviewer for an enterprise CMS.
                   Review content against publishing policies and return JSON only.
                   No markdown, no explanation outside the JSON object.';

        $prompt = <<<PROMPT
Review the following content against these publishing policies:

1. No hate speech, discrimination, or offensive language targeting any group.
2. No unverified factual claims presented as established fact.
3. No promotional or advertorial content disguised as editorial.
4. No personally identifiable information about private individuals.
5. No content that could create legal liability (defamation, copyright issues).

Return a JSON object with:
- "status": one of "pass", "flag", or "reject"
- "violations": array of specific violations found, empty array if none
- "reason": brief explanation of the status decision

Content to review:
{$content}
PROMPT;

        $raw    = $this->ai->chat($system, $prompt);
        $result = $this->ai->parseJson($raw);

        return [
            'policy' => $result,
        ];
    }
}

Step 4: The Quality Assessment step

Create src/Steps/QualityAssessmentStep.php:

<?php

namespace Drupal\ai_content_pipeline\Steps;

use Drupal\ai_content_pipeline\Contracts\PipelineStepInterface;
use Drupal\ai_content_pipeline\Service\OpenAIService;

class QualityAssessmentStep implements PipelineStepInterface
{
    public function __construct(private OpenAIService $ai) {}

    public function name(): string
    {
        return 'Quality Assessment';
    }

    public function execute(string $content, array $context): array
    {
        $policyStatus = $context['policy']['status'] ?? 'unknown';

        $system = 'You are a senior editorial quality reviewer for an enterprise CMS.
                   Assess content quality objectively and return JSON only.';

        $prompt = <<<PROMPT
Assess the quality of the following content. Consider:

- Clarity and readability for a general professional audience
- Completeness: does it cover the topic adequately?
- Structure: is it well organised with a logical flow?
- Accuracy indicators: does it make claims without apparent support?
- Tone: is it appropriate for professional publication?

Note: Policy compliance status from previous check is "{$policyStatus}".

Return a JSON object with:
- "score": integer from 1 to 10
- "strengths": array of what the content does well
- "weaknesses": array of specific quality issues found
- "publishable": boolean, true if quality is sufficient for publication

Content:
{$content}
PROMPT;

        $raw    = $this->ai->chat($system, $prompt);
        $result = $this->ai->parseJson($raw);

        return [
            'quality' => $result,
        ];
    }
}

Step 5: The Taxonomy Suggestion step

Create src/Steps/TaxonomySuggestionStep.php:

<?php

namespace Drupal\ai_content_pipeline\Steps;

use Drupal\ai_content_pipeline\Contracts\PipelineStepInterface;
use Drupal\ai_content_pipeline\Service\OpenAIService;

class TaxonomySuggestionStep implements PipelineStepInterface
{
    private array $availableTerms = [
        'topics'     => ['Technology', 'Business', 'Health', 'Finance', 'Policy', 'Research', 'Opinion'],
        'audience'   => ['General', 'Technical', 'Executive', 'Academic'],
        'content_type' => ['Analysis', 'News', 'Tutorial', 'Case Study', 'Interview', 'Report'],
    ];

    public function __construct(private OpenAIService $ai) {}

    public function name(): string
    {
        return 'Taxonomy Suggestion';
    }

    public function execute(string $content, array $context): array
    {
        $termsJson = json_encode($this->availableTerms);

        $system = 'You are a content taxonomy specialist for an enterprise CMS.
                   Suggest appropriate taxonomy terms and return JSON only.';

        $prompt = <<<PROMPT
Suggest taxonomy terms for the following content.
Only suggest terms from the available taxonomy list provided.

Available taxonomy terms:
{$termsJson}

Return a JSON object with:
- "suggestions": object with vocabulary names as keys, each containing:
    - "terms": array of suggested term names from the available list
    - "confidence": "high", "medium", or "low"
- "primary_topic": the single most relevant topic term

Content:
{$content}
PROMPT;

        $raw    = $this->ai->chat($system, $prompt);
        $result = $this->ai->parseJson($raw);

        return [
            'taxonomy' => $result,
        ];
    }
}

In a real deployment, replace the hardcoded $availableTerms array with a dynamic lookup from your Drupal taxonomy vocabularies. You can load terms using Drupal's entity query system and pass the full list to the prompt.

Step 6: The Routing Decision step

This is where the pipeline pays off. By the time this step runs, it has the policy result, quality score, and taxonomy confidence from the previous three steps. The routing decision is genuinely informed rather than based on a single signal.

Create src/Steps/RoutingDecisionStep.php:

<?php

namespace Drupal\ai_content_pipeline\Steps;

use Drupal\ai_content_pipeline\Contracts\PipelineStepInterface;
use Drupal\ai_content_pipeline\Service\OpenAIService;

class RoutingDecisionStep implements PipelineStepInterface
{
    public function __construct(private OpenAIService $ai) {}

    public function name(): string
    {
        return 'Routing Decision';
    }

    public function execute(string $content, array $context): array
    {
        $policyStatus   = $context['policy']['status']       ?? 'unknown';
        $policyReason   = $context['policy']['reason']       ?? '';
        $qualityScore   = $context['quality']['score']       ?? 0;
        $publishable    = $context['quality']['publishable']  ?? false;
        $weaknesses     = json_encode($context['quality']['weaknesses'] ?? []);
        $violations     = json_encode($context['policy']['violations']  ?? []);

        $system = 'You are a content workflow manager for an enterprise CMS.
                   Make routing decisions based on pipeline analysis results.
                   Return JSON only.';

        $prompt = <<<PROMPT
Based on the pipeline analysis below, decide how this content should be routed.

Pipeline results:
- Policy status: {$policyStatus}
- Policy reason: {$policyReason}
- Policy violations: {$violations}
- Quality score: {$qualityScore} / 10
- Publishable assessment: {$publishable}
- Quality weaknesses: {$weaknesses}

Routing options:
- "auto_approve": policy passed, quality score 9-10, no issues
- "editor_review": policy passed, quality score 6-8, minor issues only
- "senior_editor_review": policy flagged or quality score 4-5, needs experienced judgment
- "reject": policy status is reject, or quality score below 4

Return a JSON object with:
- "decision": one of the four routing options above
- "reason": clear explanation of why this routing was chosen
- "reviewer_notes": specific things the human reviewer should check, as an array
- "priority": "high", "normal", or "low"
PROMPT;

        $raw    = $this->ai->chat($system, $prompt);
        $result = $this->ai->parseJson($raw);

        return [
            'routing' => $result,
        ];
    }
}

Step 7: The Pipeline Orchestrator

This is the class that wires everything together. It runs each step in sequence, collects the context, handles failures gracefully, and returns the full pipeline result.

Create src/Pipeline/ContentModerationPipeline.php:

<?php

namespace Drupal\ai_content_pipeline\Pipeline;

use Drupal\ai_content_pipeline\Contracts\PipelineStepInterface;
use Drupal\Core\Logger\LoggerChannelFactoryInterface;

class ContentModerationPipeline
{
    private array $steps = [];
    private $logger;

    public function __construct(LoggerChannelFactoryInterface $loggerFactory)
    {
        $this->logger = $loggerFactory->get('ai_content_pipeline');
    }

    public function addStep(PipelineStepInterface $step): self
    {
        $this->steps[] = $step;
        return $this;
    }

    public function run(string $content): array
    {
        $context   = [];
        $stepLog   = [];
        $startTime = microtime(true);

        foreach ($this->steps as $step) {
            $stepName  = $step->name();
            $stepStart = microtime(true);

            try {
                $result  = $step->execute($content, $context);
                $context = array_merge($context, $result);

                $stepLog[] = [
                    'step'     => $stepName,
                    'status'   => 'completed',
                    'duration' => round(microtime(true) - $stepStart, 2) . 's',
                ];

                $this->logger->info('Pipeline step completed: @step', ['@step' => $stepName]);

            } catch (\Exception $e) {
                $this->logger->error('Pipeline step failed: @step, Error: @error', [
                    '@step'  => $stepName,
                    '@error' => $e->getMessage(),
                ]);

                $stepLog[] = [
                    'step'   => $stepName,
                    'status' => 'failed',
                    'error'  => $e->getMessage(),
                ];

                // On failure, route to senior editor for manual review
                $context['routing'] = [
                    'decision'       => 'senior_editor_review',
                    'reason'         => "Pipeline step '{$stepName}' failed. Manual review required.",
                    'reviewer_notes' => ['Pipeline encountered an error, please review manually.'],
                    'priority'       => 'high',
                ];

                break;
            }
        }

        return [
            'context'       => $context,
            'steps'         => $stepLog,
            'total_duration' => round(microtime(true) - $startTime, 2) . 's',
        ];
    }
}

Step 8: Wiring it into Drupal's Moderation workflow

Now we connect the pipeline to Drupal's content workflow. This hook fires when a node is presaved, runs the pipeline, and applies the routing decision as a moderation state.

In ai_content_pipeline.module:

<?php

use Drupal\node\NodeInterface;
use Drupal\ai_content_pipeline\Service\OpenAIService;
use Drupal\ai_content_pipeline\Pipeline\ContentModerationPipeline;
use Drupal\ai_content_pipeline\Steps\PolicyComplianceStep;
use Drupal\ai_content_pipeline\Steps\QualityAssessmentStep;
use Drupal\ai_content_pipeline\Steps\TaxonomySuggestionStep;
use Drupal\ai_content_pipeline\Steps\RoutingDecisionStep;

function ai_content_pipeline_node_presave(NodeInterface $node): void
{
    // Only run on new nodes that are pending moderation
    if (!$node->isNew()) {
        return;
    }

    $content = $node->getTitle() . "\n\n" . $node->get('body')->value;

    if (empty(trim($content))) {
        return;
    }

    $ai       = new OpenAIService();
    $pipeline = new ContentModerationPipeline(\Drupal::service('logger.factory'));

    $pipeline
        ->addStep(new PolicyComplianceStep($ai))
        ->addStep(new QualityAssessmentStep($ai))
        ->addStep(new TaxonomySuggestionStep($ai))
        ->addStep(new RoutingDecisionStep($ai));

    $result   = $pipeline->run($content);
    $routing  = $result['context']['routing'] ?? null;

    if (!$routing) {
        return;
    }

    // Map routing decision to Drupal moderation states
    $stateMap = [
        'auto_approve'         => 'published',
        'editor_review'        => 'needs_review',
        'senior_editor_review' => 'needs_review',
        'reject'               => 'rejected',
    ];

    $decision = $routing['decision'] ?? 'editor_review';
    $state    = $stateMap[$decision] ?? 'needs_review';

    if ($node->hasField('moderation_state')) {
        $node->set('moderation_state', $state);
    }

    // Store pipeline results in a field for reviewer reference
    if ($node->hasField('field_ai_review_notes')) {
        $notes  = "Routing: {$decision}\n";
        $notes .= "Reason: {$routing['reason']}\n\n";
        $notes .= "Reviewer notes:\n" . implode("\n", $routing['reviewer_notes'] ?? []);
        $node->set('field_ai_review_notes', $notes);
    }
}

What the Pipeline output looks like in practice

Here is a realistic example of the full pipeline result for a piece of content that passed policy checks but had quality issues. This is what your editors would see in the review notes field.

{
  "context": {
    "policy": {
      "status": "pass",
      "violations": [],
      "reason": "Content meets all publishing policy requirements."
    },
    "quality": {
      "score": 6,
      "publishable": true,
      "strengths": ["Clear headline", "Good factual grounding"],
      "weaknesses": ["Conclusion is abrupt and underdeveloped", "Second section lacks supporting evidence"]
    },
    "taxonomy": {
      "suggestions": {
        "topics": { "terms": ["Technology", "Business"], "confidence": "high" },
        "audience": { "terms": ["Executive"], "confidence": "medium" },
        "content_type": { "terms": ["Analysis"], "confidence": "high" }
      },
      "primary_topic": "Technology"
    },
    "routing": {
      "decision": "editor_review",
      "reason": "Policy passed but quality score of 6 indicates minor issues that need editorial attention before publication.",
      "reviewer_notes": [
        "Strengthen the conclusion, currently ends abruptly",
        "Add supporting evidence or sources to the second section",
        "Taxonomy auto-applied, verify the Executive audience tag is correct"
      ],
      "priority": "normal"
    }
  },
  "steps": [
    { "step": "Policy Compliance Check", "status": "completed", "duration": "1.8s" },
    { "step": "Quality Assessment",      "status": "completed", "duration": "2.1s" },
    { "step": "Taxonomy Suggestion",     "status": "completed", "duration": "1.6s" },
    { "step": "Routing Decision",        "status": "completed", "duration": "1.4s" }
  ],
  "total_duration": "6.9s"
}

The reviewer opens the content, sees it has been routed to them with a quality score of 6, reads the specific reviewer notes, and knows exactly what to look at. No need to read the whole piece from scratch looking for problems. That is the practical value here.

Things worth knowing before you deploy

Seven seconds of pipeline processing on every content submission is not acceptable for a synchronous save operation. Move the pipeline into a queued job that fires after the initial save, using Drupal's Queue API or a custom queue worker. Store a "pending AI review" state that content sits in while the pipeline runs, then update the moderation state when the job completes.

The system prompts in each step are where the real customisation happens. The policy step above uses generic rules, but for a real enterprise deployment you would replace those with your organisation's actual editorial policies, pulled from a config form or a dedicated policy content type in Drupal itself. That way non-technical editors can update the policy rules without touching code.

On cost, four GPT-4o calls per content submission adds up across a high-volume site. For content that does not need the full pipeline, like very short pieces or resubmissions, consider a lighter first-pass check using GPT-4o-mini before deciding whether to run the full chain. The classification step costs a fraction of the full pipeline and can filter out a significant portion of submissions early.

Finally, keep the pipeline results. Store each step's output against the content revision in a custom table or a long text field. After a few months you will have data on what the pipeline flags most often, how accurate the routing decisions are, and where editors are overriding the AI recommendations. That feedback loop is what lets you improve the system prompts over time and actually measure whether the pipeline is helping.

The pattern here, focused steps, structured JSON outputs, full context passed forward, graceful failure handling, is the same pattern LangChain formalises in its framework. Building it directly in PHP means you keep it inside your existing Drupal infrastructure with no additional services to run. For most enterprise Drupal teams, that is the right tradeoff.

Comments · 0

Post a Comment