Most Joomla sites serve the same content to every registered user regardless of what they have read before, what they searched for, or how long they spent on specific topics. A user who has read six articles about Joomla security gets the same homepage recommendations as someone who only reads performance tuning content. That is a missed opportunity every single time they log in.
Personalisation is not a new idea but it has historically required either expensive third-party platforms or a data science team to implement properly. What has changed is that OpenAI's embedding and completion APIs make it possible to build a genuinely useful personalisation engine inside your existing Joomla installation without either of those things.
What we are building here is a system that tracks what registered users click on, which articles they read, how long they spend reading, and what they search for. It uses that behaviour data to build an interest profile per user, then uses OpenAI embeddings to find articles that match that profile semantically, not just by category tag. The result is a recommendations module that gets more accurate the more a user engages with the site.
What you need: Joomla 4 or 5, PHP 8.1+, Composer, MySQL, and an OpenAI API key.
How the system works End to End
Before writing any code, the architecture is worth understanding clearly. There are three distinct parts to this system and keeping them separate makes the whole thing easier to build and maintain.
Part 1: Behaviour Tracking
User reads an article, searches, or clicks a link
↓
JavaScript sends event data to a Joomla plugin endpoint
↓
Event stored in user_behaviour_events table
Part 2: Profile Building (runs on cron every hour)
Fetch recent behaviour events per user
↓
Build a text summary of user interests from event data
↓
Send summary to OpenAI Embeddings API
↓
Store interest profile vector in user_interest_profiles table
Part 3: Recommendations (runs on module render)
Load current user's interest profile vector
↓
Compare against pre-computed article embedding vectors
↓
Return top N articles by cosine similarity
↓
Render as a recommendations module on any page
The profile building step is the key insight here. Rather than trying to match individual behaviour events to articles directly, we build a text summary of each user's interests from their behaviour data, embed that summary as a vector, and then find articles that are semantically close to it. This means a user who reads articles about "Joomla template overrides" and "child theme development" will get recommendations about "Joomla layout XML" even if that exact phrase never appeared in their behaviour history.
Database Tables
We need three tables. One for raw behaviour events, one for computed user interest profiles, and one for pre-computed article embeddings. Run these in your Joomla database:
CREATE TABLE `#__user_behaviour_events` (
`id` INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
`user_id` INT UNSIGNED NOT NULL,
`event_type` ENUM('view','click','search','time_on_page') NOT NULL,
`article_id` INT UNSIGNED NULL,
`search_query` VARCHAR(500) NULL,
`duration_seconds` INT UNSIGNED NULL,
`metadata` JSON NULL,
`created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
INDEX `idx_user_events` (`user_id`, `created_at`),
INDEX `idx_article_events` (`article_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE `#__user_interest_profiles` (
`user_id` INT UNSIGNED PRIMARY KEY,
`interest_summary` TEXT NOT NULL,
`embedding` JSON NOT NULL,
`events_count` INT UNSIGNED NOT NULL DEFAULT 0,
`updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE `#__article_embeddings` (
`article_id` INT UNSIGNED PRIMARY KEY,
`embedding` JSON NOT NULL,
`indexed_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
MySQL does not have a native vector type so we are storing embeddings as JSON arrays. For sites with large article catalogues, over a few thousand articles, consider migrating to PostgreSQL with pgvector for proper vector indexing and faster similarity queries. For most Joomla sites MySQL with JSON embeddings works fine.
Part 1: Behaviour Tracking Plugin
Create a Joomla system plugin that handles two things: serving a lightweight JavaScript tracker, and receiving the behaviour events that tracker sends back.
Module structure at plugins/system/behaviourtracker/:
plugins/system/behaviourtracker/
behaviourtracker.php
behaviourtracker.xml
src/
Extension/
BehaviourTracker.php
media/
js/
tracker.js
The main plugin class at src/Extension/BehaviourTracker.php:
<?php
namespace Joomla\Plugin\System\BehaviourTracker\Extension;
use Joomla\CMS\Plugin\CMSPlugin;
use Joomla\CMS\Factory;
use Joomla\CMS\Uri\Uri;
class BehaviourTracker extends CMSPlugin
{
public function onAfterDispatch(): void
{
$app = Factory::getApplication();
$user = Factory::getUser();
// Only track logged-in users on site pages
if ($user->guest || $app->isClient('administrator')) {
return;
}
$input = $app->getInput();
$option = $input->getCmd('option');
$view = $input->getCmd('view');
// Track article views automatically on the server side
if ($option === 'com_content' && $view === 'article') {
$articleId = $input->getInt('id');
if ($articleId) {
$this->recordEvent($user->id, 'view', $articleId);
}
}
// Inject the JavaScript tracker on all frontend pages
$doc = $app->getDocument();
$baseUrl = Uri::root();
$doc->addScriptOptions('behaviourTracker', [
'userId' => $user->id,
'endpoint' => $baseUrl . 'index.php?option=com_ajax&group=system&plugin=behaviourtracker&format=json',
'articleId'=> ($option === 'com_content' && $view === 'article')
? $input->getInt('id') : null,
]);
$doc->addScript(Uri::root() . 'media/plg_system_behaviourtracker/js/tracker.js', [], ['defer' => true]);
}
public function onAjaxBehaviourtracker(): void
{
$app = Factory::getApplication();
$user = Factory::getUser();
if ($user->guest) {
echo json_encode(['status' => 'ignored']);
$app->close();
}
$input = $app->getInput()->json;
$eventType = $input->getString('event_type');
$articleId = $input->getInt('article_id', 0);
$query = $input->getString('search_query', '');
$duration = $input->getInt('duration_seconds', 0);
$allowed = ['click', 'search', 'time_on_page'];
if (!in_array($eventType, $allowed)) {
echo json_encode(['status' => 'invalid']);
$app->close();
}
$this->recordEvent(
$user->id,
$eventType,
$articleId ?: null,
$query ?: null,
$duration ?: null
);
echo json_encode(['status' => 'ok']);
$app->close();
}
private function recordEvent(
int $userId,
string $eventType,
?int $articleId = null,
?string $searchQuery = null,
?int $duration = null
): void {
$db = Factory::getDbo();
$row = (object)[
'user_id' => $userId,
'event_type' => $eventType,
'article_id' => $articleId,
'search_query' => $searchQuery,
'duration_seconds' => $duration,
'created_at' => date('Y-m-d H:i:s'),
];
$db->insertObject('#__user_behaviour_events', $row);
}
}
Now the JavaScript tracker at media/js/tracker.js:
(function () {
const config = Joomla.getOptions('behaviourTracker');
if (!config || !config.userId) return;
const endpoint = config.endpoint;
const articleId = config.articleId;
function send(payload) {
navigator.sendBeacon
? navigator.sendBeacon(endpoint, JSON.stringify(payload))
: fetch(endpoint, { method: 'POST', body: JSON.stringify(payload),
keepalive: true });
}
// Track time on page for article views
if (articleId) {
const startTime = Date.now();
window.addEventListener('beforeunload', function () {
const duration = Math.round((Date.now() - startTime) / 1000);
// Only record if they spent more than 10 seconds
if (duration > 10) {
send({
event_type: 'time_on_page',
article_id: articleId,
duration_seconds: duration,
});
}
});
}
// Track internal link clicks to articles
document.addEventListener('click', function (e) {
const link = e.target.closest('a[href]');
if (!link) return;
const href = link.getAttribute('href');
const match = href.match(/[?&]id=(\d+)/);
if (match && href.includes('com_content')) {
send({
event_type: 'click',
article_id: parseInt(match[1]),
});
}
});
// Track search queries
const searchForm = document.querySelector('form[action*="com_search"], form[action*="com_finder"]');
if (searchForm) {
searchForm.addEventListener('submit', function () {
const input = searchForm.querySelector('input[type="text"], input[name="q"], input[name="searchword"]');
if (input && input.value.trim().length > 2) {
send({
event_type: 'search',
search_query: input.value.trim(),
});
}
});
}
})();
Using navigator.sendBeacon for the time-on-page event is important. When a user navigates away, fetch requests get cancelled before they complete. sendBeacon is designed specifically for this situation and guarantees the request goes through even as the page unloads.
Part 2: Interest Profile Builder
This is the part that makes the recommendations smart rather than just "articles in the same category." Create this as a Joomla CLI task that runs on cron every hour.
Create components/com_personalisation/src/Service/ProfileBuilder.php:
<?php
namespace Joomla\Component\Personalisation\Site\Service;
use Joomla\CMS\Factory;
use OpenAI;
class ProfileBuilder
{
private $openai;
private int $lookbackDays = 30;
private int $minEvents = 3;
public function __construct()
{
$params = \JComponentHelper::getParams('com_personalisation');
$this->openai = OpenAI::client($params->get('openai_api_key'));
}
public function buildForAllUsers(): array
{
$db = Factory::getDbo();
// Find users who have behaviour events and need profile updates
$query = $db->getQuery(true)
->select('DISTINCT e.user_id')
->from($db->quoteName('#__user_behaviour_events', 'e'))
->where($db->quoteName('e.created_at') . ' > ' .
$db->quote(date('Y-m-d H:i:s', strtotime("-{$this->lookbackDays} days"))))
->group($db->quoteName('e.user_id'))
->having('COUNT(*) >= ' . $this->minEvents);
$userIds = $db->setQuery($query)->loadColumn();
$built = 0;
$errors = 0;
foreach ($userIds as $userId) {
try {
$this->buildForUser((int) $userId);
$built++;
} catch (\Exception $e) {
$errors++;
\JLog::add(
"Profile build failed for user {$userId}: " . $e->getMessage(),
\JLog::ERROR,
'com_personalisation'
);
}
// Avoid hitting OpenAI rate limits
usleep(100000);
}
return ['built' => $built, 'errors' => $errors];
}
public function buildForUser(int $userId): void
{
$db = Factory::getDbo();
$events = $this->fetchEvents($userId);
if (empty($events)) {
return;
}
// Build interest summary from behaviour events
$summary = $this->buildInterestSummary($userId, $events);
// Get embedding for the interest summary
$response = $this->openai->embeddings()->create([
'model' => 'text-embedding-3-small',
'input' => $summary,
]);
$embedding = $response->embeddings[0]->embedding;
// Upsert the user interest profile
$existing = $db->setQuery(
$db->getQuery(true)
->select('user_id')
->from('#__user_interest_profiles')
->where('user_id = ' . (int) $userId)
)->loadResult();
$row = (object)[
'user_id' => $userId,
'interest_summary' => $summary,
'embedding' => json_encode($embedding),
'events_count' => count($events),
'updated_at' => date('Y-m-d H:i:s'),
];
$existing
? $db->updateObject('#__user_interest_profiles', $row, 'user_id')
: $db->insertObject('#__user_interest_profiles', $row);
}
private function fetchEvents(int $userId): array
{
$db = Factory::getDbo();
$since = date('Y-m-d H:i:s', strtotime("-{$this->lookbackDays} days"));
$query = $db->getQuery(true)
->select([
'e.event_type',
'e.article_id',
'e.search_query',
'e.duration_seconds',
'a.title AS article_title',
'a.catid',
'c.title AS category_title',
])
->from($db->quoteName('#__user_behaviour_events', 'e'))
->leftJoin($db->quoteName('#__content', 'a') . ' ON a.id = e.article_id')
->leftJoin($db->quoteName('#__categories', 'c') . ' ON c.id = a.catid')
->where('e.user_id = ' . (int) $userId)
->where('e.created_at > ' . $db->quote($since))
->order('e.created_at DESC')
->setLimit(200);
return $db->setQuery($query)->loadAssocList();
}
private function buildInterestSummary(int $userId, array $events): string
{
$viewedTitles = [];
$searchQueries = [];
$categories = [];
$longReads = [];
foreach ($events as $event) {
if ($event['article_title']) {
$viewedTitles[] = $event['article_title'];
}
if ($event['category_title']) {
$categories[] = $event['category_title'];
}
if ($event['search_query']) {
$searchQueries[] = $event['search_query'];
}
// Articles read for more than 90 seconds indicate strong interest
if ($event['duration_seconds'] > 90 && $event['article_title']) {
$longReads[] = $event['article_title'];
}
}
// Deduplicate and limit to avoid overly long summaries
$viewedTitles = array_unique(array_slice($viewedTitles, 0, 20));
$searchQueries = array_unique(array_slice($searchQueries, 0, 10));
$categories = array_values(array_unique($categories));
$longReads = array_unique($longReads);
// Count category frequency to identify dominant interests
$catCounts = array_count_values(array_column($events, 'category_title'));
arsort($catCounts);
$topCategories = array_keys(array_slice($catCounts, 0, 5, true));
$parts = [];
if (!empty($topCategories)) {
$parts[] = "Primary interests: " . implode(', ', $topCategories) . ".";
}
if (!empty($searchQueries)) {
$parts[] = "Searched for: " . implode(', ', $searchQueries) . ".";
}
if (!empty($longReads)) {
$parts[] = "Read thoroughly: " . implode(', ', $longReads) . ".";
}
if (!empty($viewedTitles)) {
$parts[] = "Also viewed: " . implode(', ', $viewedTitles) . ".";
}
return implode(' ', $parts);
}
}
The interest summary construction is worth explaining. We weight long reads (over 90 seconds) separately because time spent reading is a stronger signal of genuine interest than a quick click. Category frequency tells us which topics dominate a user's browsing. Search queries tell us what they were actively looking for, not just passively browsing. Combining these three signals produces a richer interest summary than article titles alone.
Part 3: Article Indexer
Before we can recommend articles, we need embeddings for each article. This service indexes your Joomla content into the article embeddings table.
Create components/com_personalisation/src/Service/ArticleIndexer.php:
<?php
namespace Joomla\Component\Personalisation\Site\Service;
use Joomla\CMS\Factory;
use OpenAI;
class ArticleIndexer
{
private $openai;
private int $batchSize = 20;
public function __construct()
{
$params = \JComponentHelper::getParams('com_personalisation');
$this->openai = OpenAI::client($params->get('openai_api_key'));
}
public function indexAll(bool $forceReindex = false): array
{
$db = Factory::getDbo();
$query = $db->getQuery(true)
->select(['a.id', 'a.title', 'a.introtext', 'a.fulltext', 'c.title AS category'])
->from($db->quoteName('#__content', 'a'))
->leftJoin($db->quoteName('#__categories', 'c') . ' ON c.id = a.catid')
->where('a.state = 1');
if (!$forceReindex) {
// Only index articles without existing embeddings
$query->leftJoin($db->quoteName('#__article_embeddings', 'ae') . ' ON ae.article_id = a.id')
->where('ae.article_id IS NULL');
}
$articles = $db->setQuery($query)->loadObjectList();
if (empty($articles)) {
return ['indexed' => 0, 'skipped' => 0];
}
$indexed = 0;
$errors = 0;
$batches = array_chunk($articles, $this->batchSize);
foreach ($batches as $batch) {
$texts = array_map(function ($article) {
$body = strip_tags($article->introtext . ' ' . $article->fulltext);
$body = preg_replace('/\s+/', ' ', $body);
$body = substr(trim($body), 0, 2000);
return $article->category . ': ' . $article->title . '. ' . $body;
}, $batch);
try {
$response = $this->openai->embeddings()->create([
'model' => 'text-embedding-3-small',
'input' => $texts,
]);
foreach ($batch as $i => $article) {
$embedding = $response->embeddings[$i]->embedding ?? null;
if (!$embedding) {
$errors++;
continue;
}
$this->upsertEmbedding($article->id, $embedding);
$indexed++;
}
usleep(200000);
} catch (\Exception $e) {
$errors += count($batch);
\JLog::add('Article indexing error: ' . $e->getMessage(), \JLog::ERROR, 'com_personalisation');
}
}
return ['indexed' => $indexed, 'errors' => $errors];
}
private function upsertEmbedding(int $articleId, array $embedding): void
{
$db = Factory::getDbo();
$existing = $db->setQuery(
$db->getQuery(true)
->select('article_id')
->from('#__article_embeddings')
->where('article_id = ' . $articleId)
)->loadResult();
$row = (object)[
'article_id' => $articleId,
'embedding' => json_encode($embedding),
'indexed_at' => date('Y-m-d H:i:s'),
];
$existing
? $db->updateObject('#__article_embeddings', $row, 'article_id')
: $db->insertObject('#__article_embeddings', $row);
}
}
Notice the article text is prefixed with its category title before embedding. "Joomla Performance: Why your Joomla site is slow and how to fix it" produces a more accurate embedding than the title alone because the category provides context that helps distinguish articles on similar topics across different subject areas.
Part 4: The Recommendation Engine
Create components/com_personalisation/src/Service/RecommendationEngine.php:
<?php
namespace Joomla\Component\Personalisation\Site\Service;
use Joomla\CMS\Factory;
class RecommendationEngine
{
private float $similarityThreshold = 0.70;
public function getRecommendations(int $userId, int $limit = 5): array
{
$db = Factory::getDbo();
// Load the user's interest profile
$profile = $db->setQuery(
$db->getQuery(true)
->select(['embedding', 'events_count'])
->from('#__user_interest_profiles')
->where('user_id = ' . $userId)
)->loadObject();
if (!$profile) {
// No profile yet, return popular articles as fallback
return $this->getPopularArticles($limit);
}
$userVector = json_decode($profile->embedding, true);
if (empty($userVector)) {
return $this->getPopularArticles($limit);
}
// Load all article embeddings
$articles = $db->setQuery(
$db->getQuery(true)
->select(['ae.article_id', 'ae.embedding', 'a.title', 'a.alias',
'a.catid', 'a.introtext', 'a.created', 'a.hits',
'c.title AS category', 'c.alias AS cat_alias'])
->from($db->quoteName('#__article_embeddings', 'ae'))
->join('INNER', $db->quoteName('#__content', 'a') . ' ON a.id = ae.article_id')
->join('LEFT', $db->quoteName('#__categories', 'c') . ' ON c.id = a.catid')
->where('a.state = 1')
)->loadObjectList();
// Get articles the user has already read so we don't recommend them again
$readArticleIds = $db->setQuery(
$db->getQuery(true)
->select('DISTINCT article_id')
->from('#__user_behaviour_events')
->where('user_id = ' . $userId)
->where('article_id IS NOT NULL')
->where("event_type IN ('view', 'click')")
)->loadColumn();
$readSet = array_flip($readArticleIds);
// Score each article by cosine similarity to user interest vector
$scored = [];
foreach ($articles as $article) {
// Skip already-read articles
if (isset($readSet[$article->article_id])) {
continue;
}
$articleVector = json_decode($article->embedding, true);
if (empty($articleVector)) {
continue;
}
$similarity = $this->cosineSimilarity($userVector, $articleVector);
if ($similarity >= $this->similarityThreshold) {
$scored[] = [
'article_id' => $article->article_id,
'title' => $article->title,
'alias' => $article->alias,
'category' => $article->category,
'cat_alias' => $article->cat_alias,
'introtext' => strip_tags($article->introtext),
'created' => $article->created,
'hits' => $article->hits,
'similarity' => $similarity,
];
}
}
// Sort by similarity score, highest first
usort($scored, fn($a, $b) => $b['similarity'] <=> $a['similarity']);
return array_slice($scored, 0, $limit);
}
private function cosineSimilarity(array $a, array $b): float
{
$dot = 0.0;
$magA = 0.0;
$magB = 0.0;
$len = min(count($a), count($b));
for ($i = 0; $i < $len; $i++) {
$dot += $a[$i] * $b[$i];
$magA += $a[$i] ** 2;
$magB += $b[$i] ** 2;
}
$magA = sqrt($magA);
$magB = sqrt($magB);
return ($magA * $magB) > 0 ? $dot / ($magA * $magB) : 0.0;
}
private function getPopularArticles(int $limit): array
{
$db = Factory::getDbo();
return $db->setQuery(
$db->getQuery(true)
->select(['a.id AS article_id', 'a.title', 'a.alias',
'a.introtext', 'a.created', 'a.hits',
'c.title AS category', 'c.alias AS cat_alias'])
->from($db->quoteName('#__content', 'a'))
->leftJoin($db->quoteName('#__categories', 'c') . ' ON c.id = a.catid')
->where('a.state = 1')
->order('a.hits DESC')
->setLimit($limit)
)->loadAssocList();
}
}
The fallback to popular articles for users without a profile yet is important. A new registered user has no behaviour history, so returning nothing or an error is a bad experience. Popular articles are a reasonable default until enough behaviour data accumulates, which typically takes two to three sessions.
Excluding already-read articles from recommendations is something a lot of personalisation implementations skip. There is nothing more frustrating for a user than being recommended an article they read last week. The read set lookup adds minimal overhead and makes the recommendations feel genuinely useful.
Part 5: The Recommendations Module
Create a Joomla module that renders the recommendations anywhere on the site. Module structure at modules/mod_personalised_recommendations/:
modules/mod_personalised_recommendations/
mod_personalised_recommendations.php
mod_personalised_recommendations.xml
tmpl/
default.php
The main module file mod_personalised_recommendations.php:
<?php
defined('_JEXEC') or die;
use Joomla\CMS\Factory;
use Joomla\Component\Personalisation\Site\Service\RecommendationEngine;
$user = Factory::getUser();
if ($user->guest) {
return;
}
$limit = $params->get('article_count', 5);
$engine = new RecommendationEngine();
try {
$recommendations = $engine->getRecommendations($user->id, $limit);
} catch (\Exception $e) {
\JLog::add('Recommendations module error: ' . $e->getMessage(), \JLog::ERROR, 'mod_personalised_recommendations');
$recommendations = [];
}
if (empty($recommendations)) {
return;
}
require JModuleHelper::getLayoutPath('mod_personalised_recommendations', $params->get('layout', 'default'));
The module template at tmpl/default.php:
<?php defined('_JEXEC') or die; ?>
<div class="mod-personalised-recommendations">
<h3><?php echo htmlspecialchars($params->get('header_text', 'Recommended for You')); ?></h3>
<ul>
<?php foreach ($recommendations as $item) : ?>
<li>
<a href="<?php echo JRoute::_(
'index.php?option=com_content&view=article&id=' . $item['article_id']
. '&catid=' . ($item['catid'] ?? '')
); ?>">
<?php echo htmlspecialchars($item['title']); ?>
</a>
<?php if ($params->get('show_category', 1) && !empty($item['category'])) : ?>
<span class="article-category">
<?php echo htmlspecialchars($item['category']); ?>
</span>
<?php endif; ?>
<?php if ($params->get('show_intro', 1) && !empty($item['introtext'])) : ?>
<p><?php echo htmlspecialchars(substr(strip_tags($item['introtext']), 0, 120)) . '...'; ?></p>
<?php endif; ?>
</li>
<?php endforeach; ?>
</ul>
</div>
Wiring It All Together With Cron
The profile builder needs to run on a schedule. Create a Joomla CLI script at cli/personalisation_cron.php:
<?php
define('_JEXEC', 1);
define('JPATH_BASE', dirname(__DIR__));
require_once JPATH_BASE . '/includes/defines.php';
require_once JPATH_BASE . '/includes/framework.php';
use Joomla\CMS\Factory;
use Joomla\Component\Personalisation\Site\Service\ProfileBuilder;
use Joomla\Component\Personalisation\Site\Service\ArticleIndexer;
$app = Factory::getApplication('cli');
$task = $argv[1] ?? 'profiles';
switch ($task) {
case 'index':
$indexer = new ArticleIndexer();
$result = $indexer->indexAll();
echo "Indexed: {$result['indexed']}, Errors: {$result['errors']}\n";
break;
case 'profiles':
default:
$builder = new ProfileBuilder();
$result = $builder->buildForAllUsers();
echo "Profiles built: {$result['built']}, Errors: {$result['errors']}\n";
break;
}
Add these to your server crontab:
# Rebuild user interest profiles every hour
0 * * * * php /path/to/joomla/cli/personalisation_cron.php profiles >> /var/log/joomla_personalisation.log 2>&1
# Index new articles every 6 hours
0 */6 * * * php /path/to/joomla/cli/personalisation_cron.php index >> /var/log/joomla_personalisation.log 2>&1
Run the article indexer manually first before anything else:
php /path/to/joomla/cli/personalisation_cron.php index
This builds embeddings for all your existing articles. Depending on how many articles you have, this might take a few minutes and cost a small amount in OpenAI API tokens. For 500 articles it typically costs less than $0.10 using text-embedding-3-small.
What the Recommendations Look Like in Practice
Here is a realistic example of what the system produces after a user has been active for a few sessions. The user has read articles about Joomla template development, searched for "override layout XML", and spent over two minutes reading an article about Joomla child templates.
Their interest summary built by the profile builder:
Primary interests: Joomla Development, Joomla Theming.
Searched for: override layout XML, Joomla template child.
Read thoroughly: How to Create a Child Template in Joomla 5.
Also viewed: Joomla Template Overrides Explained, Understanding Joomla Layout XML,
Joomla Module Chrome Types, Adding Custom CSS to a Joomla Template.
Top recommendations returned by the engine (articles they have not read yet):
[
{
"title": "Joomla 5 Template Positions: A Complete Guide",
"category": "Joomla Development",
"similarity": 0.912
},
{
"title": "How to Override Joomla Core Templates Without Hacking Core",
"category": "Joomla Development",
"similarity": 0.887
},
{
"title": "Using Bootstrap 5 Effectively in Custom Joomla Templates",
"category": "Joomla Theming",
"similarity": 0.871
},
{
"title": "Joomla Module Assignment by Menu Item and User Group",
"category": "Joomla Development",
"similarity": 0.843
},
{
"title": "Debugging Joomla Layout Issues with the Developer Toolbar",
"category": "Joomla Development",
"similarity": 0.831
}
]
All five are relevant, none are articles the user has already read, and the recommendations are semantically accurate even though the user never used the exact phrase "template positions" or "Bootstrap" in their search queries. That is the embedding similarity working correctly.
A Few Things Worth Knowing Before You Deploy
The cosine similarity calculation in PHP is fine for article catalogues up to a few thousand articles. Once you get beyond that, the in-memory comparison of every article embedding against every user profile starts to add up. At that scale, moving the article embeddings to PostgreSQL with pgvector and running the similarity query in the database will keep response times fast.
Privacy is worth thinking about carefully before deploying behaviour tracking. Depending on your jurisdiction, tracking logged-in user behaviour may require disclosure in your privacy policy and potentially explicit consent. At minimum, update your privacy policy to mention that behaviour data is collected to improve content recommendations. For sites with European users, review GDPR requirements around behavioural profiling before going live.
The 30-day lookback window in the profile builder is a reasonable default but worth tuning for your site. On a site where users visit daily, 30 days captures good signal. On a site where users visit monthly, a 90-day window gives more data to work with. Adjust the $lookbackDays property in ProfileBuilder to match your site's typical visit cadence.
Finally, add a simple feedback mechanism if you can. Even a small thumbs up or thumbs down on recommended articles gives you data to validate whether the recommendations are actually landing well with your users. If most users ignore or actively dismiss recommendations, that tells you something about either the similarity threshold, the quality of your interest summaries, or the article content itself. Without that feedback loop, you are flying blind on whether the engine is actually helping.
Comments · 0
Post a Comment