3rd MoverI develop, therefore I am
 
More in the category
- Coding

Read more about
Php

Amazonification
I use the word amazonification to describe when we dynamically add links and content to a page, based on what other viewers of the page have been doing before and after having visited that particular page. This article describes why and how to do it.

By Martin Joergensen

Persuasive design  < 
This design principle is also referred to as persuasive design — design where you try to persuade your users to browse on even though the have already finished the business or task on your site. I like amazonification better, but people may not know what I mean...

You use their short lack of focus to stimulate their interest in something relevant. These short seconds are often called seducible moments.
 
The idea of amazonification (sometimes referred to as amazonization) is to use the behavior of a mass of users to deduct some relevance between parts of a site, and use that relevance to dynamically show people links to or content from other relevant or potentially interesting parts of the site.
I will exemplify the concept using articles as the parts, but in essence they could be anything: pictures in an online gallery, merchandise in a store, keywords or other elements.

The reason for this somewhat cryptic word is of course what the site Amazon.com has been doing for years. Even though former Amazon guru and evangelist (formal title: Chief Scientist), German Andreas Wiegend, is now touting that the Amazon approach might not be the best anyhow, I will still refer to the concept as Amazonification. Amazon still uses it—and I still like both the term and the concept.
 
What we usually do  < 
Usually it’s the job of an author or editor to create the connection to other relevant articles on a site. This is done by selecting a category for the article, maybe marking it up with some keywords or simply by manually linking to relevant articles.
Keywords and categorization are common means of enabling a system to create an updated and dynamic list of relevant links to an article. They are typically also used to build section front pages, or theme pages on a particular subject.
 
What we want to do  < 
Now users have their own ways. The relevance seen by the author or editor and the links created on the basis of their ideas may not always be the most interesting to the user.
The users can very well see some other connection between elements, which are not covered by the category or keyword system, and which have not been discovered by the author or editor.
The site also develops, and what was relevant when the article was written, may not be relevant anymore, because something newer is likely to have superseded what was most important before.

The way the users indicate connection between things is by clicking on links and traversing the site.
By harvesting information on user’s behavior and analyzing it, we can create new paths between pages, which were not there from the start.
Provided that there’s reason for the user’s clicking, we will even get meaningful paths, which can be helpful to other users.

Of course there is also a danger that users create absolutely irrelevant connections by clicking almost randomly. But two things will help us avoid this situation:
- First of all most users are actually sane and do click based on interest and relevance.
- And secondly we add a democracy factor to our system and only consider stuff relevant when a certain number of users have indicated the same relevance.
 
Critical mass  < 
Gathering information of the character that we are dealing with here usually only makes sense when a site has a significant number of visitors who also visit more than a few pages during a fairly short period.
Exact numbers are difficult to establish, but I will stick out my neck and try to give an estimate.
A daily visitorship in the thousands and a certainty that each visitor sees more than say five pages apart from the front page over a period of a couple of weeks will aide the validity of the harvested data and give meaning to the information we can deduct from it.
In the example that accompanies this article I will set up some of these numbers as variables, which you can tune to fit your needs.
 
Crucial information  < 
To obtain the necessary data we need to roll out a small toolset. We need to clearly be able to identify a couple of crucial elements:
- the individual user
- the single element (article in my example)

We identify the user by assigning each an id and save it in a cookie on the user’s machine. Your site might already do this for other purposes in which case the existing code will do fine. I will supply an inherent user code for this little system anyway, just in case.

The second part is just whatever id the element we’re tracking may have. For the sake of simplicity, I will pretend that we have only one type of entity—the article—and one single set of unique id’s. Oftentimes the world is more complex, but you will have to sort that out yourself.
Nothing hinders the handling of several content types. Just write the type—typically an object class—along with the id of the element, and visitors could then be allowed to create relevance between pictures, people, articles and much more. When returning relevance data, you just need to return the element type too, and incorporate this into the system that shows the relevant items.

Well, I’m digressing...

For now we will only register two pieces of information—visitor id and element id—together with a date and time for every article viewed.
 
Our tools  < 
We will save all data in a database and build a small PHP visitor class to help handling the tasks at hand.
I will run you through the tables and the class—visitor.class.php—in a moment, but start by describing how you use it first.

Its constructor, visitor(), does several central chores, and all you need to do on each page to make sure a visit is registered is to call this method. Place the line:

$visitor=new visitor($element_id);

before any output is made by your page. Since the routine depends on being able to write a cookie, it must run before headers are sent. The object tries to establish this itself by calling headers_sent(), and will avoid error messages. But if headers have been sent before it runs, no information will be saved, rendering the object inane and pretty useless.
$element_id is the unique id of whatever you need to find relevance between.

All other methods are private to this object except the one called relevant(). By calling this:

$relevant=$visitor->relevant($element_id);

we get in return an array of the element_id’s of hopefully relevant articles that the largest number of other users visited during a certain period. How you utilize these id’s is up to you, but digging out the articles they represent and presenting them as headlines with links might be a good suggestion. Your mileage may vary...

If the return array is empty, there were too few relevance instances to establish a list considered valid. The lower limit for this is set inside the class as shown further ahead in the article.
 
The database tables  < 
We need two tables in the database: one for the visitors and one for the visits.
It may surprise you, but I have called them visitors and visits, and you will find a small SQL-file, which will create them for you, in the code supplied with this article.
Make sure that you get the indexes set up properly. The added speed is important, especially in the case of large sites with many visitors.

The visitor id is eight digits in this case, but could also be alphanumeric or any other format. It just needs to be unique to the single user. Adjust the id column format according to your preferred or existing id format.
 
Addendum: the site has since grown to well above 10,000 visitors each day, and I have not experienced any problems. Still, I have trimmed the stored data to stay below 150,000 visitors and 50,000 visits.
Watch the table sizes  < 
Monitor the size of these two tables, particularly visits. Since they are accessed several times with each view of a page, you don’t want them to grow too large. That could potentially slow down page loading times.
Adjust the numbers $max_days and $visitor_max_days referenced below to limit the size of the tables. When any of them grow above 100,000 rows you may consider adjusting the number of days you preserve information as described below. This number of course depends on the number of elements (articles), traffic patterns on the site you run and not least your server resources.

Global FlyFisher, which uses this system, currently has 135,000 unique visitors and about 35,000 page visits registered, which offers no problems and no detectible lag in page presentation. These numbers seem stable with 2-3,000 unique visitors a day combined with two months preservation time of user data and two weeks of visits data.
 
Visitor class in details  < 
If you run your finger down the visitor() constructor you will see what happens in more detail:
First of all we check for search engine robots. Their selection of pages will usually be based on random patterns rather than some idea of relevance. We won’t take the chance.
If the function is_bot() returns true, we go no further. Else we continue. You can add patterns for any crawlers or user agents you want to exclude to the list.

After that we set a few variables to use in the class.
We set the table names for the database. This is convenient for the sake of renaming tables. Just set the names here, and it’s taken care of. The same goes for the name of the cookie.

Then we transfer the id of the element, which the current user is looking at, for later use in $this->element_id.

The next few variables need some explanation.

$this->max_days will tell the system for how long we want to save information on which pages who visited. Older visiting data than this number of days will be deleted. This number can be adjusted depending on the number of visitors, page views and how often your site is updated.
10-20 days seems suitable for a site with weekly updates and hundreds to thousands of visitors a day. Less traffic may call for a larger number and heavy traffic means that enough information is gathered in shorter time.

$this->visitor_max_days is the number of days we save visitor information in our visitors list. If visitors are inactive for more than this number of days, we remove them to stop the list from growing too large.
You can adjust this number depending on the number of returning users your site has. 30-60 days is fine in most cases unless you have a massively visited site or the very opposite.

$this->num_users indicates how many users we need to have seen the same two pages in order for them to be considered relevant to each other. This number will usually be about 5-20, but needs adjustment as the system starts working and its results can be scrutinized. Set it to 1 initially in order for any relevance data to appear while you develop and test your application, but remember to increase it later on.

$this->num_elements tells the system how many relevant elements we want returned. You could also have this as an argument to the function relevant() for more flexibility.
 
The methods  < 
After the variables have been set, we call different methods to do the magic of registering the data.

get_cookie() fetches any code that might be on the user’s machine from an earlier visit.
set_cookie() sets a cookie. If it hasn’t been done before it incorporates generating a code through the method generate_code(), which will supply a unique random code for a new user. When this has been done the new or existing code is written onto the user’s computer and set to last the same amount of time as we keep the user’s information in the database.

update_lastest_visit() will then update the same information in the visitors table the database, refreshing the date of the visit, postponing the day that this visitor’s data will be deleted. The same routine will delete all dated visitors each time it’s called. This ensures that old visitor data is purged.

Our last job is to register the essence of the whole game: what element the user actually looked at. This is done in the function register_element(), which also has two jobs. First it deletes all data that has grown too old, and then it inserts the new data into the table.
 
Check it out  < 
You can see an example of this system at work on my non-commercial playground website The Global FlyFisher.
Take any recent article like the Danish Gallery and go to the bottom. On the left hand side you see the top five articles that people who visited the current articles also visited.
 
Data mining  < 
The routines above are designed to take care of themselves, and just build a dataset describing the behavior of the visitors over whatever period we set up in the class.

Second half of the project is to dig into these data and bring out the information we intended to display in the first place: relevant elements to the current.

For this we have the method relevant().

This method will take the current user’s data—which we already gathered and stored in the object—and find the elements that other users found to be somehow connected to the element that was just viewed.

If you want the method to find relevant elements to some other element than the current, just supply the id number for another element as an argument to the method.

The method does a little SQL gymnastics. I’m sure a real SQL whiz could obtain what we need in a more elegant way, but the approach used has worked well for me.

First we select a number of visitor’s id’s for the people who were most busy looking at the current element—be it an article, a store item, a picture or whatever. We avoid selecting what the current visitor did him or herself.

Next step is to select the topmost other elements that the selected users also looked at, sorted by number of visits to that page.
The method then just stores the element’s id’s in an array and returns this array.

What you do with the result is up to you, but in most cases you will want to fetch article headlines, item descriptions or similar information and create a short list of links. Present them with the text “People who looked at this also looked at:” and you have amazonified your page.
 
Speed issues  < 
When the amount of data grows large enough you will probably run into some performance issues in the database handling. The routines that delete from the tables each time a visitor visits a new place can consume quite a bit of time when working on a significantly large table (like 100,000 entries or more depending on your server and database).

In that case you will have to just purge the tables occasionally and not every time a visit is updated. Add some random function or some timing rule and let the maintenance take place once an hour or how often you find it necesarry. The results are not affected in a way that's visible to the user.

You can also opt for a cron-like scheme, where an occasional maintenance-routine covers the purging of these tables along with other tasks.
Code  < 
Below you will find the code for the amazonification class and further down an example of its usage.

You need a small database to make them work. Here is an SQL-dump that can build the two tables needed. The code below uses root access with no password. Not an example to follow in the real world!


-- 
-- Database: `amazonifocation`
-- 
-- --------------------------------------------------------
-- 
-- Table structure for table `visitors`
-- 

CREATE TABLE `visitors` (
  `id` varchar(8) NOT NULL default '',
  `latest_visit` date NOT NULL default '0000-00-00',
  UNIQUE KEY `id` (`id`),
  KEY `latest_visit` (`latest_visit`)

-- --------------------------------------------------------
-- 
-- Table structure for table `visits`
-- 

CREATE TABLE `visits` (
  `id` varchar(8) NOT NULL default '',
  `created` date NOT NULL default '0000-00-00',
  `element_id` int(11) NOT NULL default '0',
  KEY `id` (`id`),
  KEY `created` (`created`),
  KEY `element_id` (`element_id`)


 
<?php

// *******************************************************************
// visitor.class.php - a simple amazonification toolbox
// Developed by Martin Joergensen, (c) 2005
// martin@globalflyfisher.com, http://globalflyfisher.com
// This code is licensed under the Noncommercial Creative Commons License
// You are not only encouraged to use it noncommercially but also to develop it
// Please credit me if you use or redistribute it

class visitor
    
{

    
// *******************************************************************
    // Public methods

    // Constructor - element_id is the unique id of the element type we're tracking
    // It could be an article ID, item number or the like
    
function visitor($element_id)
        {
        
// Ignore search engine crawlers etc.
        
if ($this->is_bot())
            return 
'';

        
// Set table names
        
$this->visits_table='visits';
        
$this->table='visitors';

        
// Name for the cookie
        
$this->cookie_name='amazonification_id';

        
// Remember the element we want to track
        
$this->element_id=$element_id;

        
// Save visitor's behaviour data for $this->max_days days
        
$this->max_days=10;

        
// Save visitor's data for $this->visitor max_days days
        
$this->visitor_max_days=30;

        
// Minimal number users for a valid relevance list - 1 for testing - higher for real life
        // IMPORTANT! Increase this number before going live
        
$this->num_users=1;

        
// Number of relevant elements to return
        
$this->num_elements=5;

        
// Update visitor data
        
$this->get_cookie();
        
$this->set_cookie();
        
$this->update_lastest_visit();
        
$this->register_element();
        }


    
// Get list of elements relevant to the current 
    
function relevant($element_id=0)
        {
        
// Use argument if supplied, else use current element_id
        
$element_id=$element_id $element_id $this->element_id;

        
// Get other visitors who looked at current element and were most active
        
$result=mysql_query('SELECT id, COUNT(id) AS views 
FROM '
.$this->visits_table.
WHERE element_id ="'
.$this->element_id.'" 
AND id!="'
.$this->id.'" 
GROUP BY id 
ORDER BY views DESC, created 
DESC LIMIT '
.$this->num_users);
echo 
mysql_error();
        
// Go further only when the critical number of other users have seen the current element
        
if (mysql_num_rows($result)==$this->num_users) {
            
// Create an SQL query that selects all the other elements they looked at - most popular first
            
while ($d=mysql_fetch_object($result))
                
$inc.='OR id LIKE "'.$d->id.'" ';
            
$result=mysql_query('SELECT COUNT(*) AS views, element_id
FROM '
.$this->visits_table.' WHERE 
element_id!="'
.$this->element_id.'"
AND (0 '
.$inc.') 
GROUP BY element_id
ORDER BY views DESC 
LIMIT '
.$this->num_elements);

            
// Go through each and store in an array
            
while ($d=mysql_fetch_object($result)) {
                
$ret[]=$d->element_id;
                }

            
// Return the array as a result
            
return $ret;
            }
        }

    
// *******************************************************************
    // Private methods

    // Check for search engine robots
    
function is_bot()
        {
        
// Add whatever type of agent you want to ignore - if any
        
return (eregi("msnbot|googlebot|ask.jeeves|findlinks|movabletype|gigabot|python-urllib",
            
$_SERVER['HTTP_USER_AGENT']));
        }

    
// Fetch cookie value and store
    
function get_cookie()
        {
        
// Get the current visitor id  - if any - from the cookie value
        
$this->id=$_COOKIE[$this->cookie_name];
        }

    function 
set_cookie()
        {
        
// Avoid writing cookie after screen output
        
if (!headers_sent()) {
            if (!
$this->id)
                
// Generate a new code if none has been read
                
$this->id=$this->generate_id();
                
// Set the cookie on the user's machine
                
setcookie($this->cookie_name$this->idtime()+(3600*24*$this->visitor_max_days), '/');  
            }
        }

    function 
delete_cookie()
        {
        
// Avoid writing cookie after screen output
        
if (!headers_sent())
            
// Delete the cookie on the user's machine
            
setcookie($this->cookie_name''time()-3600'/');
        
// Delete the internal representation of same cookie
        
unset($_COOKIE[$this->cookie_name]);
        }

    function 
generate_id()
        {
        do { 
            
// Create a random eight digits number
            
$code.=rand(1000000099999999); 
            
// Check for its existence in the database
            
$result=mysql_query('SELECT id FROM '.$this->table.' WHERE id="'.$code.'"');
            
// Do it again if it already exists
            
} while (mysql_num_rows($result)); 

        
// Write it as a new one when a unique code has been found
        
mysql_query('INSERT INTO '.$this->table.' VALUES("'.$code.'", NOW(), NOW())');
        return 
$code;
        }

    function 
update_lastest_visit()
        {
        
// Update this visitor's last visit
        
mysql_query('UPDATE '.$this->table.' SET latest_visit=NOW() WHERE id="'.$this->id.'"');

        if (!
mysql_affected_rows())
            
// If none were updated, create it as a new entry
            
mysql_query('INSERT INTO '.$this->table.
                
' (id, latest_visit) VALUES ("'.$this->id.'", NOW())');

        
// Clean out passive visitors
        
mysql_query('DELETE FROM '.$this->table.
            
' WHERE (TO_DAYS(NOW())-TO_DAYS(latest_visit))>'.$this->visitor_max_days);
        }

    function 
register_element()
        {
        
// Clean out old entries from visits table
        
mysql_query('DELETE FROM '.$this->visits_table.
            
' WHERE (TO_DAYS(NOW())-TO_DAYS(created))>'.$this->max_days);

        
// Insert the current visitor and element information into the database
        
mysql_query('INSERT INTO '.$this->visits_table.
            
' (id, created, element_id) VALUES ("'.$this->id.'", NOW(), "'.$this->element_id.'")');
        }

    
// *******************************************************************

    
}

?>

An example of a simple index.php, which uses the amazonification-routines above.
 
<?php

// *******************************************************************
// Simple demonstration of visitor.class.php
// Developed by Martin Joergensen, (c) 2005
// martin@globalflyfisher.com, http://globalflyfisher.com
// This code is licensed under the Noncommercial Creative Commons License
// You are not only encouraged to use it noncommercially but also to develop it
// Please credit me if you use or redistribute it

// Call this page with different ?id=nn from different browsers or machines to see it work

// Replace with whatever database info your site recquires
$db mysql_connect('localhost''root''');
mysql_select_db('amazonifocation'$db);

// Include the class
include_once('visitor.class.php');

// visitor MUST be created before any output is done
$visitor=new visitor($_GET['id']);

?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Amazonification</title>
</head>
<body>
<?
$relevant
=$visitor->relevant();
if (
is_array($relevant))
    foreach(
$relevant as $key=>$id)
        echo 
'<a href="?id='.$id.'">'.$id.'<br />';
?>
</body>
</html>

 
Read more about these subjects
Coding - Php

Submit to:           

 © 2010 3rd Mover