Cronjob Auto Update RSS Cache
I recently wrote a tutorial on using SimplePie to parse your Google Reader shared items Atom feed (or any other RSS/Atom feed for that matter). However, after implementing it on this site (in the sidebar here) and testing it for a month-or-so, I’ve found page load times are very slow whenever SimplePie parses the feed and updates the cache. I had it set to request an update every hour (3600 seconds), which in hindsight is probably too often for my needs.
The ultimate solution would be for the cache to update in the background, server-side rather than when the page is loaded by a user and the criteria of 3600 seconds has passed since the last update (I.e. to avoid very slow page load and bad user experience). In other words, when users visit the page, the cache has already been updated in the background so the user arrives on the page without a long delay. The PHP in the header of the page simply calls in the information from the cache file, rather than reparsing and caching.
By setting up a cronjob I’ve been able to specify exactly when I want SimplePie to update the cache which means the page on my site loads without having to reparse and cache the feed.
The problem with setting up a cronjob is that my web hosting isn’t setup for cronjobs or scheduled tasks. To achieve this I would need to update my hosting–so I have found another solution to this problem. There are several remote sites that allow you to setup a cronjob on their server, which they then remotely call your file automatically at your specified times.
In the rest of this article I will show you how to create a cronjob file, refer to it in the page you want to display your updated feed, and how to setup the scheduled cronjob via an online scheduled task service. I am basing the rest of this article on the SimplePie tutorial that I wrote previously for example purposes.
SimplePie Example Page
Below is the original code for parsing and pulling in a Google Reader shared items atom feed. Tutorial here.
<?php
require '/home/folder/public_html/php/simplepie.inc';
$url = 'http://www.google.com/reader/public/atom/user%2F13279602483212565421%2Fstate%2Fcom.google%2Fbroadcast';
$feed = new SimplePie();
$feed->set_feed_url($url);
$feed->init();
$start = 0;
$length = 5;
$feed->strip_htmltags(array('img','embed','center','strong'));
?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>SimplePie Test</title>
</head>
<body>
<h1>SimplePie Test</h1>
<ul>
<?php
// loop through items
foreach ($feed->get_items($start,$length) as $item):
?>
<li><a href="<?php echo $item->get_link(); ?>"><?php echo substr($item->get_title(), 0, 45) . '...'; ?></a> | <?php echo $item->get_date('j F Y'); ?><br /><?php echo substr($item->get_description(), 0, 160) . '...'; ?></li>
<?php endforeach; ?>
</ul>
</body>
</html>
Create a Cronjob File
By using the example page above, it is simply a case of copying the PHP at the very top of the code into a separate PHP file. So your PHP file should have the following code:
<?php
// Change location to match simpliepie.inc file location on your server
require '/home/folder/public_html/php/simplepie.inc';
// Change atom feed URL to your own
$url = 'http://www.google.com/reader/public/atom/user%2F13279602483212565421%2Fstate%2Fcom.google%2Fbroadcast';
$feed = new SimplePie();
$feed->set_feed_url($url);
$feed->set_cache_duration(0);
$feed->set_timeout(5);
// Change location of cache folder in line below
$feed->set_cache_location($_SERVER['DOCUMENT_ROOT'] . '/about/cache');
$feed->init();
$start = 0;
$length = 5;
$feed->strip_htmltags(array('img','embed','center','strong'));
?>
Notice the line $feed->set_cache_duration(0); has a value of 0 so that each time the file is called, it will reparse and update the cache. Also add the line $feed->set_timeout(5); below the cache duration line.
Now upload the PHP file to your server. You will need to refer to this file using the cron service later.
Update SimplePie Example Page
On your original page where the parsed Atom feed appears you will now need to make some changes to the PHP at the top of the page. This is shown below:
<?php
require '/home/folder/public_html/php/simplepie.inc';
$url = 'http://www.google.com/reader/public/atom/user%2F13279602483212565421%2Fstate%2Fcom.google%2Fbroadcast';
$feed = new SimplePie();
$feed->set_feed_url($url);
$feed->set_cache_duration(999999999);
$feed->set_timeout(-1);
$feed->init();
$start = 0;
$length = 5;
$feed->strip_htmltags(array('img','embed','center','strong'));
?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>SimplePie Test</title>
</head>
<body>
<h1>SimplePie Test</h1>
<ul>
<?php
// loop through items
foreach ($feed->get_items($start,$length) as $item):
?>
<li><a href="<?php echo $item->get_link(); ?>"><?php echo substr($item->get_title(), 0, 45) . '...'; ?></a> | <?php echo $item->get_date('j F Y'); ?><br /><?php echo substr($item->get_description(), 0, 160) . '...'; ?></li>
<?php endforeach; ?>
</ul>
</body>
</html>
You will notice that the two differences are the inclusion of $feed->set_cache_duration(999999999); and $feed->set_timeout(-1);. This will stop the page reparsing and caching the Atom feed, and instead will use the cached file already run on a scheduled basis via the separate PHP file. Now once you have uploaded your updated SimplePie Example page and the separate PHP file, you will need to setup the cronjob via an online service such as webcron.org.
Setup Cronjob Via WebCron
Services such as webcron are very easy to use, and for many, it’s the only solution for scheduled tasks unless you are prepared to pay for an upgrade with your web host.
Once you have registered as a user, it is a very simple process of referring to the PHP file location and then setting up the cronjob schedule.
To add a new cronjob simply click the link ‘+ Add a cronjob’
On the next page setup a cronjob schedule. This is fairly straight-forward. By selecting All for any category the cronjob will run every five minutes. But if you don’t need to update that often you can specify certain months, days of the week or times of the day.
I set mine up to run every month (All in Month column), every day (All in Day column), only week days (Mon – Fri in Week day column), 10am, 12am, 2pm, 4pm, 6pm, 9pm (10, 12, 14, 16, 18, 21 in Hour column), on the hour (0 in the Minutes column).
Now it is a simple case of submitting your new cronjob and then hitting the test link. You can also track the history to see if there have been any errors in fetching your PHP file and running the cronjob.