SimplePie: Parse RSS/Atom Feeds
I recently wrote a post about using MagpieRSS to parse my Google Reader shared items atom feed. However, I have come across various bugs in the last few days – from extremely slow page loads, out-of-date cache files and character encoding issues.
This is an example of the error message I was having:
Warning: MagpieRSS: Failed to fetch http://www.google.com/reader/public/atom/user%2F13279602483212565421%2Fstate%2Fcom.google%2Fbroadcast (HTTP Error: connection failed (11) in /home/public_html/folder/magpierss/rss_fetch.inc on line 238
I’ve read various articles on the causes of the error. Some suggested an issue with the web host or character encoding problems. Rather than spend hours fixing the issue I decided to try SimplePie instead (something I’ve been meaning to do for a long time).
The following tutorial will talk you through setting up SimplePie and then parsing your Google Reader shared items atom feed (however, the instructions can easily be adapted for parsing Twitter RSS feeds or pretty much any other RSS/Atom feed).
SimpePie requires the following in order to work correctly:
Installing SimplePie
1. Download the latest SimplePie zip file from the official site.
2. Create a folder called php and a folder called cache in the root directory of your site.
3. Change file permissions (CHMOD) on the cache folder to 777 (755 or 775 may also work depending on your server settings).
4. Unzip the SimplePie file that you downloaded, and upload simplepie.inc to the php folder using FTP. Now you are ready to test your server to check that it is compatible with SimplePie.
5. Create a test page to check that SimplePie is correctly parsing your RSS/Atom feed. Copy and paste the following example page into a text editor and change the require location of the simplepie.inc file to match its location on your server. Also change the $url = location to the URL for the RSS/Atom feed you want to parse. Save the file with a .php file extension and upload to your server.
<?php
require '/home/folder/public_html/php/simplepie.inc';
$url = 'http://www.google.com/reader/public/atom/user%2F13279602483212565421%2Fstate%2Fcom.google%2Fbroadcast';
$feed = new SimplePie();
$feed->set_feed_url($url);
$feed->init();
?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>SimplePie Test</title>
</head>
<body>
<h1>SimplePie Test</h1>
<ul>
<?php
// loop through items
foreach ($feed->get_items() as $item):
?>
<li><a href="<?php echo $item->get_link(); ?>"><?php echo $item->get_title(); ?></a> | <?php echo $item->get_date('j F Y'); ?><br /><?php echo $item->get_description(); ?></li>
<?php endforeach; ?>
</ul>
</body>
</html>
6. Navigate to the URL for the page you just uploaded to your server and you should see your RSS/Atom feed content appear in a rudimentary format. This means it works!
Parsing Google Reader Shared Items
When parsing your shared items you may want some extra control over what elements are shown on your page. The following section looks at some simple implementation changes that will allow you to display your feed content in the format you want.
Control how many feed items are shown
By adding $start = 0; and $length = 5; to the PHP in the head of your page you will be able to control where SimplePie begins its parsing, and how many items are displayed on your page. To add this extra control your page will look as follows:
<?php
require '/home/folder/public_html/php/simplepie.inc';
$url = 'http://www.google.com/reader/public/atom/user%2F13279602483212565421%2Fstate%2Fcom.google%2Fbroadcast';
$feed = new SimplePie();
$feed->set_feed_url($url);
$feed->init();
$start = 0;
$length = 5;
?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>SimplePie Test</title>
</head>
<body>
<h1>SimplePie Test</h1>
<ul>
<?php
// loop through items
foreach ($feed->get_items($start,$length) as $item):
?>
<li><a href="<?php echo $item->get_link(); ?>"><?php echo $item->get_title(); ?></a> | <?php echo $item->get_date('j F Y'); ?><br /><?php echo $item->get_description(); ?></li>
<?php endforeach; ?>
</ul>
</body>
</html>
Notice that the PHP loop within the page body has now changed from: foreach ($feed->get_items() as $item): to:
foreach ($feed->get_items($start,$length) as $item):.
I have set it to five in the PHP within the page head, which means SimplePie will parse and display the five most recent feed items.
Truncate long titles
If you are displaying your parsed feed in a sidebar long titles may wrap across several lines. You can control the length of titles that are displayed and add ellipses (…) for long titles to avoid wrapping. To truncate long titles your page will look as follows:
<?php
require '/home/folder/public_html/php/simplepie.inc';
$url = 'http://www.google.com/reader/public/atom/user%2F13279602483212565421%2Fstate%2Fcom.google%2Fbroadcast';
$feed = new SimplePie();
$feed->set_feed_url($url);
$feed->init();
$start = 0;
$length = 5;
?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>SimplePie Test</title>
</head>
<body>
<h1>SimplePie Test</h1>
<ul>
<?php
// loop through items
foreach ($feed->get_items($start,$length) as $item):
?>
<li><a href="<?php echo $item->get_link(); ?>"><?php echo substr($item->get_title(), 0, 45) . '...'; ?></a> | <?php echo $item->get_date('j F Y'); ?><br /><?php echo $item->get_description(); ?></li>
<?php endforeach; ?>
</ul>
</body>
</html>
Within the body the following change has been made: < ?php echo substr($item->get_title(), 0, 45) . '...'; ?>. This tells SimplePie to show titles of up to 45 characters (this includes symbols and spaces), and then show ellipses (…) to show that the title has been truncated.
Truncate descriptions and exclude elements from display
You can do the same with the RSS/Atom feed description by adding the following code to the standard PHP echo description statement: < ?php echo substr($item->get_description(), 0, 160) . '...'; ?>. This tells SimplePie to display up to 180 characters before truncating the description.
One issue with parsing and displaying the feed description is that images and video boxes will display. This is fine in some situations, but if you want to show your content in a simple form such as in a sidebar, showing this extra content may not be suitable and may break your pages. To deal with these we can specify elements that we don’t want SimplePie to display in the parsed feed. This is done in the page head with the following statement: $feed->strip_htmltags(array());.
This can be adjusted so that we can exclude certain elements from our parsed feed description, as follows: $feed->strip_htmltags(array('img','embed','center','strong'));. You can easily add and remove elements from the string depending on what you want to exclude from display.
Our final page code will look as follows:
<?php
require '/home/folder/public_html/php/simplepie.inc';
$url = 'http://www.google.com/reader/public/atom/user%2F13279602483212565421%2Fstate%2Fcom.google%2Fbroadcast';
$feed = new SimplePie();
$feed->set_feed_url($url);
$feed->init();
$start = 0;
$length = 5;
$feed->strip_htmltags(array('img','embed','center','strong'));
?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>SimplePie Test</title>
</head>
<body>
<h1>SimplePie Test</h1>
<ul>
<?php
// loop through items
foreach ($feed->get_items($start,$length) as $item):
?>
<li><a href="<?php echo $item->get_link(); ?>"><?php echo substr($item->get_title(), 0, 45) . '...'; ?></a> | <?php echo $item->get_date('j F Y'); ?><br /><?php echo substr($item->get_description(), 0, 160) . '...'; ?></li>
<?php endforeach; ?>
</ul>
</body>
</html>
The above code lets you:
- Show a set amount of feed items
- Truncate long feed titles
- Show part of the feed description and truncate it past a certain point
- Exclude certain elements from the feed description from being displayed
Hi, love the script example, thanks! It is much cleaner than many I have seen. Can you help me with how to leave out “Event Status:Confirmed after every event? And less important, can the published date be removed too?
Thanks.
Hi Gina
I haven’t had the problem that you mention with “Event Status: Confirmed”. If you send me a link I will have a look for you.
To answer your second question though, it’s quite easy to remove the published date by removing the following line of PHP code < ? php echo $item->get_date(‘j F Y’); ?>. Have a look at the example code in the post to see what I mean.
Basically by removing that piece of PHP code you are stopping the date from being output onto your page.
I hope that helps and thanks for stopping by!
Hey Rob,
Thanks for your reply. I was able to get rid of the Event Status by shortening the description length at echo substr($item->get_description(), 0, 45). It must be part of the Google description.
Decided to keep the published date, but thanks for pointing out the easy fix!
Great tutorial! Works like a charm!
Thanks Jules, I’m glad it was useful for you.
Hi there – can you please tell me if its possible to use this to just display one post via rss but the full post not just a select few lines etc – i have been trying for a while now and im stuck unable to do so!
Many thanks
Harry
Hi Harry, yes it’s possible. You will need to make two adjustments to the code above.
First:
$start = 0;
$length = 1;
Then remove the truncation code from the output as follows:
< ?php echo $item->get_description(); ?>BTW there shouldn’t be a gap between the first bracket
<and?phpbut it keeps getting rendered with the gap.Thanks for the reply – i have just tried that but it doesnt seem to work i get only a few characters now with this code:
set_feed_url($url);
$feed->init();
$start = 0;
$length = 1;
$feed->strip_htmltags(array(‘img’,'embed’,'center’,'strong’));
?>
SimplePie Test
SimplePie Test
get_items($start,$length) as $item):
?>
<a href="get_link(); ?>”>get_title(), 0, 45) . ‘…’; ?>
Can you see what im doing wrong?
Many Thanks!
sorry i see that that didnt post correctly – my code is up here = http://harryfinn.co.uk/doitright/test.php – can you see whats wrong?
Yes, the title is still truncating.
Change this line: ”>get_title(); ?>.
is there any way to get just a first or only one image from content?
substr($item->get_content(), 0, 45, (image -1) . ‘…’;
Thank you
It is possible yes, but it can be awkward because images in feeds are not a uniform size. You simply remove ‘img’ from the $feed->strip_htmltags string at the top of the page. However, I don’t recommend this approach, and there are other scripts and plugins available for handling images better. If you want to display images from a flickr feed for example, I have written a tutorial here.
This article was very helpful and I have it parsing my blog just like I want, except that when I post something new it doesn’t seem to come up on my site right away. It seems to be in the atom feed thing but not on the site. Why isn’t it rechecking the atom url?
Hi Katelyn
I wrote another article that deals with this issue you are having. Basically you need to setup a Cron Job that will automatically execute the SimplePie script at intervals that you set (I.e. once every hour, once every five minutes etc).
Here’s the post I wrote: http://brightscape.net/blog/cronjob-auto-update-rss-cache/