How to parse RSS feeds with PHP

Computers & Internet Article Directory, Get Free Reprint Articles and Computers & Internet Content for your site with
article directory
54866 *recent articles in 509 categories Last article added 11/15/07
 
Article Categories
 
Reviews
 
Site Menu
 
Site Search


 
ArticlePros.com » Computers & Internet » RSS Feeds » How to parse RSS feeds with PHP

  • Date: 2007-02-19
  • Author: Paul George
  • All articles by this author
  • Visit author's website
  • How to parse RSS feeds with PHP


    Related RSS Feeds Articles

         XML stands for EXtensible Markup Language and it is a simplified subset of Standard Generalized Markup Language (SGML). Its primary purpose is to facilitate the sharing of data across different information systems, particularly systems connected via the Internet. RSS is a Web content syndication format. Its name is an acronym for Really Simple Syndication. In other words, RSS is a lightweight XML format designed for sharing headlines and other Web content. More details about RSS 2.0 specification can be found at http://blogs.law.harvard.edu/tech/rss. Very often people want to read rss files and display the content on their site using a custom layout. This article represents a complete guide to the entire process of parsing RSS 2.0 files using PHP. Requirements: In order to be able to test the code in this tutorial we need to have installed a web server (I am using Apache: http://httpd.apache.org) configured with support for PHP (http://www.php.net). You can find lots of articles and tutorials on the web on how to install Apache and PHP. Available method for parsing an XML file. Currently there are two methods used by developers to read XML files, no matter what the programming language might be: SAX (Simple API for XML) and DOM (Document Object Model). I will shortly describe each of these methods and finally choose the best for us. SAX (Simple API for XML) is an event based API. Every time a tag is opened or closed, or any time the parser finds some text, it makes callbacks to user-defined functions for each event with the node or text information. The advantage of a SAX parser is that it's really lightweight. The parser doesn't keep anything in memory for very long, so it can be used for extremely large files. The disadvantage is that writing SAX parser event function can take some time and coding experience. The DOM (Document Object Model) defines a standard way for accessing and manipulating XML documents. The DOM presents an XML document as a tree-structure (a node tree), with the elements, attributes, and text defined as nodes. An API implementing DOM standard will read the entire XML document into memory and provide a set of functions for manipulating the data. The drawback of this powerful method is that is not recommended for large XML documents, which would take too much memory to build the model of the document. Because usually people are dealing with normal size files and not everybody has the necessary time or skills to write an entire SAX parser we'll use the DOM method. So let's get started. As a RSS example we'll use the following file: http://www.softarea51.com/rss/windows/Web_Development/XML_CSS_Utilities.xml. A part of this file is given below.
    <?xml version="1.0" encoding="UTF-8"?>
    <rss version="2.0">
    	<channel>
    		<title>SoftArea51 - latest XML & CSS Utilities software for Windows</title>
    		<link>http://www.softarea51.com/windows/Web_Development/XML_CSS_Utilities/LatestReleases-1.html</link>
    		<description>Try and buy latest XML & CSS Utilities software for Windows</description>
    		<language>en-us</language>
    		<image>
    			<title>SoftArea51 - latest XML & CSS Utilities software for Windows</title>
    			<url>http://www.softarea51.com/images/logo.gif</url>
    			<link>http://www.softarea51.com/</link>
    			<description>Try and buy latest XML & CSS Utilities software for Windows</description>
    		</image>
    		<item>
    			<title>Feed Mix</title>
    			<link>http://www.softarea51.com/windows/Web_Development/XML_CSS_Utilities/Review-Feed_Mix.html</link>
    			<description>Feed Mix is a feature-rich RSS editor with the unique ability to create a new RSS feed from several others that already exist...</description>
    		</item>
    		<item>
    			<title>RSS Submit</title>
    			<link>http://www.softarea51.com/windows/Web_Development/XML_CSS_Utilities/Review-RSS_Submit.html</link>
    			<description>RSS Submit is the most powerful RSS feed promotion tool available...</description>
    		</item>
    		<item>
    			<title>PAD-Script</title>
    			<link>http://www.softarea51.com/windows/Web_Development/XML_CSS_Utilities/Review-PAD_Script.html</link>
    			<description>Avoid having to update all your PAD files whenever the PAD format changes...</description>
    		</item>
    		<item>
    			<title>PAD Data Extractor Tool</title>
    			<link>http://www.softarea51.com/windows/Web_Development/XML_CSS_Utilities/Review-PAD_Data_Extractor_Tool.html</link>
    			<description>Data Doctor XML PAD information extractor software tools extract important data from online website XML file...</description>
    		</item>
    		
    	</channel>
    </rss>
    
    In order to get the useful data from the RSS file we need to loop through the item nodes and extract the information we need. Below you can find the script for parsing the above RSS feeds:
    <?php
    
    	$doc = new DOMDocument();
    	$doc->load('http://www.softarea51.com/rss/windows/Web_Development/XML_CSS_Utilities.xml');
    	$arrFeeds = array();
    	foreach ($doc->getElementsByTagName('item') as $node) {
    		array_push($arrFeeds, array (	'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
    					'description' => $node->getElementsByTagName('description')->item(0)->nodeValue,
    					'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
    					'date' => $node->getElementsByTagName('pubDate')->item(0)->nodeValue
    					));
    	}
    
    ?>
    
    The script starts by creating a new DOMDocument object and loading the RSS file into that object using the load method. After that, the script uses the getElementsByName method to get a list of all of the elements with the given name (in our case 'item'). Within the loop of the item nodes, the script uses the getElementsByName method to get the nodeValue for the title, description, link and date tags. The nodeValue is the text within the node. An array is used to store each set of values and each array represents an entry in the big array that holds our structured RSS data. As you can see, the job was easy enough. All the data is now hold by the $arrFeeds array, it is well structured and you can display it using the desired layout. This tutorial was originaly published on SoftArea51 at http://www.softarea51.com/tutorials/parse_rss_with_php.html.

    More articles from this pro: http://www.ArticlePros.com/author.php?Paul George


    More on Computers & Internet and RSS Feeds can be found here.
     

    Get this article to go

    RSS | JScript | Email | HTML

     

    About the author

    Paul George is a software engineer with a solid background in computer programming and teaching methods.

    http://www.softarea51.com

     
    Email options
       

    ** Check all that apply **

     

    This article has been accessed 204 times since 2007-02-19.


    Home  •  Search  •  Add Your Own Article  •  RSS feeds  •  JavaScript Feeds  •   •  Set as Homepage  •  Add to Favourites
    Disclaimer: The information presented and opinions expressed herein are those of the authors
    and do not necessarily represent the views of ArticlePros.com and/or its partners.
    Copyright ArticlePros.com © 2005. All Rights Reserved