TrendsRecently, I was digging through some of my code and came across a ton of stuff I had that revolves around Google. While I was flipping through some of the code, I came across an app I wrote back when Google Trends first came out. It was a simple application that could scrape the Google Trends page to see what the hottest trends were. I don’t exactly remember why I originally wrote that app, but I apparently had a reason for it at the time. Unfortunately, the app used a simple scraping mechanism that no longer worked with the updated Google Trends site. So, instead of just throwing out the code, I decided to bring it up to date. Besides, who knows when I might need something like this in the future. Since I’m sure there are plenty of others out there that could use this code right now, I’m going to take a minute to share it with you.

To begin with, the original method of simply scraping the Google Trends homepage located at http://www.google.com/trends/ was no longer going to work since Google has changed the way they generate their screens. Like most of the other Google pages, Google Trends now generates their entire user interface on-the-fly using Javascript. However, Google does still make it easy for us to get the current hottest trends by providing us with a nice little RSS feed located at http://www.google.com/trends/hottrends/atom/feed. I’ve shown how to create RSS readers in the past. So, this won’t be anything new, but I will use a simpler method for parsing the XML response by using the XmlDocument along with the SelectNodes and SelectSingleNode methods. Since the Google Trends RSS feed also includes namespaces, I will also make use of the XmlNamespaceManager. Some of the items in the feed also include markup. So, I am also adding a simple method that will strip away any HTML, leaving us with nice clean text to work with.

The first thing you will need to do to get the XML from the Google Trends RSS feed is to incorporate a new WebClient object. Using that object, you can download the entire XML response as text by calling the DownloadString method and passing it the URL of the feed itself.

WebClient wc = new WebClient();
String html = wc.DownloadString(http://www.google.com/trends/hottrends/atom/feed);

Once you have the XML as a string, you will want to load it into a new XmlDocument by calling the LoadXml method.

XmlDocument doc = new XmlDocument();
doc.LoadXml(html);

Before we jump in too far, we will now want to setup our XmlNamespaceManager object which we will use later.

XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace(“ht”, http://www.google.com/trends/hottrends);

Next, you will want to setup a new XmlNodeList which will contain all of the items found within your XML document. Using simple XPath, we can navigate all the way down the element chain and extract all occurrences of “<item>” like this:

XmlNodeList items = doc.SelectNodes(“//rss/channel/item”);

Now that you have a list of all items in the XML, you can iterate over the list using a standard “foreach” routine. Along the way, you can pluck out the information you want by again using the SelectSingleNode and SelectNodes functions. For the purposes of this article, we will pluck out the title & description of each item. Since some items do not include descriptions, you will want to include a null check before calling “.Value” on the description node. Calling “.Value” is what returns the content between the open and close tags. For example, if we have the following XML:

<?xml version="1.0" encoding="UTF-8" ?>
<item>
<title>Prodigy Productions, LLC</title>
</item>



we can access “Prodigy Productions, LLC” by calling doc.SelectSingleNode(“//item/title/text()”).Value.

Alrighty, now that we know how to extract certain portions of content from the XML, we can continue to extract the list of news items that are associated with each hot trend using the same technique. When you start pulling the titles and snippets from each news_item, you will notice that these items have been prefixed with “ht:” such as “ht:news_item”. This is where the XmlNamespaceManager comes into play. To use it, all you have to do is pass it as a second argument to the SelectNodes and SelectSingleNode methods like this:

XmlNodeList news_items = item.SelectNodes(“ht:news_item”, nsmgr);

Another thing you might or might not care about when working with the news items are that they typically contain extra markup which makes it easy to style them when displaying the results in an HTML page. But, since we are spitting out our results to the console, we don’t need this extra markup. For that, we can use our good friend Regex and pass it a simple pattern as shown here:

        const string HTML_TAG_PATTERN = "<.*?>";
        public static string StripHTML(string inputString)
        {
            return Regex.Replace(inputString, HTML_TAG_PATTERN, string.Empty);
        }

That’s it. You are now ready to get the current hottest trends on Google. Below is the entire code used for this article. You can also download my entire Solution project from http://www.prodigyproductionsllc.com/downloads/GoogleTrends.zip. Enjoy!

using System;
using System.Collections.Generic;
using System.IO;
using System.Text.RegularExpressions;
using System.Net;
using System.Xml;

namespace GoogleTrends
{
    class Program
    {
        static void Main(string[] args)
        {
            WebClient wc = new WebClient();
            String html = wc.DownloadString("http://www.google.com/trends/hottrends/atom/feed");

            XmlDocument doc = new XmlDocument();
            doc.LoadXml(html);

            XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
            nsmgr.AddNamespace("ht", "http://www.google.com/trends/hottrends");

            XmlNodeList items = doc.SelectNodes("//rss/channel/item");
            foreach (XmlNode item in items)
            {
                string title = item.SelectSingleNode("title/text()").Value;
                string description = "";
                if (item.SelectSingleNode("description/text()") != null)
                    description = item.SelectSingleNode("description/text()").Value;

                Console.WriteLine("Title: " + title);
                Console.WriteLine("Description: " + description);

                XmlNodeList news_items = item.SelectNodes("ht:news_item", nsmgr);
                foreach (XmlNode news_item in news_items)
                {
                    string news_title = StripHTML(news_item.SelectSingleNode("ht:news_item_title/text()", nsmgr).Value);
                    string news_snippet = StripHTML(news_item.SelectSingleNode("ht:news_item_snippet/text()", nsmgr).Value);

                    Console.WriteLine(" - News Title: " + news_title);
                    Console.WriteLine(" - News Snippet: " + news_snippet + Environment.NewLine);
                }

                Console.WriteLine(Environment.NewLine);
            }

            Console.ReadLine(); // This is here so we can view the trends before the app closes
        }

        const string HTML_TAG_PATTERN = "<.*?>";
        public static string StripHTML(string inputString)
        {
            return Regex.Replace(inputString, HTML_TAG_PATTERN, string.Empty);
        }
    }
}

Grab yourself a copy of my eBook (Android App Development 101) today!

BUY NOW

Related Posts

Tagged with:  

Leave a Reply