Byte 1: Displaying an RSS Feed

  • Description: Your final product will be the a web-enabled application that can query and display an RSS feed (should be something like [yourpseudonym]-rss.appspot.com)
  • Source Code: See https://github.com/jmankoff/data, in Assignments/jmankoff-rss
  • For some assignments we will provide complete or partial source code that you can look at. It is recommended that you try to construct your own source code using the tutorial and only refer to the provided code as needed. This is especially important since we build up the source code iteratively in the tutorial, gradually replacing portions of it, and the provided source code only shows a single view (the final version). In addition, it will rarely be the case that you can use that source code entirely unmodified to complete an assignment.

Overview

In this project, you will create a small application that displays data from an RSS feed. The work you do in this project byte is something you will build on throughout this class. This assignment has the following learning goals:

  • Setting up your environment
  • A first experience with Python
  • Learning how to acquire data from an external source
  • A first experience with the RSS format
  • A first experience with programmable HTML
  • A first experience with forms
  • A first experience setting up a question and deciding what data helps to answer the question.

Detailed instructions for Byte 1

This project requires you to use Pythion 2.7 (please note the version number) and some additional libraries that are available for python. To learn more about Python, you may want to explore www.pythontutor.com. The textbook Introduction to Computing and Programming in Python is an excellent introductory book aimed at non programmers.

Setting Up Python using Google Apps

Google Apps is a development environment that will let you place your code on the web with relative ease. An excellent "Getting Started" tutorial will walk you through the initial creation of a simple application that displays plain text on the web.

  • You will need to select your language (select Python)
  • A framework (select Flask). You will also need to supply a project name (I used 'jmankoff-rss'). When you create a google web application, you will need a unique identifier for it that no one else on the web has used. A good idea for the assignments in this class is to prefix them with a unique id you choose (you can use your username, but then students grading you may know who you are, an anonymous id is fine too).
  • Install google app engine on your machine (allow it to make command symlinks as you will want to do some things at the command line).

The tutorial is quite detailed and helpful. Be sure to follow it all the way through until you can load your website on the web.

Setting up a locally hosted version of your website

You will need to tell Google Appspot about your website to load and debug things locally. The google AppEngine Launcher application looks something like this. To add my app (jmankoff-rss) I clicked on the + button on the bottom left, navigated to the appropriate directory, and filled in other details.

You can ignore "Admin Port" for now. "Port" (which you would have designated at set up time) is the port on which your local application is running. You have to press the 'Run' button to actually visit this application. The default is 8080, in which case you can view the results of your code at http://localhost:8080/.

Installing Libraries

Before we start, let's make sure we've included the correct set of libraries in our application. For this project we will be using:

  • Bootstrap. Head over to http://getbootstrap.com/getting-started/ to download it. Unzip it, and move the subdirectories (css, fonts, js) into the main directory of your application ('jmankoff-rss/' in my case).
  • JQuery. Download the compressed production version at https://jquery.com/download/, and place it in the 'js' directory that you just added to your application.
  • Jinja2 (the Python Template Engine). Jinja is already installed, but you do need to tell appspot you are using it (see below)

Now, set up your app.yaml with the correct information:

# Handlers define how to route requests to your application.
handlers:
- url: /js
  static_dir: js
  application_readable: true
- url: /fonts
  static_dir: fonts
- url: /css
  static_dir: css
- url: /templates
  static_dir: templates
# ... and further down after a bunch of comments
libraries:
  application_readable: true
  application_readable: true
  application_readable: true
- name: jinja2
  version: latest
- name: webapp2
  version: latest

Customizing your Application

To begin serving HTML pages, follow the Jinja2 tutorial. We've already updated the app.yaml file. Now we have to update the main.py file to follow through. Add the following lines at the top of the file:

# Imports
import os
import jinja2
import webapp2
import logging
JINJA_ENVIRONMENT = jinja2.Environment(
    loader=jinja2.FileSystemLoader(os.path.dirname(__file__)),
    extensions=['jinja2.ext.autoescape'],
    autoescape=True)

And change the hello() route as follows:

@app.route('/')
def hello():
    template = JINJA_ENVIRONMENT.get_template('templates/index.html')
    return template.render()

Create a directory inside the [yourname]-byte1 folder named 'templates' and put a file named 'index.html' inside. 'index.html' should contain the following html.

<!DOCTYPE html>

<html>

<head>

<title>Byte 1 Tutorial</title>

</head>

<body> <h1>Data Pipeline Project Byte 1 Example</h1> </body> </html>

The result, when you load it, should look like this:

Data Pipeline Project Rss Byte Example

Now we want to add some bootstrap styling. The sample index.html and about.html files provided with your byte source code are based on a bootstrap theme, you can view more themes on the getting started. page at http://getbootstrap.com/getting-started/ and download sourcecode for example themes. Just be aware that you will need to modify these themes to reflect the directory structure of your google Appspot application. Specifically, you should use 'css/...' to refer to css files, and 'js/...' to refer to javascript.

Debugging your Application

You can quickly and easily test your scripts as you go using a local web page. The Google App Engine Launcher gives you the information you need to do this: Edit main.py, save it, and check that your application is running (There should be a small green arrow to the left of it in the application launcher):

You can ignore "Admin Port" for now. "Port" (which you would have designated at set up time) is the port on which your local application is running. The default is 8080, in which case you can view the results of your code at http://localhost:8080/ [Note: my port in the image above is 8082 because I changed the default, and my corresponding URL would be localhost:8082]. The "Logs" window is also extremely helpful. You can output debugging text there by using the python command Logging.info().

Thus, a very good debugging and editing cycle is [Edit main.py] [reload local web page] [check results and log to make sure your code is doing what you think it is] [rinse and repeat]

Collecting information from your RSS feed in your application

Now that you can show HTML on the web using Bootstrap, it is time to show dynamic information from your RSS feed. To do so you will need to construct a url as follows:

http://www.bing.com/search?q=[search term]&format=rss

For example, if you want to search for dogs, use:

http://www.bing.com/search?q=dog&format=rss

You will need to modify main.py to take data from this feed and display it on the page. There are several ways to do this, ranging from downloading and parsing the raw html yourself to using a third party library that specializes in feeds. This homework will walk you through the latter solution. We will use the feedparser library. Feedparser documentation is available at http://pythonhosted.org/feedparser.

Because Feedparser is not part of the Python standard library, we will need to make sure Google has access to it. This requires downloading it, and copying it (specifically, the file feedparser.py into the same directory as main.py. Once this is done, you should be able to add import feedparser to the top of main.py.

Now that we have a way of testing the code that you write, let's talk about how to parse a feed. The basic approach is as follows:

import feedparser
import logging
feed = feedparser.parse("http://www.bing.com/search?q=[search term]&format=rss")
for item in feed[ "items" ]:
logging.info(item)

Displaying the Feed Contents in Jinja

First, collect the information. We will take advantage of a python simplification for creating a list using a loop here:

data = [{"link": item.link, "title": item.title, "description": item.summary_detail} for item in feed["items"]]

One of the most powerful aspects of Jinja is its ability to display dynamic information provided by Python. We can pass one or more variables to Jinja by placing them in context:

return template.render(feed=data)

Next, update the 'index.html' file to show the information:

<h2>Feed Contents</h2>

<div class="panel panel-primary">


{% for item in feed %}

<div class="panel-heading">

<h3 class="panel-title"><a href="{{ item["link"] }}">{{

item["title"]}}</a></h3>

</div>

<div class="panel-body">

{{item["description"]["value"]}}

</div>

{% endfor %}

</div>

Note the use of {% ... %}. This indicates some logic that should be executed (in this case a for loop). The contents of {{ ... }} are replaced with their value. The resulting output looks like this:

Letting the user control the search

We will start with a very simple form that you can add to your 'index.html' file:

<form action="search" method="POST">
  Search Term: <input name="search_term" value="cats">
  <input type="submit" value="Enter Search Term">
</form>

We will also need a way to display the search results. This involves placing some additional logic inside the 'index.html' file to display the search terms (aids debugging) and results. We use the if / endif statements for error checking: If the term isn't present, the page will still render.

{% if search: %}
<p>Searching for {{search}}</p>
{% endif %}

Now we need to collect the form data. This involves adding a handler for post to 'main.py' as follows:

    def post(self):
        logging.info("post")
        terms = self.request.get('search_term')
        context = {"search": terms}
        self.render_response('index.html', **context)

Note that the input name specified in 'index.html' and the string used in self.request.get need to match up for jinja to show anything. In the code above, 'search_term' will show up (see below) but since we have not provided any results, that part of the web page will not render.

When this is done, after you type a search term in, the web page at http://localhost:8080/ should show the following:

[... and so on]

Finally, we need to use the search term result. First we need to make sure that flask knows we are accepting a post from the search form (and check the result).

@app.route('/search', methods=['POST'])

def search():

term = request.form["search_term"]

logging.info(term)

Now we build the new search url (still inside the search method) and fetch the result for display.

template = JINJA_ENVIRONMENT.get_template('templates/index.html')

url = "http://www.bing.com/search?q=" + term + "&format=rss"

feed = feedparser.parse(url)

data = [{"link":item.link, "title":item.title, "description":item.summary_detail} for item in feed["items"]]

return template.render(feed=data)

Finally, if we want to show the default search term in the form input, we change it to

<form action="search" method="POST">
  Search Term: <input name="search_term" value={{search}}><br>
  <input type="submit" value="Enter Search Term">
</form>

and change the get method in 'main.py' to pass in "cat" as the default search term:

context = {"feed" : feed, "search" : "cat"}

Now we should have a working search form. Here is an example showing the results of a search for "cats"

Setting up a question and answering it

At this point you should have a working version of code that answers a question I set out (what news articles are available about this topic) corresponding to the reference application http://jmankoff-rss.appspot.com/. The deeper thinking in this assignment requires that you select RSS data and display its contents in a way that correspond to a question you have designed and answered. Note that the example code does not demonstrate this. This is the first step (and of course will eventually be much more iterative) in any data pipeline: Figuring out what data you need to answer your question. You should:

  1. Identify a question and ensure that it is clear to the person viewing your assignment what that question is (this will probably involve modifying index.html to show the question).
  2. Identify data that you think can help answer that question (you are limited by what was introduced in this byte to data sources that are searchable, and return an RSS feed. If you want to go beyond that, see below). Here are some possibilities:
    • Learn how to use bing to filter the search down to a specific type of page or topic. For example, to just search news about obama on bing, you can use http://www.bing.com/news/search?q=obama&format=rss
    • Use search term to filter which items from an RSS are displayed in your code rather than on someone else's server
    • Feedspot (paid service) or Feedly (paid service, connects to IFTTT which could enable interesting things)
    • Trulia (real estate search -- I searched for 'Pittsburgh, PA')
    • Yahoo! finance (I searched for 'CAT')
  3. Display the results on the web page.

When this assignment was originally written, RSS was everywhere, and Yahoo! pipes even let you turn non RSS stuff into RSS. There is unfortunately much less right now, so if you find good feed ideas, post them to the class!

Questions you should be able to answer when you hand this in

What question does your application help to answer, and how does it let the end user answer that question?

Describe the flow of information from the end user (who enters a search term) until it is displayed back to the end user, in terms of the specific components relevant to the assignment (end user; jinja2; main.py; yahoo pipes).

Where is the URL for the working version of your assignment?

Taking it further (optional things to think about):

A working application based on this tutorial will be sufficient to get full credit for this byte. A reference version can be tried out at http://jmankoff-byte1.appspot.com/ You can get extra credit if you impress the grader. Below are some ideas:

  • Display results on a map or showing summary stats
  • Dynamically update your page as new information arrives
  • Playing with other types of actions. For example, Pocket, which you can hook up to IFTTT to do interesting things produces an RSS feed. The link for your pocket account will be something like 'https://getpocket.com/users/[yourpocketusername]/feed/' Or you can use a form to trigger IFTTT actions using
  • Trigger IFTTT recipes from your webpage using the 'Maker' channel (use http://requestb.in/ to debug)
  • Something else you think of that we didn't :)