In this project, you will create a small application that displays data from an RSS feed. The work you do in this project byte is something you will build on throughout this class. This assignment has the following learning goals:
This project requires you to use Pythion 2.7 (please note the version number) and some additional libraries that are available for python. To learn more about Python, you may want to explore www.pythontutor.com. The textbook Introduction to Computing and Programming in Python is an excellent introductory book aimed at non programmers.
Google Apps is a development environment that will let you place your code on the web with relative ease. An excellent "Getting Started" tutorial will walk you through the initial creation of a simple application that displays plain text on the web.
The tutorial is quite detailed and helpful. Be sure to follow it all the way through until you can load your website on the web.
You will need to tell Google Appspot about your website to load and debug things locally. The google AppEngine Launcher application looks something like this. To add my app (jmankoff-rss) I clicked on the + button on the bottom left, navigated to the appropriate directory, and filled in other details.
You can ignore "Admin Port" for now. "Port" (which you would have designated at set up time) is the port on which your local application is running. You have to press the 'Run' button to actually visit this application. The default is 8080, in which case you can view the results of your code at http://localhost:8080/
.
Before we start, let's make sure we've included the correct set of libraries in our application. For this project we will be using:
Now, set up your app.yaml with the correct information:
# Handlers define how to route requests to your application.
handlers:
- url: /js
static_dir: js
application_readable: true
- url: /fonts
static_dir: fonts
- url: /css
static_dir: css
- url: /templates
static_dir: templates
# ... and further down after a bunch of comments
libraries:
application_readable: true
application_readable: true
application_readable: true
- name: jinja2
version: latest
- name: webapp2
version: latest
To begin serving HTML pages, follow the Jinja2 tutorial. We've already updated the app.yaml file. Now we have to update the main.py file to follow through. Add the following lines at the top of the file:
# Imports
import os
import jinja2
import webapp2
import logging
JINJA_ENVIRONMENT = jinja2.Environment(
loader=jinja2.FileSystemLoader(os.path.dirname(__file__)),
extensions=['jinja2.ext.autoescape'],
autoescape=True)
And change the hello()
route as follows:
@app.route('/')
def hello():
template = JINJA_ENVIRONMENT.get_template('templates/index.html')
return template.render()
Create a directory inside the [yourname]-byte1 folder named 'templates' and put a file named 'index.html' inside. 'index.html' should contain the following html.
<!DOCTYPE html>
<html>
<head>
<title>Byte 1 Tutorial</title>
</head>
<body> <h1>Data Pipeline Project Byte 1 Example</h1> </body> </html>
The result, when you load it, should look like this:
Now we want to add some bootstrap styling. The sample index.html and about.html files provided with your byte source code are based on a bootstrap theme, you can view more themes on the getting started. page at http://getbootstrap.com/getting-started/ and download sourcecode for example themes. Just be aware that you will need to modify these themes to reflect the directory structure of your google Appspot application. Specifically, you should use 'css/...' to refer to css files, and 'js/...' to refer to javascript.
You can quickly and easily test your scripts as you go using a local web page. The Google App Engine Launcher gives you the information you need to do this: Edit main.py
, save it, and check that your application is running (There should be a small green arrow to the left of it in the application launcher):
You can ignore "Admin Port" for now. "Port" (which you would have designated at set up time) is the port on which your local application is running. The default is 8080, in which case you can view the results of your code at http://localhost:8080/
[Note: my port in the image above is 8082 because I changed the default, and my corresponding URL would be localhost:8082]. The "Logs" window is also extremely helpful. You can output debugging text there by using the python command Logging.info()
.
Thus, a very good debugging and editing cycle is [Edit main.py
] [reload local web page] [check results and log to make sure your code is doing what you think it is] [rinse and repeat]
Now that you can show HTML on the web using Bootstrap, it is time to show dynamic information from your RSS feed. To do so you will need to construct a url as follows:
http://www.bing.com/search?q=[search term]&format=rss
For example, if you want to search for dogs, use:
http://www.bing.com/search?q=dog&format=rss
You will need to modify main.py
to take data from this feed and display it on the page. There are several ways to do this, ranging from downloading and parsing the raw html yourself to using a third party library that specializes in feeds. This homework will walk you through the latter solution. We will use the feedparser library. Feedparser documentation is available at http://pythonhosted.org/feedparser.
Because Feedparser is not part of the Python standard library, we will need to make sure Google has access to it. This requires downloading it, and copying it (specifically, the file feedparser.py
into the same directory as main.py
. Once this is done, you should be able to add import feedparser
to the top of main.py
.
Now that we have a way of testing the code that you write, let's talk about how to parse a feed. The basic approach is as follows:
import feedparser
import logging
feed = feedparser.parse("http://www.bing.com/search?q=[search term]&format=rss")
for item in feed[ "items" ]:
logging.info(item)
First, collect the information. We will take advantage of a python simplification for creating a list using a loop here:
data = [{"link": item.link, "title": item.title, "description": item.summary_detail} for item in feed["items"]]
One of the most powerful aspects of Jinja is its ability to display dynamic information provided by Python. We can pass one or more variables to Jinja by placing them in context:
return template.render(feed=data)
Next, update the 'index.html' file to show the information:
<h2>Feed Contents</h2>
<div class="panel panel-primary">
{% for item in feed %}
<div class="panel-heading">
<h3 class="panel-title"><a href="{{ item["link"] }}">{{
item["title"]}}</a></h3>
</div>
<div class="panel-body">
{{item["description"]["value"]}}
</div>
{% endfor %}
</div>
Note the use of {% ... %}
. This indicates some logic that should be executed (in this case a for loop). The contents of {{ ... }}
are replaced with their value. The resulting output looks like this:
We will start with a very simple form that you can add to your 'index.html' file:
<form action="search" method="POST">
Search Term: <input name="search_term" value="cats">
<input type="submit" value="Enter Search Term">
</form>
We will also need a way to display the search results. This involves placing some additional logic inside the 'index.html' file to display the search terms (aids debugging) and results. We use the if / endif statements for error checking: If the term isn't present, the page will still render.
{% if search: %}
<p>Searching for {{search}}</p>
{% endif %}
Now we need to collect the form data. This involves adding a handler for post to 'main.py' as follows:
def post(self):
logging.info("post")
terms = self.request.get('search_term')
context = {"search": terms}
self.render_response('index.html', **context)
Note that the input name specified in 'index.html' and the string used in self.request.get need to match up for jinja to show anything. In the code above, 'search_term' will show up (see below) but since we have not provided any results, that part of the web page will not render.
When this is done, after you type a search term in, the web page at http://localhost:8080/ should show the following:
[... and so on]
Finally, we need to use the search term result. First we need to make sure that flask knows we are accepting a post from the search form (and check the result).
@app.route('/search', methods=['POST'])
def search():
term = request.form["search_term"]
logging.info(term)
Now we build the new search url (still inside the search method) and fetch the result for display.
template = JINJA_ENVIRONMENT.get_template('templates/index.html')
url = "http://www.bing.com/search?q=" + term + "&format=rss"
feed = feedparser.parse(url)
data = [{"link":item.link, "title":item.title, "description":item.summary_detail} for item in feed["items"]]
return template.render(feed=data)
Finally, if we want to show the default search term in the form input, we change it to
<form action="search" method="POST">
Search Term: <input name="search_term" value={{search}}><br>
<input type="submit" value="Enter Search Term">
</form>
and change the get
method in 'main.py' to pass in "cat" as the default search term:
context = {"feed" : feed, "search" : "cat"}
Now we should have a working search form. Here is an example showing the results of a search for "cats"
At this point you should have a working version of code that answers a question I set out (what news articles are available about this topic) corresponding to the reference application http://jmankoff-rss.appspot.com/. The deeper thinking in this assignment requires that you select RSS data and display its contents in a way that correspond to a question you have designed and answered. Note that the example code does not demonstrate this. This is the first step (and of course will eventually be much more iterative) in any data pipeline: Figuring out what data you need to answer your question. You should:
When this assignment was originally written, RSS was everywhere, and Yahoo! pipes even let you turn non RSS stuff into RSS. There is unfortunately much less right now, so if you find good feed ideas, post them to the class!
What question does your application help to answer, and how does it let the end user answer that question?
Describe the flow of information from the end user (who enters a search term) until it is displayed back to the end user, in terms of the specific components relevant to the assignment (end user; jinja2; main.py; yahoo pipes).
Where is the URL for the working version of your assignment?
A working application based on this tutorial will be sufficient to get full credit for this byte. A reference version can be tried out at http://jmankoff-byte1.appspot.com/ You can get extra credit if you impress the grader. Below are some ideas:
'https://getpocket.com/users/[yourpocketusername]/feed/'
Or you can use a form to trigger IFTTT actions using