Assignments‎ > ‎

Assignment 4

posted ‎‎Nov 13, 2008 2:41 PM‎‎ by Michael Armbrust   [ updated ‎‎Dec 4, 2008 4:59 PM‎‎ ]

Overview

In this assignment, you will be creating a simple web application called MyTunes using Ruby on Rails with SQLlite as your backend database. Your website will store information about music, like songs and artists, similar to the example from discussion section earlier in the semester.  You will be practicing the database design process by designing a relational schema to represent the data we give you, defining necessary DDL statements, and writing interesting SQL queries against your database.  The technologies used in this assignment are described in more detail below.

We give you a data file in XML that describes a collection of songs, and you will be responsible for designing a relational schema to represent this data, using Ruby to parse the XML file and populate your database.  After you have populated your database, you will write DML statements to answer the queries we specify. 

Part of this assignment is presenting a front-end view to the results of your queries. Ruby on Rails uses the Model-View-Controller paradigm. Your queries will be specified in a Controller and Views will display records matching each query in your web browser. While you won't be graded on aesthetics, feel free to be as creative as you'd like with the appearance of the Views. See the MVC description below.

Administrivia

  • This assignment may be done in partners
  • It will be due December 8, 2008 at 11:59pm

Key Technologies

In this section we describe some of the tools you will be using to complete this assignment.

Ruby

Ruby is a dynamic, open-source, scripting language.  While learning ruby is obviously not a major focus of this class you will need a cursory understanding in order to complete this assignment. We recommend looking at some of the following resources before you get started. Since this is a new language for some people, please don't hesitate to ask the TAs or newgroup questions about ruby basics, as we don't want you to waste time fighting with the language.

XML

The Extensible Markup Language (XML) is a general-purpose specification for describing hierarchical data.  If you know HTML it will probably look very familiar to you.  While the full specification and validation of XML documents can be fairly complex, this should not effect you when completing this assignment.  Below is an example of the XML schema we will be giving you:

 <data>
    <song title="SexyBack (radio version)" length="208.927" sha1="13a4fd349b19aa1a1007c8c1f48a2b53">
        <artist>Justin Timberlake</artist>
        <artist>Timbaland</artist>
        <album>FutureSex/LoveSounds</album>
        <album>SexyBack (single)</album>
    </song>     <user username='michael' password='test'>
<rating song='13a4fd349b19aa1a1007c8c1f48a2b53' score='10'/>
    </user>
</data>

Each element is represented between <>.  Elements can contain child elements (such as artist with in song), attributes (such as title within song), or just plain text (such as Justin Timberlake inside of <artist></artist>.  Some elements have a <start> and </end> tag, and some (like rating) are empty-element tags.  We will be using a Ruby library, REXML, to parse the xml and convert it to a relational schema of your design.  Example code is provided in the project skeleton.

In ruby we have written most of the code for you to parse the XML.  You can access attributes (song.attributes['sha1']) and the text of the entries (artist.text.strip) from the objects passed to the lambda functions you will need to write.

RubyOnRails

Overview

Ruby On Rails is an open-source web development framework written in Ruby.  Its goal is to make it easy to rapidly design, implement and deploy database backed, interactive websites.  We will be providing you with a partially implemented site so that you can focus on schema/query desgin and not on learning the ins and outs of Ruby/Rails.  However, if you are interested in learning more there are a number of great quick start tutorials and screencasts that might help you understand some of the black magic that is going on behind the scenes.

Get Michael's live coding example from class

Understanding MVC

Rails is based around the Model-View-Controller design pattern. 

<wikipedia>
It is common to split an application into separate layers that run on different computers: presentation (UI), domain logic, and data access. In MVC the presentation layer is further separated into view and controller.  MVC is often seen in web applications, where the view is the actual HTML page, and the controller is the code that gathers dynamic data and generates the content within the HTML. Finally, the model is represented by the actual content, usually stored in a database, and the business rules that transform that content based on user actions.

Though MVC comes in different flavors, control flow generally works as follows:

  1. The user interacts with the user interface in some way (e.g. presses a button).
  2. A controller handles the input event from the user interface, often via a registered handler or callback.
  3. The controller notifies the model of the user action, possibly resulting in a change in the model's state. (e.g. controller updates user's Shopping cart).[4]
  4. A view uses the model (indirectly) to generate an appropriate user interface (e.g. the view produces a screen listing the shopping cart contents). The view gets its own data from the model. The model has no direct knowledge of the view.
  5. The user interface waits for further user interactions, which begins then a new cycle.

By decoupling models and views, MVC helps to reduce the complexity in architectural design, and to increase flexibility and reuse.

</wikipedia>

You will see this functionality separated into directories in app/ in the skeleton provided. 

There are three controllers:

  • application.rb - This is the parent controller for all other controllers in your app.  You should not need to put any code here.
  • data_controller.rb - This controller has only a single method reload that takes in an XML document as a parameter.  You can access it by pointing your browser to http://<server>:<port>/data/reload.  You will need to write the code that deletes all data from your database, parses the XML, and inserts it into the relational schema you have designed.
  • query_controller.rb - This controller has a number of functions that run interesting queries users might want to run over the their myTunes database.  Comments are provided describing the specs for each one.  You can test them by visiting http://<server>:<port>/query/.

ActiveRecord

ActiveRecord is an Object-Relational Mapping library used by RubyOnRails.  It's fairly full featured and makes a number of assumptions about the layout of your schema.  If you are familiar with this library feel free to use any basic (no plug-ins) features you. If not, don't worry, as we have provided a list of these conventions and overrides, along with few simple example below about using ActiveRecord to run arbitrarily complex database queries over the Boats/Sailors example from lecture.

Conventions:
  • Tables are the plural of the class name for any active record instance.  (I.e. class Sailor < ActiveRecord:Base would refer to data in the Sailors table)
  • The primary key for any entity is id.
    • You can override this by putting set_primary_key :key or set_primary_keys [:key1, :key2] at the top of the class definition
  • Foreign keys are always named <foreign table name>_id
    • You can override this by specifiying :foreign_key => [:song] after the belongs_to or has_many
Example:

Sailors(sid, sname, rating, age)
Boats(bid, bname, color)
Reserves(sid, bid, day)

class Sailor < ActiveRecord:Base
    set_primary_key :sid
    has_many :reservations, :foreign_key => :sid
end


class Boat < ActiveRecord:Base
    set_primary_key :bid
    has_many :reservations, :foreign_key => :bid
end


class Reservation < ActiveRecord:Base
    set_primary_keys :sid, :bid, :day
    belongs_to :sailor, :foreign_key => :sid
    belongs_to :boat, :foreign_key => :bid
end


Make a new sailor:
s = Sailor.new
s[:sid] = 1
s[:sname] = "Michael"
s[:rating] = 10
s[:age] = 24
s.save


Find a sailor by name:
s = Sailor.find_by_sname("Michael")

Find all red boats:
b = Boat.find_all_by_color("red")

Find all reservations by a sailor:
s = Sailor.find_by_sname("Michael")
r = s.reservations

These are the basics, but sometimes you want to be able to write more complex queries.  For this you can call back on full SQL.  If you wanted all sailors who have reserved a boat today you could write.
s = Sailor.find_by_sql("SELECT DISTINCT sailor.* FROM sailors JOIN reservations ON sailors.sid = reservations.sid WHERE reservation.day = <today>")

You can even add computed collumns.  For example to find all sailors along with their reservation counts you could write:
s = Sailor.find_by_sql("SELECT sailor.*, count(*) as reservation_count JOIN reservations ON sailors.sid = reservations.sid GROUP BY sailors.sid")
s[0] #First sailor in the list
s[0][:sname] #name of the first sailor in the list
s[0][:reservation_count] #Reservation count for the first sailor in the list

Sqlite3

SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine.  Unlike postgres which uses a client server model, sqlite is an in-process database.  This simplifies deployment and removes many of the problems people had in previous labs getting postgres to run.

Creating a new database is as easy as typing: sqlite3 <db file>.  This will give you a SQL prompt where you can CREATE TABLE, SELECT, etc.  To exit the interpreter you simply type '.quit'.  For this project you will be working on a database called db/development.sqlite3

Assignment

Get the Skeleton

In your home directory on the instructional machines run the following command:
tar xvf ~cs186/fa08/hw4/skel.tar

You will see a number of directories in the main mytunes directory. Those of interest are:
  • app/
    This is the only directory you should need to make changes in. It contains source files  (*.rb and *.html.erb) for the Models, Views, and Controllers.
    Each controller file is in the controllers directory.
    There is a directory within views for each controller's view files (*.html.erb); each function in each controller has its own view, unless it directs its output to a different function's view by calling render :action => 'other_func_name'.
    Your models will go in the models subdirectory; you will create a model for each of the database relations in your schema.
    You don't need to use the helpers directory in controllers, or the layouts directory in views.
  • config/
    This directory contains the database connection configuration, the controllers' URL mappings, and other environment parameters.
  • db/
    Your database state is stored in this directory.
  • log/
    Log information is stored here. You'll be interested in development.log.

Design Your Schema

You will need to design a relational schema to hold the data.  Feel free to design it as you see fit.  Put the DDL (i.e. CREATE TABLE....) into a file called ./schema.sql in the root directory of the app.  DO NOT USE MIGRATIONS.  We will be executing this from a script to create your database.

To setup the database run:
rm -f db/development.sqlite3; sqlite3 db/development.sqlite3 < schema.sql

You can then play around with it in a console by running:
sqlite3 db/development.sqlite3

Find Functional Dependencies

Determine the set of functional dependencies present in your relational schema. Make any schema changes necessary. Save your functional dependencies in a text file called fd.txt

You do not need to list trivial functional dependencies (X -> X), but you do need to prove your schema is in BCNF by listing other dependencies that do exist and stating why they don't violate any of the BCNF conditions..

Start the Server

You can start the server by running:
script/server -p <port>

Pick any high numbered port (1024+).  Your rails app will then be accessible from http://<server>:<port>.  You can stop it with ctrl-c.  You don't need to restart the server everytime you make code changes, rails will auto-reload the files. Helpful debugging information can be found in log/development.log

Import the Data

We have provided sample data for you in the form of ./data.xml in the root of your app.  You can cut and paste this into the form at http://<server>:<port>/data/reload, after implementing the reload function. Make sure you copy the entire data file.

When importing the data you may find that there is extra whitespace surrounding the artist and album names.  The ruby strip command can be used to clean this up. (i.e.  to get the artist name type "artist.text.strip")

Write the Queries

Note: the comments in query_controller.rb for by_artist and by_album should read Fill the songs array with all songs that are by the artist (or album) with name '#{params[:key]}'

Each query should be specified in the query_controller.rb using the given function name (called an action).
  1. songs
    Show the names of all songs with their artists' names and album names
  2. artists
    Show the names of all artists
  3. albums
    Show the name of each album and its songs
  4. by_artist
    Given an artist name, show all songs by that artist
  5. by_album
    Given an album name, show all songs in that album
  6. add_rating
    Add a rating using the given user name, song, and score
  7. best_song
    Show the highest rated song (across all users)
  8. favorite_artist
    Show the highest rated artist for the given user name
  9. artist_ratings
    Show each artist and its average rating.  Artists without song ratings should still be displayed.
  10. two_artists_songs
    Given the names of two artists, show songs they have worked on together.
  11. distinct_collab
    List artists sorted by who has the largest set of distinct collaborations. The largest set is the number of other artists, not the number of collaborations.
You are free to modify the views HTML if you want to make the site look more awesome.  However, it is important you keep the classes of the tags that surround each part of the answer the same, as this is how we will be grading it.

Optimize Your Site

Too Many Queries

Try looking at the development log as you use different pages on the website by running `tail -f log/development.log` in a separate terminal window.  For each hit to the site you should see a block of text containing the following information:
  • The names of the controller and action that ran
  • The originating ip address and timestamp for the request
  • Debugging information like the session id of the user and parameters that were passed
  • A list of SQL queries executed by the request
  • Performance information
Depending on how careful you were, it it quite possible that some of your pages are executing many (10+) queries per request.  This can significantly impact the performance of your website.  Fix any action that does more than 5 queries.  We suggest looking at eager loading in the RoR API documentation.

Indexes

Similarly, some of you actions may be doing scans over large sections of the database to return an answer.  Thus you should modify your schema to add indicies to speed up the queries you run.  If you add the indexes dynamically at the sqlite console, make sure you also update your schema.sql file so that the indexes will be created when we grade the assignment. Briefly discuss your rationale for adding indexes in the fd.txt file you submit.  One useful tool you can use to find out how queries execute is EXPLAIN from inside of a sqlite3 console session.  Run this on specific queries that you have harvested from the development log.  Since EXPLAIN is not part of any standard, the output will be much different than what you saw in Assignment 3.  An example is shown below:

sqlite> explain select * from songs WHERE SHA1 = "793d5c8018384ff24983d55b6ca3ad38";;
addr opcode p1 p2 p3
---- -------------- ---------- ---------- ---------------------------------
0 Goto 0 23
1 Integer 0 0
2 OpenRead 0 2
3 SetNumColumns 0 3
4 Integer 0 0
5 OpenRead 1 3 keyinfo(1,BINARY)
6 String8 0 0 793d5c8018384ff24983d55b6ca3ad38
7 IsNull -1 20
8 MakeRecord 1 0 a
9 MemStore 0 0
10 MoveGe 1 20
11 MemLoad 0 0
12 IdxGE 1 20 +
13 IdxRowid 1 0

Each row represents an operation that is run on the sqlite virtual machine and a reference of what each of them does can be found here.  While it is not necessary to fully understand all of the instructions for the virtual machine, understanding some important operations (like IdxGE) will help you understand when indexes are being utilized vs table scans.  You should create any indicies that are necessary to efficiently execute the queries in the assignement.

What to turn in

Create a tar file of the things to turn in by running `tar cvf hw4.tar app/ schema.sql`.  Then run the submit command to turn in the following:
  1. hw4.tar - A tar file containing the app/ directory and schema.sql file for your rails app
  2. fd.txt - A description of your schema in BCNF along wiht a list of functional dependencies, as well as rationale for added indexes
  3. MY.PARTNERS - Listing the other person that you worked with

How to Submit

  1. Save your files in a directory called "hw4" within your cs186 home directory.
  2. cd into hw4
  3. Run: submit hw4

Tips

Running example

A running example of the assignment is accessible here:
 
http://sphere.cs.berkeley.edu:3000/query

Working Servers

We suggest you spread yourselves out over as many of the instructional machines as possible.  While in theory any of the machines of type "Linuxi686" or "SunOSi86pc" should work, some of them are overly busy or seem to have weird firewalls.  Below is a list of machines known to work (at least at one point).  Please post any additions or corrections to the newsgroup as you encounter them.  Note: since ruby is interpreted instead of compiled you do not need to 'make clean' and remake in order to switch machines.
  • sphere.cs.berkeley.edu
  • rhombus.cs.berkeley.edu
  • cube.cs.berkeley.edu

Debugging

  • You can run ruby code inside the rails enviroment from the command line using:
    script/console
  • Note your server's log in log/development.log

Port troubleshooting

If you get an error like TCPServer Error: Address already in use, you may be trying to start your server on someone else's port or you may already have a server running on that port. Check if you have a server running by ps -u user_name (works on SunOS machines). If you can't kill the ruby server using ctrl-c, kill it using its process id: kill -9 pid.