Course syllabus‎ > ‎

Project overview

Motivation

Data management is one of the fundamental concerns in computing, and solutions can range from storing data in plain text files, to sophisticated relational databases, such as IBM DB2, Microsoft SQL Server, MySQL, or Oracle, supporting concurrent query-based data access from thousands of users.

More recently, there have been efforts to move storage into "the cloud", essentially treating storage as a service managed by someone else so users have reliable and secure access to their data anywhere and anytime without dealing with the complexities of managing sophisticated database management systems and maintaining operating systems and hardware. Amazon's S3 and SimpleDB services and Google Base are examples of such services.

There exists also a breed of highly specialized data management solutions for managing and analyzing financial data, network data or environmental data collected through large sensor fields.

The design, development, understanding, and maintenance of these systems require a wide variety of expertise including data modeling, software engineering, networking, distributed systems, administration and backup, security, computer architecture and performance optimization.

To give you some appreciation for some of the issues involved, you will be designing and implementing a simple storage server over a series of assignment milestones in this course. You will apply software engineering principles, collaborate on software design and development, learn to use popular programming tools, learn to design the software before you develop it, learn to maintain proper documentation of the software artifacts you build, learn to present and demonstrate your software, and in the end have a working implementation of a storage server. Finally, we plan to show in one of the final lectures where the storage server fits in a large distributed data management system built by one of the leading Internet companies.

Working individually and working in teams

Except for the first milestone, Milestone 1, where you work individually,  the software project in this course is conducted in teams of 3. The teams must work together for the duration of the course. Each course milestone including midterm software demonstration, final presentation, design document submissions, and code submissions is a team effort.

Evaluation of students is based on their individual performance during the software demonstration, the final presentation, the project memos presented to the project manager, and as documented by the students themselves in the attribution tables that are part of the design document. Course marks are not assigned on a per team basis, but on an individual basis.

Instructions for submitting team preferences are specified in Group Selection Instructions. Students who do not select a team by the deadline will be randomly assigned to teams.

You will be expected to formulate team rules. If you are frustrated by the dynamics within your team, and/or your team-mates' behavior, it is important to address this in an appropriate and effective way. Your Project Manager can help you resolve these issues and help you, as a team, develop strategies for working successfully with each other. Remember: most projects do NOT fail for technical reasons, they fail because of planning and team management problems. Do address these concerns early and do not postpone the resolution of problems.

Milestone overview

Milestone 1

In Milestone 1, you will design the command-line shell for the storage client and implement basic logging functionalities.


By developing these features, you will develop a basic understanding of the skeleton code we provide and familiarize yourself with the code submission instructions we specify with each milestone. Also, the features developed in this milestone will help with the rest of the software project and the midterm demonstration.


When selecting a team and starting to work as a team after the completion of Milestone 1, you may opt to use one of the three shells and logging features developed by either team member or opt to merge your code to use the most appropriate functionality. This choice is yours to make as a team.

Milestone 2

In Milestone 2, you design and develop the basic storage server, and you learn to document your design.


The assignment offers you an understanding of basic data management principles, involves socket-based client/server programming, and simple parsing techniques.


You have to plan and to discuss the design and its rationale in a design document that documents your software project. We place great emphasis on the discussion and presentation of design decisions. You have to document architecture and protocols with pertinent diagrams.


We also aim to stress test the storage server by exposing it to a fair sized amount of data. Part of the tasks of this assignment is the preparation of data found online for management with your storage server. We will emphasize the use of appropriate tooling, such as scripting languages or advanced editors for cleaning and extracting the tables to be managed.


Subsequent assignments will build on software and documentation developed in this milestone. Therefore, clean and clear documentation is essential.

Milestone 3

In Milestone 3 you extend the basic storage server by enabling it to directly process more complex data types specified in the configuration file.

To ease parsing of the configuration file and protocol message, you are required to use advanced parsing techniques, greatly simplified by the effective use of tools. Complex scanners and parsers can be easily generated and used as functions in your code. As text scanner, also known as lexical analyzer, we recommend Lex and, as parser generator, we recommend YACC (a.k.a., Yet Another Compiler Compiler). We will be discussing the use of the Lex implementation flex and the YACC implementation bison. Both are freely available as open source and widely used in industry. The use of Lex and Yacc will be optional and you are free to manually implement the required functionality.

As already emphasized in Milestone 2, we continue to place great emphasis on unit testing and expect that you implement and follow a clear test plan by rigorously adding unit tests to the tests already included in the skeleton code.

To ease understanding and transitioning of software artifacts, we introduce standard modeling notations for you to use when describing, designing, and documenting your design.

Milestone 4

In Milestone 4 you further extend the storage server with performance improving measures through multi-threading, leaving multi-tasking and event-driven performance enhancement measures as optional for those who feel ambitious.

A good understanding of concurrency, atomicity and synchronization as well as performance evaluation plays a key role.

We ask that you prepare a detailed performance evaluation report as part of the final design document submission. In this report, you graph, analyze, interpret, and discuss the implications of concurrency, caching, and other criteria (e.g., design decisions you made, trade-offs you see, etc.) for various workloads.

Summary of key project concepts by milestone

The table below summarizes the key concepts and deliverables of each assignment milestone.


Key conceptsMain deliverables
Milestone 1Familiarization with skeleton code, command-line shell for storage server client, logging on client and server side, and following of submission instructions.
Code and code documentation
Milestone 2Basic storage server, client/server concept, socket-based communication, simple parsing, tool-based data manipulation, standard modelling.
Design document, diagrams in standard notation, test plan, code documentation, performance evaluation.


Milestone 3Complex data types, advanced parsing, standard modelling notation, bug tracking and reporting.
Revised design document, diagrams in standard notation, detailed test plan, and bug report.
Milestone 4Concurrency, multi-threading, synchronization, performance measurements.Revised design documents and detailed performance evaluation report

Weekly progress report memos

Each week a different member of your team will be responsible for writing up a memo summarizing progress to date, decisions, challenges, problems and possible solutions to problems. The memo should identify technical questions which you suggest could be addressed in lecture. As well, it should include one or two paragraphs reflecting on the work done during that week, discussing particular challenges or lessons learned.

Over the course of the term, each member of the team will write at least 2 of these memos and take the lead in the meeting at which these memos are submitted.

Submitting your design documents

To submit your design documents to Turnitin follow these steps:

  1. Log in to turnitin. You likely still have a user profile from last year, but if you do not, then you will have to create one. Be sure to use your University of Toronto email address. Once the teams have been established and assigned to Communication Instructor/Project Managers, you will receive a Course ID and Password. You will enroll in your Communication Instructor/Project Manager's section.
  2. Click on the class name and find the appropriate assignment. Click on the submit icon.
  3. Choose "file upload" from the pull-down menu.
  4. Click "browse" and select a file of one of the following types: MS Word, WordPerfect, PDF, HTML, RTF or TXT.
  5. Click "submit" to upload the file.
  6. Review the preview panel to confirm that this is the correct version to submit.
  7. Click "submit paper" at the bottom of the page. Warning: this step must be completed or the paper will not be submitted.
  8. After the process has been completed to "submit paper," a digital receipt will show up on screen and be sent to you by email.

You may find these instructions with screenshots at http://www.turnitin.com/resources/documentation/turnitin/training/Submitting_a_paper_as_student.pdf