Nothing yet!
In the last checkpoint, you built a Data Processor to manage datasets and a Query Engine to handle queries on those datasets. In this checkpoint, you will extend both the Data Processor and the Query Engine you built previously. In this checkpoint, the Data Processor will need to accept another type of input data, rooms, in the form of HTML files. The input data will include information about the physical spaces where classes are held on campus.
The Query Engine will be extended to enable aggregation and with more ways of ordering results. For example, the new query language will be able to answer questions like "What is the average of this course?" by averaging all course section averages, or "How many seats are in this building?" by summing the seats in every room in a building.
The specification for these changes is on the Room Specification page.
We will check for copied code on all submissions, so please make sure your work is your own.
The grade for this checkpoint will be calculated identically to Checkpoint 1.
Your grade for Checkpoint 2 will be from AutoTest. It will be assessed according to the bucketed grading described on the Project Grading page, up to a maximum quality of Proficient.
Attending labs is required for the remainder of the term, regardless of your progress on the project. During labs, you and your partner will meet with your TA to discuss progress on the project and any issues you come across.
See the Grading Page for details on how Lab Attendance affects your Project Grade.
The restrictions for Checkpoint 2 are the same as for Checkpoint 1.
The process for submitting your work for Checkpoint 2 are the same as for Checkpoint 1, except now you will invoke the bot with @310-bot #c2 instead of @310-bot #c1. The check target ( @310-bot #check) remains available and will continue to be a crucial way to ensure your project is being evaluated as expected.
The feedback AutoTest will provide for Checkpoint 2 will be identical as C1, although the smoke test clusters will change to reflect the updated requirements below.
The Room Specification is found within the Room Specification page. This section gives advice on how to implement the specification.
The room dataset contains HTML files which will need to be parsed.
There is a provided package called parse5 that you should use to parse the HTML files into a more convenient-to-traverse JSON format (you should only need the parse method). Parse5 also has an online playground where you can visualize the structure of a Document, which is the output of a parsed HTML file. You must traverse this document in order to extract the buildings/rooms information.
There are many ways to structure an HTML file to display the same information. It is important in your parsing to not hard code the parsing of the HTML tree. Instead, focus on searching the document tree for nodes that match the specification. For example, there can be many <table> elements in the index.htm file, so your code should search for all <table>s and find the one that satisfies the specification (i.e., has valid building/rooms data). Ultimately, if you find yourself looking for Document nodes based on some hardcoded positions (eg. children[0].children[1].children[0].text), you'll want to change your approach!
Browser Development Tools
HTML is much harder to read than JSON. Every browser comes with development tools to view and interact with the HTML for the displayed page. A great way to familiarize yourself with the structure of the campus.zip is to open the index.htm file in your browser and inspect the HTML elements using the browser development tools. You can use the inspector to move through the HTML tree and click on the links to open up the building files.
Chrome Developer Tools: Open the index.htm file in Chrome, then open the developer tools to inspect elements. You can click links to open building files.
Sending the Request
To send these requests, you must use the http package.
Although the request is a GET, you cannot test the response by posting it directly into your browser URL (like Chrome). The browser will automatically convert the http to https, and the request will be rejected.
The best way to test the Geolocation locally is by using the curl command from your terminal. For example, you can use the following command, where google.com is replaced with your team's URL.
curl -i http://google.com
Encoding the Address
To encode the address, use the function encodeURIComponent() (documentation link).
TypeScript/JavaScript numbers are represented by floating point numbers, performing this arithmetic can return different values depending on the order the operations take place. So, certain operations must be handled with care.
Perform the following steps exactly when implementing the following:
AVG: Must use the Decimal package (already included in your package.json).
Convert each of your value to a Decimal:
e.g., new Decimal(num)
Add the numbers being averaged using Decimal's add() method (and building up a variable called total).
Calculate the average. numRows should not be converted to a Decimal:
e.g., let avg = total.toNumber() / numRows)
Round the average to the second decimal digit with toFixed(2) and cast the result back to a number type. When casting to a number, you may appear to "lose" decimal places, for instance Number("2.00") will display as 2. This is okay.
e.g., let res = Number(avg.toFixed(2))
SUM: Use toFixed(2) to round to two decimal places.
Sorting should be according to the < operator in TypeScript/JavaScript, not by localeCompare.
localeCompare is significantly slower than the < operator and is very configurable which can lead to performance issues and hard to diagnose differences between local tests in your development environment and AutoTest.
There are several ways to get started. Some possible options that could be pursued, in any order:
Watch/read the recommended videos and tutorials. These really help sink in the idea of HTML parsing and working with async code.
Start by looking at the rooms kind dataset we have provided and understanding what kind of data you will be analyzing and manipulating. It is crucial to understand that index.htm and the building files have different structures. You will need to extract different, though complementary information, from each one of them. You can open up the HTML files in your browser to inspect them and use the parse5 online playground to understand them.
Ignoring the rest of the dataset parsing, consider writing a method to get a building's geolocation along with tests for this helper method.
Ignoring the provided dataset, create a mock dataset with fewer rows. Write the portion of the system that would perform the GROUP and APPLY operations on this small dataset.
Trying to keep all of the requirements in mind at once can be overwhelming. Tackling a single task that you can accomplish in an hour is going to be much more effective than worrying about the whole specification at once. Iteratively growing your project from small task to small task is going to be the best way to make forward progress.
As with C1, you will want to create your own zip files for testing. However, the rooms zip does not contain a root folder, so be careful with how you create your zip file to not include a root folder. The index.htm file should exist at the root of the zip.
The following resources have been created by course staff to assist you with the project.
HTML Parsing tips: Reviews the structure of HTML and how to search for an HTML element
Async Cookbook: learn about promises and the differences between synchronous and asynchronous code.
Github Projects can be a helpful way to share progress to your teammates, and to your TA. In your Github Repo, you can create a New Project by navigating to the Project tab.
Once your project is created, checkout the Board view (this can be chosen when creating the project, or changed afterwards). This board should look very familiar... Scrum or Kanban board anyone?