Initial version.
In the last checkpoint, you built a Data Processor to manage datasets and a Query Engine to handle queries on those datasets. In this checkpoint, you will extend both the data processing and query capabilities.
Grade: Your grade for Checkpoint 2 is calculated by AutoTest on the main branch of your repo as follows:
your grade = (number of grader tests passing against your implementation) / (total number of tests)
See AutoTest Feedback for details on getting feedback (and your grade), what your feedback looks like, and how you should use it. In particular, note that during the checkpoint, your grade is reported as a bucket.
The Room Specification is found within the Room Specification page.
The room dataset contains HTML files which will need to be parsed.
There is a provided package called parse5 that you should use to parse the HTML files into a more convenient-to-traverse JSON format (you should only need the parse method). Parse5 also has an online playground where you can visualize the structure of a Document, which is the output of a parsed HTML file. You must traverse this document in order to extract the buildings/rooms information.
There are many ways to structure an HTML file to display the same information. It is important in your parsing to not hard code the parsing of the HTML tree. Instead, focus on searching the document tree for nodes that match the specification. For example, there can be many <table> elements in the index.htm file, so your code should search for all <table>s and find the one that satisfies the specification (i.e., has valid building/rooms data). Ultimately, if you find yourself looking for Document nodes based on some hardcoded positions (eg. children[0].children[1].children[0].text), you'll want to change your approach!
Sending the Request
To send these requests, you must use Node's http module.
Although the request is a GET, you cannot test the response by posting it directly into your browser URL (like Chrome). The browser will automatically convert the http to https, and the request will be rejected.
The best way to test the Geolocation locally is by using the curl command from your terminal. For example, you can use the following command, where google.com is replaced with your team's URL.
curl -i http://google.com
Encoding the Address
To encode the address, use the function encodeURIComponent() (documentation link).
TypeScript/JavaScript numbers are represented by floating point numbers, performing this arithmetic can return different values depending on the order the operations take place. So, certain operations must be handled with care.
Perform the following steps exactly when implementing the following:
AVG: Must use the Decimal package (already included in your package.json).
Convert each of your value to a Decimal:
e.g., new Decimal(num)
Add the numbers being averaged using Decimal's add() method (and building up a variable called total).
Calculate the average. numRows should not be converted to a Decimal:
e.g., let avg = total.toNumber() / numRows)
Round the average to the second decimal digit with toFixed(2) and cast the result back to a number type. When casting to a number, you may appear to "lose" decimal places, for instance Number("2.00") will display as 2. This is okay.
e.g., let res = Number(avg.toFixed(2))
SUM: Use toFixed(2) to round to two decimal places.
Sorting should be according to the < operator in TypeScript/JavaScript, not by localeCompare.
localeCompare is significantly slower than the < operator and is very configurable which can lead to performance issues and hard to diagnose differences between local tests in your development environment and AutoTest.
There are several ways to get started. Some possible options that could be pursued, in any order:
Refer to the recommended videos and tutorials below. These really help sink in the idea of HTML parsing and working with async code.
Start by looking at the rooms kind dataset we have provided and understanding what kind of data you will be analyzing and manipulating. It is crucial to understand that index.htm and the building files have different structures. You will need to extract different, though complementary information, from each one of them. You can open up the HTML files in your browser to inspect them and use the parse5 online playground to understand them.
Ignoring the rest of the dataset parsing, consider writing a method to get a building's geolocation along with tests for this helper method.
Ignoring the provided dataset, create a mock dataset with fewer rows. Write the portion of the system that would perform the GROUP and APPLY operations on this small dataset.
Trying to keep all of the requirements in mind at once can be overwhelming. Tackling a single task that you can accomplish in an hour is going to be much more effective than worrying about the whole specification at once. Iteratively growing your project from small task to small task is going to be the best way to make forward progress.
As with C1, you will want to create your own zip files for testing. However, the rooms zip does not contain a root folder, so be careful with how you create your zip file to not include a root folder. The index.htm file should exist at the root of the zip.
Browser Development Tools
HTML is much harder to read than JSON. Every browser comes with development tools to view and interact with the HTML for the displayed page. A great way to familiarize yourself with the structure of the campus.zip is to open the index.htm file in your browser and inspect the HTML elements using the browser development tools. You can use the inspector to move through the HTML tree and click on the links to open up the building files.
Chrome Developer Tools: Open the index.htm file in Chrome, then open the developer tools to inspect elements. You can click links to open building files.
The following resources have been created by course staff to assist you with the project.
HTML Parsing tips: Reviews the structure of HTML and how to search for an HTML element
Async Cookbook: learn about promises and the differences between synchronous and asynchronous code.