Initial version
You will extend InsightUBC to provide a way for users to manage and query data about rooms on campus. Also, you will extend the query language to handle aggregation computations and new sorting options for both sections and rooms.
All of the changes in this checkpoint are to the addDataset and performQuery endpoints as described below. The other endpoints (listDatasets and removeDataset) are unchanged from the prior Checkpoint.
The API remains unchanged from the insightUBC Section Specification.
Very important: do not alter the given API (IInsightFacade.ts interface) in any way, as it is used to grade your project!
The same addDataset method defined in the IInsightFacade.ts interface file is used to add a rooms kind of dataset.
The definition of a valid ID has not changed. Refer to the insightUBC Section Specification.
Same as for sections, the content parameter is a zip file, in the format of a base64 string. All the data you need is contained within this zip file. You should use the JSZip module to unzip, navigate through, and view the files inside this file.
However unlike sections, the rooms data is contained within HTML files (.htm), not JSON (.json). Also, unlike the sections database, the information for a single room is spread between two files: the index.htm file (which contains a room's building information) and a building HTML file (like AAC.htm which contains the room information). A single building can have multiple rooms.
At the root of the zip file is the index.htm file. The index.htm file contains a table with building information, where one column of that table contains a link (file path) to each building's file. Below is an example of the file structure of a rooms dataset where the building files are found within campus/discover/buildings-and-classrooms/. This is the same file structure as the given room database: campus.zip.
.
├── campus/│ └── discover/│ └── buildings-and-classrooms/│ ├── AAC.htm│ ├── ACEN.htm│ ├── ACU.htm│ └── ...└── index.htm
A valid dataset:
Is a zip file.
Contains at least one valid room.
A valid index.htm file:
Is an HTML-formatted file. If index.htm exists, it is safe to assume that it will always be a well-formatted HTML file.
Contains a table that lists and links to building data files.
The index.htm file can contain many tables, but only one will be the valid building list table. How to find the correct table is explained later in this section. The index.htm might also contain no table, in which case it would be invalid.
Each row in the table represents a building and the row will contain a column that links to the building's data file within the zip. The building file might not exist or it could contain no valid rooms.
An example link to a building file (ALRD.htm) looks like this:
<a href="./campus/discover/buildings-and-classrooms/ALRD.htm" title="Building Details and Map">...</a>
All building file links will contain link elements (<a>) in the href property.
You can assume that the link (<a>) will be contained in the cell element with the class views-field-title, elaborated on below.
A valid building file:
Is an HTML-formatted file. If the building file exists, it is safe to assume that it will always be a well-formatted HTML file.
Is linked from the index.htm file (as explained above).
Contains a table with valid rooms.
The building file can contain many tables, but only one will be the valid room table. How to find the correct table is explained below this section.
The building file might contain no rooms table or may contain a table with no valid rooms, in which case that building has no rooms.
A valid room:
Contains every field which can be used in a rooms query (see the Valid Query Keys section)
Note: If a field is present in the HTML (ie. the <td> cell exists) but is empty or contains something counter-intuitive like an empty string, it is still valid.
The requested room's geolocation request returns successfully (i.e., there is no error). Geolocation is described below.
When adding a rooms kind dataset, the dataset kind will be InsightFacade.Rooms. The InsightFacade.Sections kind is also valid but only when adding a sections dataset.
All the room data is contained within HTML tables. A building table within the index.htm file and a room table within the building's HTML file. An HTML table ( <table> element) contains rows (<tr> elements), and rows contain cells (<td> elements). HTML elements can have classes. Below is an example of an HTML table, which is a simplified version of the table found within the index.htm file:
<table>
<thead>...</thead>
<tbody>
<tr>
<td class="views-field views-field-title">
<a href="./discover/buildings-and-classrooms/ACU.htm" title="Building Details and Map">Acute Care Unit</a>
</td>
<td class="views-field views-field-field-building-address">
2211 Wesbrook Mall
</td>
....
</tr>
</tbody>
</table>
In the above example, the first cell element (<td>) has two CSS classes (views-field and views-field-title). The second cell has the same class as the first (views-field) and a second, unique class (views-field-field-building-address).
The classes found on table cells (<td>) will be the same across all valid tables. For example, all valid index.htm tables will have the views-field and views-field-field-building-address classes on their address cells.
To find the table within an HTML file with the room information, you will need to look at the classes found on the <td> elements. As soon as you find one <td> element with a valid class, then you have found the room data table. Once the room data table has been found, it will need to be validated to ensure it contains all the required information.
The same procedure can be applied to find the building table. You will need to look at the classes found on the <td> elements, and as soon as you find one <td> element with a valid class, then you have found the building table.
To find valid classes, unzip the given room dataset, campus.zip, open the .htm files and look at the classes on the <td> elements.
As seen in the above example, the classes views-field and views-field-field-building-address can be used to find a room's building address.
The building link from the index.htm file to the building's file will be contained in the table cell element with the class views-field-title.
An example valid rooms kind dataset is the UBC Building and classrooms listing from a few years ago: campus.zip. To find the number of valid rooms inside the campus.zip, you will need to query it using the Reference UI (construct a query with no filter!).
For a building that contains a valid room, you will need to fetch the building's latitude and longitude.
This is usually performed using online web services. To avoid problems with us spamming external geolocation providers, we will be providing a web service for you to use for this purpose. To obtain the geolocation of an address, you must send a GET request to:
http://cs310.students.cs.ubc.ca:11316/api/v1/project_team<TEAM NUMBER>/<ADDRESS>
Where <ADDRESS> should be the URL-encoded version of an address (e.g., "6245 Agronomy Road V6T 1Z4" should be represented as 6245%20Agronomy%20Road%20V6T%201Z4). Addresses should be given exactly as they appear in the dataset files, or an HTTP 404 error code will be returned.
The response will match the following interface (either you will get lat & lon, or error, but never both):
interface GeoResponse {
lat?: number;
lon?: number;
error?: string;
}
Since we are hosting this service it could be killed by a DOS attack, so please try not to overload the service. You should only need to query this service when you are processing the initial dataset zips, not when you are answering queries.
Regarding the Query Engine, the primary objective of this checkpoint is two-fold:
Extend the query language to accommodate queries to a new dataset kind, i.e., Rooms; and
Enable more comprehensive queries about the datasets, i.e., aggregate results, directional sorts.
A valid query is the same as before:
Is based on the given EBNF (defined below)
Only references one dataset (via the query keys).
Has less than or equal to 5000 results. If this limit is exceeded the query should reject with a ResultTooLargeError
At a high level, the new query functionalities added are:
GROUP: Group the list of results into sets using some matching criteria.
APPLY: Perform calculations across a set of results (i.e., across a GROUP).
SORT: Order results by one or more columns.
QUERY ::='{' BODY ', ' OPTIONS '}' | '{' BODY ', ' OPTIONS ', ' TRANSFORMATIONS '}'
// Note: a BODY with no FILTER (i.e., WHERE:{}) matches all entries.
BODY ::= 'WHERE:{' FILTER? '}'
FILTER ::= LOGICCOMPARISON | MCOMPARISON | SCOMPARISON | NEGATION
LOGICCOMPARISON ::= LOGIC ':[' FILTER_LIST ']'
MCOMPARISON ::= MCOMPARATOR ':{' mkey ':' number '}'
SCOMPARISON ::= 'IS:{' skey ': "' [*]? inputstring [*]? '" }' // Asterisks at the beginning or end of the inputstring should act as wildcards.
NEGATION ::= 'NOT :{' FILTER '}'
FILTER_LIST ::= '{' FILTER '}' | '{' FILTER '}, ' FILTER_LIST // Comma separated list of filters containing at least one filter
LOGIC ::= 'AND' | 'OR'
MCOMPARATOR ::= 'LT' | 'GT' | 'EQ'
OPTIONS ::= 'OPTIONS:{' COLUMNS '}' | 'OPTIONS:{' COLUMNS ', ' SORT '}'
SORT ::= 'ORDER: { dir:' DIRECTION ', keys: [ ' ANYKEY_LIST '] }' | 'ORDER: ' ANYKEY
DIRECTION ::= 'UP' | 'DOWN'
TRANSFORMATIONS ::= 'TRANSFORMATIONS: {' GROUP ', ' APPLY '}'
GROUP ::= 'GROUP: [' KEY_LIST ']'
APPLY ::= 'APPLY: [' APPLYRULE_LIST? ']'
APPLYRULE_LIST ::= APPLYRULE | APPLYRULE ', ' APPLYRULE_LIST
APPLYRULE ::= '{' applykey ': {' APPLYTOKEN ':' KEY '} }'
APPLYTOKEN ::= 'MAX' | 'MIN' | 'AVG' | 'COUNT' | 'SUM'
COLUMNS ::= 'COLUMNS:[' ANYKEY_LIST ']'
// Comma-separated list of keys containing at least one key
KEY_LIST ::= KEY | KEY ', ' KEY_LIST
ANYKEY_LIST ::= ANYKEY | ANYKEY ', ' ANYKEY_LIST
ANYKEY ::= KEY | applykey
KEY ::= mkey | skey
mkey ::= '"' idstring '_' mfield '"'
skey ::= '"' idstring '_' sfield '"'
mfield ::= 'avg' | 'pass' | 'fail' | 'audit' | 'year' | 'lat' | 'lon' | 'seats'
sfield ::= 'dept' | 'id' | 'instructor' | 'title' | 'uuid' | 'fullname' | 'shortname' | 'number' | 'name' | 'address' | 'type' | 'furniture' | 'href'
idstring ::= [^_]+ // One or more of any character, except underscore.
inputstring ::= [^*]* // Zero or more of any character, except asterisk.
applykey ::= [^_]+ // One or more of any character, except underscore.
The query language now supports performing calculations across a group of results.
The types of calculations supported are:
MAX: Find the maximum value of a field.
Returns the same number that is in the originating dataset.
MIN: Find the minimum value of a field.
Returns the same number that is in the originating dataset.
AVG: Find the average value of a field.
Returns a number rounded to two decimal places.
SUM: Find the sum of a field.
Returns a number rounded to two decimal places.
COUNT: Count the number of unique occurrences of a field.
Returns whole numbers.
Requirements:
MAX/MIN/AVG/SUM should only be requested for numeric keys. COUNT can be requested for all keys.
The applykey in an APPLYRULE should be unique, so no two APPLYRULEs should share an applykey with the same name.
If GROUP is present, all COLUMNS keys must correspond to one of the GROUP keys or to applykeys defined in the APPLY block.
The query language now supports sorting by:
A single column as in C1, e.g. "ORDER": "sections_avg"
Creating an object to sort by ascending or descending order and by multiple columns
e.g., "ORDER": {"dir": "DOWN", "keys": ["maxSeats"]}
"dir"
The order of the sorting is set by the direction ("dir") :
"dir": "UP": Sort results ascending.
"dir": "DOWN": Sort results descending.
"keys"
The "keys" field allows for sorting by multiple keys (i.e., columns), where each additional key resolves ties for the previous key.
For example:
"keys": ["sections_avg"]: sorts by a single key
"keys": ["sections_year", "sections_avg"]: sorts by multiple keys. In this case, the section average should be used to resolve ties for sections in the same year
Requirements:
All SORT keys must also be in the COLUMNS.
Valid query keys follow the same format as specified in the Sections Specification.
First, note that WHERE is completely independent of GROUP/APPLY. WHERE filtering happens first, then GROUP/APPLY are performed on those filtered results.
GROUP: [term1, term2, ...] signifies that a group should be created for every unique set of all N-terms. For example, GROUP: [sections_dept, sections_id] would create a group for every unique (department, id) pair in the sections dataset. Every member of a group will always have the same values for each key in the GROUP array (e.g., in the previous example, all members of a group would share the same values for sections_dept and sections_id).
As an example, suppose we have the following courses dataset (for the sake of simplicity, some keys are omitted):
[
{ "sections_uuid": "1", "sections_instructor": "Jean", "sections_avg": 90, "sections_title" : "310"},
{ "sections_uuid": "2", "sections_instructor": "Jean", "sections_avg": 80, "sections_title" : "310"},
{ "sections_uuid": "3", "sections_instructor": "Casey", "sections_avg": 95, "sections_title" : "310"},
{ "sections_uuid": "4", "sections_instructor": "Casey", "sections_avg": 85, "sections_title" : "310"},
{ "sections_uuid": "5", "sections_instructor": "Kelly", "sections_avg": 74, "sections_title" : "210"},
{ "sections_uuid": "6", "sections_instructor": "Kelly", "sections_avg": 78, "sections_title" : "210"},
{ "sections_uuid": "7", "sections_instructor": "Kelly", "sections_avg": 72, "sections_title" : "210"},
{ "sections_uuid": "8", "sections_instructor": "Eli", "sections_avg": 85, "sections_title" : "210"}
]
We want to query the above dataset to aggregate sections by their title and obtain their average. Our aggregation query would look like this:
{
"WHERE": {},
"OPTIONS": {
"COLUMNS": ["sections_title", "overallAvg"]
},
"TRANSFORMATIONS": {
"GROUP": ["sections_title"],
"APPLY": [{
"overallAvg": {
"AVG": "sections_avg"
}
}]
}
}
For this query, there are two groups: one that matches "sections_title" = "310" and one other that matches "210". At some point you will likely need to have an intermediate data structure to create/hold on your groups; use whatever structure that feels natural to you.
Continuing with our example, we have these groups:
310 group = [
{ "sections_uuid": "1", "sections_instructor": "Jean", "sections_avg": 90, "sections_title" : "310"},
{ "sections_uuid": "2", "sections_instructor": "Jean", "sections_avg": 80, "sections_title" : "310"},
{ "sections_uuid": "3", "sections_instructor": "Casey", "sections_avg": 95, "sections_title" : "310"},
{ "sections_uuid": "4", "sections_instructor": "Casey", "sections_avg": 85, "sections_title" : "310"}
]
210 group = [
{ "sections_uuid": "5", "sections_instructor": "Kelly", "sections_avg": 74, "sections_title" : "210"},
{ "sections_uuid": "6", "sections_instructor": "Kelly", "sections_avg": 78, "sections_title" : "210"},
{ "sections_uuid": "7", "sections_instructor": "Kelly", "sections_avg": 72, "sections_title" : "210"},
{ "sections_uuid": "8", "sections_instructor": "Eli", "sections_avg": 85, "sections_title" : "210"}
]
The last step is fairly simple, we execute the apply operation in each group. The average of "310" group is (90 + 80 + 95 + 85)/4 = 87.5 whereas for the "210" group the average is 77.25. Our final result for the above query would be:
[
{ "sections_title" : "310", "overallAvg": 87.5},
{ "sections_title" : "210", "overallAvg": 77.25}
]
Notice that we can have more elaborate groups, such as discovering if a specific instructor of a section has a better average than other instructors (i.e., "GROUP": ["sections_instructor", "sections_title"]). In that case, we would have four groups: (310, Jean), (310, Casey), (210 , Kelly), and (210, Eli).
Below is another example of a valid query and its results. The query is asking to look for all rooms that contain tables and more than 300 seats. Then, it asks to group the rooms by their building (shortname) and to find the room with the maximum capacity. The query will return the room with the maximum capacity per building in descending order.
{
"WHERE": {
"AND": [{
"IS": {
"rooms_furniture": "*Tables*"
}
}, {
"GT": {
"rooms_seats": 300
}
}]
},
"OPTIONS": {
"COLUMNS": [
"rooms_shortname",
"maxSeats"
],
"ORDER": {
"dir": "DOWN",
"keys": ["maxSeats"]
}
},
"TRANSFORMATIONS": {
"GROUP": ["rooms_shortname"],
"APPLY": [{
"maxSeats": {
"MAX": "rooms_seats"
}
}]
}
}
Response:
[
{
"rooms_shortname": "OSBO",
"maxSeats": 442
},
{
"rooms_shortname": "HEBB",
"maxSeats": 375
},
{
"rooms_shortname": "LSC",
"maxSeats": 350
}
]
This section is identical to the insightUBC Section Specification.
Just like sections, room datasets should be accessible to new InsightFacade instances, so the room datasets will need to be saved to disk in the <PROJECT_DIR>/data directory.