Section Specification

insightUBC Section Specification

1. API

2. Managing course section data

2.1 Adding a dataset

2.2 Removing a dataset

2.3 Listing all datasets

3. Querying the data for insights

Valid Query argument to performQuery

Complex example query

4. IInsightFacade.ts

5. Caching Progress (Persistence)

Change Log

insightUBC Section Specification

UBC has a wide variety of courses. Manually viewing information about the courses is painful and slow. Students and Professors (users) would like to be able to query information about courses to gain insights into UBC. InsightUBC will provide a way for users to manage their course section data and query this data for insights.

1. API

Users will interact with your project through a fixed API, defined through a provided interface, IInsightFacade.ts. Very important: do not alter the given API (interface) in any way, as it is used to grade your project!

The interface provides four methods: addDataset, listDataset, removeDataset, and performQuery. Users will manage their course section data through the methods addDataset, listDataset and removeDataset and users will query their data using the method performQuery.

The contents of the API file, IInsightFacade.ts, is given below in the IInsightFacade.ts section. Read the entire file carefully - it contains details about the expected parameters, what the methods should do and what specific error types to throw for failures.

For example, a user might write the following code to use your API:

function getNumberOfSectionsInUBCDataset(): Promise<number> {

return fs.readFile("src/resources/archives/ubc-sections.zip")

.then((buffer) => buffer.toString("base64"))

.then((content) => new InsightFacade().addDataset("ubc", content,

InsightDatasetKind.Sections))

.then(() => new InsightFacade().listDatasets())

.then((datasets) => datasets.find((dataset) => dataset.id === "ubc"))

.then((dataset) => dataset!.numRows)

.catch((error) => -1);

}

2. Managing course section data

We allow users to perform three actions for managing their data:

Adding a dataset, so it is available for querying.
Listing all datasets that are available to query.
Removing a dataset, so it is no longer available for query.

Each of these actions has a corresponding API method defined in IInsightFacade.ts.

2.1 Adding a dataset

Without data, there is nothing to search through for insights! Before a user can query, they will need to add data to the system. All valid course sections should be extracted from the dataset and stored such that they can later be queried.

The following method is defined in the IInsightFacade.ts interface file:

addDataset(id: string, content: string, kind: InsightDatasetKind): Promise<string[]> adds a dataset to the internal model, providing the id of the dataset, the string of the content of the dataset, and the kind of the dataset. Any invalid inputs should be rejected.

Each of the three arguments to addDataset are described below:

Valid ID argument to addDataset

A user can add multiple datasets to your project and they will be identified by the ID provided by the user. A valid id is an idstring, defined in the EBNF (see below). In addition, an id that is only whitespace is invalid.

Valid Content argument to addDataset

The content parameter is the entire zip file, in the format of a base64 string. That's the entire zip file, all the data you need is contained in it. You should use the JSZip module to unzip, navigate through, and view the files inside.

A valid dataset:

- Is a structured as a base64 string of a zip file.
- Contains at least one valid section.

A valid course:

- - Is a JSON formatted file.
  - Contains one or more valid sections.
    - - Within a JSON formatted file, valid sections will be found within the "result" key.
  - Is located within a folder called courses/ in the zip's root directory.

A valid section:

- - Contains every field which can be used by a query (see the "Valid Query Keys" section below).
    - - If a field you use in a section is present in the JSON but contains something counter-intuitive like empty string, it is still valid.

An example of a valid dataset which contains 64,612 valid UBC course sections can be found here. This data has been obtained from UBC PAIR and has not been modified in any way. The data is provided as a zip file: inside of the zip you will find a file for each of the courses offered at UBC. Each of those file contains a JSON object containing the information about each section of the course.

Unzip the example valid dataset to see what a valid JSON formatted file looks like. You can use an online JSON formatter to more easily view the JSON file contents.

NOTE: PAIR is a great example to understand the expected format of the data, and to learn how EBNF works. However, it is much larger than required for your unit tests, and as such may be opaque and cumbersome to deal with. We recommend that you create smaller zip files for your own tests.

Valid Kind argument to addDataset

For c0 and c1, the only valid argument for kind will be sections. For c2 and beyond, rooms will also be valid.

2.2 Removing a dataset

Users would like to be able to remove datasets that were previously added successfully.

The following method is defined in the IInsightFacade.ts interface file:

- removeDataset(id: string): Promise<string> removes a dataset given the id. Removing a dataset results in the same behaviour as if it were never added in the first place.

Valid ID argument to removeDataset

A valid id is an idstring, and follows the same rules as a valid id for addDataset. In addition, removing an id that does not match ids with any available dataset should be rejected.

2.3 Listing all datasets

Users would like to be able to list all available datasets for querying.

The following method is defined in the IInsightFacade.ts interface file:

- listDatasets(): Promise<InsightDataset[]> returns an array of currently added datasets. Each element of the array should describe a dataset following the InsightDataset interface which contains the dataset id, kind, and number of rows.

3. Querying the data for insights

After a user has added a dataset, they should be able to query that dataset for insights.

The following method is defined in the IInsightFacade.ts interface file:

- performQuery(query: unknown): Promise<InsightResult[]> performs a query on the dataset. It first should parse and validate the input query, then perform semantic checks on the query and evaluate the query only if it is valid.
  - Since the type for the query is unknown, technically anything could be passed. A valid query will be an object type matching the requirements below (hint: your first check should be rejecting query if it is not an object type).

Valid Query argument to performQuery

A valid query:

Is based on the given EBNF (defined below)
References exactly one dataset in its query keys.
Has less than or equal to 5000 results. If this limit is exceeded,, the query should reject with a ResultTooLargeError

Query EBNF

EBNF is a syntax with which queries can be written. All queryable data systems have such a syntax, and you will be implementing one that supports the EBNF one. Queries to the system should be JavaScript objects structured according to the following grammar (represented in EBNF):

- WHERE defines which sections should be included in the results.
- COLUMNS defines which keys should be included in each result.
- ORDER defines what order the results should be in.

The full EBNF grammar is shown here. Further below, you can also see an example input/output query for performQuery, as expressed in EBNF.

QUERY ::='{' BODY ', ' OPTIONS '}'

// Note: a BODY with no FILTER (i.e. WHERE:{}) matches all entries.

BODY ::= 'WHERE:{' FILTER? '}'

FILTER ::= LOGICCOMPARISON | MCOMPARISON | SCOMPARISON | NEGATION

LOGICCOMPARISON ::= LOGIC ':[' FILTER_LIST ']'

MCOMPARISON ::= MCOMPARATOR ':{' mkey ':' number '}'

SCOMPARISON ::= 'IS:{' skey ': "' [*]? inputstring [*]? '" }' // Asterisks at the beginning or end of the inputstring should act as wildcards.

NEGATION ::= 'NOT :{' FILTER '}'

FILTER_LIST ::= '{' FILTER '}' | '{' FILTER '}, ' FILTER_LIST // comma separated list of filters containing at least one filter

LOGIC ::= 'AND' | 'OR'

MCOMPARATOR ::= 'LT' | 'GT' | 'EQ'

OPTIONS ::= 'OPTIONS:{' COLUMNS '}' | 'OPTIONS:{' COLUMNS ', ORDER:' key '}'

COLUMNS ::= 'COLUMNS:[' KEY_LIST ']'

KEY_LIST ::= key | key ', ' KEY_LIST // comma separated list of keys containing at least one key

key ::= mkey | skey

mkey ::= '"' idstring '_' mfield '"'

skey ::= '"' idstring '_' sfield '"'

mfield ::= 'avg' | 'pass' | 'fail' | 'audit' | 'year'

sfield ::= 'dept' | 'id' | 'instructor' | 'title' | 'uuid'

idstring ::= [^_]+ // One or more of any character, except underscore.

inputstring ::= [^*]* // Zero or more of any character, except asterisk.

Wildcards

Wildcards are the optional asterisks in SCOMPARISON. For Example, "IS": {"sections_dept": "C*"}, this would be looking for any course department that starts with a C. Because both are optional, there are four possible configurations:

inputstring: Matches inputstring exactly

*inputstring: Ends with inputstring

inputstring*: Starts with inputstring

*inputstring*: Contains inputstring

There are no "in the middle" asterisks, such as input*string, allowed.

Order

The ORDER's key must be a key found in the COLUMNS 's KEY_LIST array.

NOTE: If ORDER is not specified, any ordering of the results is considered valid.

Tie Breaks

Often, you will find cases where the field being sorted on appears multiple times. For example, if you sort by department, many entries may be CPSC. The order within those CPSC rows does not matter (in other words, the specification does not test tiebreaking ordering in these scenarios).

Valid query keys

In the above EBNF, the query keys are the mkey and skey. As defined in the EBNF, a valid query key has two parts, separated by an underscore: <idstring>_<mfield | sfield>.

idstring is the dataset id provided by the user when they add the dataset (the id parameter).
mfield | sfield is the column key that represents a piece of information about the course.

For example, if a user has added a dataset with the id ubc-courses, than a valid query key is ubc-courses_avg.

The following table lists the valid dataset keys (mfield | sfield) and their expected formats. Play close attention to the desired format and descriptions of each field!

Simple example query

{

"WHERE":{

"GT":{

"sections_avg":97

}

},

"OPTIONS":{

"COLUMNS":[

"sections_dept",

"sections_avg"

],

"ORDER":"sections_avg"

}

The result for this would look like:

[

{ "sections_dept": "math", "sections_avg": 97.09 },

{ "sections_dept": "epse", "sections_avg": 97.09 },

{ "sections_dept": "math", "sections_avg": 97.25 },

{ "sections_dept": "epse", "sections_avg": 97.29 },

{ "sections_dept": "nurs", "sections_avg": 97.33 },

{ "sections_dept": "epse", "sections_avg": 97.41 },

{ "sections_dept": "cnps", "sections_avg": 97.47 },

{ "sections_dept": "math", "sections_avg": 97.48 },

{ "sections_dept": "educ", "sections_avg": 97.5 },

{ "sections_dept": "nurs", "sections_avg": 97.53 },

{ "sections_dept": "epse", "sections_avg": 97.67 },

{ "sections_dept": "epse", "sections_avg": 97.69 },

{ "sections_dept": "epse", "sections_avg": 97.78 },

{ "sections_dept": "crwr", "sections_avg": 98 },

{ "sections_dept": "epse", "sections_avg": 98.08 },

{ "sections_dept": "nurs", "sections_avg": 98.21 },

{ "sections_dept": "epse", "sections_avg": 98.36 },

{ "sections_dept": "epse", "sections_avg": 98.45 },

{ "sections_dept": "nurs", "sections_avg": 98.5 },

{ "sections_dept": "nurs", "sections_avg": 98.58 },

{ "sections_dept": "epse", "sections_avg": 98.58 },

{ "sections_dept": "epse", "sections_avg": 98.7 },

{ "sections_dept": "nurs", "sections_avg": 98.71 },

{ "sections_dept": "eece", "sections_avg": 98.75 },

{ "sections_dept": "epse", "sections_avg": 98.76 },

{ "sections_dept": "epse", "sections_avg": 98.8 },

{ "sections_dept": "spph", "sections_avg": 98.98 },

{ "sections_dept": "cnps", "sections_avg": 99.19 },

{ "sections_dept": "math", "sections_avg": 99.78 },

{ "sections_dept": "math", "sections_avg": 99.78 }

]

Complex example query

{

"WHERE":{

"OR":[

{

"AND":[

{

"GT":{

"ubc_avg":90

}

},

{

"IS":{

"ubc_dept":"adhe"

}

]

},

{

"EQ":{

"ubc_avg":95

}

]

},

"OPTIONS":{

"COLUMNS":[

"ubc_dept",

"ubc_id",

"ubc_avg"

],

"ORDER":"ubc_avg"

}

The result of this query would be:

[

{ "ubc_dept": "adhe", "ubc_id": "329", "ubc_avg": 90.02 },

{ "ubc_dept": "adhe", "ubc_id": "412", "ubc_avg": 90.16 },

{ "ubc_dept": "adhe", "ubc_id": "330", "ubc_avg": 90.17 },

{ "ubc_dept": "adhe", "ubc_id": "412", "ubc_avg": 90.18 },

{ "ubc_dept": "adhe", "ubc_id": "330", "ubc_avg": 90.5 },

{ "ubc_dept": "adhe", "ubc_id": "330", "ubc_avg": 90.72 },

{ "ubc_dept": "adhe", "ubc_id": "329", "ubc_avg": 90.82 },

{ "ubc_dept": "adhe", "ubc_id": "330", "ubc_avg": 90.85 },

{ "ubc_dept": "adhe", "ubc_id": "330", "ubc_avg": 91.29 },

{ "ubc_dept": "adhe", "ubc_id": "330", "ubc_avg": 91.33 },

{ "ubc_dept": "adhe", "ubc_id": "330", "ubc_avg": 91.48 },

{ "ubc_dept": "adhe", "ubc_id": "329", "ubc_avg": 92.54 },

{ "ubc_dept": "adhe", "ubc_id": "329", "ubc_avg": 93.33 },

{ "ubc_dept": "sowk", "ubc_id": "570", "ubc_avg": 95 },

{ "ubc_dept": "rhsc", "ubc_id": "501", "ubc_avg": 95 },

{ "ubc_dept": "psyc", "ubc_id": "501", "ubc_avg": 95 },

{ "ubc_dept": "obst", "ubc_id": "549", "ubc_avg": 95 },

{ "ubc_dept": "nurs", "ubc_id": "424", "ubc_avg": 95 },

{ "ubc_dept": "musc", "ubc_id": "553", "ubc_avg": 95 },

{ "ubc_dept": "mtrl", "ubc_id": "599", "ubc_avg": 95 },

{ "ubc_dept": "mtrl", "ubc_id": "564", "ubc_avg": 95 },

{ "ubc_dept": "math", "ubc_id": "532", "ubc_avg": 95 },

{ "ubc_dept": "kin", "ubc_id": "500", "ubc_avg": 95 },

{ "ubc_dept": "kin", "ubc_id": "499", "ubc_avg": 95 },

{ "ubc_dept": "epse", "ubc_id": "682", "ubc_avg": 95 },

{ "ubc_dept": "epse", "ubc_id": "606", "ubc_avg": 95 },

{ "ubc_dept": "edcp", "ubc_id": "473", "ubc_avg": 95 },

{ "ubc_dept": "econ", "ubc_id": "516", "ubc_avg": 95 },

{ "ubc_dept": "crwr", "ubc_id": "599", "ubc_avg": 95 },

{ "ubc_dept": "cpsc", "ubc_id": "589", "ubc_avg": 95 },

{ "ubc_dept": "cnps", "ubc_id": "535", "ubc_avg": 95 },

{ "ubc_dept": "bmeg", "ubc_id": "597", "ubc_avg": 95 },

{ "ubc_dept": "adhe", "ubc_id": "329", "ubc_avg": 96.11 }

]

4. IInsightFacade.ts

The high-level API you must support is shown below; these declarations should be in your project in src/controller/IInsightFacade.ts.

/*

* This is the primary high-level API for the project. In this folder there should be:

* A class called InsightFacade, this should be in a file called InsightFacade.ts.

* You should not change this interface at all or the test suite will not work.

*/

export enum InsightDatasetKind {

Sections = "sections",

Rooms = "rooms",

}

export interface InsightDataset {

id: string;

kind: InsightDatasetKind;

numRows: number;

}

export interface InsightResult {

[key: string]: string | number;

}

export class InsightError extends Error {

constructor(message?: string) {

super(message);

Error.captureStackTrace(this, InsightError);

}

export class NotFoundError extends Error {

constructor(message?: string) {

super(message);

Error.captureStackTrace(this, NotFoundError);

}

export class ResultTooLargeError extends Error {

constructor(message?: string) {

super(message);

Error.captureStackTrace(this, ResultTooLargeError);

}

export interface IInsightFacade {

/**

* Add a dataset to insightUBC.

*

* @param id The id of the dataset being added.

* @param content The base64 content of the dataset. This content should be in the form of a serialized zip file.

* @param kind The kind of the dataset

*

* @return Promise <string[]>

*

* The promise should fulfill on a successful add, reject for any failures.

* The promise should fulfill with a string array,

* containing the ids of all currently added datasets upon a successful add.

* The promise should reject with an InsightError describing the error.

*

* An id is invalid if it contains an underscore, or is only whitespace characters.

* If id is the same as the id of an already added dataset, the dataset should be rejected and not saved.

*

* After receiving the dataset, it should be processed into a data structure of

* your design. The processed data structure should be persisted to disk; your

* system should be able to load this persisted value into memory for answering

* queries.

*

* Ultimately, a dataset must be added or loaded from disk before queries can

* be successfully answered.

*/

addDataset(id: string, content: string, kind: InsightDatasetKind): Promise<string[]>;

/**

* Remove a dataset from insightUBC.

*

* @param id The id of the dataset to remove.

*

* @return Promise <string>

*

* The promise should fulfill upon a successful removal. Reject on any error.

* A removed dataset behaves as if it never existed in the system (i.e. it was never added).

* Attempting to remove a dataset that hasn't been added yet counts as an error.

*

* An id is invalid if it contains an underscore, or is only whitespace characters.

*

* The promise should fulfill with the id of the dataset that was removed.

* The promise should reject with a NotFoundError (if a valid id was not yet added)

* or an InsightError (invalid id or any other source of failure) describing the error.

*/

removeDataset(id: string): Promise<string>;

/**

* Perform a query on insightUBC.

*

* @param query The query to be performed.

*

* If a query is incorrectly formatted, references a dataset not added (in memory or on disk),

* or references multiple datasets, it should be rejected with an InsightError.

* If a query would return more than 5000 results, it should be rejected with a ResultTooLargeError.

*

* @return Promise <InsightResult[]>

*

* The promise should fulfill with an array of results.

* The promise should reject with an InsightError describing the error.

*/

performQuery(query: unknown): Promise<InsightResult[]>;

/**

* List all currently added datasets, their types, and number of rows.

*

* @return Promise <InsightDataset[]>

* The promise should fulfill an array of currently added InsightDatasets, and will only fulfill.

*/

listDatasets(): Promise<InsightDataset[]>;

}

More information about IInsightFacade file contents:

- InsightDatasetKind is an enum specifying the two possible dataset types.
- InsightDataset is an interface for a simple object storing metadata about an added dataset.
- InsightError, NotFoundError, ResultTooLargeError are Error subtypes potentially returned by the API.

Important: InsightDataset and InsightResult interfaces should be treated as final! In other words, they should not be extended. (Sorry, Typescript does not have language support for this restriction). This kind of extension is unnecessary (those types are already quite general) and our assertions depend on the exact type signatures provided.

// Yes

const myDataset: InsightDataset = {

id: "foo",

kind: InsightDatasetKind.Sections,

numRows: 1

};

// No

class DatasetClass implements InsightDataset { ... }

const myDataset: DatasetClass = new DatasetClass(...);

5. Caching Progress (Persistence)

Dataset processing is computationally expensive. As such, we wish to build a system that is resilient to unexpected changes (i.e. crashes) without data loss. To do this, we will cache the added datasets on persistent storage (disk). Once a dataset has been added to the system via addDataset, any future instance of InsightFacade should be able to access it, to avoid having to re-process the data. In other words, when a user creates a new instance of InsightFacade, they should be able to query/list/remove datasets that had been added to "old" instances, so long as they had not been removed since. To simplify your implementation of this feature, once a new InsightFacade has been created, the behaviour of all "old" instances is undefined. This means that you will only have to reason about having one concurrently active InsightFacade at a time.

const facade = new InsightFacade();
await facade.addDataset("ubc", dataset, InsightDatasetKind.Sections);

// later...

const newInstance = new InsightFacade();

// this should still work, and work the same as facade.removeDataset("ubc")

newInstance.removeDataset("ubc");

As mentioned, you must cache correctly added datasets on disk. Disk behaves differently than memory. Disk persists between InsightFacade instances but memory does not. For example, storing an array in a class variable, would be storing that array in memory (e.g. private arrayInMemory = ["a", "b", "c"];). Saving that array in a file, would be storing that array on disk. Disk is obviously more persistent than memory and this is intentional -- if any code crashed, data in memory would be lost, but data in disk could be safely recovered.

The valid dataset files should be saved to the <PROJECT_DIR>/data directory. You can store the data on disk in whatever way that works for you. You just need to be able to (a) read it, and (b) remove it if removeDataset is called with the appropriate id. This can be a single file, a folder full of files, text file(s), json file(s), etc. Hint: You may be tempted to read all the data when constructing a new InsightFacade instance, but this will cause performance issues. We recommend that you load datasets lazily.

The dataset files should be read and written using the fs-extra package.

Make sure to not commit any files in <PROJECT_DIR>/data to git as this may cause unpredicted test failures.