Can model different data (a dataset , a query) into suitable data structures and reason about alternative representations.
Can understand and work with external libraries (fs-extra, JSZip) by following documentation.
Able to reason about and debug programs with asynchrony (Promises, async/await).
Able to handle exceptions properly for programs involving asynchronicity.
Can parse and validate a query based on the given EBNF grammar.
Can extract a set of glass box tests for a particular implementation.
In Checkpoint 0, you created a test suite against the insightUBC Section specification. In Checkpoint 1, you will design and build an implementation of that specification using unit tests and code coverage metrics to ensure the quality of your implementation.
Repo: All of your work for C1 onwards will take place in a new GitHub repository that you and your partner will share. To be provisioned this repository, you must specify your team in Classy. Details about when registration opens and when your repo will be provisioned will be communicated to you.
Grade: Your grade for Checkpoint 1 is calculated by AutoTest on the main branch of your repo as follows:
your grade = (number of grader tests passing against your implementation) / (total number of tests)
See AutoTest Feedback for details on getting feedback (and your grade), what your feedback looks like, and how you should use it. In particular, note that during the checkpoint, your grade is reported as a bucket.
You cannot use any library package that is not already specified in this document or required for TypeScript compilation (i.e.: types).
Your implementation must be in TypeScript.
You are not allowed to store the data in any external database, only disk storage is permitted.
Do not store your datasets as static or global variables; keep them as members of a class. This is important because we try to clear datasets between tests. If your datasets are stored globally, then one misbehaving test may cause all subsequent tests to fail.
We've updated our reference implementation to no longer return results in the same order as the Reference UI. What does this mean?
When running your tests against our implementation your tests which used to pass, might now fail. The order of the query results your test's expected might differ from the order of the results our implementation is returning, even though they both contain the same items. If the assertions in your tests are too strict (i.e., they depend on the order being identical to that from the reference UI), your tests will fail.
You will want to update your test assertions to not rely on order when it is not explicitly set.
This checkpoint involves implementing the insightUBC project! Your implementation should follow the insightUBC Section Specification.
There are two main parts to implementation in C1: the Dataset Processor and the Query Engine. The Dataset Processor roughly corresponds to the addDataset method of insightFacade, and the Query Engine to performQuery.
In order for insightUBC to be able to answer all kinds of questions about datasets, it must first load and process the data from the given zip files. You will take the dataset zips you've seen in C0, check that they are indeed valid datasets, and convert them into data model(s) of your choice. There are many good ways to model a section. For example, you could represent each section with its fields as a TypeScript class. Try to reason about different representations, keeping in mind that your Query Engine will be working with this representation when answering queries.
insightUBC also needs a Query Engine, so that it can answer questions about the datasets processed by the Dataset Processor. The Query Engine takes a JSON query, parses it, and validates that it is both syntactically and semantically correct. You will also implement the code to find the desired subset of all your datasets that matches a query.
Modeling Queries
As with a Section in a dataset, you will want to give a query a representation within your code. Coming up with a good model might take a couple of tries, but try to reduce a query into smaller parts that are more manageable.
For example, let's consider this sample query in the original JSON format. One way you could model a query is as a recursive tree structure (aka AST). One benefit of this representation is that it naturally converts from the EBNF used to specify the grammar of a query.
At the top level, we have a query. A query consists of two sub-components, a WHERE block and an OPTIONS block. And the WHERE block, again, consists of a single sub-component, the MComparator. Similarly, the OPTIONS block further decomposes into the COLUMNS and ORDER components.
This kind of decomposition is nice because it achieves separation of concerns. You could have a function that only cares about handling of the WHERE block, while another function is responsible for handling the OPTIONS block. The goal here is to make each function easier to reason about, code, and test.
Tip: If your design uses this kind of query representation, your old friend recursion might come in handy!
This specification might seem intimidating, but keep in mind that this project has the same interaction mechanism as most software systems:
It consumes input data (the zip file).
It transforms the data (according to the query).
It returns a result.
There is no best way to get started, but you can consider each of these in turn. Some possible options that could be pursued in any order (or skipped entirely):
Start by looking at the data file we have provided and understand what kind of data you will be analyzing and manipulating. This will help you think through the types of data structures you may want to create (this is a precursor to step 1 above).
Look at the sample queries in the specification. From these queries, figure out how you would want the data arranged so that you can answer these queries (this is the precursor to step 2 above).
Ignoring the provided data, create some fake data (maybe for one section of one course). Write the portion of the system that queries this fake data (this is step 2 above).
Like the above, using some fake data and a fake query processor, write code that would return the fake data correctly and with the correct error codes (this is step 3 above).
Trying to keep all of the requirements in mind at once is going to be overwhelming. Tackling a single task that you can accomplish in an hour is going to be much more effective than worrying about the whole specification at once. Iteratively growing your project from one small task towards the next small task is going to be the best way to make forward progress.
One successful pattern we have observed for decomposing the specifications is writing actual user stories for the parts of the spec you have identified so you can keep track of what you are doing in your own words. When you tap 'new issue' in your repository you will see a User Story issue template that can be extremely helpful for tracking and coordinating the work your team will undertake.
It is common to misuse the fs-extra methods when reading and writing files to disk. Unfortunately, misuse can cause timing issues which may appear as failing tests on AutoTest (not locally) on every fifth run or only in future checkpoints when things begin to slow down. These issues are tricky to diagnose but easy to fix and prevent!
For this reason, we have blocked the use of fs-extra (and fs:node) synchronous methods within your implementation (e.g. you will not be able to use writeJSONSync).
To avoid all of this pain please make sure to read the documentation carefully: fs-extra documentation.
Below is an example of how not to use the package:
function createFile() {
fs.writeJSONSync('./package.json', {name: 'fs-extra'})
}
A testing anti-pattern is to only have integration tests (e.g., tests that directly evaluate addDataset, removeDataset, listDatasets and performQuery). A much more robust testing strategy that makes it easier to implement new features and isolate failures is to write unit tests against the individual methods in your implementation. Your C0 tests are actually integration tests as you are testing the top-level API methods, but not directly invoking the dozens of methods these top-level methods invoke.
To implement the API you will likely have to create your own additional methods and classes.
The best way to test your system is via your own unit test suite. This will be the quickest and easiest way to ensure your system is behaving correctly and to make sure regressions are not introduced as you proceed further in the project. Additionally, testing these individual methods and classes (without invoking the top-level InsightFacade APIs) will make your suite faster, better at detecting faults, and better at isolating faults so you can more easily fix them.
Writing the code will not be the hardest part about C1! It is important to communicate well with your partner, make sure both of you have read the specification, and plan what responsibilities each person will undertake.
One way to "split" C1 is into the Dataset Processor (addDataset, removeDataset, listDataset) and the Query Engine (performQuery). An issue you may experience while doing so, is that the Query Engine depends on datasets loaded by the Dataset Processor to produce query results. If you find yourself waiting for your partner to implement addDataset, you should first work on validating the query structure, as this task does not depend on any datasets being loaded. Your team can also discuss ahead of time what the Sections data will look like (after being processed by Dataset Processor), so you can test performQuery against some mocked Sections data.
We also recommend pair programming, especially when implementing complex algorithms, or while debugging. It is a great way to ensure that both of you have a shared understanding of the entire program, and to catch mistakes that will otherwise get missed.
In your package.json, there is a new script which allows you to view the coverage of your test suite on your implementation.
"scripts": {
...
"cover": "nyc --reporter text --reporter html yarn run test",
...
}
After running yarn cover, you will see some coverage stats in the console. Running yarn cover will also create a coverage directory located at the root of your project. Open the coverage/index.html file in the browser to view more details about your coverage.
The coverage directory is a build directory, so it should not be committed or pushed to Github. You can ignore the coverage directory, by updating your .gitignore file to include it.
The following resources have been created by course staff to assist you with the project.
TypeScript: An introduction to TypeScript.
Promises: An introduction to promises and their asynchronous properties.
Project Overview: An overview of the project.
Git Cookbook: Learn the basics of Git.
Async Cookbook: Learn about promises and the differences between synchronous and asynchronous code.
Pull Request Cookbook: Learn about Pull Requests in Github
Collaboration: Document about how to collaborate on the project.
Please follow the instructions in the IDE setup cookbook and restart your IDE to have prettier and lint issues displayed right in your IDE before posting to Piazza.
Take a look over here. When you use the Object type in TypeScript, the compiler does not like not knowing what types are expected from the key/value pairs. The best solution is to use an interface (first solution in the link), but if you need a quick one time solution you can also do a cast (second solution in the link). Keep in mind that 'one time fixes' often end up being more than one time, doing it right the first time can save you a lot of hassle in the long run. Do not use the third solution (editing tsconfig).
You can also create quick inline interfaces for objects:
let stringToNumMap: {[id: string]: number} = {}; // TypeScript may also ask for the new cast syntax
let maybeNum: any = 3;
let num: number = 1;
let new = num + (maybeNum as number); // Note this 'cast' is only compile time, does not actually cast/convert
Make sure TypeScript is enabled by going to Preferences > Languages & Frameworks > TypeScript and check if the checkbox 'enable TypeScript compiler' is checked. If it is not, check it and you should be good to go.
Also if you have set up Mocha in WebStorm, you can enable the Compile TypeScript option as a 'Before launch' setting to make sure it has always compiled when you run your tests. If it's still not compiling, you can always run yarn build manually, and check with a TA during lab or in office hours.
The command we write for you to execute your test suite , yarn test, has a timeout parameter that is set to 10 seconds. You can find where the command is defined in your package.json:
"test": "mocha --require ts-node/register --timeout 10000 --extension .spec.ts --recursive test",
When you execute your tests within IntelliJ, by pressing the green arrow with "Run <Your Test/Test Suite>", it uses a different configuration. If you want to keep the increased timeout, you'll need to manually update your Mocha configuration.
Open the Run/Debug Configuration Dialog as shown here.
Click "Edit Configuration templates..." which appears at the bottom left of the dialog.
Click on the Mocha Template.
Add --require ts-node/register --timeout 10000 to the "Extra Mocha options" and click "Apply".
Delete all old Mocha configurations. Now, when you create a new configuration, it should use this increased timeout.