Checkpoint 1

Learning Outcomes

Can model different data (a dataset , a query) into suitable data structures and reason about alternative representations.
Can understand and work with external libraries (fs-extra, JSZip) by following documentation.
Able to reason about and debug programs with asynchrony (Promises, async/await).
Able to handle exceptions properly for programs involving asynchronicity.
Can parse and validate a query based on the given EBNF grammar.
Can extract a set of glass box tests for a particular implementation.

Change Log

1. Checkpoint 1 Introduction

In Checkpoint 0, you created a test suite against the insightUBC Section specification. In Checkpoint 1, you will design and build an implementation of that specification, and probably write more (glass box) tests for your implementation.

Teams. This checkpoint and all future checkpoints will be completed in teams of two. You must use the same partner for the duration of the project, no changes will be permitted. If you do not have a partner, the TAs will help you find one during the first lab after Checkpoint 0. Your partner must be in your lab.

Labs. Labs are now mandatory and will be for the remainder of the term. During the labs, you and your partner will meet with your TA to discuss progress on the project and any issues you have encountered.

2. Grading

We will check for copied code on all submissions, so please make sure your work is your own.

Your grade for Checkpoint 1 will be from AutoTest. It will be assessed according to the bucketed grading described on the Project Grading page, up to a maximum quality of Proficient.

Labs Attendance (Scrum)

Attending labs is required for the remainder of the term, regardless of your progress on the project. During labs, you and your partner will meet with your TA to discuss progress on the project and any issues you come across.

See the Grading Page for details on how Lab Attendance affects your Project Grade.

2.1 Requirements

You cannot use any library package that is not already specified in this document or required for TypeScript compilation (i.e.: types).
Your implementation must be in TypeScript.
You are not allowed to store the data in any external database, only disk storage is permitted.
Do not store your datasets as static or global variables; keep them as members of a class. This is important because we try to clear datasets between tests. If your datasets are stored globally, then one misbehaving test may cause all subsequent tests to fail.

2.2 Submitting Your Work

We will grade every commit on the main branch of your repository. Your grade will be the maximum grade you received from all commits on the main branch made before the checkpoint deadline.

From C1 and onwards, the main branch of your repo will be treated differently from other branches. Only commits merged to main are eligible to be assessed by the full private Client Test Suite used to register a grade. Your project is automatically graded every time you merge a pull request (PR) to the main branch.

2.3 AutoTest Feedback

See AutoTest Feedback for details on what your feedback looks like and how you should use it.

3. Implementation

This checkpoint involves implementing the insightUBC project! Your implementation should follow the insightUBC Section's Specification outlined in Checkpoint 0.

3.1 Repository Setup

Getting your team repository

You will be provisioned a new repository to use for C1 and onwards. You and your partner will share this common repository. A team creation tool will be available in Classy after the add/drop deadline. Once available, please specify your team through Classy at https://cs310.students.cs.ubc.ca/. We will create project repositories daily for the first week, so you will be provisioned your team repository within 24 hours of creating your team. All work must take place in this repo (including all future checkpoints).

Git Branches

Now that you are working as a team, you will need to use git branches to coordinate implementation. The main branch will be protected so that you cannot commit or push to it directly, you can only merge changes via pull requests. Each pull request must also be approved by your partner. Course staff cannot approve your PRs, your partner _must_ do this.

Using version control branches is a great way to make it easier to work with your partner, but it is important that you merge your branches with main periodically. Having more than 3 branches is considered an anti-pattern, and stale branches should be deleted. When merging a branch into main, please use only the default merge option. If you merge using squash or rebase, the bot will not see a new commit on main and will only provide feature/dev branch feedback.

3.2 Changes from C0

Reference Implementation Query Results Ordering Update

We've updated our reference implementation to no longer return results in the same order as the Reference UI. What does this mean?

When running your tests against our implementation using the command @310-bot #check, your tests which used to pass, might now fail. The order of the query results your test's expected might differ from the order of the results our implementation is returning, even though they both contain the same items. If the assertions in your tests are too strict (i.e., they depend on the order being identical to that from the reference UI), your tests will fail.

You will want to update your test assertions to not rely on order when it is not explicitly set.

3.3 Implementing insightUBC

There are two main parts to implementation in C1: the Dataset Processor and the Query Engine. The Dataset Processor roughly corresponds to the addDataset method of insightFacade, and the Query Engine to performQuery.

3.3.1 Dataset Processor: Modeling Sections Data

In order for insightUBC to be able to answer all kinds of questions about datasets, it must first load and process the data from the given zip files. You will take the dataset zips you've seen in C0, check that they are indeed valid datasets, and convert them into data model(s) of your choice. There are many good ways to model a section. For example, you could represent each section with its fields as a TypeScript class. Try to reason about different representations, keeping in mind that your Query Engine will be working with this representation when answering queries.

3.3.2 Query Engine

insightUBC also needs a Query Engine, so that it can answer questions about the datasets processed by the Dataset Processor. The Query Engine takes a JSON query, parses it, and validates that it is both syntactically and semantically correct. You will also implement the code to find the desired subset of all your datasets that matches a query.

Modeling Queries

As with a Section in a dataset, you will want to give a query a representation within your code. Coming up with a good model might take a couple of tries, but try to reduce a query into smaller parts that are more manageable.

For example, let's consider this sample query in the original JSON format. One way you could model a query is as a recursive tree structure (aka AST). One benefit of this representation is that it naturally converts from the EBNF used to specify the grammar of a query.

At the top level, we have a query. A query consists of two sub-components, a WHERE block and an OPTIONS block. And the WHERE block, again, consists of a single sub-component, the MComparator. Similarly, the OPTIONS block further decomposes into the COLUMNS and ORDER components.

This kind of decomposition is nice because it achieves separation of concerns. You could have a function that only cares about handling of the WHERE block, while another function is responsible for handling the OPTIONS block. The goal here is to make each function easier to reason about, code, and test.

Tip: If your design uses this kind of query representation, your old friend recursion might come in handy!

3.4 Advice

fs-extra Package Usage

It is common to misuse the fs-extra methods when reading and writing files to disk. Unfortunately, misuse can cause timing issues which may appear as failing tests on AutoTest (not locally) on every fifth run or only in future checkpoints when things begin to slow down. These issues are tricky to diagnose but easy to fix and prevent!

For this reason, we have blocked the use of fs-extra (and fs:node) synchronous methods within your implementation (e.g. you will not be able to use writeJSONSync).

To avoid all of this pain please make sure to read the documentation carefully: fs-extra documentation.

Below is an example of how not to use the package:

function createFile() {

fs.writeJSONSync('./package.json', {name: 'fs-extra'})

}

Testing

A testing anti-pattern is to only have integration tests (e.g., tests that directly evaluate addDataset, removeDataset, listDatasets and performQuery). A much more robust testing strategy that makes it easier to implement new features and isolate failures is to write unit tests against the individual methods in your implementation. Your C0 tests are actually integration tests as you are testing the top-level API methods, but not directly invoking the dozens of methods these top-level methods invoke.

To implement the API you will likely have to create your own additional methods and classes.

The best way to test your system is via your own unit test suite. This will be the quickest and easiest way to ensure your system is behaving correctly and to make sure regressions are not introduced as you proceed further in the project. Additionally, testing these individual methods and classes (without invoking the top-level InsightFacade APIs) will make your suite faster, better at detecting faults, and better at isolating faults so you can more easily fix them.

Dividing Work

Writing the code will not be the hardest part about C1! It is important to communicate well with your partner, make sure both of you have read the specification, and plan what responsibilities each person will undertake.

One way to "split" C1 is into the Dataset Processor (addDataset, removeDataset, listDataset) and the Query Engine (performQuery). An issue you may experience while doing so, is that the Query Engine depends on datasets loaded by the Dataset Processor to produce query results. If you find yourself waiting for your partner to implement addDataset, you should first work on validating the query structure, as this task does not depend on any datasets being loaded. Your team can also discuss ahead of time what the Sections data will look like (after being processed by Dataset Processor), so you can test performQuery against some mocked Sections data.

We also recommend pair programming, especially when implementing complex algorithms, or while debugging. It is a great way to ensure that both of you have a shared understanding of the entire program, and to catch mistakes that will otherwise get missed.

Coverage

In your package.json, there is a new script which allows you to view the coverage of your test suite on your implementation.

"scripts": {

...

"cover": "nyc --reporter text --reporter html yarn run test",

...

}

After running yarn cover, you will see some coverage stats in the console. Running yarn cover will also create a coverage directory located at the root of your project. Open the coverage/index.html file in the browser to view more details about your coverage.

The coverage directory is a build directory, so it should not be committed or pushed to Github. You can ignore the coverage directory, by updating your .gitignore file to include it.

3.5 Github Issue Templates

You may have noticed we've included three Github issue templates in your initial repository: Office Hour Question, Schedule, and User Story. When you create a new issue on your repo, you can choose to create one of these three types of issues, or a custom issue. The templates are stored within the .github/ISSUE_TEMPLATE directory within your project.

Office Hour Question

In open source software, issue templates are frequently used to standardize the information provided in, and the format of issues. This provide process for the community, and ensures each issue has adequate information to make the issue actionable. For example, a bug report will include all the information to reproduce the bug (versions, dependencies, platform, steps to reproduce). You can checkout the issue templates used by NodeJS, the runtime environment we use for the project.

We've provided an Office Hour Question template, that must be used when asking TAs for debugging help during Office Hours (aka when you want to show the TA your code). It contains there sections: the Expected Behaviour (required), the Current Behaviour (required) and Additional Information (optional) like steps to reproduce. The expected and current behaviour should be specific to the problem, and should isolate the issue - writing "adding a dataset doesn't work" is not sufficient. There needs to be enough detail to make the issue actionable. For example, "The sections object is empty after processing my files. After using the debugger, my code returns before processing the sections is complete".

Schedule

A provided template so you can make a schedule with your partner.

User Story

A provided template so you can create user stories with your partner. You can combine these user stories into the schedule, and/or create a Github Project with them.

4. Resources

4.1 Getting started

This specification might seem intimidating, but keep in mind that this project has the same interaction mechanism as most software systems:

1. It consumes input data (the zip file).
2. It transforms the data (according to the query).
3. It returns a result.

There is no best way to get started, but you can consider each of these in turn. Some possible options that could be pursued in any order (or skipped entirely):

- Start by looking at the data file we have provided and understand what kind of data you will be analyzing and manipulating. This will help you think through the types of data structures you may want to create (this is a precursor to step 1 above).
- Look at the sample queries in the specification. From these queries, figure out how you would want the data arranged so that you can answer these queries (this is the precursor to step 2 above).
- Ignoring the provided data, create some fake data (maybe for one section of one course). Write the portion of the system that queries this fake data (this is step 2 above).
- Like the above, using some fake data and a fake query processor, write code that would return the fake data correctly and with the correct error codes (this is step 3 above).

Trying to keep all of the requirements in mind at once is going to be overwhelming. Tackling a single task that you can accomplish in an hour is going to be much more effective than worrying about the whole specification at once. Iteratively growing your project from one small task towards the next small task is going to be the best way to make forward progress.

One successful pattern we have observed for decomposing the specifications is writing actual user stories for the parts of the spec you have identified so you can keep track of what you are doing in your own words. When you tap 'new issue' in your repository you will see a User Story issue template that can be extremely helpful for tracking and coordinating the work your team will undertake.

4.2 Recommended Videos & Tutorials

The following resources have been created by course staff to assist you with the project.

Videos

TypeScript: An introduction to TypeScript.

Promises: An introduction to promises and their asynchronous properties.

Project Overview: An overview of the project.

Bucket Grading: An overview of bucket grading.

Tutorials

Git Cookbook: Learn the basics of Git.

Async Cookbook: Learn about promises and the differences between synchronous and asynchronous code.

Pull Request Cookbook: Learn about Pull Requests in Github

Collaboration: Document about how to collaborate on the project.

4.3 Common Issues

Formatting/Linting Not Working

Please follow the instructions in the IDE setup cookbook and restart your IDE to have prettier and lint issues displayed right in your IDE before posting to Piazza.

Implicitly has 'any' type"

Take a look over here. When you use the Object type in TypeScript, the compiler does not like not knowing what types are expected from the key/value pairs. The best solution is to use an interface (first solution in the link), but if you need a quick one time solution you can also do a cast (second solution in the link). Keep in mind that 'one time fixes' often end up being more than one time, doing it right the first time can save you a lot of hassle in the long run. Do not use the third solution (editing tsconfig).

You can also create quick inline interfaces for objects:

let stringToNumMap: {[id: string]: number} = {}; // TypeScript may also ask for the new cast syntax

let maybeNum: any = 3;

let num: number = 1;

let new = num + (maybeNum as number); // Note this 'cast' is only compile time, does not actually cast/convert

IntelliJ/Webstorm Not Compiling Files

Make sure TypeScript is enabled by going to Preferences > Languages & Frameworks > TypeScript and check if the checkbox 'enable TypeScript compiler' is checked. If it is not, check it and you should be good to go.

Also if you have set up Mocha in WebStorm, you can enable the Compile TypeScript option as a 'Before launch' setting to make sure it has always compiled when you run your tests. If it's still not compiling, you can always run yarn build manually, and check with a TA during lab or in office hours.

IntelliJ/Webstorm Test Timeouts

The command we write for you to execute your test suite , yarn test, has a timeout parameter that is set to 10 seconds. You can find where the command is defined in your package.json:

"test": "mocha --require ts-node/register --timeout 10000 --extension .spec.ts --recursive test",

When you execute your tests within IntelliJ, by pressing the green arrow with "Run <Your Test/Test Suite>", it uses a different configuration. If you want to keep the increased timeout, you'll need to manually update your Mocha configuration.

Open the Run/Debug Configuration Dialog as shown here.
Click "Edit Configuration templates..." which appears at the bottom left of the dialog.
Click on the Mocha Template.
Add --require ts-node/register --timeout 10000 to the "Extra Mocha options" and click "Apply".
Delete all old Mocha configurations. Now, when you create a new configuration, it should use this increased timeout.

Page updated

Google Sites

Report abuse