Assignment 1 - Learning to implement and Fit KT in Matlab
PROJECT GOAL
To understand a MATLAB implementation of a Bayesian Network model called Knowledge Tracing and demonstrate that knowledge through manipulation of the code and reporting of the inputs and outputs of the code.
If you have any question, please email saadjei@wpi.edu Remember to add CS565 to the subject of the email. This will ensure a quicker response.
This project may be done in pairs.
MATLAB
The project code is written in MATLAB using Kevin Murphy's Bayes Net Toolbox. You will need to download the toolbox from: http://code.google.com/p/bnt/.
A thorough HOWTO is available on that site explaining how the toolbox works in general.
The MATLAB application is installed on most CCC machines and can be installed on your PC via the rivet wpi share.
Instructions for installing BNT and running MATLAB via ssh on a CCC machine:
use your favorite ssh client (such as putty) to ssh to ccc.wpi.edu
download the bnt package to your home directory:
$ wget -nd http://bnt.googlecode.com/files/FullBNT-1.0.7.zip
Unzip the package:
$ unzip FullBNT-1.0.7.zip
Start MATLAB:
$ matlab -nodisplay
Change to the directory of the unzipped BNT package and add all the subdirectories to your path:
>> cd ./bnt/
>> addpath(genpathKPM(pwd));
To run the project code, change directory to the directory that contains your code and then type the function names. Three options for getting the project code into your ccc home directory are:
Map your ccc home directory to a drive on your computer. Use \\filer.wpi.edu\home as the resource and STUDENT\username as the username.
Download the project files to your computer and then scp them to ccc.wpi.edu with an scp client like WinSCP.
Use: wget -nd -O filename.m "copy/paste code link from this project page" (the -O option and double quotes are necessary)
MATLAB can also be accessed by connecting through windows Remote Desktop to windows.wpi.edu. Your machine must either be connected to the WPI network (wifi included) or WPI VPN to use this option.
THE CODE AND INPUT FILE
You are given three MATLAB files which can be found at the bottom of this page:
make_knowledge_model.m
DESCRIPTION: Creates the topology for the Knowledge Tracing model and sets values for the conditional probability tables
INPUTS: none
OUTPUTS: A Bayesian network object
get_data.m
DESCRIPTION: Get data from a txt input file
INPUTS: The name of an input file and a Bayesian network object
OUTPUTS: A cell array of data in the form that can be processed by BNT
fit_parameters.m
DESCRIPTION: Learns the conditional probability tables of a Bayesian network from sampled data
INPUTS: A Bayesian network object and a cell array of data
OUTPUTS: none
assistments_2009_2010_skill47_for_matlab.txt
DESCRIPTION: To make this very first homework easier, I did some preprocesses on the original data set to remove all the irrelevant information, and leave only the data that is required for the matlab code. The remaining data are all from the same skill: #47 Conversion of Fraction Decimals Percents. The first column is the user_id; The second column is the correctness; The third column is the opportunity count, represents how much opportunity the student have to practice on this skill. The first two columns are copied from and the third column is computed from the original data.
The big idea is that (1) creates a model, (2) get data and (3) tries to learn the parameters of the model.
ASSIGNMENT
Please write your answers to these question in a Word document.
1. The make_knowledge_model.m file defines the topology of the model in a directed acyclic graph. Please draw this model using circles to represent nodes and arrows to represent causal links from one node to another.
2. Investigate parameter learning result
Run bnt = make_knowledge_model with the default parameters of prior = 0.30, learn = 0.14, forget = 0, guess = 0.20 and slip = 0.08.
Run sampdata = get_data('assistments_2009_2010_skill47_for_matlab.txt', bnt) to read in the data.
Run fit_parameters(bnt, sampdata) (with the whole file it took Heffernan with a nice machine 20 seconds to get the first EM iteration to finish. So this is slow. Start with a file where you make it much smaller. Expect output like this as the EM process does its slow iterative process.)
>> fit_parameters(bnt2, sampdata)
EM iteration 1, ll = -7907.3624
EM iteration 2, ll = -6451.9997
EM iteration 3, ll = -6279.5132
EM iteration 4, ll = -6244.4542
EM iteration 5, ll = -6232.3814
What were the learned parameters reported?
3. Investigate parameter learning with less student data
Try different number of rows (3, 6, 10, 100, etc.) of sampdata instead of all of them. Since there is randimization in each you should run it many times (I suggest 10). What were the learned parameters reported? Is there any trend in the results?
DELIVERABLES
Zip and submit all your project files to saadjei@wpi.edu , including:
Any code you have modified. You must at least submit the modified get_data.m and fit_parameters.m.
A document with your experiment steps and results, as well as lessons you learned and difficulties you encountered in this homework.
PROJECT SUBMISSION
One hour BEFORE class. Thursday, September 11th (due one hour before class)