Assignment 2 - KT Prediction
THE CODE AND INPUT FILE
You are given four MATLAB files [files have changed since project 1, please download the files attached to this project]:
make_knowledge_model.m (modified)
DESCRIPTION: Creates the topology for the Knowledge Tracing model and sets values for the conditional probability tables
INPUTS: none
OUTPUTS: A Bayesian network object
CHANGES(since part 1): The parameters are not set for the model.
get_data.m (original)
DESCRIPTION: Get data from a txt input file
INPUTS: The name of an input file and a Bayesian network object
OUTPUTS: A cell array of data in the form that can be processed by BNT
fit_parameters.m (modified)
DESCRIPTION: Learns the conditional probability tables of a Bayesian network from sampled data
INPUTS: A Bayesian network object and a cell array of data
OUTPUTS: A Bayesian network object set with the learned parameters
CHANGES(since part 1): The initial value of parameters are not random but set to be constants, the forget parameter is set to be 0.
predict_data.m
DESCRIPTION: Predicts data points of the hidden and observed variables from the sampled data based on the network parameters
INPUTS: A Bayesian network object and a cell array of data
OUTPUTS: text
assistments_2009_2010_skill47_for_matlab.txt (original)
DESCRIPTION: The remaining data are all from the skill: #47 Conversion of Fraction Decimals Percents. The first column is the user_id; The second column is the correctness; The third column is the opportunity count, represents how much opportunity the student have to practice on this skill. The first two columns are copied from and the third column is computed from the original data.
The big idea is that (1) create a model, (2) generate data from that model and (3) try to learn the parameters of the model. New data can be predicted by (4) using the learned parameters of (3). This is a common scenario in Intelligent Tutoring Systems where the parameters for a model of knowledge are trained using a previous year's student data and the current year's students' knowledge and performance are predicted online based on those parameters as the students progress through the system.
ASSIGNMENT
Note: Please email saadjei@wpi.edu this week to schedule a weekly 20 min meeting, time slots that just before and just after the class time is preferable. If you are doing project in a group of 2~4, please include the names of the members of your team in the email as well. Thanks.
Please write your answers in a document.
1. Investigate the accuracy of observed node prediction given different learned parameters. For this experiment you will learn different parameters from some of the data, then use those parameters to predict the rest of the data. This is a typical training and testing set paradigm where the data that is being predicted is not used (held out) in training. The analogue to the real world is that you may have data from last year's class which you could use to train the parameters of your model and then predict the current year's class responses and knowledge using those parameters.
Run bnt = make_knowledge_model.
Run sampdata = get_data('assistments_2009_2010_skill47_for_matlab.txt', bnt) to read in the data.
Run training_data = sampdata(n:m, :); to select n~m rows in the dataset as training data. Note that n and m are integer variables.
Run test_data= sampdata(r:s, :); to select r~s rows in the dataset as test data. r and s are integers.
Run bnet_with_learned_parameters = fit_parameters(bnt, training_data). This output will contain the learned parameters.
Run predict_data(bnet_with_learned_parameters, test_data) and take note of the errors
Run the experiment using different number of rows (20, 50, 100, 500, etc) as training data, observe if that influence the error
Extra: Modify predict_data so that it prints out the average MAE for each opportunity, observe if there's any trend as opportunity increases
DELIVERABLES
Bring to class a printed paper with your experiment steps and results, as well as things you learned and difficulties you meet in this homework.
PROJECT SUBMISSION
At the beginning of the class on Monday, Sept 15th.