Machine Learning Edition
Apart from its main goal of being a simulation of HOU's biology lab for our students to be trained, Onlabs has been also used for Machine Learning research. Specifically, three ML methods have been implemented: a Genetic Algorithm, an Artificial Neural Network and Reinforcement Learning.
What are they about?
Onlabs student edition contains, among others, an evaluation mode, where the human user is being evaluated by the computer (our rater) with respect a particular experimental procedure with a scoring mechanism that we have designed for that purpose. The user performs various actions (e.g. for the microscoping procedure, they plug the microscope cable into the socket, put the test specimen on the microscope stage, etc.) and receives the respective individual scores, each one of them concerning a particular sub-task of the procedure they are dealing with. Those individual scores are then combined into a weighted average which consists of the user's progress rate. We've used a weighted average because the various individual scores are not of the same importance (e.g. plugging the microscope cable into the socket is more important than testing the various microscope knobs), so they have their own weights. The resulting progress rate represents to what extent the experimental procedure is completed. However, progress rate does NOT consider the order those actions were made. For evaluating the order, we have designed another scoring mechanism involving penalties which are assigned when an action is done in the wrong order. Again, not all actions are equally important, that time in terms of order, so different penalties are assigned depending on the mistake done (e.g. looking through the microscope oculars before configure the light intensity receives higher penalty than testing the specimen holder knob before testing the stage knob). The assignment of different penalties is done due to different penalty weights corresponding to each action, similarly to the progress rate weights. The progress rate and the accumulated penalties make up the aggregate score (two other factors contribute to the aggregate score; the resetting rate, that is, a rate measuring to what extent the various instruments used in that procedure were reset to their original state and the time spent in the procedure, but they are not taken into account in our ML implementation).
Nevertheless, both weighted average weights and penalty weights are intuitive, that is, been defined by us on speculation. So, this is where ML gets in: we use a GA to calibrate the weighted average weights and RL for the penalty weights. And what about the ANN? Well, we use a three-layer ANN as an alternative measure of the progress rate, in which individual values are passed into as input values, and we train it with Back-Propagation. In this case, instead of the weighted average weights, the weights being calibrated are the weights of the ANN.
We need to mention here that our ML implementation concerns only the microscoping procedure for now and not the way more complex electrophoresis procedure included in the latest student edition.
So, let us summarize for now:
Our basic scoring system measuring the progress rate of the user is a weighted average of the various individual scores. We calibrate the weights of the weighted average with the use of a GA.
Our alternative progress rate measure is an ANN receiving the individual score as inputs. We tune the weights of the ANN with Back-Propagation training.
Our scoring method for the order of actions made by the user is based on penalties assigned to actions made in the wrong order according to the respective penalty weights. We reconfigure the penalty weights with the use of RL.
But how is this implemented?
Apart from Instruction and Evaluation modes, Onlabs ML edition has an extra mode, that of Computer Training. The latter has two sub-modes, 'Training the Rater' and 'Training the Bot'.
In 'Training the Rater', the training data are collected by having the human user carry out the microscoping procedure several times and be evaluated by several (human) experts; what is being collected is various data sets, each one consisting of the individual scores of a play session along with an expert's (intuitive) evaluation for that particular session as well as the classification of the user's performance in that session as 'High', 'Medium' or 'Low'. Then the training data are either put into our GA or are used in the for the training of our ANN with Back-Propropagation, with the various parameters set by the (human) trainer, such as generations, fitness function, mutation rate, epochs, bias, etc. Upon the completion of the training, a .csv file is produced in the /MLData folder of the app containing the Mean Squared Error for the various generations or epochs respectively and for several training and testing groups (re-substitution, cross-validation among experts etc.) along with a .txt file with a fittest weight vector for the weighted average or the post-training weights of the ANN. Furthermore, the user is asked whether they want to save the produced weights for use in the Evaluation Mode instead of the intuitive ones defined by us.
[NOTICE 1: To train the rater, the number of experts one needs to have used in data collection process are between 2 and 6. They also need to have the same number of classifications (High, Medium, Low) within every expert, again being between 2 and 6. Otherwise the training does not run. But don't worry! We have included a MIC_training_data.txt in the /MLData folder in our app. Those data satisfy the afore-mentioned restrictions and you may use them to experiment on.]
In 'Training the Bot', a Non-Playable Character does all the work by playing on its own and being trained with RL in how to correctly perform microscoping. The user needs only to set the parameters for training, such as the number of episodes, the steps per episode, the discount rate, etc. In our RL implementation, the rewards are the negatives of the penalties received by the NPC while playing. Upon completion of training, a .csv file is produced with the final and maximum aggregate scores for each episode as well as the Mean Difference Analogue (MNDif) for it, representing to how often the NPC has followed the same state-action pairs. In case the user chooses variable rewards instead of fixed ones (the variation being done according to our MNDif-inspired algorithm described in our forthcoming publications), when training is over, they are asked whether to save the new penalty weights that have been produced.
[NOTICE 2: If you have saved the weighted average, ANN or penalty weights but you decide that you don't want any of them anymore, you just delete the respective .txt file created in the /Specs/Scorings folder in the app. The original weights are then restored.]
You may download OnlabsML v. 1.0 from here. It's currently available only for Windows, as we have implemented the automatic cursor moving in 'Training the Bot' sub-mode with the use of Windows libraries. It's also compiled under "quality level: fastest" for obvious reasons.
In case you have any question about the app or the ML theory behind it, please email us and we'll be happy to answer! Do you have any suggestion for improvement? Please let us know, too.