Post date: Sep 23, 2018 12:39:43 PM
Training or testing data used in Deep Learning, if done at a single go can cause Out Of Memory problems. In such situations, the solution is to divide the data into batches. This is an example of how testing is executed by using batches. Developing a measure of accuracy for a training completed deep learning algorithm, the following code is used:
prediction = tf.equal(tf.argmax(fc2, 1), tf.argmax(lbl_holder, 1))
accuracy = tf.reduce_mean(tf.cast(prediction, tf.float32))
overall_accuracy = sess.run(accuracy, feed_dict={img_holder: test_data,
lbl_holder: test_labels, train: False}))
The use of a single batch may cause OOM problems so the batch is divided into smaller batches. First, initialize some numbers for the batch processing.
test_batch_size = 1000
num_test_batches = int(10000 / test_batch_size)
test_batch_sum = 0
In this instance, the size of the entire test_batch is 10000 and my 3GB GPU PC couldn't take the derived matrices and generated an OOM error. The test_batch_size selected is based on the largest 00's number I could do without the error. I tried 2000 and it still errored out. The test_batch_sum is to accumulate the sum of the batches. Note that in the first block of code for the entire 10000 batch run, accuracy was calculated using the mean of the test batch. SInce we are breaking the batch into smaller batches, we cannot use this and instead use the tf.reduce_sum() function.
prediction = tf.equal(tf.argmax(fc2, 1), tf.argmax(lbl_holder, 1))
correct = tf.reduce_sum(tf.cast(prediction, tf.float32)
for test_batch in range(num_test_batches):
test_batch_start = random.randint(0, test_batch_size*(num_test_batches-1)-1)
test_batch_end = test_batch_start + test_batch_size
test_img_batch = test_data[test_batch_start:test_batch_end, :]
test_lbl_batch = test_labels[test_batch_start:test_batch_end, :]
single_batch_sum = sess.run(correct, feed_dict={img_holder: test_img_batch,
lbl_holder: test_lbl_batch, train: False))
test_batch_sum += single_batch_sum
overall_accuracy = correct_sum / 10000
And this is where I got the idea from.
https://stackoverflow.com/questions/50111438/tensorflow-validate-accuracy-with-batch-data