Part of the lab is to be sure you can write the BinaryLabels class and the BagOfWordsFeatures class and use them to extract the X and y matrices. Therefore, there will be less guidance on how the sample output below was created. However, the command-line arguments used when running the program will always be shown.
python3 pcl_main.py /cs/cs159/data/patronize/patronize_full.xml /cs/cs159/data/patronize/patronize_full.xml /cs/cs159/data/patronize/vocab.txt -o predictions.txt -x 2 --train_size 1000 -s 100 -v 1000
>>> np.sum(X_train, axis=0)
[[56 57 55 62 54 43 55 49 48 57 51 45 39 30 39 45 43 39 33 42 42 53 36 34
34 37 41 40 27 29 30 31 40 39 39 26 28 35 26 36 23 31 23 40 31 22 45 39
29 31 32 30 20 30 25 29 26 36 27 26 27 35 32 35 34 33 28 33 24 24 28 23
21 30 24 24 24 29 25 33 25 30 25 28 19 33 23 20 25 25 29 20 22 22 28 23
16 23 29 25 20 23 27 25 20 28 29 18 21 23 19 23 19 16 26 15 27 19 17 17
19 14 17 26 18 13 14 27 26 17 21 19 20 21 28 23 20 23 19 23 18 20 18 11
26 11 18 17 17 21 18 13 17 29 22 19 21 16 17 15 17 19 14 21 15 21 15 15
17 17 14 15 17 20 10 18 17 14 17 8 20 18 22 13 17 16 15 17 22 15 16 23
10 15 21 9 16 21 10 20 11 10 20 17 19 15 15 11 27 17 17 13 11 13 15 16
16 17 19 19 19 9 12 26 11 11 17 14 11 13 18 18 10 8 11 10 12 9 10 18
10 18 14 24 17 18 11 15 8 14 11 11 18 13 16 13 9 12 14 15 16 11 15 6
8 16 21 6 20 7 24 11 5 8 15 13 20 15 11 8 13 11 9 11 9 8 9 10
17 12 20 16 16 20 12 7 7 11 10 11 13 7 10 16 9 13 17 11 13 7 14 11
17 10 12 12 10 8 10 6 18 11 6 14 8 12 12 21 10 8 7 15 8 14 17 9
8 14 12 9 14 10 18 16 12 12 11 10 6 17 11 12 6 18 10 10 6 4 6 12
6 8 10 15 16 7 9 9 7 17 13 9 8 11 19 17 12 9 13 11 9 7 7 10
5 13 12 7 17 5 12 9 6 8 9 6 10 11 13 16 16 11 8 13 7 5 9 6
8 7 4 10 16 18 8 5 6 9 8 13 5 10 13 9 9 7 2 5 8 9 10 10
7 12 9 11 4 8 7 9 3 12 10 8 11 9 9 10 9 9 13 10 11 5 5 9
10 9 6 10 10 5 14 10 9 9 6 7 9 12 5 7 6 12 4 7 4 7 7 4
2 5 9 13 9 7 7 12 10 8 9 9 9 1 11 16 4 7 9 5 15 7 7 6
9 5 6 10 4 5 9 9 5 12 5 8 6 10 7 6 9 10 7 7 9 6 6 9
5 15 7 2 11 7 9 11 11 9 10 11 6 10 8 5 5 5 6 11 10 9 12 10
5 9 5 8 8 9 10 10 7 11 10 4 12 5 8 7 2 11 10 10 4 13 7 6
10 6 5 5 7 10 10 20 6 6 7 9 5 8 5 6 6 9 11 13 12 3 5 5
6 8 11 11 9 4 6 5 8 8 5 4 5 7 4 6 8 10 7 6 4 8 8 6
4 3 7 8 8 9 6 9 4 2 8 8 5 5 9 7 6 8 9 9 10 9 4 7
4 12 2 1 7 7 5 9 8 2 5 6 6 7 9 10 12 6 5 3 5 9 9 5
5 14 7 14 4 3 8 6 3 8 6 3 7 6 3 7 6 10 9 5 8 5 4 6
6 6 13 9 4 3 8 7 7 6 6 7 4 6 1 7 7 4 5 7 2 8 6 3
3 3 5 10 5 4 4 11 4 5 5 9 7 3 11 11 6 10 4 7 1 3 9 7
9 8 10 7 6 4 13 7 8 6 6 4 8 5 9 5 8 6 8 7 7 7 10 0
7 7 5 8 8 3 5 4 5 6 5 4 4 5 4 7 2 7 5 7 5 2 5 2
6 7 8 2 7 9 8 6 2 6 8 6 9 9 6 5 7 6 6 7 4 7 7 5
2 4 2 2 6 6 8 7 8 5 6 7 6 5 5 6 6 7 3 8 9 4 2 9
6 6 3 10 5 4 2 1 7 7 7 9 6 4 10 4 7 10 2 9 5 3 7 6
8 7 5 6 3 9 6 4 7 8 7 8 2 6 5 3 8 2 4 3 2 7 8 7
5 4 4 5 3 1 2 3 2 12 7 3 6 6 4 4 5 5 6 9 7 6 3 6
3 2 3 7 4 4 0 6 7 9 8 4 4 6 3 7 2 6 7 5 4 5 7 7
5 5 3 1 2 2 3 3 9 7 5 7 4 4 8 3 6 1 7 4 7 3 3 3
8 2 4 4 4 3 6 5 6 7 3 3 2 6 3 10 6 2 3 7 3 6 2 2
9 7 5 4 5 6 5 6 3 5 5 3 7 2 1 3]]
>>> class_probs
[[0.52974589 0.47025411]
[0.95817766 0.04182234]
[0.87683086 0.12316914]
...
[0.62481496 0.37518504]
[0.48022029 0.51977971]
[0.95869947 0.04130053]]
>>> y_pred
[0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
1 0 1 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1
0]
>>> sum(y_pred)
101
python3 pcl_main.py /cs/cs159/data/patronize/patronize_full.xml /cs/cs159/data/patronize/patronize_full.xml /cs/cs159/data/patronize/vocab.txt -o predictions.txt -t women --train_size 1000 -s 100 -v 1000
>>> class_probs
[[9.39143815e-01 6.08561848e-02]
[9.76073660e-01 2.39263399e-02]
[9.57087837e-01 4.29121628e-02]
[9.87554338e-01 1.24456622e-02]
[9.57310745e-01 4.26892550e-02]
...cut for brevity...
[9.30012512e-01 6.99874879e-02]
[9.99696735e-01 3.03265411e-04]
[9.58561680e-01 4.14383201e-02]
[9.99962867e-01 3.71334548e-05]
[9.90246136e-01 9.75386428e-03]]
>>> y_pred
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
>>> sum(y_pred)
3