Install libraries and import packages.
Load dataset.
Data preprocessing and preparation.
The features, target, test subsample size, and total sample size are defined. The raw data undergoes preprocessing to ensure proper interpretation by the models, including the removal of data points with empty parameters, and the conversion of the target text class labels into numeric format. A stratified subsample of 5,000 is taken to avoid time constraint from QKNN while maintaining original class proportions. Features are scaled to accommodate for the distance-dependency of KNN. The underlying relationship between data points is preserved.
Training the KNN model.
Begin by finding the best K between 1 - 50 using 5-fold cross validation. A temporary KNN model is instantiated for each value 1-50, splitting the train set into 5 portions and conducting training on 4, testing on the remaining 1. Highest accuracy is best_k.
Best k is then used to train the final KNN model using the full training set of 4,000 samples.
Testing the model and output metrics.
Calculating quantum distance using fidelity.
Set up quantum device using PennyLane's CPU simulator default.qubit. Device uses 7 wires across Ancilla, Register A, and Register B. 6 wires for 3 features and 3 qubits and 1 for ancilla qubit. Then proceed to encoding, where numerical values are converted to radians in relation to its qubit using CNOT gates (RY and RZ) provided by PennyLane.
SWAP test measures the similaraty between two points. Points x and z are encoded into Register A and Register B respectively and passed throgh the ancilla. The test returns the probability that the ancilla measures |0), which is the indicator that the states experience overlap. This similarity metric is the converted into quantum distance.
Training the QKNN model.
Quantum distance is calculated between the test and training sets and stored in matrix Q_DIST. The elapsed time is returned.
Making predictions - testing the model. Iterating across the matrix.
Rendering the bar graph using MatLab.