1 . Environment Setup: Install the required Qiskit libraries for quantum machine learning.
2. Importing Libraries: Import all required libraries including NumPy, Pandas, Matplotlib, Seaborn, Scikit-learn, and Qiskit modules for building and evaluating classical and quantum classifiers.
3. Data Loading: Mounting the Google Drive and loading the soil dataset from Google Drive.
4. Target Variable Creation: Creating a binary target variable by splitting counties into high and low cultivated land based on the median of the CULT_LAND column.
5. Feature Selection: Selecting four features (NVG_LAND, GRS_LAND, FOR_LAND, SQ3) identified through forward sequential feature selection. Four features are used because each feature maps to one qubit in the quantum models.
6. Preprocessing: Splitting the data into training and testing sets and scaling the features using StandardScaler. The scaler is fit only on training data to prevent data leakage.
7. Training Subsets for Quantum Models: Creating a reduced 800-sample training subset for the quantum models using stratified subsampling. This is standard practice in QML research due to the O(n^2) cost of quantum kernel computation.
8. Evaluation Metrics Helper: Defining a reusable helper function that computes the same standardized set of metrics (Accuracy, Precision, Recall, F1 Score, AUC-ROC, Balanced Accuracy, MSE, training time, inference time, parameter count) for every model.
9. Classical SVM: Our baseline is a Support Vector Classifier (SVC) with an RBF kernel. It finds a decision boundary that maximizes the margin between classes, using the kernel trick to implicitly map data into a higher-dimensional space. We first train it on the full 2,487 samples to show what classical ML achieves at scale.
We then train the same SVM on the reduced 800-sample subset for a fair comparison with the quantum models. Both runs use the same test set, so all accuracy numbers are directly comparable.
10. Quantum Circuit Components: Setting up the quantum circuit components: ZZFeatureMap with reps=1 for data encoding, RealAmplitudes ansatz with reps=3 for trainable rotations, and COBYLA optimizer with 200 iterations for training.
11. Hybrid Quantum-Classical (QSVC): Training the Hybrid Quantum Support Vector Classifier using a FidelityStatevectorKernel and grid search over the C parameter to find the best SVM configuration.
12. Quantum VQC (Fully Quantum): Training the Variational Quantum Classifier on the reduced dataset using the COBYLA optimizer with 200 iterations and a convergence callback to track loss. This cell may take 8-12 minutes to run.
13. Final Comparison: All four model results (Classical SVM at full data, Classical SVM at 800 samples, Hybrid QSVC at 800 samples, and Quantum VQC at 800 samples) are assembled into a single comparison DataFrame. All models are evaluated on the same test set, making every metric directly comparable.