The dataset in this module is used for CAPTCHA letter recognition using CAPTCHA images.
The dataset can be found on the Kaggle website: https://www.kaggle.com/datasets/fournierp/captcha-version-2-images?resource=download
Or
https://drive.google.com/file/d/1nD5OQR6NOrrkBjzUZw7RfFwWKehgjhOG/view?usp=sharing
In this Module, we will be implementing Convolutional Neural Network (CNN) for CAPTCHA Bypass in a new Google Collab notebook. In the dataset, a large number of images are there consisting of CAPTCHA for testing Convolutional Neural Network (CNN) for CAPTCHA Bypass.
Convolutional Neural network for captcha bypass
Let us first import all the necessary libraries that are required.
The fully ready google colab code can be found from this link to copy paste or run: https://colab.research.google.com/drive/1ic4fxhJ11ZlEDDzYJ0gs3_MRYNl_DCPO#scrollTo=l_9Jqw_1RD-a
Copy and paste the following link to open google colab
Then click File --> New notebook
Click the red box area in the website and change the file name to Neural network algorithms for network DOS detection.ipynb
Next click the Runtime and change runtime type (in Hardware accelerator) to GPU (Because it will run faster than CPU).
Let us first import all the necessary libraries that are required
import os
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
from collections import Counter
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
Now, we will upload the dataset into the Google Colab. Image data comes in a folder, therefore, we will mount it on the gmail and then use the path directory of the dataset for further analysis.
Step 1: Download the dataset from Kaggle website . Then unzip it in your local machine.
Step 2: Upload the data into your google drive, for simplicity, you can upload the image folder into the MyDrive
Step 3. Mount google drive at your colab: run the following code into the colab
## mounting at google drive
from google.colab import drive
drive.mount("/content/gdrive")
Step 4: Connet to google drive --> choose your account where you upload the data --> allow to mount
Step 5: find the path directory and run the following code:
data_dir = Path("/content/gdrive/My Drive/your-data-folder-name/")
# Get list of all the images
images = sorted(list(map(str, list(data_dir.glob("*.png")))))
labels = [img.split(os.path.sep)[-1].split(".png")[0] for img in images]
characters = set(char for label in labels for char in label)
print("Number of images found: ", len(images))
print("Number of labels found: ", len(labels))
print("Number of unique characters: ", len(characters))
print("Characters present: ", characters)
# Batch size for training and validation
batch_size = 16
# Desired image dimensions
img_width = 200
img_height = 50
# Factor by which the image is going to be downsampled
# Hence total downsampling factor would be 4.
downsample_factor = 4
# Maximum length of any captcha in the dataset
max_length = max([len(label) for label in labels])
After run the code, the result indicates that the dataset contains 1040 captcha files as png images.
We will now Preprocessing the task using following codes:
# Mapping characters to integers
char_to_num = layers.StringLookup(
vocabulary=list(characters), mask_token=None
)
# Mapping integers back to original characters
num_to_char = layers.StringLookup(
vocabulary=char_to_num.get_vocabulary(), mask_token=None, invert=True
)
def split_data(images, labels, train_size=0.9, shuffle=True):
# 1. Get the total size of the dataset
size = len(images)
# 2. Make an indices array and shuffle it, if required
indices = np.arange(size)
if shuffle:
np.random.shuffle(indices)
# 3. Get the size of training samples
train_samples = int(size * train_size)
# 4. Split data into training and validation sets
x_train, y_train = images[indices[:train_samples]], labels[indices[:train_samples]]
x_valid, y_valid = images[indices[train_samples:]], labels[indices[train_samples:]]
return x_train, x_valid, y_train, y_valid
# Splitting data into training and validation sets
x_train, x_valid, y_train, y_valid = split_data(np.array(images), np.array(labels))
def encode_single_sample(img_path, label):
# 1. Read image
img = tf.io.read_file(img_path)
# 2. Decode and convert to grayscale
img = tf.io.decode_png(img, channels=1)
# 3. Convert to float32 in [0, 1] range
img = tf.image.convert_image_dtype(img, tf.float32)
# 4. Resize to the desired size
img = tf.image.resize(img, [img_height, img_width])
# 5. Transpose the image because we want the time
# dimension to correspond to the width of the image.
img = tf.transpose(img, perm=[1, 0, 2])
# 6. Map the characters in label to numbers
label = char_to_num(tf.strings.unicode_split(label, input_encoding="UTF-8"))
# 7. Return a dict as our model is expecting two inputs
return {"image": img, "label": label}
Now we will Create Dataset objects using the following code:
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = (
train_dataset.map(
encode_single_sample, num_parallel_calls=tf.data.AUTOTUNE
)
.batch(batch_size)
.prefetch(buffer_size=tf.data.AUTOTUNE)
)
validation_dataset = tf.data.Dataset.from_tensor_slices((x_valid, y_valid))
validation_dataset = (
validation_dataset.map(
encode_single_sample, num_parallel_calls=tf.data.AUTOTUNE
)
.batch(batch_size)
.prefetch(buffer_size=tf.data.AUTOTUNE)
)
Model prediction is required in order to understand how well the trained model works. By predicting each of the 5 characters with the test set the below result is found.
Visualize the data:
_, ax = plt.subplots(4, 4, figsize=(10, 5))
for batch in train_dataset.take(1):
images = batch["image"]
labels = batch["label"]
for i in range(16):
img = (images[i] * 255).numpy().astype("uint8")
label = tf.strings.reduce_join(num_to_char(labels[i])).numpy().decode("utf-8")
ax[i // 4, i % 4].imshow(img[:, :, 0].T, cmap="gray")
ax[i // 4, i % 4].set_title(label)
ax[i // 4, i % 4].axis("off")
plt.show()
Finally, the visualization of the above test captcha image along with the predicted result shows this model’s performance.