Figure 1
Concrete is the most widely used construction material in the entire world and arguably the most important material in civil engineering. Strength is one of the main reasons concrete has been used for constructions for many decades. It can easily withstand tensile and compressive stresses without getting affected. It is exceptionally durable and can last for ages as it can survive harsh weather conditions and disasters. It is rigid and resilient from deformation These characteristics, however, result in concrete structures lacking the flexibility to move in response to environmental or volume changes. Cracking is usually the first sign of distress in concrete [2]. It is, however, possible for deterioration to exist before cracks appear.
Concrete surface cracks are major defect in civil structures (buildings, bridges, etc.) which if not treated on time, it would not only lead to the detrimental effect on its structural health and longevity but can cause real large-scale disasters which we have seen happened on many occasions in the past and claimed the lives of thousands. An example is the Dhaka garment factory collapse in Bangladesh. It occurred on 13th April 2013 and killed 1134 people and approximately 2500 injured [4]. To reduce such disasters from happening, structural inspection should be carried out on civil structures on regular basis. Structural inspection is done for the evaluation of rigidity and tensile strength of the structure. This is usually done by checking for cracks on concretes[2]. Crack detection plays a major role in a structural inspection process, finding the cracks and determining the building health. The cracked concretes are then replaced with new ones. As a data scientist I want to build a deep learning machine system that would be able to predict whether a concrete is cracked or normal
Creating a deep learning model with an accuracy rate of at least 99% using Convolutional neural network (CNN) algorithm that would predict whether a concrete is cracked (in other words alert you when a crack is detected) or normal. Convolutional neural network is a class of deep neural networks, most applied to analyzing visual imagery. I will be making use of important CNN libraries such as keras. Keras is an open-source software library that provides a python interface for artificial neural networks.
This is an image classification problem which involves giving an image as the input to a model built using a specific algorithm that outputs the class or the probability of the class that the image belongs to. This process in which we label an image to a particular class is called Supervised Learning. The algorithm is Convolutional neural network mostly used for visual imagery. CNN model consist of two main layers. The first part consists of the Convolutional layers and the Pooling layers in which the main feature extraction process takes place. In the second part, the Fully Connected and the Dense layers perform several non-linear transformations on the extracted features and act as the classifier part[3]. This model is explain and illustrated more detailly below
The metric I will be focusing on is a predicting metric. That is, a metrics better in predicting cracks
The performance metrics include; precision, recall, accuracy and f1-score
A crack detection system was done with a similar dataset- structural network defect dataset(SDNET2018) which contains over 56000 images of cracked and non-cracked concrete bridge decks, walls, and pavements. Most of the images were captured from Utah State University. The system has an accuracy of 76%
Milind Raj worked on a similar data using RESNET50 and obtained an accuracy rate of 95.3%
For the projects which I have come across which used a similar dataset as seen above, an accuracy in the range of 76- 95.3% was obtained. My goal is to obtain an accuracy of at least 99% or better.
The data set is from the Mendeley data. The data was collected from various Middle East Technical University (METU) Campus Buildings. The dataset was published on 23rd June 2019 by Çağlar Fırat Özgenel [1]
My dataset is unstructured. The dataset is divided into two as negative and positive crack images for image classification. Each class has 20000images with a total of 40000 images with 227 x 227 pixels with RGB channels.
My data is unstructured data. It is divided into two classes -
20000 negative images representing normal concretes
20000 positive images representing crack concretes
There's a total of 40000 images of 227x227 pixels with RGB channels.
These images are best represented in a data frame with two columns, one for the File path and the other for the label
Figure 2
As seen in Figure 2, my data frame has two columns, 'Filepath' and 'Label'.
It has a total of 40000 observations or rows representing 40000 images
This ratio gives us a total of 32000 images reserve for training and 8000 images for testing.
Image preprocessing
Normalization is the most crucial step in image pre-processing. This refers to rescaling the pixel values so that they lie within a confined range. Our original images consist of RGB coefficients in the 0-255 , but such values would be too high for this model to process , so we target values between 0 and 1 instead by scaling with a 1/255.
Generators
Three generators are created; Train_gen, validation_gen and test_gen each with validated images belonging to two classes
In the prepocessing phase, validation_split is 0.1 of my train dataset
These classes are either positive or negative. The three generators are:
Train_gen: 28800 validated image filenames in two classes
Validation_gen: 3200 validated image fienames in two classes
Test_gen: 8000 validated image filenames in two classes
This involves understanding my datasets by placing it in a visual context so that patterns, trends and correlations that might not otherwise be detected can be exposed.
Figure 3
Label count
From figure 3:
They are two classes of labels- positive and negative
20000 images have a positive label and the other 20000 have a negative label
Preview of images in train data
Fig 4: Preview of images in train data
Preview of data in test data
Fig 5: Preview of images in test data
Image Augmentation is a way of applying different types of transformation techniques on actual images, thus producing copies of the same image with alterations. This helps to train deep learning models on more image variations than what is present in the actual dataset
Random rotation
The image rotation technique enables the model by generating images of different orientations. The ImageDataGenerator class in Keras uses this technique to generate randomly rotated images in which the angle can range from 0 degrees to 360 degrees.
Fig 6: Random rotation
Random shift
The random shifts technique helps in improving those images that are not properly positioned . Keras ImageDataGenerator uses parameters height_shift_range for vertical shifts in an image and for horizontal shifts in an image, we can use width_shift_range
Horizontal shift
For horizontal shift, we are using width_shift_range argument.
Fig 7: Horizontal shift
Vertical shift
For the vertical shift data augmentation technique, we are using height_shift_range argument.
Fig 8: Vertical shift
Random flips
Another technique for performing augmentation is the flipping of images. ImageDataGenerator helps in flipping the images, it can either flip horizontally or vertically. Random flip can we horizontal or vertical
Horizontal flip
For horizontal flip operation, we are using horizontal_flip argument.
Fig 9: Horizontal flip
Vertical flip
For vertical flip operation, we are using vertical_flip argument.
Fig 10: Vertical flip
Random brightness
Random brightness is the most useful technique as in most cases we have images with low or almost no lighting. Thus, we can train our model on the images generated using these images. The brightness is controlled using brightness_range argument.
Fig 11: Random brightness
Random zoom
The zooming in and zooming out operation is implemented using zoom_range argument.
Fig 12: Random zoom
Creating the convolutional base
A sequential model allows us to create models layer by layer in a step-by-step fashion
This model takes input of shape (120, 120, 3). Image_height is 120, image_width is 120 and channel is 3
The model has multiple convolutional layer each followed by a MaxPooling ;
Convolutional layer consist of a filter, kernel_size and activation function. This is the stage in which most of the base features such as sharp edges and curves are extracted from the image and hence this layer is also known as the feature extractor layer.
Pooling layer: The pooling operation is also known as down sampling where the spatial volume of the image is reduced. If we perform a Pooling operation with a stride of 2 on an image with dimensions 28×28, then the image size reduced to 14×14, it gets reduced to half of its original size.
The activation is a mathematical gate in between the input feeding the current neuron and its output going to the next layer. They basically decide whether the neuron should be activated or not.
ReLU activation function is widely used and is default choice as it yields better result
Convolutional base summary
As seen in the figure, the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.
Fig 13: Convolutional base summary table
Adding Dense layers on top
To complete the model, feed the last output tensor from the convolutional base (of shape (7, 7, 64)) into one or more dense layer to perform classification. Dense layers take vectors as input (which are 1D), while the current output is a 3D tensor. First, flatten (or unroll) the 3D output to 1D, then add one or more Dense layers on top. My image dataset has 2 output classes, so I will use a final Dense layer with 2 outputs.
Model summary
Fig 14; Dense layer summary table
GITHUB
Click here
Compiling the model
To compile our model, the following parameters are needed: Optimizers, loss function and metrics
Optimizer
Adam optimizers perform the best on average compared to other optimizers.
Loss function
The loss function used to compile our model is categorical cross entropy. It is used as a loss function for multi class classification model. That is when we have two or more output labels
Metrics
The metrics is accuracy
Training the model
The model is trained with keras fit() function. The model is trains for 15 epochs. The fit() function will return a history object; By storing the result of the function in a history, it is used to plot the accuracy and the loss function plots between the training and validation which will help to visualize our model's performance visually.
To train our model, the following parameters are needed: number of epochs, validation data and train data
Epoch
An epoch represents one training step
Fig 15: Training the model
From the figure above, after 15 epochs a train accuracy of 99.67% was obtained
Model evaluation on test data
Accuracy: 99.721% , Loss: 1.19%
An accuracy of 99.721% looks impressive!
Visualizing the accuracy and loss of my model
Putting my model evaluation into perspective by plotting the accuracy and loss plots of the training and validation data
Fig 16: Visualizing the accuracy and loss of model
Phase II PowerPoint Presentation
GITHUB
PHASE III
Predicting Labels
Model test data prediction gives us floating point values. It will not be feasible to compare the predicted labels with true test labels. So, I will round off the output which will convert the float values into integers. Further more, I will use np.argmax() to select the index number which has a higher value in a row
Numpy argmax () is an inbuilt function that is used to get the indices of the maximum element from our array (single dimension array) or any row or column of any given array
Fig 17: Predicting test label
Confusion matrix
A confusion matrix is a predictive analytics tool. Specifically, it is a table that displays and compares actual values with the model’s predicted values .
Fig 17: Confusion matrix
True Positive (TP)
True positive represents the value of correct predictions of positives out of actual positive cases. Out of 4003 actual positives, 3991 are correctly predicted positive. Thus, the value of true positive is 3991.
False Positive (FP)
It represents the value of incorrect positive predictions. The value represents the number of negatives(out of 3997) which gets falsely predicted a positive. Out of 3997 actual negatives, 6 is falsely predicted as positive. Thus the value of false positive is 6.
True Negative (TN)
True negative represents the value of correct prediction of negatives out of actual negative cases. Out of 3997 actual negatives , 3991 are corrected predicted as negatives. The value of true negatives is 3991.
False Negative (FN)
False Negative represents the value of incorrect negative predictions. This value represents the number of actual positives (out of 4003) which gets falsely predicted as negatives. Out of 4003 actual positives, 12 is incorrectly predicted as negatives. Thus the value of False Negative is 12
Performance metrics
Precision
It represents the model's ability to correctly predict the positives out of all the positive prediction it made. It represents the ratio between the number positive samples correctly classified to the total number of samples classified as positive (either correctly or incorrectly).
Precision score = TP/(TP+FP)
= (3991/(6+3978)
= 1.00
Recall
Model recall score represents the model's ability to correctly predict the positives out of actual positives. The recall is calculated as the ratio of the true positives to the actual positives
Recall score = TP/(TP+FN)
= 3991/(3991+12)
= 1.00
Accuracy score
It represents the model's ability to correctly predict both the positives and negatives out of all the predictions.
Accuracy score = (TP+TN)/(TP+FP+TN+FN)
= (3991+3991) / (3991 + 12 + 3991 + 6)
= 1.00
F1-Score
It represents the model's score as a function of precision and recall score. The F1-score is a way of combining the precision and recall of the model, It is also known as the harmonic mean of the model's precision and recall
F1-score = 2 X Precision score X Recall score / (Precision score + Recall score)
= (2 X 1.00x1.00) /( 1.00+ 1.00)
= 1.00
Classification Report
Classification report gives us a summary table containing precision, recall, F1-score and makes it easy for us to observe which class performs better
Fig 18: Classification report
Pros and cons of the CNN model
Pros
•In terms of performance, CNN model is very efficient relative to other models
•It is simple to implement and requires fewer parameters to build
•It is used in various fields and perform major tasks like facial recognition, analyzing documents, understanding climate, image recognition and object identification
Cons
•Overfitting is a common problem when training the model especially when we don’t have enough data to train our model with. This problem can be resolve by increasing the data through augmentation.
•Another common issue is data loss during the training process. This can be reduced by increasing the number of epochs and increasing the amount of data through data augmentation.
•If the CNN has several layers, then the training process takes a lot of time if the computer doesn’t consist of a good GPU.
Conclusion
Both classes performed the same in terms of precision, recall and f1-score
My model has a test accuracy of 99.721% which is better than my initial goal of 99.000%. Therefore, my project is successful
Phase III PowerPoint presentation
GITHUB
[1] Özgenel, Ç. F. (2019, 07 23). Mendeley Data. Retrieved from Concrete Crack Images for Classification: https://data.mendeley.com/datasets/5y9wdsg2zt/2
[2] SCIENTIFIC, G. (2019, August 17). Evaluating Cracking in Concrete: Procedures. Retrieved from GIATEC: https://www.giatecscientific.com/education/cracking-in-concrete-procedures/
[3] Vadapalli, P. (2021, February 25). Image Classification in CNN: Everything You Need to Know. Retrieved from upgradeblog: https://www.upgrad.com/blog/image-classification-in-cnn/
[4] Wikipedia. (2021, February 13). 2013 Dhaka garment factory collapse. Retrieved from Wikipedia: https://en.wikipedia.org/wiki/2013_Dhaka_garment_factory_collapse
[5] GreekForGreek. (2020, May 18). Keras.Conv2D Class. Retrieved from GreekForGreek: https://www.geeksforgeeks.org/keras-conv2d-class/
[6] Lathiya, A. (9, September 5). Numpy Argmax: How To Use Np Argmax() Function In Python. Retrieved from https://appdividend.com/2020/03/28/python-numpy-argmax-function-example/
[7] ScienceDirect. (2021). Convolutional Layer. Retrieved from ScienceDirect : https://www.sciencedirect.com/topics/engineering/convolutional-layer