- Initial step is to apply filtering to images so that we can mimic the saliency detection algorithm that our brain uses to identify features of the image.
- Frequency tuned image saliency detection is used in this project to attain the necessary features.
- The processing includes transforming image color space, applying filters, rescaling the pixel range and finally image acquisition from the system.
- The image acquisition system provides us with Saliency Map in Gray Scale along with Binary Maps with Adaptive thresholding.
The primary aspect of Image processing is to chose which color space to use to do the processing. Therefore Lab color space is selected for processing.
- Lab color space is based on the principles of color opponency, where L gives luminance information and a and b give color opponency in green red and yellow blue respectively.
- The concept of color opponency says that these color opponents cannot be distinguished separately. For example, there is no color as greenish red or yellowish blue.
- The importance of LAB is that 90% of the visible color can be produced using LAB color space which is not possible using RGB and other color models. LAB color space is independent of the platform or the device used to create the image.
- The method introduced is purely computational and is not based on biological vision principles.
- The average of each component in a three-dimensional space with three components: L*, a*, b*. The average values are further used to calculate the saliency map.
- To obtain low frequency component and disregard the high frequency we use Gaussian filtering.
- The low frequency helps us in obtaining a uniformly distributed saliency map and the high frequencies retained give well-defined boundaries to the salient part.
- This is used to obtain the best binary mask from the saliency map.
- This is needed to obtain the most desired precision and recall values in order to plot a better ROC curve as well as AUC value for the selected image.
Deep Learning - Unsupervised Learning
- After obtaining the Saliency Map and Binary maps we can train machine learning algorithm to understand the patterns.
- In this way we can train the machine to attain the results of the algorithm without actually running the algorithm thus reducing computing and improving efficiency.
Tools - TensorFlow (Python), CUDA
- For the implementation of Deep Learning model, TensorFlow Package by Google is used.
- For enhanced efficiency TensorFlow-GPU can be used along with CUDA by NVIDIA .
- Images have to be of the same dimension(preferred square images) so that the training of the model is faster.
- The data set available on the Internet (e.g., Coco Data-set) for images have all the images in standard size.
- However in this project the images are of varying dimensions, so we need to transform the image into standard square dimensions.
- It is also necessary to resize the images because that reduces the size of training data thus improving the efficiency of the Model.
- After getting all images in standard size we can applying statistical tools to understand the data available to us.
- This helps us to tweak the model parameters to obtain better results.
- Statistical tools like mean, standard deviation are great tools with low computation costs.
Deep Learning - Supervised Learning
- Keras is a Deep Learning library for Python, that is simple, modular, and extensible.
- A sequential Keras model is used for the implementations.
- The softmax function squashes the outputs of each unit to be between 0 and 1, just like a sigmoid function. But it also divides each output such that the total sum of the outputs is equal to 1.
- The output of the softmax function is equivalent to a categorical probability distribution, it tells you the probability that any of the classes are true.
- The ReLU function is f(x) = max(0,x).
- This is applied element-wise to the output of a function, such as a matrix-vector product.
- In MLP usages, rectifier units replace all other activation functions.
- One way ReLUs improve neural networks is by speeding up training.
- The gradient computation is very simple (either 0 or 1 depending on the sign of Input).
- Also, the computational step of a ReLU is easy: any negative elements are set to 0.0 -- no exponents, no multiplication or division operations.
- Cross-entropy is used to quantify the difference between two probability distributions.
- This helps to predict levels of image pixels in a manner that helps better visualization and prediction.
- This is an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.
- The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients.
- It is well suited for problems that are large in terms of data and parameters in this case images.