Presentation

Methods

Simple Neural Networks

Simple neural network version 1

Simple neural network version 2

Convolutional Neural Networks - Visual Geometry Group

Small Visual Geometry Group SmallVGG

Visual Geometry Group VGG

Why focus on a three classification models?

The reasons that the project focused on continuing developing the three classes is that it can be expanded to more languages easily without making a lot of changes to the primary model. If we use the model for two categories classification, we will need to use different methods in Keras and require a lot of modification for future expansion.

Dataset

There are three different version of datasets used:

Version 1: random images that have specified text in a different environment with too many other objects.
- Used open library by hardikvasa called google-images-download. It's a python script that allows you to search and download hundreds of images.
- You can use different keywords to specify the size, the type, and the images format.
  - $ for example : $googleimagesdownload --keywords "playground" --limit 20 --color red
Version 2: random images of the specified language alone with white background.
- Created the dataset using Word document of random words taken from different articles and news. Then use an online converter to convert the PDF to separated images.
Version 3: a modified version of version 2 using Pillow python library.
- Function 1 to do several transformations on the images, e.g., rotation, flipping position and color changing.

def normapro(folder, f):

# note f is now the name of one image

    i = Image.open(folder + f)

    # split the image name and extension

    fn, fext = os.path.splitext(f)

    # do rotation of 45 and crop the image

    out1 = i.rotate(45)

    box = (350, 0, 650, 400)  # box = (30, 30, 410, 410)

    region1 = out1.crop(box)

    region1.save(folder + '{}_prepro1.jpg'.format(fn))

    # do rotation of 90 and crop the image

    out2 = i.transpose(Image.ROTATE_90)

    box = (30, 100, 410, 600)

    region2 = out2.crop(box)

    region2.save(folder + '{}_prepro2.jpg'.format(fn))

    # flip images  left to right and opposit

    out3 = i.transpose(Image.FLIP_LEFT_RIGHT)

    out3.save(folder + '{}_prepro3.jpg'.format(fn))

    # flip images top to bottom

    out4 = i.transpose(Image.FLIP_TOP_BOTTOM)

    out4.save(folder + '{}_prepro4.jpg'.format(fn))

    # invert black to white and opposit

    out5 = PIL.ImageOps.invert(i)

    out5.save(folder + '{}_prepro5.jpg'.format(fn))

    # change the color of the images

    orig_color = (255, 255, 255)

    replacement_color1 = (200, 0, 0)

    replacement_color2 = (0, 200, 0)

    replacement_color3 = (0, 0, 200)

    img = i.convert('RGB')

    data1 = np.array(img)

    data2 = np.array(img)

    data3 = np.array(img)

    data1[(data1 != orig_color).all(axis=-1)] = replacement_color1

    data2[(data2 != orig_color).all(axis=-1)] = replacement_color2

    data3[(data3 != orig_color).all(axis=-1)] = replacement_color3

    out6 = Image.fromarray(data1, mode='RGB')

    out7 = Image.fromarray(data2, mode='RGB')

    out8 = Image.fromarray(data3, mode='RGB')

    out6.save(folder + '{}_prepro6.jpg'.format(fn))

    out7.save(folder + '{}_prepro7.jpg'.format(fn))

    out8.save(folder + '{}_prepro8.jpg'.format(fn))

    # now incrument the breaker bk

Function 2 to invert the picture background with the text

def bnwpro(folder, f):

    # note f is now the name of one image

    i = Image.open(folder + f)

    # split the image name and extension

    fn, fext = os.path.splitext(f)

    inverted_image = PIL.ImageOps.invert(i)

    inverted_image.save(folder + '{}.jpg'.format(fn))

Dataset version 1 for the Arabic language

Dataset version 1 for the English language

Dataset version 2 for the English language

Dataset version 3 Pillow image processing for the Japanese language

Dataset version 3 with applying Keras data image generator

Results

Simple Neural Network

Classification report for data with several transformation

Classification report for data with inverted background

Above pictures, show the classification report on the validation data while training the model which shows excellent results. Testing the Simple neural network models on 24 random images that never seen by the model, the results are not that promising. From the 24 random samples, the model classified 14 images correctly and ten images that are not correctly classified. And the overall performance was 58.33%.

Conventional Neural Network (CNN) - VGG

Classification report for data with several transformation

Classification report for data with inverted background

Surprisingly, the SNN performed better than VGG, and that could be for many reasons. Using Pillow to modify the data and then use it in VGG model which already had a Keras data image generator which takes the date and transforms it again and adds noise to it. And this second transformation may cause VGG to be confused. For the CNN, it seems that classify almost every image to be Japanese, and maybe after rotating Arabic and English data, it becomes similar to Japanese data which made CNN get confused.

Why Raspberry Pi?

In addition to being inexpensive and easy to work on, Been able to run the project on RPi will help in including it with other interesting RPi projects. And that was part of the reasons why. So the project could be integrated into many other applications,e.g., google image translator.

DLFR on Raspberry Pi

https://www.youtube.com/watch?feature=youtu.be&v=iBfhAOs8fxo

Conclusion

It seems that no one did this project before, the method of classifying different languages font recognition (DLFR) as an object and not extracting the text from the image. Overall, this project could be beneficial for others who want to integrate it into their applications and try different or improve current methods. The DLFR project focused on using two models in deep learning, simple neural networks, and convolutional neural networks. Moreover, the project's dataset was self-created with approximately 2100 images per language. There were three different ways to build the datasets. The first method used completely random pictures with a different font, position, and background. The second and the third methods were random lists of words taken randomly from articles and news. The best dataset performance was both the second and the third datasets. To improve the datasets farther, it must represent all the languages alphabet and character which needs a lot of work to accomplish. In the future, we could use a region based convolutional neural networks for instance segmentation (R-CNN) model. R-CNN will mask each alphabet for different languages with its classification which will lead to better prediction.

Report abuse