Identifying artist from artwork

Vikram Mohanty
Department of Electrical and Computer Engineering, Virginia Tech

Abstract : For a non-expert in paintings and other related art forms, it is very difficult to identify the artist behind the art form just by merely looking at the style of the painting or just the brush strokes or a sculpture. Usually, the person has to resort to contextual information in the painting or meta-tags about the painting to take a guess at who the artist is. This project is an attempt to build a system that learns the style of an artist from his/her paintings, without using any form of contextual information, such as a lady wearing black having a mysterious smile, or image meta-tags, such as "Monalisa"/"Louvre"/"16th Century". The project, if it had turned out to be as successful as expected, in its final form, would have been a tool to input an artist's name and retrieve all images that seem most likely to have been made by the artist. An extension of this project would have been to transfer the artist's style to any painting/photograph. I used the "Neural Style Transfer" method to extract style features from paintings, and trained it with weak classifiers, like SVM and Mixture of Gaussians, to check if there's any pattern leading up to success. The results, however, were not as one would hope for the artist identification task. For identifying the painting styles like "impressionism" or "abstract", the classifiers showed some improvement.

Figure 1. Project Outline

Introduction

In a time when CNNs have seeped into every form of Computer Vision applications, it was difficult to find a simple application with real-world relevance, which was not yet solved by the advent of Deep Learning. Back in 2015, when Gatys et al. published their paper "A neural algorithm of artistic style", this led to many cool applications like Prisma, where anyone can pick an content image and apply the style from a style template image like that of Van Gogh's Starry Night, and have your own image in the "Starry Night" style. This application was limited to the style templates being any random image, and could not be used to transform a particular artist's style. This was the primary motivation behind the project.

  • Is it possible to find the link between the styles of two paintings by the same artist ? In other words, does the artist have a trademark style or signature ?
  • Can a computer recognize this link/style ?

This led me to devise a simple platform to extract styles using the Neural Style Transfer approach, and trying to learn the artist from these style images. In terms of applications, this project has a lot of potential. If successfully implemented with a large database, this could be used by museums to identify the artist behind an unidentified painting, or the school of art to which it belongs, or the the era in which it was most likely made - essentially some value-generating information about an unidentified painting. Once an artist's style is learnt, it can be applied to any painting, and the result would be "Da Vinci painting your portrait" in true sense, and not "your portrait in the Mona Lisa style."

The project was inspired from Kaggle's "Painter by Numbers", a competition to verify whether 2 paintings are by the same artist, and used the same database as it. The database comes with the original paintings extracted from wikiart.org, and each image has style and artist information attached to it.

To my knowledge, the existing work in this field such as [2] exploit low-level local features such as HoG, SIFT and some high-level features such as CNN features, and show promising results, which are far from a near-perfect accuracy. With regards to that, this project can be considered more of an investigation of using the style templates extracted from the Neural Style Transfer pipeline, which seem to work well (qualitatively) for transferring style from one image onto the content of another, as high-level features that can be leveraged to classify artists. In this pipeline, since the content aspect of images doesn't come into play, this can be considered as solely using the style aspect of an image, which may constitute texture and color information, and not the context of the image, for the artist or the art style classification task.

Approach

  1. Data Pre-processing :

The dataset obtained from Kaggle's Painter by Numbers comprised of ~100k images with information attached to it. The first task was to use the CNN architecture described in [1] to extract the style features. I followed the Tensorflow implementation described here - [3], and [4], and the Torch implementation here - [5]. As described in the original paper, an image is generated from white noise by the CNN architecture, and with loss functions between a style template and a content image, the learnt image extracts the style and content separately from both these images, and thus, achieves the required output. The style aspect of the "Style Template" image is described by the Gram Matrices, which are defined as the inner products between the vectorized feature maps of the output image. These gram matrices correspond to different layers and come in different sizes. For example, 64 x 64 corresponds to a the first activation layer, and represents fine-grained activations. As we move towards


Figure 2. Method proposed in Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. "Image style transfer using convolutional neural networks."

I modified the codes in order to extract only the style features, which was quite a challenge. Since this did not seem straightforward as I had hoped for, I tried a different approach - to extract only the gram matrices from all levels (64x64, 128x128, 256x256, 512x512).

It took ~11 hours to generate these features on a NVIDIA Tesla K-80 GPU.

Issue : Only gram matrix features of 6458 images were extracted, which belonged to 1278 classes. This was a bad classification task, due to the non-uniform distribution of images per class.

These matrices were flattened to a vector, converted to a zero mean, unit variance dataset.

t-SNE was used to reduce the dimensions to 2.

Once again, revisiting the original problem of using the style features.

The original code was modified in a way to nullify the weight of the content image, and just use the style weights. This resulted in the image generated from white noise to "learn" only the style aspect. The number of iterations was set to 500 for the network to backpropagate and learn the style features. The final generated image, since it did not have any content aspect, constituted the style features of a painting.

It took over 2 mins to generate one image, and took almost ~11days to extract these style images from 7538 images, which belonged to 1582 artist classes and 136 style classes. This was a bad classification problem in itself, considering the non-uniform distribution of the data.

Some examples of the style classes are - "Romanticism", "Baroque", "Realism", etc. Since these are traditionally based on the context of the image, I wanted to explore whether the image style is correlated to these contextual-style classes in any way.


Figure 3. Overall Project Pipeline

2. Modifying Labels / Artists into Schools of Art / Style Schools

Since the data was non-uniformly distributed in terms of the number of images per class, the labels needed to be distributed. I tried to bin certain labels together, and changed the classification problem into a simpler one in order to classify the label bins. If the project results had shown promising results, then these artist and style label bins could have been termed as "Schools of Art" and "Style Schools" respectively. Basically, the problem was reduced to which school of art does this painting belong to, or which school of style does the painting belong to.

The classification was tried for a number of bin sizes. For the artist classification task, the number of schools varied from 4 to 78, while for the style classification task, the number of schools varied from 2 to 7. These numbers were randomly chosen, keeping in mind, the computational time and in an attempt to see some good results, which however, was not the case.

3. Feature Engineering

Local Binary Patterns (borrowed from blog)

For quantifying the color and texture information from these style images, I wrote the code to extract Local Binary Patterns [6], which compute a local representation of mixture.

The first step in constructing the LBP texture descriptor is to convert the image to grayscale. For each pixel in the grayscale image, we select a neighborhood of size r surrounding the center pixel. A LBP value is then calculated for this center pixel and stored in the output 2D array with the same width and height as the input image. This process of thresholding, accumulating binary strings, and storing the output decimal value in the LBP array is then repeated for each pixel in the input image. The last step is to compute a histogram over the output LBP array. Since a 3 x 3 neighborhood has 2 ^ 8 = 256 possible patterns, our LBP 2D array thus has a minimum value of 0 and a maximum value of 255, allowing us to construct a 256-bin histogram of LBP codes as our final feature vector.

The LBP library was used from scikit-learn package.

Figure 4. Local Binary Pattern value computed for one neighborhood and stored in a 2D array


Color Histograms

R, G and B color histograms were generated for these images, spanning all 256 color intensity values for each channel. This resulted in a 256*3=768 dimensional vector.

Upon concatenating with the 26-dimensional LBP histogram, we get a 794 dimensional vector that captures the color and texture information of the training images.

Since a 794 dimensional feature vector would make things computationally more expensive, PCA was used for dimension reduction. Finally, I used a 300-dimensional feature vector, the loss of information might be a reason behind the poor results.

Figure 5. Style Image converted to a 794-dimensional histogram/feature

4. Classification

Using the packages from scikit-learn, I tried out some weak classifiers like Naive Bayes (more of a baseline), Support Vector Machines, an Adaboost classifier, and a clustering algorithm like Bayesian Mixture of Gaussians. The initial idea was to do only clustering, with a hope that each cluster would correspond to an unique school. The reason for using weak classifiers and not a CNN-based classifier, was because the data itself was generated using complicated CNN architectures and was computationally expensive, and I didn't want to repeat the whole thing for a classifier as well in terms of hyper-parameter tuning. On top of it, the data generated was not enough to employing Deep Learning approaches. If the weak classifiers showed promising results, this could mean that with a strong classifier, the results would improve.

Experiments and Results

1. Gram Matrix experiment

For the 1st approach, some experiments were conducted using the gram matrices, followed by t-SNE dimension reduction, with a SVM classifier.


Figure 6. Classification Accuracy using Gram Matrices with a SVM classifier for 1292 test images, and classifying from 1278 classes

Since there were 1278 classes, it would have been interesting to see a top-5 classes result output, however the results were pretty demoralizing in the first place. It would have been a stretch to try out more tests.

The 512x512 gram matrix result wasn't shown here, as the computation took a lot of time (~6500x512x512 data points decomposed to ~6500x2 for training, followed by testing on ~1300 images). If the pattern is any indication of improving results, the results would have improved further with a strong classifier, and with using a 512x512 gram matrix.

2. Using Style Features

7538 style images were generated, and each of these images were encoded into a 794-dimensional feature vector. These were broken into a training set of 7000 images and a test set of 538 images. In fact, it was earlier broken into 6000 training images and 1538 test images, but the results were close to negligible, so I tried with a larger training dataset. Since the number of classes for both style and artists were a large number, label bins were created at random.

The classifiers used are - Mixture of Gaussians, Adaboost Ensemble Classifier, SVM and Naive Bayes classifier.

The results for Mixture of Gaussians is shown separately, as the number of clusters initialized (same as number of schools) don't correspond to an unique school (bin). Therefore, we have a different (fewer) number of schools than the initial conditions. For example, if the system is initialized with 10 schools, and the GMM is set to make 10 different clusters out of the data points, cluster 1 may correspond to school 1, cluster 2 may correspond to school 2, and cluster 3 corresponds again with school 1. This leads to a fewer number of schools than we began with.


Figure 7.a Artist Classification results using SVM, Naive Bayes and Adaboost, for different number of schools (X-axis). Accuracy (Y-axis) in %

As can be seen, the results are not very promising. SVM can be seen to perform best among the three classifiers across the different tests. However, the results drop as we increase the number of schools ( which is inversely proportional to the number of artists in each school).

Figure 7.b Artist Classification results using GMM for different number of schools (X-axis). Accuracy (Y-axis) in %

The results are not quite promising in this case either. And neither is the clustering definitive, multiple clusters correspond to the same label.

Figure 8.a Style Classification results using SVM, Naive Bayes and Adaboost, for different number of schools (X-axis). Accuracy (Y-axis) in %

The results are better than the artist classification task, but considering the small number of schools (more number of style labels clubbed together), this is not a good take-away conclusive result. SVM performs better than the others. However, as observed in the artist classification case, the accuracy drops with the increase in number of schools (less number of style labels clubbed together).

Figure 8.b Style Classification results using GMM for different number of schools (X-axis). Accuracy (Y-axis) in %

The results, even though better than the artist classification, is still not commensurate with the number of schools.

Figure 9 Original Images and the corresponding Style Images generated

Conclusion and Future Work

The results were not as good as one would hope for. Therefore, it would be bold to conclude anything from these results.

Trying different robust methods to quantify texture and color information from the style images could work out.

I guess using a strong classifier like CNNs on top of these CNN-generated images might result in some interesting data. Testing the classifiers on the entire ~100k dataset would be fruitful considering such a large number of class labels.

Some of the style images generated seemed erroneous i.e. the number of iterations required to generate these images was not enough. This might have corrupted the dataset. In retrospect, this was a difficult topic to choose, but I guess the problems I faced during the course of this project was a nice learning experience.

It would be interesting to extract features the same way as discussed in [2] on these style images.

References

1. Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. "A neural algorithm of artistic style." arXiv preprint arXiv:1508.06576 (2015).

2. Saleh B, Elgammal A. Large-scale classification of fine-art paintings: Learning the right metric on the right feature. arXiv preprint arXiv:1505.00855. 2015 May 5.

3. https://github.com/anishathalye/neural-style

4. https://github.com/robertomest/neural-style-keras/

5. https://github.com/jcjohnson/fast-neural-style

6. Ojala, Timo, Matti Pietikainen, and Topi Maenpaa. "Multiresolution gray-scale and rotation invariant texture classification with local binary patterns." IEEE Transactions on pattern analysis and machine intelligence 24.7 (2002): 971-987.