Peter Robert Keil

The Artist

 

Peter Robert Keil (born 1942) is a German artist best-known for his colorful and playful painting style. Although frequently associated with the “Neue Wilden” movement of the early 1980s and well-steeped in the work of the great artists of the 19th and 20th centuries, Mr. Keil has consistently pursued his own vision. Over the course of a long career, his expressive style has evolved to reflect changes in his own life and the artistic zeitgeist, but it always retains the spontaneous and ludic quality that communicates his joy in the process of painting.

 

The impressive size of Mr. Keil’s oeuvre – over 20,000 paintings, as well as works in other media – attests to his passion.  And those who encounter his work respond to it enthusiastically: it’s appealing and accessible, and thus widely collected.

 

But these characteristics – the work’s popular appeal, reasonable price point, and the sheer number of paintings – make Mr. Keil’s style a tempting target for forgers. Since 2013, the artist has worked with the Keil Collection Heidelberg to remove forged Keils from the art market. Art historian Kristina Hoge collaborates with Mr. Keil to authenticate his works and compile a catalogue raisonné, the first volume of which has been published, with a second volume forthcoming.

 

Mr. Keil and Dr. Hoge also advise collectors on the authenticity of purported Keil works they have purchased. Currently, the authentication process requires the transport of the work in question to the Keil Collection Heidelberg on specified dates for visual inspection.  Our objective has been to explore the ability of artificial intelligence to assist in the authentication process and allow for remote evaluation.

 

 

The A-EyeTM for Peter Robert Keil

 

The objective in all of our studies is to train a convolutional neural network (CNN) to distinguish the work of an artist from those of imitators and forgers.  Typically, the comparative works we use for training are drawn from works exhibiting varying degrees of visual similarity to the artist under study.  This enables the trained CNN to make fine as well as coarse distinctions and to generalize beyond the training images.  Ideally that generalization will make the CNN capable of distinguishing genuine work from close forgeries.  Typically, however, few established forgeries are available for training and testing, limiting our ability to fully explore the capabilities of The A-Eye in detecting them.

 

In this case, our collaborators, ARTTRD Fine Art Trading, Consulting & Collecting, were able to supply numerous examples of Keil forgeries identified as such by the artist himself.  This enhanced our training efforts and enabled us to assess success in distinguishing genuine works from those consciously, and often skillfully, made to pass as genuine.

 

Our “Salient Slices” technique, described in detail here, first divides a source image into overlapping tiles of a given size.  These tiles are then sifted in accordance with a discriminator that identifies the tiles likely to contribute meaningfully to classification.  Our discriminator computes image entropy, which corresponds to the degree of visual diversity exhibited by a tile, and retains only those tiles whose image entropy equals or exceeds that of the entire source image.

 

The A-Eye analyzes each qualifying tile of a test image and assigns it a probability between zero and one:  values closer to zero correspond to classification as Keil, while values approaching one correspond to a non-Keil classification.  For this project, we classify a work based on the average probability across all tiles.

We trained and tested our CNN model at tile sizes ranging from 150 x 150 to 600 x 600 pixels.  We obtained maximum classification accuracy using 350 x 350-pixel tiles: 

We trained and tested using various CNN architectures, including various versions of EfficientNet, DenseNet, and ResNet.  We found here, as we have in other studies, that our rather simple five-layer model described in our first published article worked best.

 

         The Dataset

The dataset we employed to train The A-Eye for Peter Robert Keil consisted of high-resolution images of the artist’s two-dimensional works, selected to be representative of the varying styles he has used throughout his career; known forgeries of Keil’s work; and paintings by other artists, intended to span a range of visual similarity to Keil’s oeuvre— from very close to evocative but readily distinguishable.  Our objective was to train The A-Eye to make fine distinctions with good generalization properties and without overfitting to the training set.

The “other artists” whose work comprises the remainder of the dataset include older artists whom Keil admired and learned from, or whose work shares visual qualities with his own (Joan Miro, Pablo Picasso, Arshile Gorky, and German Expressionists such as Max Beckmann and Ernst Ludwig Kirchner); contemporaries (Karel Appel, Georg Baselitz, Corneille, A. R, Penck, Andy Warhol, and others); and close associates, including members of the “Junge Wilde” group (Markus Lüpertz, Rainer Fetting, Salomé, Luciano Castelli, Elivira Bach, and Barbara Quandt). A list of all 43 artists whose work is included in the training set is appended below.

 

 

         Training, Testing, and Analysis

 

We trained using 300 images – 150 Keil works and 150 comparatives, including 30 forgeries.  Our test set contained 181 images, including 35 authentic Keils, 22 forgeries, and 124 comparative works by other artists.

 

We used a “decision boundary” of 0.5 during training; that is, our training process scores predictions less than 0.5 as corresponding to Keil classifications, while predictions equal to or greater than 0.5 are scored as not Keil.  Over many repetitions, the CNN’s performance improves as its errors are “backpropagated” through its internal layer structure to adjust the analysis.

 

The decision boundary used for training, however, is not necessarily the one used for ultimate testing of the trained CNN.  Even though the neural network initially learns right from wrong at a benchmark of 0.5, the optimal decision boundary is a “hyperparameter” found during testing.  Usually it is closer to the outer limit corresponding to the artist (in this case, zero).  We attribute this to the fact that our training sets are “imbalanced”:  the artist images are more visually homogeneous than the comparative images.  This is by design, since visually diverse comparatives are essential in training the neural network to recognize a wide range of possible forgeries.  The imbalance usually pulls the working decision boundary closer to the extreme value associated with the artist.

 

The optimal decision boundary minimizes erroneous classifications.  In a perfect world, we would identify a number between zero and one that perfectly divides the predictions between artist and comparative works with 100% accuracy.  In the real world, particularly the world of art classification, there are almost always some errors, so we compromise.  We can bias the decision boundary to favor false positives or false negatives.  We believe it is far more important to avoid de-attributing an authentic Keil than misclassifying a forgery as genuine, so we bias toward false positives.

 

Using our trained CNN to analyze the test set, we found that the authentic Keils have predictions ranging from just under 0.23 down to nearly zero, while the forgeries have predictions ranging from 0.025 to nearly 1.  If we model the predictions as “normal” or gaussian probability distributions, we find that they have means and standard deviations as follows:

The low mean and small standard deviation for predictions corresponding to authentic Keils tell us that the distribution is very tight around 0.11.  The forgery mean and standard deviation describe a much broader, more diffuse distribution.  That the Keil and forgery means are so far apart demonstrates that the CNN classifies them quite distinctly and decisively.  The distributions have the following shapes:  

We discovered an outlier among the Keil works we tested – a mixed-media work with a substantial proportion of collage elements relative to painted regions, which was strongly classified, incorrectly, as a forgery.  Since our CNN classified all other mixed-media works correctly, it appears that that classification accuracy does not suffer so long as the painted regions of a work predominate visually.

 

The best overall classification accuracy we can achieve on the test set, using any decision boundary, is 98%.  Because the Keil and forgery prediction distributions are so well separated (with only minor overlap), we can select a decision boundary within a band of values that will retain this overall accuracy level but with different error mixes.  If we choose a decision boundary at 0.11, which as shown above is the mean prediction value among Keils in the test set, we find one false positive and three false negatives (ignoring the above-noted outlier). 

If, on the other hand, we set the decision boundary at 0.23, we have only one false negative but three false positives. 

In assigning a classification probability (i.e., a certainty level) to a candidate work, our earlier studies have taken a relatively coarse interpolation approach, assigning a final probability based on where the raw prediction falls between the decision boundary and the closest extreme.  With a decision boundary of 0.5 and an artist extrema of zero, for example, a raw prediction of 0.25 would correspond to a 75% probability that the candidate work is by the artist under study.  By contrast, a raw prediction of 0.75 would correspond to a 75% probability that the author was someone else.  If the decision boundary is 0.25, by contrast, a raw prediction of 0.125 (halfway between the decision boundary and zero) would correspond to a 75% probability that the candidate work is by the artist under study, while a raw prediction of .625 (halfway between the decision boundary and one) would correspond to the opposite probability.

 

In this study, the abundance of known forgeries allows us to take a more sophisticated statistical approach to estimating probabilities.  In particular, we can analyze the estimated probability distributions for Keil works and forgeries using the cumulative distribution function, which indicates the probability that the true likelihood associated with a prediction will be equal to or above, or equal to or below, the actual prediction value.  This allows us to better estimate the certainty level associated with a prediction value.

 

Consider first a prediction of 0.11, which corresponds to the mean Keil prediction value.  If that also represents our decision boundary, then a linear interpolation would assign a probability of 50% – i.e., equally likely to be a genuine Keil or a forgery.  But this fails to consider the very different distributions of Keil and forgery predictions over the test set.  Using the cumulative distribution function for the Keil distribution, we find the probability that the true value lies between 0.11 and zero; and using the cumulative distribution function for the forgery distribution, we find the probability that the true value is between 0.11 and one.  The ratio of these likelihoods tells us which is more probable – i.e., whether the painting is more likely a Keil or more likely a forgery – and by how much.  For a prediction of 0.11, we find that the painting is actually 91.2% likely to be a Keil.  On the other hand, a painting with a prediction of 0.23 is 81.1% likely to be a forgery.

These likelihoods are not affected by where we set the decision boundary.

 

It should be noted that we are estimating the probability distributions based on a relatively small number of samples (35 Keils and 22 forgeries in our test set).  If we had many more samples, we would have greater certainty about the true probability distributions and, hence, the likelihoods we compute based on them.  But the means of the two distributions are sufficiently far apart that the error in our likelihood estimates should not be consquential.

 

In conclusion, we adopt a decision boundary of 0.23, classifying paintings with probabilities at or above this figure as forgeries and everything else as genuine.  We expect to maintain an accuracy level of 98% with very few genuine Keil paintings misclassified as forgeries; in fact, most Keils should have decisive predictions very close to the zero limit.

Note: The "comparative" artists whose work is included in the training set (and the number of works by each artist) are Karel Appel (8); Elvira Bach (5); Georg Baselitz (3); Max Beckmann (3); Luciano Castelli (8); Corneille (3); Walter Dahn (1); Willem de Kooning (5); André Derain (3); Rainer Fetting (3); Sam Francis (3); Arshile Gorky (4); Alexej von Jawlensky (2); Asger Jorn (4); Wassily Kandinsky (3); Martin Kippenberger (1); Ernst Ludwig Kirchner (3); Bernd Koberling (3); Lee Krasner (1); Markus Lüpertz (4); August Macke (1); Franz Marc (1); Henri Matisse (3); Helmut Middendorf (3); Jean Mirre (1); Joan Miró (3); Gabriele Münter (2); Emil Nolde (1); Albert Oehlen (3); A. R. Penck (4); Max Pechstein (3); Pablo Picasso (4); Jackson Pollock (2); Barbara Quandt (3); Salomé (Wolfgang Ludwig Cihlarz) (4); Karl Schmidt Rottluff (3); Georges Rouault (1); Emil Schumacher (2); Walter Stöhrer (1); Stefan Szczesny (1); Victor Thall (1); Maurice de Vlaminck (1); and Andy Warhol (2).