I wanted to build and maintain my own server, and not have to rely on Amazon or Google or Oracle clouds. This was my goal, not out of any philosophical objections to those platforms, but because I am a geek, and I wanted control over the machine from boot to shutdown and everything in-between.
My server is for hosting an AI player of the game of go. Yours is likely for providing content or services of some other type, but the principles are the same.
Emulate the playing styles of other players whose games have been observed.
"Guess the Pro", even when presented with a previously unseen game by that pro.
"Guess the Next Move" when observing a previously unseen game where the identity of the players is known.
A fingerprint is to behavior (individual move), as a snapshot is to identifier (a player's ID matrix).
It may be said, of moves selected by experts to play in a game of go, that they have a quality called "balance" that is lacking in the play of amateurs, and even more lacking in that of beginners. But... "It may be said" that the moon is made of unripe cheese. "It may be said" that I am Marie, Queen of Romania. Saying so does not make those statements true.
Saying "go players should seek balance in their moves" could be a meaningless platitude, or it could be the truth, or it could be somewhere in-between, e.g., true in general with some exceptions and so on. A proof would require a precise definition of balance. In the absence of such rigor, we ask for the reader's patience as we present at least some evidence of the truth of the following assertion (which we ourselves regard as not only obvious, but also as axiomatic):
Most behaviors in which pros actually indulge have a high degree of "balance".²
— Potter Stewart (1915–1985), associate justice of the U. S. Supreme Court from 1958 to 1981; frequently remembered for his famous non-definition of obscenity
— Whatever that may be, and however we may measure it.
In go, as in life, thought and behavior range "from the sublime to the ridiculous." Thought and behavior are two forms of action. The former is focused inwardly, while the latter is focused outwardly. Sometimes people think one thing, but do a different thing, causing a conflict between the inward thought and the outward behavior. Generally speaking, thought and deed should be in harmony, or balance with one another. "Balanced" is one of several adjectives , properties really; that we use without defining rigorously, Several adjectival phrases also come to mind now: "Inwardly focused." "Outwardly focused." "In harmony;" "In conflict."
Cowardly ones, please line up to the left, to the "Meek" side of the mean line. Bullies — Hey, Buddy, I'm talking to you! — line up to the right, to the "Bold" side of the mean line. The more of a coward you are, the more leftward you go, and conversely, the more of a bully you are, the more rightward you go. And what does the "mean line" mean? It's the sublime, stable behavior in the middle, The behavior than which no other behavior has a greater degree of our confidence in its correctness. So the y-axis above represents the percent win-rate, or the "strength" of the behavior where strong means a high degree of balance. That balance is learned by the network from training data.is the output of the network's evaluation of a particular behavior. Every behavior has such a win-rate, the question is of course, how to measure it.
It cannot be stressed too strongly that we are not speaking of "left" and "right" as in political ideologies, but to the left or right of the bell-curve shown in the graphic above. There are cowards on both sides of the political aisle, just as there are bullies on both sides.
The measuring occurs during the feature-extraction phase of training the neural network.
Students of statistics will remember that he width of a bell curve is determined by the standard deviation — 68% of the data points are within one standard deviation of the mean, 95% of the data are within two standard deviations, and 99.7% of the data points are within three standard deviations of the mean. So, in the figure above, the Pathologically Defensive moves comprise 0.15% of the data, and the Pathologically Aggressive moves represent 0.15%. These extremes represent Ridiculous behaviors that comprise only 0.3% of the data: Pros and strong amateurs rarely play such moves, but it happens.
The remaining data points lie between these two extremes, comprising the 99.7% of behaviors that lie within three standard deviations from the Sublime mean. Assuming a Gaussian distribution, as depicted above, we can cover the various intervals with the categories shown:
Pros: [Light blue above] Cautious, but not overly so, hopeful but not overly so. Pro moves are at times somewhat reserved, but rarely timid, and never (overly) generous. At other times they are somewhat ambitious, but rarely eager and never (overly) greedy. (68% of the data.)
Strong Amateurs: [Light blue and purple above] Can range from: Moderately reserved, often timid, and at times even generous, to: moderately ambitious, often eager and at times even greedy. (Another 27% of the data, Along with the Pro data, that sums up to 95% of the data.)
Beginners: {Light blue, purple, and maroon above] Beginner behavior ranges from defensive and generous all the way to greedy and aggressive. (Another 4.7% of things that Pros and Strong amateurs almost never do, bringing us to an accumulation of 99.7% of the data.)
Trolls, wing-nuts, and mistakes: Troll moves include even the pathologically defensive and pathologically aggressive behaviors, that is the entire bell curve, including the ridiculous 0.3% described above. (The remaining 0.3% of the data.)
The Sheep is opposite the Ox, the Monkey opposite the Tiger, and so on. Say hello to some of my little friends!
It's not our intent to discriminate against nor to denigrate any person, living or dead, regardless of the year of their birth!
TIME
An important way in which said balance reveals itself is in the timing of a move. The best players always seem to play moves neither "too early" nor "too late", but "right on time", "just in time", or "not a moment too soon". For example, endgame moves that could conceivably be played at any time during the game are left until a later time, after attending to other, more urgently pressing matters. So, it matters not only where a move is played, but when it is played, amidst a global context.
To put it another way, a move that is either too early or too late, is either too aggressive or too submissive for that time, for that moment, of the game. Thus the timeliness, or punctuality, of the better players' moves is (again) balanced.
The Endgame
Ultimately, might there also be a way in which to capture, to describe, to emulate, the behavioral style of a particular individual? To view go through the unique lens through which that individual views go, and to play go according to that style?
We think so.
As we stated at the beginning, our goals include not only "a bot that plays go", but also bots which are able to:
Emulate the playing styles of other players whose games have been observed.
"Guess the Pro", even when presented with a previously unseen game by that pro.
"Guess the Next Move" when observing a previously unseen game where the identity of the players is known.
Toward that end, we present the following machine-learning architecture for classifying go-behavior.
Above: Top-level overview of Gozillago program architecture.
Q, How many styles of go-playing are there?
The trivially true answer, not to be glib or dismissive of the question, really is:
A. There are at least as many styles of playing go, as there are go-players.
Every go-player has his or her own unique style. The odds against any game being identical to another, that is, comprising exactly the same series of individual moves, become astronomically huge after about forty moves. Furthermore, because an individual's style is always changing, there are actually more styles of play than there are players! It is not too far off the mark to suggest that every individual move, by each and every player, in every game, is a behavior which, likewise, characterizes a style of play,
We assert "at least" in our answer to the question above, because we may also include imaginary players, such as "bots", whose characteristics may be set to arbitrary values, by means of "sliders". For example. think of potentiometers and tone-controllers in audio equipment, which are used to control electrical resistance, measured in Ohms (Ω).
We may set the values arbitrarily, measured instead in "horsiness" or "rattiness" or "dragoniness", etc. to create an infinite number of player profiles. These may be considered "random variables" for purposes of statistical analysis.
Of course, these behaviors do not occur in a vacuum, but rather within a context that has been created by previous behaviors by Black and White. How a player reacts in a given situation is also indicative of that player's style, or type of behavior, within that specific context,. A part of that context can include "What do I, the bot, know about this specific opponent's past behavior?"
And time marches on.
We further -- somewhat arbitrarily -- define eight stages of the game:
Early fuseki, Deep fuseki, Late fuseki, Early middle-game, Deep middle-game, Late middle-game, Early endgame, and Late endgame.
Are seven enough? Are nine too many? At this point, the reader may be wondering, "where is all this leading?"
time series. 8 phases of the game, 12 animals Ninety-six pieces of pie.
Now, as promised, we present the "Player Identification Matrix", which describes the "style" of a player at a specific point in that player's history, which is to say that it comprises, or encapsulates "what we know about this player so far". Not only (like a fingerprint) is it unique to that player, but it is also unique to the game itself, and as such can also be useful as a hash-value to identify a specific game (Indeed, to index a specific play within a specific game by a specific player and a specific opponent!) The odds that two of these matrices will share identical values (after a few moves into a game) are infinitesimal for, to astronomical against. [As Dave Dyer has shown, a go game quickly -- that is, after about forty moves -- becomes unique, not being required to share its hash-value with any other game of go.]
Eight pies ✖ twelve pieces per pie ⩵ 96 pieces of pie
Each pie is a 12-dimensional vector, and there are eight of them. The size of the slices may vary wildly. Since their sum within the pie must always be 1 (that is, 100% of the pie) their sizes may be expressed as percentages, to the degree of precision desired. Each pie represents a stage of the game, i.e., early fuseki, late fuseki, early middle-game, mid-middle-game, late middle-game, and three flavors (early/middle/late) of endgame.
So, for example, a player could play like a tiger in the fuseki stages, tending toward horse-like play in the early middle-game and to some other zodiac animals for the endgame stages. There are as many styles of play as there are players. The values 0 through 7 represent each stage, but notice that some games may end before entering the endgame phases at all, while every game of go has an early fuseki (zero) stage.
This section is still under construction.
More amazing developments to follow.
Please stay tuned for further results.