A generator project I made to revamp my programming mindset in Java, as well as learn some more JavaScript/HTML/CSS.
Working copy found here:
https://googledrive.com/host/0B9f3hv6KkjXcS19ZMmxxdGE5bUE/THM/tht-gen.html
Generates the Japanese characters "ta"(た), "i"(い), "he"(へ), and "n"(ん) randomly, and stops when a consecutive sequence generates either of the following:
I've added Twitter integration to the working copy, so that you can tweet your result if anything interesting occurs, such as a very short or a very long output result.
This work was inspired by similar projects, which are linked to from the working copy above.
Any queries please e-mail me at
, or tweet me @kitamiya .
とりあえず公開。「た」「い」「へ」「ん」をランダムに連続表示して、「たいへん」か「へんたい」が出たら終了する、お遊びプログラム。(一部建設中) https://t.co/7mpz129guo pic.twitter.com/tn6nVVAjGl
— 北宮 (@kitamiya) September 8, 2014
.
Below are some statistical analyses.
With four letters in the letter set, each occurring (pseudo-)randomly per output of the next letter, any letter has a 1 in 4 chance of appearing. So, after four letters have been output, the chance of a specific target string (say たいへん) appearing is
(1/4)^4 = 1/256 .
So naively thinking, we would expect the target string to appear approximately once in 256 output letters.
.
A slightly more sophisticated approach is to apply one of the approaches discussed on this page about the expected number of coin tosses to get five consecutive heads; I choose the method using "Let e be the expected number of tosses.", since the resulting linear equation is simple to solve.
Let e be the expected number of output letters needed to get the target string たいへん. Also, let f be the expected number of output letters needed to get the target string たいへん given we obtained the letter た .
Then
and after given a certain た,
The expected length e is equal to the sum of each (Probability)*(Expected Length) product set from the first table, which gives
Furthermore, a similar concept for f using the second table gives the following:
Taking the difference of equations (2) and (1'), we can obtain
and substituting this into (1) gives
(and f=253 also comes out, if it matters)
The resulting answer of e=256 is equivalent to that obtained by the naive thinking earlier, in this case.
Since I wasn't 100% certain that my methodology was correct, let alone efficient, I've posed a question at Maths Stack Exchange:
http://math.stackexchange.com/questions/931839/expected-number-of-output-letters-to-get-desired-word
There are other approaches mentioned there; in particular I like the Markov Chain - Transition Matrix-like approach.
Empirically, from statistics gathering over a large number of outputs via programming, we obtain a mean value around 256:
.
With our actual application, targeting EITHER たいへん OR へんたい, we can do a similar analysis, though the work involves three linear simultaneous equations.
Before we start, I show you the empirical result for the mean letter output length for this simulation.
Naively, we would expect half of the above example, since we have double the number of possible target combinations.
So we expect a value of once every 128 letters...
...apparently, from empirical data, the expected value is an unintuitive once every 136 letters.
Let us check this by calculation.
Let e be the expected number of output letters needed to get one of target strings たいへん OR へんたい.
Also, let f be the expected number of output letters needed to get one of the target strings given we obtained the letter た , and let g be the expected number of output letters needed to get one of the target strings given we obtained the letter へ .
Then
and after given a certain た,
and similarly after given a certain へ. [table omitted]
Similar to the first example, the expected lengths e, f, and g are equal to the sum of each (Probability)*(Expected Length) product set from the first, second, and omitted tables. The second and omitted tables give the following:
,
where the second table gives equation (1), and the omitted table gives equation (2).
Combining the two, we (unsurprisingly) get the same expression (3) for f and g in terms of e.
Now, the first table gives
and by substituting the expressions for f and g in (3) gives the final answer of e=136 and f=g=135.
This matches the empirical data given earlier; the expected number of letters output until one of the target strings is obtained is 136.
We can also use the transition matrix method mentioned earlier for this as well - an idea of how this would look:
なにこれ、マルコフ連鎖(ってかグラフとトランジション・マトリックスだけだけど)面白い! どっかに教科書埋まってる筈だし、マルコフ連鎖勉強し直してみるかな…? pic.twitter.com/KhpCi6bL7g — 北宮 (@kitamiya) September 16, 2014