English Letter Frequency Counts
Note, April 28, 2022
The highly non-random patterns in English text are one source of statistical improbability. However, the statistical significance of text patterns is elevated when chunks of text are linked with 'objects' in the real world. For example, WSM radio, M/ac/ron, Madrid, Bryn Mawr college, mun/ich, Milky Way, moon, mars, etc.
Today, August 13, 2020 I found some missing data. Previously the best data I had for cryptographic letter frequencies was on this web page:
Letter Frequency of the Most Common Last Letter in Words
e s t d n r y f l o g h a k m p u w
Today, I found better data on this web page.
Letter Counts by Position Within Word
In the chart, the bottom row prefixed by -1 provides the letter frequency for the last letter in words.
All 26 letters are displayed with the frequency of its occurrence in a word by position using a numeric a tool tip. The tool tips are displayed on the norvig web page though I will add that tool tip information to this SETI Nuggets web page.
http://norvig.com/mayzner.html
Task definition with search target: 'm' 'y' 'r' 'o' 'n'
The task definition is a logical subset of searching for nonrandom patterns.
Text to be searched is any book or magazine leftmost and rightmost columns.
Process down the column on the average about 25 lines in each page.
Determine if 'm' 'y' 'r' 'o' 'n' is found one or more times with the letters in the ordered 'm' 'y' 'r' 'o' 'n' sequence.
Determine if 'm' 'y' 'r' 'o' 'n' is found one or more times with the letters in a non-ordered 'm' 'y' 'r' 'o' 'n' sequence.
Non-myron letters (e.g. b, f, c, s, j) are ignored.
While processing down a column if any target letter is repeated the processing is restarted.
Examples: myreeejtwr duplicates 'r' so the processing restarts at the next 'm' if that occurs on the same text page or for an unordered match any of the other target letters.
Another example: miufrpqyccpwqlralkswy duplicates 'y' which restarts the search.
For more information see:
Algorithms and Pseudo Code
My intent was to publish the SETI signal I discovered in H Norman Schwarzkopf's book, It Doesn't Take a Hero.
When I bought that book in the 1990's I could immediately see it was chock full of SETI signals. The signals were highly non-random patterns of text. I published my SETI theory in my Geocities website and in my SETI Linguistics blog.
I am jotting down some ideas on proving my SETI theory in sources other than HNS's book.
Algos required:
Fill N number of arrays each holding 25 elements the number of lines in a page of text. The number of arrays and number of elements can be assigned arbitrarily. Default: 1,000 arrays each with 25 elements.
Populate each array with random letters correlating to the Novig.com letter count frequencies for last position in word. Do this to two decimal places. I came up with an algo to do this that runs in O(n) time.
On each page process from top down searching for 'm' 'y' 'r' 'o' 'n' as ordered and unordered matches as defined above in this web page and evidenced in the other data tabs of this SETI Nuggets website.
Record the number of ordered and unordered matches and the distance between the first and last letter in a sequence or unordered group.
The result should calculate the probability of certain types of matches.
The SETI proof is to determine if the occurrence of matches in text is the same or substantially different than the computer generated data.
The data in text is not computer generated. It is typed by a human author, processed by a book publisher, and found by a human reader.
Letter Frequencies For Last Letter In Words
The data is from the tool tips in the Norvig chart. The rightmost column has a precision of two decimal places. I trimmed the data deleting, not rounding, all numbers in that column past two decimal places.
Letter Frequencies For Last Letter In Words (from Norvig.com tool tips)
1.
e 20.134
2.
t 8.971
3.
a 2.819
4.
o 4.177
5.
i 0.752
6.
n 9.310
7.
s 12.903
8.
r 5.899
9.
h 2.712
10.
l 3.465
11.
d 9.981
12.
c 0.603
13.
u 0.389
14.
m 1.656
15.
f 4.714
16.
p 0.541
17.
g 2.939
18.
w 0.821
19.
y 6.002
20.
b 0.123
21.
v 0.055
22.
k 0.802
23.
0.16
24.
j 0.022
25.
q 0.013
26.
z 0.033
Total
100.0
Added November 2, 2021
I am totally aware my SETI code may incorrectly be interpreted as produced by an ordinary expression of end-of-word cryptographic letter frequencies. Similarly, I know poker probabilities are not applicable from the perspective of combinatorics.
Based on the a priori rules I have applied in my data from my early SETI research to now, I have good reasons I discovered a SETI signal.
See: English Letter Frequency Counts.
My intent was to prove the existence of a SETI signals in Time and Newsweek hardcopy magazines. Then after this HNS paper book was published, It Doesn't Take a Hero: The Autobiography of General H. Norman Schwarzkopf Paperback September 1, 1993, my goal was to show a SETI signal in that book. I can easily see in the HNS paperback book that it is chock full of SETI signals.
Amazon paperback, not pdf or kindle, book:
https://www.amazon.com/Doesnt-Take-Hero-Autobiography-Schwarzkopf/dp/0553563386
A Note on Cryptographic Letter Frequencies and Real-world linkage:
The highly non-random patterns in English text are one source of statistical improbability. However, the statistical significance of text patterns is elevated when chunks of text are linked with 'objects' in the real world.
For example, re in car nation, WSM radio, M/ac/ron, Madrid, Bryn Mawr college, mun/ich, Milky Way, moon, mars, MSW, Moscow, Mir, Yad Vashem in Hebrew right to left, Saudi Arabia, sdi, Cardiff, Meron, detroit, ttt, Carnegie Mellon, MH370, myron cope, ibm, challenger and columbia space shuttle disasters.
Adding: 2001 a Space Odyssey: Moon Watcher.