I perform maths experiments using Python programs, and recently completed a strange project connected to the Champernowne Constant Cn. I'd written a Python prog to generate arbitrary chunks of C₃₆, the constant in base-36 number system that enables a simple encoding of English text (all the alphabet letters become valid digits). I calibrated its virtual execution time, in Pydroid 3 on my Intel 12th-gen Core i3 CPU, and if all successive digits are printed to the screen my own name ‘dick_pountain’ would take around 20,000 years to arrive , while Hamlet’s Soliloquy would take 2 million years. Obviously I didn’t actually run it for such times but calculated the position of the text in C₁₀ using the formula P(n)=1+k=1∑d−19k⋅10k−1+d(n−10d−1), then extrapolated from the timing of shorter test samples.
This set me thinking that the position of a text fragment in C₃₆, (or C₁₀ for that matter,) says something about the uniqueness of that fragment. An earlier, project of mine was a ‘cheat’ for Wordle (to which I’m addicted) that ranked the uniqueness of 5-letter words using the frequency distribution of alphabet letters in English. With such frequency tables still at hand I decided to combine the two projects. For any text T, find its position in C₁₀ (using a simpler ASCII encoding for convenience), then calculate the sum of the uniquenesses of all the separate words it contains and take the ratio of those two numbers They get very large so a lot of ‘log10ing’ is required, even given Python's huge floats. This ratio, which I’ve dubbed the text’s ‘ntropy’, may or may not have some significance, but I can't figure out exactly what…
Here's the code in Python 3.6:
from math import log10, ceil
from champerknown import champn
# turn string into base36 number
def n36(s): return int(s, base=36)
# turn base36 number into string: x=lambda n,b:(int(n/b) and x(int(n/b),b)or'')+chr(48+n%b+39*(n%b>9))
def sn36(n):
b = 36
return (int(n//b) and snb(int(n//b)) or '') + chr(48 + n%b + 39*(n%b>9))
# log base 36
def log36(n):
return log10(n)/log10(36)
# calculate position of n in C₁₀ using formula P(n)=1+k=1∑d−19k⋅10k−1+d(n−10d−1)
def champdist(n):
r = 0
d = int(log10(n)+1)
for k in range(1,d-1):
r += 9*k*10**(k-1)
return r+d*(n-10**(d-1))+1
# pretty-print long time periods
def duration(d):
r = log10(d)
if r > 8:
t, l = z/31536000, ' years'
elif r > 5 :
t, l = z/86400, ' days'
elif r > 4:
t, l = z/3600, ' hours'
elif r > 2:
t, l = z/60, ' minutes'
else:
t, l = z, ' seconds'
return str(round(t,1))+l
# compute the sum of the probabilities of occurence of successive characters composing the English words in text s
def pval(s):
l, val, pos, wl, wn = 1,0,0,1,1
freq = (('S','C','B','T','P','A','F','G','D','M','R','L','W','E','H','O','V','N','I','U','Q','J','K','Y','Z','X',' '),
('A','O','R','E','L','I','U','H','N','T','P','W','C','M','Y','D','S','B','V','X','G','K','F','Q','J','Z',' '),
('A','I','O','E','U','R','N','L','T','S','D','G','P','M','C','B','V','Y','W','F','K','X','Z','H','J','Q',' '),
('E','N','S','A','L','I','R','C','T','O','U','G','D','M','K','P','V','F','H','W','B','Z','Y','X','J','Q',' '),
('E','Y','T','R','L','H','N','D','K','A','O','P','M','G','S','C','F','W','B','I','X','Z','U','V','Q','J',' '),
('A','E','I','O','U','S','R','N','L','T','C','H','D','G','Y','B','P','M','K','W','V','F','X','Z','J','Q',' '))
for i in s:
if ord(i) > 64 or ord(i) == 32 or ord(i) == 13:
l = 26-freq[min(pos,5)].index(i.upper())
if l == 0:
wn += 1
wl, pos = pos+1, 0
else:
val += l*10**pos//wl
pos += 1
return val//wn
# turn string into *sum* of its ASCII codes
def nasc(s):
n = 0
for i in s:
n += ord(i)
return n
# convert string to number by *concatenating* ASCII codes of chars
def nASC(s):
num = ''
for c in s:
num += str(ord(c))
return int(num)
# ntropy(t) is position in Champernowne Constant C₁₀ of whole text, divided by sum of individual word probabilities
def ntropy(s):
dst = champdist(nASC(s)) # position within C₁₀
return round(log10(dst)*log10(pval(s)/log10(dst)),2)
#----------------------------------------------------------------
# test samples
shk = 'To be or not to be that is the question'
skh = 'Otrb eonto oted tta tis het uen a qti os'
dum = 'what the hell is goin on here?'
idl = "I love books. I read books I've written books&. I review books. My house has bookshelves in every room, holding more than 1000 volumes that often spill over onto the floor. And its become increasingly clear that this makes me some sort of bibliographic dinosaur, just waiting for my asteroid to land. A recent GallupWalton study reveals a dramatic drop-off in book reading among GenZ and younger generations. Only around a third of youngsters between 8% to 18% report they enjoy reading and fewer than 20% actually read daily in their free time: 35% of Gen Z students actively dislike reading and 43% said they rarely or never read for fun. And what do they read: fewer fiction and print books, a strong bias toward digital text and over 60% say they read song lyrics on their screens. Digital culture shifts them away from text toward sound and moving pictures even that minimal text ability required to operate phones may dwindle as voice input increasingly takes over."
rnd = 'fsdf lk;sej $neg oi-we nb3 eogi vdsvlk;ss dvsdG Wwe qwe8 9fdft m,ulvET v qlf lrld f gg gglkn s9dkl nvsl47"%& kd n bvlksdf poedbvs5 dkl bnsd;l00k bn kl s4dn b;ksd jbn!<>L:os;kd bnsk;ddfs fkl0 bmxc ewe wo potjW re&&$ib 1VVWwv wng143 bbweERG bsbe;kl sdn bs;kl dbns l;dgep ob dmfdg dfegb dm vw-wen b3eogi vdsvlk;ss dvsdG Wweq we8 9fd ftm,ulvET v qlflrld f gg gglkns 9dk lnvsl47"%&'
rnd2 = "fsdf lk;sej $neg oi-we nb3 eogi vdsvlk;ss dvsdG Wwe qwe8 9fdft mpvET v qlf lrld f gg gglkn s9dkl nvsl47%& kd n bvlksdf poedbvs5 dkl bnsd;l00k bn kl s4dn b;ksd jbn!<>L:os;kd bnsk;ddfs fkl0 bmxc ewe wo potjW re$ib 1VVWwv wng143 bbRG bsbe;kl sdn bs;kl dbns l;dgep ob fdg dfegb dm vw-wen b3eogi vdsvlk;ss dvsdG Wweq we8 9fd ftm,ulvET v qlflrld f gg gglkn wo potjW re&&$ib 1VVWwv wng143 bbweERG bsbe;kl sdn bs;kl dbns l;dgep ob dmfdg dfegb dm vw-wen b3eogi vdsvlk;ss dvdG Wweq we8 9fd ftm,ulvET v qlflrld f gg gglkns 9dk lnvsl47%s 9dk lnvsl47%&dsvlk;ss dvsdG yop e qwe8 9fdft m,ulvET v qlf lrld f gg gglkn s9dkl nvsl47%& kd n bvlksdf poedbvs5 dkl bnsd;l00k bn kl s4dn b;ksd jbn!<>L:os;kd bnsk;ddfs fkl0 bmxc ewe wo potjW re&&$ib 1VVWwv wng143 eERG bsbe;kl sdn bs;kl dbns l;dgep ob dmfdg dfegb dm vw-wen b3eogi vlk;ss dvsdG Wweq we8 9fd ftm,ulvET v qlflrld f gg gglkn wo potjW re&&$ib 1VVWwv wng143 bbweERG bsbe;kl sdn bs;kl dbns l;dgep ob dmfd sme ty"
# alphabets
alph = 'abcdefghijklmnopqrstuvwxyz'
alphsp = 'a b c d e f g h i j k l m n o p q r s t u v w x y z'
print(pval(idl)/pval(rnd2),'\n')
print(ntropy(idl)/ntropy(rnd2))
print(ntropy(skh)/ntropy(shk))
print(ntropy('t is n'))