Orthography Statistics (Part 3: Alphabet Pairings)

This is Part 3 describing the statistics I used for generating better words in the Orthographic Password Creator. Part 1 looked at the pairing of all English Orthography components (Wikipedia definition of English Orthography). Part 2 showed count plots for the count tables.

I wrote a C program that permuted through all possible pairings of all letters in the alphabet, then counted how many times each pair occurs in the English language. The program loaded the entire 109,462 word dictionary into memory, which is used by the Random Word Password Creator, then counted every occurrence of every possible pair of letters (26 x 26 = 676 pairs).

This scan works different than the previous tests (Part 1) that used an editors search function. Counts for the same pairings, comparing the two methodologies, are a little higher due to the methodology. The program starts at the top of the dictionary, looks at the current and next character, then increments the appropriate counter. The pointer was then incremented by one, repeat. The search function method of the previous test would increment by two so no overlap.

For example, xxxy. The search function method would see the first two letters as a pair then increment by two. The third x is paired with xy, so xx gets one count, xy gets one count.

The C program increments by 1, so the same example xxxy, would see the first xx pair, then the second and third character xx pair, then the xy pair, so xx gets two counts and xy gets one count.

The reason I did it this way was to get statistics about how many times a particular letter is followed by another specific letter. By scanning the entire dictionary, these numbers are about as accurate as possible.

The following shows the plots followed by the data table. The first plot is a linear scale for the counts. As you can see, the curve is exponential. The second plot is the natural logarithm of the count value.

Note on the logarithm graph/table entries: The plotting program ignores the infinite values from taking the logarithm of zero, whereas the table function in the blog barfs at infinity. I replaced the infinity values for zero counts (from ln(0)) with zero in the table.

Alphabet Pairs LinearAlphabet Pairs Log

Letters Occurrences Logarithm
bq0.00.0
bz0.00.0
cf0.00.0
cj0.00.0
cv0.00.0
cx0.00.0
fq0.00.0
fv0.00.0
fx0.00.0
fz0.00.0
gq0.00.0
gv0.00.0
gx0.00.0
hx0.00.0
hz0.00.0
jb0.00.0
jd0.00.0
jf0.00.0
jg0.00.0
jh0.00.0
jl0.00.0
jm0.00.0
jp0.00.0
jq0.00.0
jr0.00.0
js0.00.0
jt0.00.0
jv0.00.0
jw0.00.0
jx0.00.0
jy0.00.0
jz0.00.0
kq0.00.0
kx0.00.0
kz0.00.0
mx0.00.0
mz0.00.0
pq0.00.0
pv0.00.0
px0.00.0
qb0.00.0
qc0.00.0
qd0.00.0
qf0.00.0
qg0.00.0
qh0.00.0
qj0.00.0
qk0.00.0
ql0.00.0
qm0.00.0
qn0.00.0
qp0.00.0
qq0.00.0
qv0.00.0
qw0.00.0
qx0.00.0
qy0.00.0
qz0.00.0
sx0.00.0
tq0.00.0
vb0.00.0
vf0.00.0
vh0.00.0
vj0.00.0
vk0.00.0
vm0.00.0
vp0.00.0
vq0.00.0
vw0.00.0
vx0.00.0
wq0.00.0
wv0.00.0
wx0.00.0
xd0.00.0
xj0.00.0
xk0.00.0
xr0.00.0
xz0.00.0
yq0.00.0
yy0.00.0
zf0.00.0
zr0.00.0
zx0.00.0
bx1.00.0
cg1.00.0
dx1.00.0
fk1.00.0
hv1.00.0
jc1.00.0
jk1.00.0
pz1.00.0
qe1.00.0
qr1.00.0
qs1.00.0
vg1.00.0
wj1.00.0
zd1.00.0
zj1.00.0
zs1.00.0
cw2.00.6931471805599453
fj2.00.6931471805599453
hq2.00.6931471805599453
lj2.00.6931471805599453
lx2.00.6931471805599453
qo2.00.6931471805599453
qt2.00.6931471805599453
sz2.00.6931471805599453
tx2.00.6931471805599453
vc2.00.6931471805599453
vd2.00.6931471805599453
vz2.00.6931471805599453
zc2.00.6931471805599453
zg2.00.6931471805599453
zn2.00.6931471805599453
cb3.01.0986122886681096
jn3.01.0986122886681096
mg3.01.0986122886681096
mj3.01.0986122886681096
qa3.01.0986122886681096
vl3.01.0986122886681096
xg3.01.0986122886681096
xn3.01.0986122886681096
xq3.01.0986122886681096
zq3.01.0986122886681096
gj4.01.3862943611198906
jj4.01.3862943611198906
mq4.01.3862943611198906
vt4.01.3862943611198906
zb4.01.3862943611198906
zk4.01.3862943611198906
zt4.01.3862943611198906
cm5.01.6094379124341003
cp5.01.6094379124341003
dz5.01.6094379124341003
gk5.01.6094379124341003
rx5.01.6094379124341003
tk5.01.6094379124341003
zh5.01.6094379124341003
fg6.01.791759469228055
fm6.01.791759469228055
fw6.01.791759469228055
gc6.01.791759469228055
hg6.01.791759469228055
lq6.01.791759469228055
xb6.01.791759469228055
zv6.01.791759469228055
zw6.01.791759469228055
hj7.01.9459101490553132
lz7.01.9459101490553132
qi7.01.9459101490553132
uq7.01.9459101490553132
yj7.01.9459101490553132
bk8.02.0794415416798357
fn8.02.0794415416798357
gz8.02.0794415416798357
kg8.02.0794415416798357
kv8.02.0794415416798357
tj8.02.0794415416798357
uw8.02.0794415416798357
wz8.02.0794415416798357
xm8.02.0794415416798357
dq9.02.1972245773362196
fc9.02.1972245773362196
fp9.02.1972245773362196
iy9.02.1972245773362196
kj9.02.1972245773362196
mk9.02.1972245773362196
xv9.02.1972245773362196
zm9.02.1972245773362196
pj10.02.302585092994046
uh10.02.302585092994046
vn10.02.302585092994046
xf10.02.302585092994046
zp10.02.302585092994046
hk11.02.3978952727983707
tv11.02.3978952727983707
vs11.02.3978952727983707
bg12.02.4849066497880004
cd12.02.4849066497880004
fd12.02.4849066497880004
kc12.02.4849066497880004
wg12.02.4849066497880004
dk13.02.5649493574615367
md13.02.5649493574615367
ww13.02.5649493574615367
yk13.02.5649493574615367
xs14.02.6390573296152584
cn15.02.70805020110221
xw15.02.70805020110221
pg17.02.833213344056216
xx17.02.833213344056216
yv17.02.833213344056216
gp18.02.8903717578961645
xl18.02.8903717578961645
sj19.02.9444389791664403
fb21.03.044522437723423
mh21.03.044522437723423
uj21.03.044522437723423
fh22.03.091042453358316
mv22.03.091042453358316
sv22.03.091042453358316
uu22.03.091042453358316
vv22.03.091042453358316
uy23.03.1354942159291497
hc24.03.1780538303479458
rz24.03.1780538303479458
wu24.03.1780538303479458
bp25.03.2188758248682006
ij25.03.2188758248682006
mt25.03.2188758248682006
cz26.03.258096538021482
mw26.03.258096538021482
wc26.03.258096538021482
gd27.03.295836866004329
bw28.03.332204510175204
iw28.03.332204510175204
yx28.03.332204510175204
mc29.03.367295829986474
pk29.03.367295829986474
nx30.03.4011973816621555
bh31.03.4339872044851463
bv32.03.4657359027997265
bf33.03.4965075614664802
kd33.03.4965075614664802
vr33.03.4965075614664802
kk35.03.5553480614894135
mr35.03.5553480614894135
wm35.03.5553480614894135
hp36.03.58351893845611
pd36.03.58351893845611
aa38.03.6375861597263857
wt39.03.6635616461296463
gf40.03.6888794541139363
rq40.03.6888794541139363
hh41.03.713572066704308
oj41.03.713572066704308
bn42.03.7376696182833684
kp42.03.7376696182833684
pf43.03.7612001156935624
vy43.03.7612001156935624
yh44.03.784189633918261
rj45.03.8066624897703196
zu45.03.8066624897703196
hd48.03.8712010109078907
dt51.03.9318256327243257
ih51.03.9318256327243257
wy51.03.9318256327243257
td52.03.9512437185814275
wp52.03.9512437185814275
kf53.03.970291913552122
ii54.03.9889840465642745
wf54.03.9889840465642745
pw56.04.02535169073515
pc57.04.04305126783455
yz57.04.04305126783455
yu58.04.060443010546419
kt59.04.07753744390572
lr59.04.07753744390572
gb60.04.0943445622221
pb60.04.0943445622221
yf61.04.110873864173311
ao63.04.143134726391533
gw64.04.1588830833596715
pn64.04.1588830833596715
cq65.04.174387269895637
wk66.04.189654742026425
bj67.04.204692619390966
aj68.04.219507705176107
lh68.04.219507705176107
bc69.04.23410650459726
pm69.04.23410650459726
xh69.04.23410650459726
bm70.04.248495242049359
gt70.04.248495242049359
oq71.04.2626798770413155
lw73.04.290459441148391
sr76.04.330733340286331
km77.04.343805421853684
aq80.04.382026634673881
mf80.04.382026634673881
sg80.04.382026634673881
bd83.04.418840607796598
kh86.04.454347296253507
tg88.04.477336814478207
zy89.04.48863636973214
hf90.04.499809670330265
tp90.04.499809670330265
kw91.04.51085950651685
sd93.04.532599493153256
wb93.04.532599493153256
dp94.04.543294782270004
nz94.04.543294782270004
xy97.04.574710978503383
kb99.04.59511985013459
ku101.04.61512051684126
uv101.04.61512051684126
ux102.04.624972813284271
hb103.04.634728988229636
kr104.04.6443908991413725
df109.04.6913478822291435
wd111.04.709530201312334
xu111.04.709530201312334
dc112.04.718498871295094
yw112.04.718498871295094
yg114.04.736198448394496
uz117.04.762173934797756
uk118.04.770684624465665
dh119.04.77912349311153
ml120.04.787491742782046
hw121.04.795790545596741
ej122.04.804021044733257
dj124.04.820281565605037
xo127.04.844187086458591
yb135.04.90527477843843
zl136.04.912654885736052
dv139.04.9344739331306915
ji140.04.941642422609304
oh142.04.955827057601261
bt143.04.962844630259907
sb149.05.003946305945459
nj154.05.0369526024136295
lb162.05.087596335232384
oz163.05.093750200806762
tz163.05.093750200806762
iq171.05.14166355650266
ez172.05.147494476813453
db179.05.187385805840755
nq182.05.204006687076795
by184.05.214935757608986
ko184.05.214935757608986
hn185.05.220355825078325
yd185.05.220355825078325
tb187.05.231108616854587
dw188.05.236441962829949
gm188.05.236441962829949
vu190.05.247024072160486
xa197.05.2832037287379885
yt199.05.293304824724492
lg203.05.313205979041787
xc211.05.351858133476067
ek215.05.3706380281276624
tn215.05.3706380281276624
fs219.05.389071729816501
sf225.05.41610040220442
mn227.05.424950017481403
dm230.05.438079308923196
rw240.05.480638923341991
tf244.05.497168225293202
ky249.05.517452896464707
ix256.05.545177444479562
tm259.05.556828061699537
sq263.05.572154032177765
rh266.05.583496308781699
zz266.05.583496308781699
hm269.05.594711379601839
ln270.05.598421958998375
nw278.05.627621113690637
wl279.05.631211781821365
lp280.05.634789603169249
dn281.05.638354669333745
py283.05.645446897643238
my285.05.652489180268651
hl295.05.68697535633982
uo296.05.69035945432406
yr296.05.69035945432406
lf298.05.697093486505405
ax305.05.720311776607412
kn307.05.726847747587197
ah309.05.733341276897746
eq311.05.739792912179234
lm312.05.7430031878094825
yn313.05.746203190540153
zo315.05.752572638825633
fy324.05.780743515792329
ik328.05.793013608384144
je330.05.799092654460526
nb330.05.799092654460526
xt339.05.82600010738045
lc343.05.83773044716594
lv343.05.83773044716594
yc349.05.855071922202427
nm350.05.857933154483459
ae352.05.863631175598097
az359.05.883322388488279
lk364.05.8971538676367405
dy367.05.905361848054571
ny367.05.905361848054571
iu376.05.929589143389895
oy381.05.942799375126701
ja382.05.945420608606575
jo394.05.976350909297934
nh394.05.976350909297934
yl399.05.988961416889864
yo402.05.996452088619021
xp406.06.0063531596017325
gy413.06.023447592961033
ox417.06.0330862217988015
ws417.06.0330862217988015
uf418.06.035481432524756
eh427.06.056784013228625
wn438.06.082218910376446
ym441.06.089044875446846
cs446.06.100318952020064
dg448.06.104793232414985
rf448.06.104793232414985
kl464.06.139884552226255
hs473.06.159095388491933
nr477.06.1675164908883415
np480.06.173786103901937
ka494.06.202535517187923
wr497.06.208590026096629
xe498.06.210600077024653
ft500.06.214608098422191
xi500.06.214608098422191
rv517.06.248042874508429
ya523.06.259581464064923
cy527.06.267200548541362
ju530.06.272877006546167
nl531.06.274762021241939
tw535.06.282266746896006
sw543.06.297109319933935
yp544.06.298949246855942
eu582.06.366470447731438
nv586.06.373319789577012
yi611.06.415096959171596
ok615.06.421622267806518
sy618.06.42648845745769
bs623.06.434546518787453
of648.06.473890696352274
za648.06.473890696352274
oe651.06.478509642208569
af656.06.486160788944089
sk659.06.490723534502507
tc675.06.51471269087253
hr683.06.52649485957079
ey689.06.535241271013659
cc703.06.555356891810665
ye710.06.565264970035361
bb727.06.588926477533519
dd731.06.594413459749778
gn757.06.6293632534374485
rp767.06.642486801367256
aw798.06.682108597449809
rk816.06.704414354964107
eb822.06.71174039505618
hu830.06.721425700790643
ug838.06.731018100482083
ht848.06.742880635791903
rl849.06.744059186311348
sn858.06.754604099487962
ld872.06.77078942390898
eo874.06.773080375655535
nu889.06.790097235513905
ys894.06.795705775173515
zi896.06.79794041297493
ew903.06.805722553416985
rb910.06.813444599510896
ak912.06.815639990074331
hy920.06.824373670043086
gg928.06.833031732786201
nk938.06.843749949006225
rg946.06.852242569051878
fr971.06.878326468291325
ud987.06.894670039433482
wh1013.06.920671504248683
nf1021.06.928537818164665
ks1048.06.954638864880987
vo1051.06.957497370876951
gu1061.06.966967138613983
pt1085.06.98933526597456
du1086.06.990256500493881
mu1102.07.00488198971286
ei1114.07.01571242048723
oa1137.07.036148493750536
mm1138.07.037027614686276
pu1138.07.037027614686276
dl1140.07.038783541388542
gh1141.07.039660349862076
go1142.07.040536390215956
lt1144.07.042286171939743
ob1148.07.045776576879511
ms1156.07.052721049232323
ay1162.07.057897937411856
gs1198.07.088408778675395
dr1214.07.101675971619444
ue1228.07.113142108707088
nn1230.07.114769448366463
ui1230.07.114769448366463
fa1234.07.1180162044653335
ff1236.07.119635638017636
ev1243.07.1252830915107115
av1246.07.1276936993473985
ps1261.07.139660335964919
tl1272.07.148345743900068
eg1277.07.152268856032539
ub1286.07.1592919047975645
rc1307.07.175489713624222
ua1322.07.186901020411631
ib1334.07.195937226475569
ef1336.07.197435354096591
oi1336.07.197435354096591
rn1350.07.2078598714324755
au1359.07.214504414151143
mb1385.07.233455418621439
fu1413.07.253470382684528
uc1423.07.260522598089852
sl1426.07.262628600974241
fl1452.07.280697195384741
up1456.07.283448228756631
bu1493.07.30854279753919
wo1524.07.329093736246591
pp1553.07.347943823148687
ov1572.07.360103972989152
cl1613.07.385851078125209
sm1623.07.392031567514591
ds1625.07.393263094763838
rr1625.07.393263094763838
ex1627.07.394493107219038
gl1632.07.397561535524052
br1656.07.412160334945205
rm1664.07.416979621381154
fo1668.07.419380582918692
ry1682.07.427738840532894
fe1687.07.430707082545968
ki1696.07.4360278163518485
od1702.07.43955930913332
ls1739.07.461065514354283
ip1763.07.47477218239787
qu1772.07.4798641311650265
da1777.07.482681828154651
rd1809.07.500529485395295
do1845.07.520234556474628
if1849.07.522400231387125
we1863.07.529943370601589
ow1872.07.534762657037537
lu1901.07.550135342488429
ty1902.07.550661243105336
va1937.07.5688956634069955
ga1972.07.586803535162581
ep2004.07.602900462204755
wi2021.07.611347717403621
oc2058.07.629489916393995
ph2065.07.632885505395133
ru2068.07.63433723562832
cu2075.07.637716432664798
pl2109.07.653969180478774
tt2142.07.669495251007694
gr2189.07.691200097522863
ze2200.07.696212639346407
so2219.07.704811922932594
bo2254.07.720461694599722
cr2266.07.725771441587952
sp2295.07.738488122494646
gi2312.07.7458682297922685
um2322.07.7501841622578365
og2354.07.763871287820222
sa2358.07.765569081097317
ig2392.07.779885115070522
ir2395.07.781138509845015
ai2411.07.787796878181171
wa2445.07.801800401908972
bi2446.07.802209316247118
mp2446.07.802209316247118
ba2468.07.811163385025279
ag2485.07.818027938530729
op2494.07.821643126239982
su2511.07.828436359157585
tu2514.07.829630389150193
fi2566.07.850103545175582
ke2607.07.865955413933502
iv2623.07.872073979866873
be2635.07.876638460975463
sc2641.07.878912912297132
im2648.07.881559917056899
rt2656.07.884576510596324
iz2711.07.905072849498666
vi2746.07.917900586327916
ck2751.07.919719760924575
id2757.07.921898411023797
oo2757.07.921898411023797
ee2806.07.939515260662406
mo2828.07.947325027016463
ut2850.07.955074273262696
ap2853.07.9561263512135
pi2874.07.96346006663897
am2891.07.969357742016346
ot2918.07.978653729082731
ct2975.07.99799931797973
pa2981.08.000014093678072
ad2988.08.002359546252707
po3007.08.008698182988528
em3023.08.014004994779459
no3087.08.034955024502159
ci3097.08.038189179973203
os3204.08.07215530818825
ho3447.08.145259566516865
ha3449.08.145839612936841
th3526.08.167919362957816
ia3594.08.187021067343505
om3632.08.197538739721184
ul3650.08.202482446576537
hi3704.08.21716859576607
ec3726.08.223090551161533
pr3767.08.23403420769204
ge3781.08.237743803890933
ab3961.08.284251797621916
sh4015.08.297792626380861
ts4071.08.31164394850298
bl4123.08.3243363327069
ce4133.08.326758814511733
nc4139.08.328209491748732
ur4300.08.366370301681654
ac4314.08.369620826949102
ol4391.08.387312270561717
us4452.08.401108712395436
et4511.08.414274137408396
na4526.08.417593826193484
mi4574.08.428143374582726
ma4590.08.431635303054591
lo4634.08.441175704992322
si4725.08.460622839927844
ch4733.08.462314529906248
pe4787.08.473659189392508
to4830.08.482601746646619
ou4911.08.499232866039245
as4969.08.51097389160232
il4973.08.511778558714738
nd4985.08.514188682395938
ea5053.08.527737405291909
me5194.08.555259392222693
ni5283.08.572249397164315
el5418.08.59748202264504
di5420.08.597851094433691
he5508.08.613956859848546
ta5535.08.618846845142738
ll5564.08.624072553043334
ca5885.08.680162019694377
ve6085.08.713582005421628
tr6209.08.733755131364893
ly6221.08.73568594451502
ss6221.08.73568594451502
se6259.08.7417757069247
io6273.08.744009988096742
la6305.08.74909824839902
ro6612.08.796641458940915
co6686.08.807771066980044
ns6720.08.812843433517195
it6775.08.820994645747902
ie6978.08.85051762174672
ic7428.08.913011922472597
rs7560.08.930626469173578
de7583.08.933664178700935
un8092.08.998631198287637
nt8146.09.005282288208358
or8246.09.017483513266844
ne8439.09.040619097157967
ar8557.09.054504940418054
is8772.09.079320109537779
li9116.09.117786390365575
ra9694.09.17926241640532
al9719.09.181838011503471
ri9750.09.185022563991893
an10166.09.226804098006848
le10956.09.301642530382969
st11079.09.312806703520673
en11086.09.31343832997975
on11881.09.382695764458287
at12503.09.433723894495
te13058.09.477156251746578
ng13273.09.493487175626202
ed13527.09.512442967089195
re13568.09.515469358031684
ti14364.09.572480355345974
es20561.09.931151356518601
in22393.010.016503689004832
er23156.010.050009205198895

This entry was posted in NousRandom, Orthography. Bookmark the permalink.

Leave a Reply

Your email address will not be published.

nineteen − 16 =