Orthography Statistics (Part 2: Count Plots)

In Part 1, I posted tables showing counts of how many times each orthographic element occurred within a dictionary of 109,462 English words. Now let us look at those numbers. The goal of this discussion is to find how to pare out letter combinations that are rarely used, or that create unusual or unlikely groupings. By removing those, the Orthographic Password Creator should produce more pronounceable, less complicated words, that should be simpler to remember.

The elements are divided between consonant and vowel orthography groups. Within each, the letters are in groups of 1-5 letter pairings. Obviously single letters will always be used, so no need to examine those. As we can see from the count tables, pairings of 4 and 5 letters happen much less frequently. Additionally, when a 4 or 5 letter group is randomly selected with another 4-5 group, the created word becomes significantly more complex. A goal is to use a set of orthography elements such that a consonant/vowel paring results in single syllables. Tests show that those larger groups tend to result in additional syllables.

Therefore, I am removing all 4 and 5 letter groups from the lists.

This leaves groups with 1 to 3 letters per group. Since all of the single letters will always be used, we need only examine the 2-letter and 3-letter groups. The following graphs show the counts in sorted order. The X-Axis is the index number which refers to the associated table row. I could not find a non-cluttered way to include the letters along the X-Axis, so the index number can be cross-referenced with the data tables. The tables are at the end of this post.

Following are two graphs for vowels and consonants, for pairs and triplets, so eight graphs total. As you see, the count values follow an exponential curve. Following each count graph is a logarithmic graph of the same values. I simply took the natural log of the count value. So without further ado, here are the graphs:

Vowel Two LinearVowel Two LogConsonant Two LinearConsonant Two LogVowel Three LinearVowel Three LogConsonant Three LinearConsonant Three Log

 

PAIRED VOWEL TABLE

Vowel LettersVowel CountLogarithm
uy223.0910424534
aa383.6375861597
ii383.6375861597
ao634.1431347264
ez1404.9416424226
oh1404.9416424226
uo2615.5645204073
yr2635.5721540322
ah2815.6383546693
ae3385.8230458955
oy3525.8636311756
eh3895.9635793436
eu5696.3438804341
ye5786.3595738687
oe6026.4002574453
ey6396.4599044544
aw7336.5971457019
eo8226.7117403951
ew8426.7357800142
ei10136.9206715042
wo10186.9255951971
gh10576.9631899859
ay10726.9772813416
oa10806.9847163201
ue11537.0501225203
ui11587.0544496581
eg12197.1057861295
ua12317.1155821262
oi12327.1163941441
au13037.1724245771
ow17157.4471683596
ir21557.6755460025
ig21677.6810990015
ai22057.6984827879
oo24557.8058820402
ee25617.8481530862
ut26647.8875840317
ot27847.9316440215
ia33218.1080212214
ur38468.2547889261
et42078.3445050836
ou45008.4118326758
ea45888.4311994782
ie63748.759982495
ic68938.8382616829
or76308.9398431243
ar78668.970304953
is81449.0050367388
al89939.1042017759
es189639.8502449911
er210159.9529917474

PAIRED CONSONANT TABLE

Consonant LettersConsonant CountLogarithm
vv0-Infinity
zh51.6094379124
xs142.6390573296
cn152.7080502011
mh162.7725887222
cz263.258096538
kk273.295836866
bh313.4339872045
pb493.8918202981
lh634.1431347264
cq644.1588830834
pn644.1588830834
kh794.3694478525
dh1174.7621739348
dj1184.7706846245
tz1294.8598124044
bt1334.8903491282
gm1785.1817835503
xc2005.2983173665
mn2235.4071717715
ln2255.4161004022
zz2495.5174528965
dn2535.5333894887
rh2545.537334267
lf2605.560681631
kn2885.6629604801
wr2925.6767538023
lm3025.7104270174
nh3045.7170277014
lk3275.7899601709
tw3515.8607862235
dg4186.0354814325
xe4296.0614569189
cs4316.0661080901
sw5166.2461067655
wh5516.3117348092
ve5786.3595738687
cc6536.4815771293
dd6746.5132301109
bb6946.5424719605
gn7286.5903010482
ld7466.6147256002
zi7596.6320017774
gg8466.7405193596
ks9626.8690144507
gu10066.9137373507
pt10136.9206715042
gh10576.9631899859
mm10816.9856418176
nn10926.9957661563
ps11277.027314514
ff11807.0732697175
mb12937.1647203788
we13287.19142933
pp13987.2427979228
rr15297.3323692059
fe15707.3588308983
qu16907.4324838079
ph19457.5730172561
ze19627.5817196401
tt19637.5822291943
cu19747.58781722
gi21647.67971364
mp22887.7354333525
ke23727.7714887601
rt24377.7985230536
be24977.8228452903
sc24977.8228452903
ck25347.8375543609
ct28007.9373746962
ci29157.9776250988
th32258.0786882292
ge35048.1616604521
nc36438.200562797
sh36948.2144651608
ts37918.2403851155
ce38598.2581633615
nd41788.3375879421
si44478.3999849905
ch44578.4022311729
pe44588.4024555139
me49138.4996400322
ll50718.5312933158
di51508.5467519937
ss57218.6518988943
se58078.6668193654
de69138.8411589759
ne76358.9404982177
le99009.2002900361
st103279.2425171037
ed118669.3814324468
ng120119.3935781756
te120279.3949094013
re126959.4489634941
ti134979.5102227175

TRIPLET VOWEL TABLE

Vowel LettersVowel CountLogarithm
aae10
eie10
iee10
aoh20.6931471806
uye31.0986122887
aah41.3862943611
aow41.3862943611
aar51.6094379124
aie51.6094379124
aue51.6094379124
oea51.6094379124
oeh71.9459101491
oeu82.0794415417
eah92.1972245773
aor102.302585093
awy112.3978952728
uoy132.5649493575
ieu213.0445224377
ooe373.6109179126
eir383.6375861597
iew443.7841896339
aig473.8501476017
oye554.0073331852
awe574.0430512678
eor714.262679877
oup814.3944491547
aer844.4308167988
eau844.4308167988
uet904.4998096703
aur914.5108595065
oul1064.6634390941
oor1164.7535901911
ewe1454.9767337424
ais1495.0039463059
eou1495.0039463059
eye1625.0875963352
aye1645.0998664278
eur1685.1239639794
uar1765.170483995
irr1775.1761497326
eig1805.1929568509
oar1895.2417470151
urr2375.4680601411
eer2565.5451774445
air3395.8260001074
owe3605.8861040315
err4116.0185932145
ach4606.1312264895
arr4646.1398845522
our4906.1944053911
are5226.2576675879
oll5386.2878585602
ert5726.3491389914
ete6716.508769137
igh6866.5308776277
ear7446.6120410348
ore7536.6240652278
ure7746.6515718736
olo9356.8405465293
ere14457.2758646005
ier14497.2786289423
ers51618.5488856381

TRIPLET CONSONANT TABLE

Consonant LettersConsonant CountLogarithm
xsc10
xsw20.6931471806
pph82.0794415417
tth82.0794415417
cht363.5835189385
ngh383.6375861597
rrh403.6888794541
tsh413.7135720667
chs463.8286413965
lks523.9512437186
cqu644.1588830834
chm894.4886363697
sth984.5849674787
lve1635.0937502008
kes1765.170483995
gne1815.1984970313
gue1815.1984970313
sci2165.3752784077
sch2365.463831805
mbe2615.5645204073
dge2915.6733232672
gge2915.6733232672
phe3115.7397929122
sce3155.7525726388
ffe3185.7620513828
mme3575.8777357818
rre4206.0402547113
nne4216.0426328337
sne4276.0567840132
cks4596.1290502101
que5016.2166061011
ppe5036.2205901701
tch5176.2480428745
ght6716.508769137
shi7966.6795991858
she8486.7428806358
sse8716.7696419769
chi8886.788971743
ssi9096.8123450942
tte9106.8134445995
lle9336.8384052008
the10906.9939329752
che12347.1180162045

 

 

This entry was posted in NousRandom, Orthography. Bookmark the permalink.

Leave a Reply

Your email address will not be published.

5 + 6 =