The Voynich Cipher - Anagram AnalysisSeveral people, notably Robert Teague and Philip Neal, have theorized that the cipher is based on anagrams. The idea is that to decipher the VMs text, you first convert the VMs symbols to alphabet letters, and then you find an anagram of those letters that makes a valid plaintext word. Variations on this theme are that e.g. some of the plaintext letters in the target word are allowed to be missing and can be inferred from "context", and/or that the VMs words are anagrams of the plaintext words that have letters arranged alphabetically. For this analysis, we look at Folio68r3 in particular:
The labels on the stars are, from the ten o'clock pie slice going anti-clockwise, in the Voyn 101 encoding: <68r3.pieslice_1>61oe7a9.8oay9.oae1coe= <68r3.pieslice_2>oh1o89.8ayaee.ok98*.oK98= <68r3.pieslice_3>ohoe19.1G9.179,9h9= <68r3.pieslice_4>ohos.okoy9= Anagrams: Combinations/PermutationsIf you make the reasonable assumption that all the labels on the star shapes
in f68r3 are star names, and you assume a target language, then this puts severe
constraints on the cipher, since there are only so
many star names and only so many cipher schemes that can be consistent amongst
all the stars. S: 6 1 o e 7 a 9 R: d r e ba an a l 61oe7a9 -> drebaanal -> aldebaran
(Since this Don Latham sent me a link to his list of star names at www.sixmilesystems.com/voynich , which I have simplified and removed any duplicates and non-alphabetic symbols from, and put here: http://pcbunn.cacr.caltech.edu/Voynich/StarNamesLatham.txt )
Choosing Alcyone (for reasons which will be obvious to some), you now have an
extended mapping that includes 8 and y: S: 6 1 o e 7 a 9 8 y R: d r e ba an a l cy on 61oe7a9 -> drebaanal -> aldebaran 8oay9 -> cyeaonl -> alcyone
Allowing Missing CharactersLooking again at the 12 star labels on f68r3: 61oe7a9 8oay9 oae1coe oh1o89 8ayaee ok98* oK98 ohoe19 1G9 179,9h9 ohos okoy9 There are 16 different VMs symbols used: o 9 1 e a 8 h y 7 k 6 c * K G s
Let's assume that each of the VMs symbols maps to a single plaintext alphabet letter. Which 16 of the 26 letters in the alphabet will we choose to map? Let's look at our list of star names and find the 16 most used alphabet letters: a i r e l s h n u t b k m d o c
(shown in order of frequency). How many different ways can we map the 16 VMs symbols to these 16 letters? That is: Factorial 16 = 16! = 20,922,789,888,000
which is about 21 trillion (give or take), and is quite a lot. To explore this huge space of possibilities, we can use a Monte Carlo method. Basically we write a program to shuffle the 16 alphabet letters, look to see how that mapping works, then shuffle again, and so on. As we go, we keep track of the best mapping found so far. Suppose we have the following mapping (cipher): S: o 9 1 e a 8 h y 7 k 6 c * K G s
R: a e n c i l h r o s u d k t b m
Let's go through the f68r3 star labels and convert them to plaintext using the cipher. After we convert them, let's look the plaintext characters up in our list of star names using a "fuzzy match". Here are the results: 61oe7a9 -> unacoie -> ?
8oay9 -> laire -> albireo (bo)
oae1coe -> aicndac -> ?
oh1o89 -> ahnale -> alhena ()
8ayaee -> liricc -> ?
ok98* -> aselk -> sheliak (hi)
oK98 -> atel -> elnath (nh)
ohoe19 -> ahacne -> achernar (rr)
1G9 -> nbe -> deneb (de)
1799h9 -> noeehe -> ?
ohos -> aham -> hamal (l)
okoy9 -> asare -> antares (nt)
Looking at the second label as an example, this is converted to plaintext characters "laire". The fuzzy match looks up "laire" in the star names list, and finds a match with the star called "albireo". Fuzzy Match RulesThere is a match between the deciphered plaintext word and a dictionary word if
In the example shown above, N is 2, so "laire" fuzzy matches "albireo", with letters "bo" missing (shown in brackets) SolutionsIn the above example, 8 of the 12 VMs star labels have been matched to valid star names. The application continues exploring the 16! possible arrangements, trying to improve on the number of matches. After looking at around 70 million arrangements (i.e. about 3 millionths of them), it finds this: Iteration 67334517 Deciphered=9/12 S: o 9 1 e a 8 h y 7 k 6 c * K G s R: a e l b r h c d o t i u n s k m 61oe7a9 -> ilabore -> borealis (s) 8oay9 -> harde -> schedar (sc) oae1coe -> arbluab -> ? oh1o89 -> aclahe -> alphecca (pc) 8ayaee -> hrdrbb -> ? ok98* -> atehn -> elnath (l) oK98 -> aseh -> scheat (ct) ohoe19 -> acable -> cebalrai (ri) 1G9 -> lke -> alkes (as) 1799h9 -> loeece -> ? ohos -> acam -> almach (lh) okoy9 -> atade -> tarazed (rz) Just how interesting/plausible/believable is this? We can make a control experiment by using a dictionary of dog breeds of about the same size: does mapping the VMs labels on f68r3 to star names produce a better fit than mapping the labels to dog breeds? We run the program for a few million mappings, first with the star names, then with the dog breed names. For about 4 million mappings, there is a slightly better (9 out of 12) mapping to dog breeds compared with star names (8/12): DogsIteration 3779182 Deciphered=9/12
S: o 9 1 e a 8 h y 7 k 6 c * K G s
R: e r o n h l b s d c a p t u i g
61oe7a9 -> aoendhr -> rhodesian (si)
8oay9 -> lehsr -> charles (ca)
oae1coe -> ehnopen -> ?
oh1o89 -> eboelr -> boerboel (bo)
8ayaee -> lhshnn -> ?
ok98* -> ecrlt -> central (na)
oK98 -> eurl -> tulear (ta)
ohoe19 -> ebenor -> redbone (d)
1G9 -> oir -> corgi (cg)
1799h9 -> odrrbr -> ?
ohos -> ebeg -> beagle (al)
okoy9 -> ecesr -> crested (td)
StarsIteration 4208178 Deciphered=8/12
S: o 9 1 e a 8 h y 7 k 6 c * K G s
R: a i e c o l b s t h n u r d k m
61oe7a9 -> neactoi -> ?
8oay9 -> laosi -> polaris (pr)
oae1coe -> aoceuac -> ?
oh1o89 -> abeali -> algieba (g)
8ayaee -> losocc -> ?
ok98* -> ahilr -> alphirk (pk)
oK98 -> adil -> alkaid (ka)
ohoe19 -> abacei -> cebalrai (lr)
1G9 -> eki -> keid (d)
1799h9 -> etiibi -> ?
ohos -> abam -> markab (rk)
okoy9 -> ahasi -> nashira (nr)
Of course, this doesn't disprove that the labels on f68r3 are in fact star names: it may well be that one of the 16! combinations produces a perfect fit, with "61oe7a9" deciphered to "aldebaran" and "8oay9" deciphered to "alcyone" etc.. Character Position Based CipherLooking just at the star labels in Folio 68r, we can extract the
letter/symbol frequencies as a function of position in the label. 1: o 1 8 4 9 k 2 A c y 7 W s ? G e 6 g ▐ h 2: o h k 1 c a 8 e y j C 9 K H 7 2 s u m J + f G ┘ W 3: o c 1 9 h a C e y 8 k K d U + 2 * s H n ª I 6 m Z J 4: o c 9 e a 8 y 1 C h i s k 7 A Q H 2 m J 3 ? d 5: 9 o 8 e y 1 c a m h s K * C 7 S 5 2 6: 9 e 8 a 1 y o 7 c s m i 7: 9 e a c y I o m p 1 8: 9 8 e y 7 9: a 9 e
1: a m s k h d r t e n b f z j p u c g i w v l o y 2: a l u i e h c r s d n o t z k w b g j m f y x 3: a r h i b n s m l t u d k g c e f z w j y p o v 4: a i e r h b d m l n u s k c f o t z w g p j y v 5: a i r l n e t h k b s m u d c f g y o z j w p 6: a h r i e t l n m b s d u c k z o y g p f j v 7: a e h n i r l b s m t o y d c u k z g f w v 8: a i h n e c t l s b o u k r f z m g y p 9: h a t n i e s u r o g z k d p c v m b
1T: a al m s k h d sa r t ha e n ma b ka f mi z j 1S: o 1 8 4 9 k 2 A c y 7 W s ? G e 6 g ▐ h 2T: a l u i e h ar ha c r ab la al ai s d lg ub as n 2S: o h k 1 c a 8 e y j C 9 K H 7 2 s u m J 3T: a r h i b n s m l t u d k g c e ra ha f z 3S: o c 1 9 h a C e y 8 k K d U + 2 * s H n 4T: a i e r h b d m l n u s ra k at ar al ha en c 4S: o c 9 e a 8 y 1 C h i s k 7 A Q H 2 m J 5T: a i r l n e t h k ah b s m u ar ra d c at f 5S: 9 o 8 e y 1 c a m h s K * C 7 S 5 2 6T: a h r i e t l n m b ah s d u c k an la ab ar 6S: 9 e 8 a 1 y o 7 c s m i 7T: a e h n i r l b s m t ah o y d c u k at an 7S: 9 e a c y I o m p 1 8T: a i h n e c t l s in b o u ch k r f et at z 8S: 9 8 e y 7 9T: h a t n i e s u r ha he o to g z ze ab th ra k 9S: a 9 e
|
|
03/17/2009 by Julian Bunn, email: Julian.Bunn@caltech.edu |