The Voynich Manuscript

This is a collection of various computational methods I have used in an attempt to get some understanding of the cipher used in the Voynich manuscript.

Strong's Cipher

Supposing that Strong's conjecture about the structure of the VMs cipher is correct, we use a computational method to try to infer the alphabets and their sequence with the use of a large Latin diictionary.

Details are here.

Mapping the Voynich Manuscript to Music

Using a Java Media Framework MIDI application that processes the Voynich text, aach character is mapped to a note on the MIDI scale in a two octave range: Note = 48 + i mod 24

Each word is played as one or more chords of these notes, using the Grand Piano instrument.

If a gallows character ("h" or "k") appears in the word, then the notes preceding the gallows character are played as a chord, followed shortly by the notes after the gallows character, but shifted by either 1 octave ("h") or two octaves ("k"). If the character "9" appears as the last in the word, then the delay before playing the next word is reduced.

Using this method, here is what the common Voynich word "8am" sounds like: 8am

The word "okoe": okoe

The word "4ohc89": 4ohc89

And the complete Folio f1r (a long pause is taken after each paragraph on the folio): Folio f1r

And Folio f3r

 

Folio Similarities

More details here.

A Phonetic Attack

Comparing the phonetic content of various texts with VMs "words", using "Soundex" and "Double Metaphone"

For details, go here.

Phrase Analysis

This study is based on the idea that the Voynich "Words" are in fact codes for parts of plaintext words. The Voynich symbol "9" is used to delimit the words. A Genetic Algorithm is used to explore the cipher, using a large body of common Latin phrases (as opposed to single words).

For details, go here.

Vowel/Consonant Group Encryption Scheme

For details, go here.

Transitions between Glyphs

This study looks at the probabilities associated with glyph sequences in the VM. E.g. which is the most likely glyph to find at the beginning of a word in the herbal section?

For details, go here.

Anagram Analysis of Star Labels etc.

This investigation was inspired by Robert Teague's deciphering of e.g. f68r3.

For details, go here.

Analysis of the Prefixes, Stems and Suffixes in the Voynich Manuscript

Extract:

For N=3, looking at the Herbal folios f1v-f20v inclusive, 1331 different words. 

Confirmed valid prefix/stem/suffix counts 99 252 111 
Prefix/Stem/Suffix frequency, normalised 
4ok     0.1010101               o89     0.05952381              o89     0.09009009 
4oh     0.07070707              1oe     0.055555556             8am     0.09009009 
1oe     0.060606062             4ok     0.055555556             1c9     0.054054055 
1oh     0.04040404              8am     0.04761905              1oy     0.054054055 
ok1     0.04040404              4oh     0.04761905              1oe     0.045045044 
8oe     0.030303031             1oy     0.03968254              coe     0.036036037 
1oy     0.030303031             1c9     0.031746034             cc9     0.027027028 
1co     0.030303031             1co     0.023809524             e89     0.027027028 
1ok     0.030303031             8oe     0.023809524             ham     0.027027028 
4oj     0.030303031             coe     0.01984127              2c9     0.027027028 

For more results, go here.

Useful Links

This is by no means an exhaustive list, but these are some of the Voynich sites/pages I have found useful, interesting and/or amusing.

Rene Zandbergen's site

http://www.voynich.nu/

Voynich Central

http://www.voynichcentral.com/

Elmar Vogt's blog

http://voynichthoughts.wordpress.com/

H.R. Santa Coloma's site

http://www.santa-coloma.net/voynich_drebbel/voynich.html

Philip Neal's webpages

http://voynichcentral.com/users/philipneal/

Jan Hurych's VM Letter Frequency Analysis

http://hurontaria.baf.cz/CVM/b2.htm

Voynich Transcription

http://voynichcentral.com/transcriptions/Voynich-101/index.html
 
Edith Sherwood's site
 
http://www.edithsherwood.com/voynich_botanical_plants/
 
This is a list of Latin herb names compiled from various source

http://pcbunn.cacr.caltech.edu/Voynich/LatinHerbs.txt
 
Don Latham's site (with lists of Star names, constellations etc.)
 
http://www.sixmilesystems.com/voynich
 
Francois Almaleh's site
 
http://www.almaleh.com/ms-voynich/index.html
 
 

Attempts to crack the cipher with Genetic Algorithms

Some Assumptions about the VM text

My assumption in the work with Genetic Algorithms (GAs) is that the Voynich is a terse, compressed, abbreviated n-Gram conversion of the plaintext language. The scribe would have consulted a table of n-Grams while converting the Latin (or whatever) to the VM text, perhaps even writing the enciphered words out on a separate work sheet before copying them into the VM.

For example, assume the following extract from a set of possible n-Gram conversion rules (or "mappings") used by the scribe:

Source n-Gram Voynich n-Gram
pot oc
he c
is 9
h f
s 8
y a

then, to convert the English word "hypothesis" to Voynichese, we do as follows:

h => f

y => a

pot => oc

he => c

s => 8

is => 9

so that "hypothesis" => "faocc89"

The inverse mapping can be used to decode the Voynich word (perhaps not unambiguously ...).

Clearly, there are a very large number of possible n-Gram mappings!

Candidate Decipherings of f27v

Go here to look at folio 27v, and see the various Genetic Algorithm attempts at deciphering the text.

Voynich Herbs

Edith Sherwood has a web site where she details compelling possible identifications for the plants depicted in the "herbal" pages of the VM.

Dana Scott's page also has plausible identifications for the plants.

As has often been pointed out, if we look at the first Voynich "word" that appears on each page of the herbal part of the VM, we find that those words are unique, or appear elsewhere very rarely. It thus seems reasonable that the words may be the names of the plants depicted.

The GA was set up to find a set of n-Gram mappings that would convert a list of 111 Voynich first herbal words into Latin/English or Spanish. For this, dictionaries of Latin, English and Spanish herb/plant names were used.

The GA sought a mapping that would convert all the Voynich words for herbs/plants into as many valid plaintext (Spanish, English, Latin) words as possible. The best result was for a mixed English/Latin dictionary (see table): 31 of the 111 Voynich words were converted, about 30% success rate.

(One should never expect 100% success, due to missing names in the dictionary, transcription errors, missing n-Grams, incomplete n-Grams etc..)

The results are shown below in tabular form, together with Dana Scott's and Edith Sherwood's identification. The first column shows the folio in the VM, the second shows the first Voynich word on that folio. For the GA identification columns (3 and 4) the Voynich mapped word is shown, in quotation marks if not found in the associated dictionary, and in bold if found in the dictionary.

Note that, probably unsurprisingly, nowhere do the IDs from the GA in Spanish, English/Latin and Scott/Sherwood, agree! NOT YET, anyway :-)

(What amuses me about about this mapping technique is that it tends to produce words that sound plausible in the target language. E.g. for f4r the Latin/English word "paptise" sounds like a valid word.)

Folio Voynich 1st Word Candidate GA ID, Spanish Candidate GA ID, Latin/Engish Dana Scott ID, English Dana Scott ID, Latin Sherwood ID, Latin Sherwood ID, English
f1r fa19s costa "greica"        
f1v h1s9 rabo geum Deadly Nightshade Atropa belladonna Hyoscyamus niger Solanum nigrum Solanum dulcamara Atropa belladonna Deadly Nightshade
f2r h98an9 "jzba" "ariapha" Cornflower Centaurea cyanus Centaurea diffusa Diffuse Knapweed
f2v hoom "meic" "padi" Water Lily Nymphaea candida Nymphoides Nymphoides
f3r k2cos chinita (Impatiens) arnica     Celosia argentea Feathery amaranth
f3v hoam menta (mint) paris     Helleborus foetidus Dungwort
f4r ho8ae19 "mezirn" "paptise"     Saxifraga cespitosa Alpine Saxifrage
f4v j1oom pastora (Poinsettia) "oigle"     Campanula rapunculus Rampion
f5r h2o89 "piyn" "hicse"     Arnica montana Wolfs Bane
f5v hA1coy malanga (Malanga) cirsium Tennis Racket Plant Agrimonia eupatoria Malva sylvestris Mallow
f6r foay "oote" "erk"     Acanthus mollis Bear Breeches
f6v hoay9say1Chay "meotendoteisedh" "pakpikrtsst"     Eryngium maritimum Sea Holly
f7r f1o8am "saynta" acris     Trientalis europea Starflower
f7v joe29 "rden" anise     Myrica gale Bog Myrtle
f8r g2oe "dno" "miv"     Pisum sativum Green Pea
f8v Ko8 "anop" "amot"     Symphytum officinale Comfrey
f9r k98eo "uardna" "cernur"     Ricinus communis Casteroil
f9v fo1oy "oveh" "erut" Heartsease, Wild Pansy Viola tricolor Violaceae Viola
f10r g1oK9 "pohon" "apryse"     Cichorium pumilum Chicory Endive
f10v gam tora (Tora Tree) gale     Linnaea borealis Twinflower
f11r k2oe chino (Chinese Hat Plant) "arv"     Rosmarinus officinalis Rosemary
f11v goe81o89 "albaveaca" "maadud"     Curcuma longa Turmeric
f13r koy3oy "lenga" "mdoium"     Banana Banana
f13v hoaiy "memh" "paft"     Lonicera periclymenum Honeysuckles Woodbines
f14r g1o8am "poynta" "apcris"     Scorzonera Black Salsify Vipers Grass
f14v g891om "uomic" "gesdi"     Stachys monnieri Wood Betony Heal-all Sel-heal Woundwort
f15r k2oy "chiga" "arium"     Sonchus oleraceus Sow Thistles
f15v gayoy "t8h" "gabt"     Paris quadrifolia Herb Paris
f16r go1co89 "alblanyn" "marscse"     Cannabis Cannabis
f16v g1yAm "potoora" "aptule"     Chrysanthemum Chrysanthemum
f17r f2o89 "hayn" "ulcse"     Catananche caerulea Cupids Dart
f17v g1o8oe "poyno" "apcv"     Dioscorea Yams
f18r g8yaz89 "ullngn" "gmeagse"     Aster alpinus Aster
f18v koe8 la (?) mad     Telfairia Fluted pumpkin
f19r g1oy "poga" apium     Polemonium coeruleum Greek Valerian
f19v go1am "albbora" mantle     Draba nivalis Nailwort
f20r h81o89 "caveaca" woud     Astragalus hypoglottis Milk vetch
f20v faIsay "crrote" greek     Cynara cardunculus Cardoon
f21r g1oy "poga" apium     Anagallis arvensis Pimpernel
f21v koe829 "laol" "madpe"     Dictamnus albus Burning bush False Dittany White Dittany Gas Plant
f22r goe "albv" "maus"     Verbena officinalis Common Vervain Holy Herb
f22v g9samoy "..dah" "hnshot"     Tulip Tulip
f23r g9818op ".fhilo" "hsthlo"     Pulsatilla vulgaris Pasque flower
f23v go8azoe "albzucv" "mapacus"     Borago officinalis Borage Star Flower
f24r goyoy9 "alb.." "maby"     Cucumis sativus Cucumber
f24v k1o8ay coyote (wild) rock     Ficus religiosa Sacred Fig Bo Tree
f25r f1oe89 "sanoaca" "avd"       Wild Thyme
f25v goCam "albcuora" "malile"     Isatis tinctoria Woad
f26r g%coh9 "spnij" lunaria     Prunella vulgaris Self heal
f26v g1c8ay pochote (Pochote) "apgok"     Lens culinaris Lentil
f27r hsoy manga (Mango) "veium"     Spinacia oleracea Spinach
f27v fo1ou oveja (?) eruca French Marigold Tagetes patula Dianthus superbus Dianthus
f28r g1o8ay "poyote" "apck"     Aristolochia Smearwort Birthwort Pipevine
f28v h2oe pino (Pine) "hiv" Dahlia Dahlia imperialis Rhododendrons Rhododendrons
f29r gosam "alb.ora" "mansle"     Lactuva sativa longifolia Romaine Cos Lettuce
f29v hoom "meic" "padi"     Nigella sativa Roman coriander
f30r oh1cs9 "elanbo" "inrsum"     Prunella vulgaris Healall
f30v Ks1an rubia (Madder) montana     Cuscuta europaea Dodder
f31r hcc8c9 lichi (Lychee) "rgoio"     Erigeron acris Fleabane
f31v go8az "albzon" "mapnn" Fernleaf yarrow Achillea filipendulina Valerian Valerian
f32r f1am santa (?) "aris"     Veronica triphyllos Speedwell
f32v h1co8am "ranizora" "genple"     Campanula rotundifolia Harebell
f33r k28ay "chizh" "arpt"     Silene vulgaris Bladder Campion
f33v kayay "qllh" "opmet" Masterwort Astrantia major Tanacetum parthenium Feverfew
f34r g1cocj19 "ponianos" "apnbie"     Anemone hortensis  
f34v hs189 "mansn" "vewse"     Lunaria annua Honesty Money Plant
f35r Koo anona (Custard Apple) amur     Cichorium intybus Radicchio
f35v gay1oy "trtga" galium     Ribes nigrum Blackcurrant
f36r j1af8aN "pa.nzti" "onupfl"     Delphinium staphisagria Delphinium
f36v g1ayos9 "pooteesn" "apksise"     Lamium amplexicaule Henbit
f37r koGoe "luiv" malus     Mentha longifolia Mint
f37v h2o89 "piyn" "hicse"   fedtschenkoi englerii Emilia fosbergii Tassel flower
f38r koeoy "lilh" "mmut"        
f38v oh1oj "eveet" inula     Euphorbia myrsinites Myrtle Spurge
f39r kc7o128 "goguadp" "gienmpot"        
f39v g7aiy "inmh" "naft"        
f40r g1c9 "poi" apio     Erodium malacoides Storks bill
f40v j1c7an "pagmo" "oospo"   Epiphyllum oxypetalum Crocus vernus Crocus
f41r j2c9hc8aecc9 "roilizrii" "ediorpcuio"     Origanum vulgare Wild Marjoram
f41v hcSo8ae "lirbzv" "riupus"     Coriandrum sativum Coriander Cilantro
f42r 2o "ah" st        
f42v k1o˛ cola (?) rosa     Aquilegia vulgaris Columbine Culverwort
f43r kayo8am "q.zora" "opbple"     Stellaria media Chickweed
f43v g8saiy9 "u.lbn" "gnsicse"     Elytrigia repens Couch grass
f44r k2o8g9 "chiy." arch     Mandragora officinarum Mandrake
f44v k2o china (Impatiens) "arur"     Apium graveolens Celery
f45r g9h98ae ".jzv" "hariapus"     Atriplex hortensis Orach Saltbush
f45v hosay9 "me.." pansy     Lavandula angustifolia Lavender
f46r g1coJ9 "ponitr" "apnta"     Leucanthemum vulgare Oxeye Daisy
f46v jo79e3c7 "rimvig" "andretos"   Tanacetum parthenium, Chrysanthemum parthenium Inula conyza Ploughmans Spikenard Great Fleabane
f47r g1aiy "pomh" "apft" Lady's Mantle, Lion's Foot Alchemilla vulgaris Rosaceae Sempervivum tectorum Houseleek
f47v g2cok "dnier" minor   Arnica montana Pulmonaria officinalis Lungwort
f48r g28am "dzora" "miple"     Adonis Vernalis False Hellebore
f48v g1co819 "ponifn" "apnsse"     Ruta graveolens Rue Herb of Grace
f49r gA2oe "ceahv" costus     Nymphaea caerulea Blue Nile Lotus
f49v g he wort        
f50r g2coy "dnih" mint     Astrantia major Masterwort
f50v k19 con (?) rose   Telopea speciosissima Gentiana frigida Stiff Gentain
f51r k2oe819 "chinofn" "arvsse"     Cakile maritima Searocket
f51v go2o89 albahaca (Basil) "mastd"     Salva officinalis Sage
f52r k8oh1F9 "queacn" "toinnise"     Anemone coronaria Poppy Anemone
f52v g1oy "poga" apium     Polystichum setiferum Fern
f53r hA8ap "mazlo" "ciplo"     Achillea Ptarmica Sneezewort
f53v k2oy3c9 "chigamin" "ariumocse"     Hieracium aurantiacum Hawkweed
f54r go8am "albzora" maple     Cirsium oleraceum Cabbage thistle
f54v g1co8ay "ponizh" "apnpt" Bittersweet Nightshade Solanum dulcamara Perovskia atriplicifolia Russian Sage
f55r go8am "albzora" maple     Fumaria officinalis Fumitory
f55v h1C8189 "raecsn" "geriwse" Forest lily Veltheima bracteata Broccoli Broccoli
f56r ok1ae "tebv" "trntus"     Drosera  Sundews
f56v h1cok "ranier" "genor"     Cycas revoluta Sago Palm
f57r joccoHc9 "riopei" "anomiaio"     Sherardia arvemsis Blue Field Madder
f65r           Alchemilla vulgaris Ladies Mantle
f65v           Centaurea cyanus Cornflower
f66v           Satureja montana Winter Savory
f87r           Satureja hortensis Summer Savory
f87v         Senecio Primula vulgaris Primrose
f87v         Kleinia Pedicularis flammea Lousewort Wood Bettony
f89v           Actaea spicata Baneberry
f90r           Conyza bonariensis Fleabane
f90v           Eruca vesicaria Arugula Rocket
f93r           Cynara cardunculus Artichoke
f93v           Lupinus Lupin
f94r         Botrychium lunaria Botrychium lunaria Moonwort Moonfern
f94v           Agrostemma Githago Corncockle Red Campion
f94v           Glycyrrhiza glabra Liquorice
f94v           Plantago lanceolata Ribwort Plantain Kemps
f95r         Berberis Sambucus nigra Elderberry
f95v           Althaea Rosea Hollyhock
f96r           Angelica archangelica Garden Angelica
f96v           Tamus communis Black Bryony

 

Character Analysis

See the section "Bibliography" at the end for the provenance of the various texts used.

  • Voynich. The Herbal folios from the Voynich are processed, generating frequency tables for single characters, dual characters (digraphs), trigraphs and quadgraphs (1/2/3/4). These are also called n-Grams. Frequency tables for word lengths and common words are also generated.
  • Latin. A few books from Augustinus and other places (see Bibliography)
  • A German cookbook from 1553
  • The Book of the Courtier, in English from 1561
  • French from 1367
  • a Latin Herb garden description
  • Spanish from C16

Here is the ranking of the most popular words in each text (full tables are available in the Excel spreadsheet)

The 1/2/3/4 n-Grams are calculated as follows. Suppose the word "sesame" appears in the text. The following counts are made:

  • "s" 2
  • "e" 2
  • "a" 1
  • "m" 1
  • "se" 1
  • "es" 1
  • "sa" 1
  • "am" 1
  • "me" 1
  • "ses" 1
  • "esa" 1
  • "sam" 1
  • "ame" 1
  • "sesa" 1
  • "esam" 1
  • "same" 1

Then, each count is weighted by the length of the group, yielding s=2, e=2, a=1, m=1, se=2, es=2, sa=2, am=2, me=2, ses=3, etc. The counts are added to running sums for each distinct group found, and normalised at the end of processing.

Comparison of most popular "words" Comparison of most popular characters Comparison of most popular 1/2/3/4 character

combinations

Voynich Latin German English French Herbs
8am et vnnd the de et
1oe in ain and et in
1oy non jn of que non
s te das to la si
19 est die in le quæ
oy me darnach that a Et
am sed es a en Hoc
8ay mihi so it les per
89 ut den not ou tibi
K9 enim I soit Hæc
2oe cum mit he & quam
8an quod thú be par tamen
1c9 ad woll is Roy cum
ay deus nim for ne dum
2o qui machen as lour forte
oe a so si sub
oham quam jst have ceo vires
oh9 quia ainem with pour genus
4ok19 aut lasß other nostre est
8ae de darein this dit tum
9 quae wenig they ditz Si
Koe quo der his nul quoque
4oh19 tu nit And est quod
2oy quid oder all soient ut
7am si man them des se
Koy illa von then terre huius
1H9 atque daran you come etiam
ok9 per wie but Item satis
Voynich Latin German English French Herbs
o e n e e i
9 i e t s e
1 a a a n a
a t i o t t
8 u s n r r
c s r h i u
h n d s o s
e m t i a o
y r l r u n
k o h d l m
m c c l d c
2 d m u c l
4 l ú m p p
s p o y m d
7 b g c g v
, q f f f b
K v v w q g
p g b g v f
C f w p z q
n h z b h æ
H x j v y h
g E k k b x
A I p I E P
3 S ß A x H
j Q ý T & S
z y u C R E
( C A L I A
f A N x S I
Voynich Latin German English French Herbs
o e e e e i
9 i n t s e
1 a a n n a
a t r a r r
8 u s r i t
c s i s t u
h r t o o s
e n l i a n
k m d h u o
y o h l l m
1o c c d c c
oe d en u es l
oh l ch c nt p
4 p er th en d
4o er m m d er
m b in y re is
ok is f er m re
8a ti g g p ti
89 nt ú p is v
am en b he g b
1c re o w er ri