Qokeey and qokeedy in the stars section
Rene Zandbergen has pointed out that the word qokeey is
very common on some pages of the stars section (ff 103r-116r) and
rare on others. I believe the pattern discovered by him can be
summarised thus:
-
If a paragraph contains the word qokeey, there is a 38% chance that
the next paragraph will contain the word qokeey.
-
If a paragraph contains the word qokeey, there is a 40% chance that
qokeey occurs more than once.
-
If the current word is qokeey, there is a 6% chance that the next
word will be qokeey.
High or low frequency of qokeey appears to be a feature of
pages and sheets within the quire. In the following table, pages in
the same row are on the same side of the same sheet of vellum. The
string of numerals after each page number represents the paragraphs
on that page and the occurrences of qokeey in each one (i.e.
the first paragraph of f 103r has one occurrence of qokeey,
the second has one, the third none and the final paragraph has four
occurrences.
| f103r |
1100110004012211344 |
f116v |
(not relevant) |
| f103v |
40010012121000 |
f116r |
0020010020 |
| f104r |
0000000000001 |
f115v |
0100000000000 |
| f104v |
0000020010011 |
f115r |
0000000010000 |
| f105r |
210000000000 |
f114v |
000000000000 |
| f105v |
0000000000 |
f114r |
00000000000000 |
| f106r |
000000100000000 |
f113v |
010000001000001 |
| f106v |
000000001001200 |
f113r |
00100100000000011 |
| f107r |
001000000000000 |
f112v |
1002101000000 |
| f107v |
010200001000134 |
f112r |
12501101000001 |
| f108r |
0200100100010301 |
f111v |
1111000000010001210 |
| f108v |
01032010030233120 |
f111r |
21210222120012111 |
| A |
Number of paragraphs: 331 |
| B |
Number containing qokeey: 99 |
| C |
Number containing qokeey more than once: 37 (as percentage of B:
37) |
| D |
Number containing qokeey.qokeey: 10 |
| E |
Cases where previous paragraph contained qokeey: 44 (as percentage of
B: 44) |
| F |
Occurrences of qokeey: 156 |
| G |
Occurrences of qokeey.qokeey: 10 (as percentage of F:
6) |
The word qokeedy conforms to a similar but not identical
distribution.
| f103r |
0000000200002210210
|
| f103v |
11101100000000 |
f116r |
0011000010 |
| f104r |
1100000000001 |
f115v |
1000013100000 |
| f104v |
2010000000000 |
f115r |
0110000010001 |
| f105r |
110000001000 |
f114v |
100000000000 |
| f105v |
1000000000 |
f114r |
00100001000000 |
| f106r |
000000210000001 |
f113v |
010100000110000 |
| f106v |
000000000100001 |
f113r |
00000100000000000 |
| f107r |
011000000000010 |
f112v |
2022021002100 |
| f107v |
000000001010001 |
f112r |
11010001000000 |
| f108r |
1100010101011225 |
f111v |
0100000000000000100 |
| f108v |
12130001323521220 |
f111r |
01021310121011010 |
| A |
Number of paragraphs: 331 |
| B |
Number containing qokeedy: 97 |
| C |
Number containing qokeedy more than once: 27 (as percentage of B:
27) |
| D |
Number containing qokeedy.qokeedy: 9 |
| E |
Cases where previous paragraph contained qokeedy: 41 (as percentage of
B: 42) |
| F |
Occurrences of qokeedy: 135 |
| G |
Occurrences of qokeedy.qokeedy: 10 (as percentage of F:
7) |
One possible explanation of this is that qokeey and qokeedy
are names or reflect the occurrence of names in the underlying text.
Internal structure of the 'words'
Some 90 percent of the words in the B section can be generated from
the regular expression
[dklprst]{0,1}[oa]{0,1}[lr]{0,1}[fkpt]{0,1}[SC]{0,1}[eE]{0,1}[dFKPT]{0,1}[ao]{0,1}[mnlM]{0,1}y{0,1}
subject to certain restrictions:
- q may only be followed by o or e
- the sequences pe and fe are forbidden
- the sequences le and re are forbidden
- the sequence rk is forbidden
- the sequences em en el em en er are forbidden
- only aiin ain al am an ol y (and very occasionally d) may be final
The transcription used here is a modified version of EVA, with
S for sh, C for ch, E for ee, F for cfh, K for ckh, P for cph,
V for cvh, m for iin, n for in and M for m.
It is almost as if the Voynich language only contained words whose letters
are in alphabetical order.
The pattern emerges very clearly if consecutive words of the manuscript
are printed in vertical columns with gaps inserted to indicate the
null occurrence of a character. I have generated a vertical version of
the first 25 lines of f 103r
(the beginning of the stars section):
the program was instructed to ignore word divisions but has mostly
restored them from the regular expression).
The letter m (the one which resembles the numeral '8' with a tail) is almost always the last letter
in a line of text. There are several possible explanations of this.
- m is a variant of another cipher letter. am
looks much the same as al and has much the same distribution as ar al aiin ain
(particularly the last two, which are never initial in a word).
- But ar al aiin ain also occur
finally in a line, and on the same page as am. It therefore seems not to be a variant
of these four finals, but another letter in the same group.
- m is an abbreviation used where space is tight at the end of a line.
am has a similar distribution to ali (common at the end of a line) and might be taken as an abbreviation of it.
- But it occurs at the
end of a line on f. 81r 19, where there is no illustration in the margin and ample
space for the line to continue without abbreviation.
- m is a distinct individual letter which may appear anywhere in the plain text but is forced to the end of a line by the encipherment.
-
If the letters of the Voynich ms are encipherments of individual letters of a plaintext
in the Roman alphabet, it is not possible that they have been enciphered in the
original order. I speculate that m is an enciphering of a rare plaintext letter such as z
and that the encipherment called for this letter to be entered into the ciphertext last
of all.
Is it an anagram cipher?
Here is a transformation of plaintext into ciphertext which
explains certain features of the Voynich "language".
-
Divide a plaintext into lines
-
Sort the words of each line into alphabetical order
-
Sort the letters of each word into alphabetical order
-
one thing led to another thing last night
-
another last led night one to thing thing
-
aehnort alst del ghint eno ot ghint ghint
The result has some of the statistical properties of the
Voynich text.
-
The frequency distribution of words and letters is the same
as in the natural language plaintext, but the distribution of
two-letter groups and two-word groups is significantly altered.
-
Words at the beginning of a ciphertext line tend to start with
letters at the beginning of the alphabet. Compare the high
frequency of Voynich "d" at the beginning of a line.
-
If a letter near the end of the alphabet has a tendency to
be word-initial in the plaintext (e.g. German "w"), it will
have a strong tendency to be the last word in a line. Compare the
high frequency of Voynich "m" at the end of a line.
-
The ciphertext versions of frequent words will tend to cluster
together in a line. That is, where a word such as "thing" occurs
twice in the plaintext line (as in the above example) the two
word sequence "ighnt ighnt" will occur, but "ighnt" may also occur
elsewhere in the line as an anagram of "night".
-
A one-letter word of ciphertext can only be an anagram of
a single word of plaintext ("a" can only be an anagram of "a")
and a two-letter word of ciphertext can only be an anagram of two
possible words of plaintext ("et" can only be an anagram of "et"
and "te"). This means that you cannot have a ciphertext line of
the pattern "... i ... i ... " or of the pattern "... et ... et ...
et ...". This principle largely holds good in the Voynich text:
there are only six exceptions in the corpus of Currier's language
B.
Obviously there are difficulties with the idea.
-
Voynichese words do not conform to a strict alphabetical ordering
of letters (there are quite a lot of words of the pattern dshedy).
-
Voynichese words have a strong tendency to contain only one
instance of a given letter, unlike any obvious candidate language for
the plaintext.
-
The enciphering described is not unambiguously reversible (however
I think it would work as a private aide-memoire, or as a means of
establishing priority like Galileo's well known anagram announcing his
discovery of the phases of Venus).
Here is an extract from a well known English novel modified in the way
I have described:
adn as cddeeirt for efnortu adforrw i em my now aprt dehpsu amsw asw
adn adn bmoott by cdlou dopr eefl i egls elt my no efnot deit dinw
abel almost adn btu dfnou egno i i eglnor no egglrstu ot asw ehnw
aabdet adn by dehpt chmu my eflmsy morst eht hist eimt asw hiintw
a beefor cdeiiltvy got i i eilm aenr allms os ahtt eht ot adeklw asw
abotu accklo 'ccdejnortu eghit eeginnv i in ehors eht eht asw chhiw