For many years, computational linguists have studied the statistical behavior of language -- the distribution of the number of letters in words, the distribution of the number of words in sentences, etc., in English-language texts. Similarly, cryptanalysts have long been interested in the distribution of the different letters as they occur in English-language text, in order to aid them in decoding substitution ciphers. Claude Shannon and others, in developing the science of information theory during the 1940s, attempted to find out to what extent the properties of English-language text can be modeled by a random process in which the choice of a letter depends upon which letters immediately precede it.
Eckler, A. Ross
"Letter-Distributions of Words,"
Word Ways: Vol. 7
, Article 5.
Available at: http://digitalcommons.butler.edu/wordways/vol7/iss4/5