Information theorists maintain that typical English-language text is approximately 75 per cent predictable; for example, in a text-reconstruction experiment carried out by Claude Shannon, a subject presented with various incomplete sentences was able to guess the next letter correctly in 79 out of 102 attempts. However, if one attempts to retain the sense of a sentence after weeding out letters according to some predetermined rule, one cannot achieve such an extreme limit; in fact, the removal of even half the text is likely to introduce severe difficulties. Although Shannon cites an experiment in which six subjects restored an average of 93 per cent of the 50-per-cent-deleted FCTSSTRNGRHNFCTN, surely the triteness of the phrase made it more easily recognizable than unfamiliar text would have been. It is the purpose of this article to shed light on the question of just how much can be trimmed from text without losing the meaning.
Eckler, A. Ross
"Compression of English Text,"
Word Ways: Vol. 15
, Article 20.
Available at: http://digitalcommons.butler.edu/wordways/vol15/iss2/20