If we want to detect and exploit patterns in data, we should use the best tool for the job. That tool is the human perceptual system. In this section, I attempt to demonstrate the contributions our perceptual systems can and do make to pattern recognition, to demonstrate a research methodology that can be used to develop better pattern recognition systems, and try to demonstrate that the essential pattern recognizer within such a system will often be a human being.
Here is a string presented so as to circumvent some of our innate and acquired pattern recognition abilities.

There are regularities in this pattern, but they are not obvious because the (1) building blocks are unfamiliar, because (2) most of us are not attuned to these stimuli and (3) because the visual presentation is not congruent with the regularities in the building blocks.
Here is the same string in a more familiar font: “peas porridge hot peas porridge cold peas porridge in the pot nine days old some like it hot some like it cold some like it in the pot nine days old.”
Suddenly (and effortlessly) we recognize the fundamental units (english characters), map them on to a representational framework with which we are familiar (english words and english sounds), and by the time we get to the end of the sentence we perceive additional levels of structure—rhyme, and prosody. Read the sentence one more time. That’s what pattern detection should feel like.
Because this data pattern has known structure, it’s a good testbed for our methodology. Our goal is to extract a comparably salient visual pattern from this experience withoutthe benefit of domain-specific inside information.
Let’s start at the end. Here is a naively “graspable” visual pattern whose spatial structure is isomorphic to (at least some aspects of) the structure of the ditty. It has a shape which can be perceived even at a glance, and it has a symmetry that makes the structure compelling and revealing[1]. Let’s see how such a visualization might be developed without inside information.
![]() |
![]() |
||
1) We used a computational method that identifies, without prior knowledge, an appropriate level of analysis.
![]() |
Nevill-Manning’s Sequitur algorithm doesn’t know about words as natural units, but it does a good job of detecting unique sequences of symbols. As illustrated here, the algorithm decomposes a sequence of characters into building blocks from which the larger pattern is built. (Thus, “pease porridge,” “hot,” “cold,” “in the pot nine days old,” and “some like it” emerge as separate clumps from which the larger pattern can be assembled.
2) We then sought a heuristic which would provide a visual rendering of the structure and would make the patterning of the information as obvious in the visual domain as it is in the linguistic/acoustic domain (for which it was originally designing).

2a. Consider a dendrogram motif
commonly used in biological and computational studies . This visualization motif is a good example
of bad organic information
design. Intuitive opportunities are
squandered through the use of
“objective” but distracting perpendiculars and by tolerance of arbitrary
and unnecessary line-crossings.
2b. Replacing perpendiculars with straight lines reveals that there are three double repetitions and two triple repetitions.
![]() |
2c. Avoiding line crossings makes this clearer still:
![]() |
2d. But the integrity of the whole form is considerably clearer if we allow the whole structure to achieve a symmetry. (I did this manually, but a tension-minimizing algorithm in 2D or 3-D could have done it computationally, by modeling the connectors as springs, as per the pseudo animation at left.)
2e. And finally, the structure is rotated so that its greatest symmetry is along the vertical axis.

It’s important to note that the structure thus revealed does not exploit our domain-specific knowledge of prosody, semantics, or linguistics. We only used our domain-specific knowledge intuitions to ensure that we had a “solvable” stimulus, and that we’d recognize success when we saw it. This approach—develop our techniques in knowledge domains where we can both be confident that we will recognize success when we find it, and be confident that we will not confuse “absence of pattern” with “failure to detect pattern”—is an important part of our methodology.
Returning to the example, it is also important to acknowledge that the visualization here is still very primitive. One could (and would) go farther, iteratively developing methods and testing their efficacy. Here is a next iteration:
![]() |
Recent work by Pourang Irani illustrates that there is genuine gold in these hills. Based on empirical psychological research which shows that redundancy increases recognition, I believe less simple, more organic forms would be even more effective than these openGL-style images. But our point now is simply to illustrate the direction, viability, and potentialities of the methodology proposed here.
![]() |
Having illustrated the first two elements of our proposed methodology:
I want to argue (1) that this is not just a lucky break, but an appropriate exploitation of a natural and appropriate visual motif (the leaf) and (2) that there are vast amounts of computer-accessible data which will yield to such methods and (3) that sharpening our tools on these troves of already-accessible, already-valuable data will let us determine, when we subsequently apply our tools to data of unknown richness, whether data mining successes and failures reflect the richness or poverty of the data vs the frailty of our instruments.
While many of the patterns we seek will not have the “crystalline” structure of this example., there is reason to believe that 2- and 3- dimensional leaf and branch patterns will play to our best suit.
When distinguishing one tree species of tree from another, we (easily and naturally) pick up on (1) leaf patterns which often do have a “crystalline” structure on (2) branching patterns whose `statistical properties are less regular, but nonetheless diagnostic and often visually apparent, (3) colors,(4) inter-tree spacing patterns, (5) trunk textures (6) wood grains, (etc.).
If comparable structural characteristics are present in the “foliation” of many information structures, it may well be possible to perceptually recognize patterns of digital interaction and communication by mapping them to visual space.
Web sites have this character[i]. Whether because of the authoring software used or because of the webmaster’s design aesthetic, most web sites are comprised of HTML pages that…
(1) partake of only a few, often crystalline, site-specific page patterns. )

In contrast to the standard representation at left , we should explore equally valid “organic” visualizations of document structure such as these.
(2) link to other pages on the site with a characteristic fan-out
(3) occupy a particular location in semantic vector-space that could be used to drive the encoding of coloration, or branch dilation.
(4) use a characteristic page layout and color scheme
(5) etc.
![]() |
If we do our job well, we will have created a “macroscope” that will literally help us see the forest as well as the trees, and allows us to grab the fruit we seek via a massively parallel pre-conscious perceptual search process.
[1] Mach (1886/1959, p. 129-130) has commented on our effortless perceptual ability to detect bilateral symmetry. He identified three types of symmetry perception for amorphic shapes (rotational, bilateral and translational) and he pointed out that bilateral symmetry is the strongest of the three. He also showed that it is influenced by the orientation of the axis of symmetry and that bilateral symmetry is most easily perceived when the axis of symmetry is vertical. http://www.ensc.sfu.ca/people/grad/brassard/personal/THESIS/node45.html
(An interesting question, which we need not answer, is whether our sensitivity to bilateral symmetry is an accidental by-product of our own bilateral symmetry or whether it is an evolutionary adaptation to the fact that vertebrates bodies (and faces) are also bilaterally symmetrical.)
[i] Sentences have a related character of course.
ASK MR. LANGUAGE
PERSON
Q. Please explain how to diagram a
sentence.
A. First spread the sentence out on a clean, flat surface, such as an ironing board. Then, using a sharp pencil or X-Acto knife, locate the "predicate," which indicates where the action has taken place and is usually located directly behind the gills. For example, in the sentence: "LaMont never would of bit a forest ranger," the action probably took place in a forest. Thus your diagram would be shaped like a little tree with branches sticking out of it to indicate the locations of the various particles of speech, such as your gerunds, proverbs, adjutants, etc.
— Dave Barry
Yes, we know. "Would of bit" is an unacceptable spelling of "would have bitten," but Mr. Language Person is not very bright and to change his spelling would be just plain sic.
[ii] This was the promising approach taken by a student in van Wettering’s lab in Delft, as presented at InfoVis2001 last month
Botanical Visualization of Huge Hierarchies
Ernst Kleiberg_ , Huub van de Wetering † , Jarke J. van Wijk ‡
Department of Mathematics and Computer Science
Eindhoven University of Technology