Compound Zope Documents

Overview

 At Cue Media Integration, we've started developing a simple compound document system in Zope. This is based on common experience among recent projects, and a desire to bring some Component Architecture (being developed in Zope 3) functionality into Zope 2 to satisfy our needs.

 This document looks at the annoyances that led to this development, some possible solutions, and the development path of the SCS Framework, and the SWS Product built on that framework.

 This is a living document, being grown from an outline managed by Radio Userland. It will be updated often as the product gets fleshed out over the coming weeks.

The Annoyance

 I've done a lot in the way of Zope based content/document management solutions over the past couple of years.

 The traditional HTML way of all-these-little-parts (html files, images, etc) is a tough solution for many content management situations, in my view.

 It makes it hard to

 Do workflow - when publishing/reviewing a document with images, like a white paper with diagrams, it's annoying to have to treat each diagram as a separate object.

 Move content - If that white paper was moved, the images should move with it. And smartly.

Possible solutions

 "Just use a folder!". Ugh. No. I think this is only natural to HTML developers. A folder is a metaphor, and it's the wrong metaphor for compound content. It might be the right basic technology to use underneath (just as Mac OS X can represent complex objects, like Applications, as a single object while underneath the covers it's just a file system directory), but it's just the wrong metaphor.

 Create a compound document framework. Ahhh, here we go.

 Why? The default ways of doing this type of stuff in Zope and it's CMF have just bothered me for a while. The current (very young) implementation is already yielding results.

S[CW]S 0.9

 Work has started, and gotten a little ways along on a simple compound solution to solve a couple of problems for our little company. One has to do with an upcoming bid we hope to land, the other to do with getting fast-executing jobs.

 SCS takes nods from OpenDoc, based on Parts. The architecture borrows a lot from that, from work done in the past, and from the Zope 3 architecture (which is very Service and Interface (as in software contracts) oriented).

 Document has a

 root part (container), which can then have

 other parts

 and other container parts with can naturally have

 other parts

 On some recent websites we've delivered for small shops, we had a Document class which was actually a container (ObjectManager) with text content that mixed in a BinaryHolder class, with lots of methods supporting listing/adding/deleting files and pictures. It also mixed in some LinkHandler functionality, which kept links to other documents in the site and to outside resources. As a result, we have a big fat class. It did its job, however. But, hard to adapt to other customers needs. We'll call this old basic solution.

 The new solution for similar sites uses a simple compound document class. For the base class, not much work was needed. For a usable one, a little more's gone into it (for reasons to be explained later). But now, instead of a big base class that has extra knowledge about Images and other potential parts, the compound document framework is doing some of the lifting there. We'll call this new basic solution.

 A quick comparison of the two:

 Old Basic Solution

 Heavy document class with lots of mixed in knowledge of custom data types.

 Three major classes mixed in to a base class, with added functionality coming from a ZClass child. Of the three classes, one dealt with the Text handling (which did use componentized text handlers), one dealt with Image and File handling, and one dealt with Link Handling.

 Pros

 Worked like a champ.

 It was still a pretty clean design.

 Cons

 Inflexible towards other potential customer needs.

 New Basic Solution

 Lightweight document class.

 Medium-weight partcontainer class. The document has a root part (gotten by looking for a PartRegistry and asking for a part of type 'root/default'). The default partcontainer class is actually just a renderable collection - it visits its children and asks them to render themselves to a stream (usually a StringIO instance).

 Text and Image parts (so far). The Image part is basically a subclass of the Zope OFS.Image.Image class, implementing the IPart interface to fit in with the compound system. The TextPart is based off of the old solutions actual Document class (sans mix-ins), and uses the same little TextHandler components (small components that parse text into cachable HTML and other data).

 Pros

 Adding Image support this time took 1/4th the time of the previous solution, since the IPartContainer interface and default implementation do such a good job at locating parts of different types, and since the PartRegistry makes adding an image easy:

 root.addPart(id='', type='image/*', title=title, file=image)

 (the ImagePartFactory, in this case, automatically generates an ID based off of the filename)

 Other objects should be just as easy.

 Cons

 Current solution, to be expediant, still hard codes some specialized knowledge into the document class. This is mostly to do with User Interface issues, as a more general Editor/Renderer mechanism hasn't been made yet. We don't need it. Yet.

 Thus, while being a bit more flexible (adding new Part types is pretty trivial), it's still a rigid system at its current version. But it's still young. As needs for more flexible solutions present themselves, I'll get back to this.

Design and implementation notes

 Services and Registries

 Working on the CMF and some CMF related consulting work finally convinced me how to do service based aspect oriented programming in Zope.

 Formulator and the CMF gave insight on how to do some file system services (namely Registries, usually collections of factories or special handlers).

 Combining the two of them has yielded the current SCS Services architecture.

 Registries

 There are two registries right now, the PartsRegistry and the HandlerRegistry.

 The Parts Registry is used to find and instantiate new Part objects.

 Factory = PartRegistry.get(type)

 part = PartFactory(...)

 The HandlerRegistry is used to get text handlers - small utility components that can turn incoming text into semi-structured data.

 Handler = HandlerRegistry.get('html')

 result = Handler.handle(content)

 Both are served as Singleton objects and are non persistent.

 The base SCSDocument class, the root of all things compound, is responsible for looking up the part registry. The default implementation just returns the global one.

 Near-future documents (ie, a CMF Compound Document built on SCS) could return a CMF Tool (The CMF's name for global Service objects) that is persistent and configurable through the web. That tool could fall back on the default Global registry if a part lookup fails.

 Interestingly, this seems to be a similar pattern to how Zope 3 development is going, with a concept of "global" and "local" services.

 Other services

 Other primary services used by the core framework include:

 A cataloging service (a common catalog that could know how to traverse a compound documents structure to build a single row of indexable information).

 An associations tool (currently unused, but expected to go into service soon) and unique id generator (for objects that wish to use it).

 All of these are persistent with no global service behind them. They are also very little used at the moment, with the exception of the catalog.

 The list of these services may grow.

 I do have some worry about the SCS catalog and CMF catalog being buddies. There may be some interesting work involved in joining their respective interfaces. However, the SCS catalog implementation is fairly simple right now and none of the core SCS functionality depends upon it, it only provides some helpful mix-ins to deal with indexing. Those mix-ins may be moot once SCS is moved over to an event channel architecture.

 Clearly defined Interfaces show clear object responsibilities.

 IDocument

 IDocument has the responsibility of finding global services on behalf of its parts. Why? Document is a well known root. Different document implementations could cause different behavior for their parts by doing different implementations of their service lookup.

 Applying Design Patterns

 Some central methods in SCS are IPartContainer.listParts() and IPart.renderToStream() / IPart.render().

 The method signature for listParts had been slowly growing to accomodate more and more part subpart query types. This was unsettling, because I was seeing the need to add querying elsewhere, particularly in the renderToStream() calls, to allow only rendering of text parts, for instance.

 I noticed the need for the 'Command' design pattern [GoF 233], wherein a request is encapsulated into a single object, and that object could also execute itself. A PartFilter pulls in common queries (major and minor mode, for example) into a single parameter, and implements a 'match' method (as well as Python's __call__ interface). The match method is passed in an IPart instance, which is matched against the items in the PartFilter query.

 The improvements this adds should be obvious:

 The default listParts() implementation shouldn't have to change to handle more complex queries.

 Clients of listParts() can define and pass in their own IPartFilter objects, or callable objects that accept a single IPart parameter.

 The current implementation also supports the old usage of the listParts method, which is fine enough for simple uses:

 The following will yield the same results:

 root.listParts(major='text', minor='plain')

 root.listParts(match=lambda part: part.Major() == 'text' and part.Minor() == 'plain')

 plainTextFilter = PartFilter(major='text', minor='plain')
root.listParts(match=plainTextFilter)


 Rendering and Querying

 One of the primary goals of SCS was that documents can be rendered with complex parts without intimate knowledge of those parts. So far, I'm achieving mixed results now that an actual implementation is being built on top of SCS.

 There's a dual rendering strategy - render and renderToStream. Here's (roughly) how they work:

 'render' is the public method. For example, to render a document, you could render its root part: 'doc.root.render()' You could also render a part individually using this method.

 When 'doc.root.render()' is called, it creates a stream (in the current implementation, a python StringIO object). Then it calls 'renderToStream(stream)' on itself.

 'renderToStream' is the internal method. For renderable containers, what they do is:

 for part in self.listParts():

 part.renderToStream(stream)

 All a text part might do when it recieves the message is:

 stream.write(self.render())

 Then, the root parts render() method gets the value of the stream, closes it, and returns the value.

 In quick summary, when something (like a template) wishes to render a part, it calls that parts 'render' method. If that part is a container, it can open a stream (like a Python StringIO object), and tell its sub parts to render to that stream. If those sub-parts have sub-parts, they can do the same thing.

 'render' can be simple, like return self.Content(), or it can call into something on the outside to render itself. Template based composites, for example, might do this.

Glossary

 Part

 An editable and/or renderable component. Parts are the primary force in SCS. They are registered and found by their major/minor type ('text/plain', 'image/jpg', 'container/renderable'). Parts can handle multiple major and minor types, or be dedicated to just one. Also signified by their interface, IPart.

 Part Container

 A part container does just what its name says - contains other parts. It's the equivilent of the ObjectManager in Zope, and the core implementation is in fact based on that class. A Part Container is responsible for creating new parts in itself, querying part instances contained within, and defining some rules (like what parts are allowed to be added to that container). Also signified by the interface IPartContainer

 A component may implement both the IPart and IPartContainer interfaces. In order to be really usable, this is the default behavior for containers.

 Compound Document

 Or just Document. The Document is a part container container. Its responsibilities are to locate certain services on behalf of its contained parts, and be a well-known root.

 The Document contains a single IPartContainer+IPart instance as its root. All other parts that make up a document go inside that root. This keeps the Document's interface (IDocument) relatively clean, allowing implementations of that interface to implement interfaces and responsibilities typical to a specific application of framework like Zope's CMF.

 Service

 Services are components that provide behaviour and data on behalf of components. A catalog is a service. An associations tool that tracks relations between objects is a service. An events tool that can pass events from one object to another is a service.

 Typically, services exist so that a component may get richer functionality from the world around it without extra details being added to the component itself. Using an associations service, for example, could allow rich relationships to be formed between objects that would otherwise have no knowledge of each other. That relationship adds to the application by enriching its objectspace in ways that an application developer gluing components together can do without requiring component modifications.