Email conversation between Michael Fergusson and Dethe Elza in December 2001

Michael and Dethe are the brains behind Burning Tiger which will have something to show for itself before long. Dethe also shares a blog at livingcode.ca. Michael has daughters instead of a website (fair enough, though I notice that Dethe manages to maintain both children and a blog). Update: Michael does have a web site (again).

The discussion began after Michael forwarded an article to Dethe.

Michael: Obviously, I think he[the author of the forwarded article]'s wrong about XML being (not) a good way to represent semi-structured data, but otherwise an excellent description of our value proposition.


Dethe: Well, I agree with him to a degree. Real-world documents of any sort don't really fit in well with any fixed schema or template. That's one of the reasons I like well-formed XML over valid XML in many regards--you can use the Schema as a guideline, but you can violate it as needed.


Michael: There are lots of real-world documents that would benefit from the constraints a schema could provide. There are examples, too, of where a schema (even a very loose one) would be overkill, but I can't agree with the statement that real-world documents of "any sort" don't fit with a schema.

We use a (very simple and flat) schema to good effect in the construction of the business plan, for instance.


Dethe: Sure, our business plan fits into a very simple schema. But would the same schema work for a non-profit such as a private school, or a worker-owned restaurant? (Just to name some examples where I've actually worked on their business plans.)

I think schemas are great, as guidelines. Rigid enforcement is useful at times, but vastly over-used, IMHO. There are thousands of Americans who had to change their names because the DMV's schema didn't allow accented characters or other non-WASPish naming punctuation. Schemas are an attempt to impose order on an inherently organic process (people's interactions). Resumes are a very structured and organized datatype, but some of the best resumes I've seen broke that structure (one was a poem).


Michael: Presumably, you would start with a base schema for a business plan and subclass the private school business plan, etc. In an important respect (at least in this case) the schema simply codifies what's already true, and makes sure no-one makes stupid mistakes. I'm certainly not arguing for unnecessary constraints, just sufficient constraint -- just as you and I agree to constrain the conversation to the structure of [name] followed by a posting, with the responses following the text they are responding to, in english, etc. Maybe we don't need much more than that, but in your preamble to the message, you were wishing for more structure, a clearer interaction between our postings... in short: a schema, along with a user interface optimized for that schema. Hey, someone should build something like that.


Dethe: I'm not trying to argue that there's no place for schemas or organization, just that rigid one-size-fits-all templates don't fit as many as their creators tend to think.


Michael: Sure. Then we're agreed . Of course the problem with the DMV wasn't a problem with having a schema, it was with a poorly-constructed one. If they had designed their schema more effectively, it could have helpedidentify non-english characters, proper pronounciation, etc. Unicode is a schema, just as is the english language.

Without schema of any sort, communication is impossible. As you loosen the constraints of your schema, the more expressive, yes, but also the more ambiguous your communication becomes. The DMV had unintentionally constrained themselves too far. Why? Maybe they didn't have sufficient tools for the task (no unicode, simple flat-file database, etc.), or maybe (just maybe ;-) they didn't understand the problem sufficiently.


Dethe: Obviously there are benefits to both ways, but not everything can be shoehorned into a schema, and trying too hard to do that creates a) schemas which are overly complex and unweildy, and b) frustration when even that isn't enough.


Michael: Of course. This is why XML doesn't require a dtd/schema for every document.

There's two axes here, right? Granularity x Predictability. It may be overkill to mark up every character in a book, but underkill to only have long strings for every chapter. You may need the predictability of a 2-dimensional template, or the expressiveness of "mixed content" content types. Sure, and there are even more axes than that, but let's not go there.


Dethe: I think of well-formed XML as core to a notion I'm beginning to form, called "Chaos Tools." Chaos tools are the things which help us manage our lives in a free-form, organic, people-oriented world, as opposed to table-driven, math-focused, rigid tools currently populating the world's computers.


Michael: OK, sounds good, and maybe I'm violently agreeing... but we humans spend most of our time classifying and constraining: This is a fax number and not a cell phone; this is a press release and not an airplane manual. The free-form world is made up of primitives that we assemble together to construct more complex systems. The challenge, as I see it, is that these primitives are themselves made up of yet smaller things, and so forth, fractally. You could go to the nth degree of detail, but that's not how humans work. We get to an acceptible level of detail for our purpose, and then approximate the rest. For any given actor in a particular context there is a level of detail that is acceptible, and it can be substantially different given a different actor or even subtle changes in context.


Dethe: Absolutely. I don't want to limit our ability to organize or classify, I want to broaden it. Just because I've tagged this bit of data as "boss' phone number" doesn't mean I don't want to also tag it as "friend's phone number" or even "work fax number". My address book may come with slots for home, work, cell, and fax, but what happens when the new astral phones come out and I need an identifier for that number? I want to be able to classify the same data in different ways, and new data that was never anticipated by the schema.


Michael: So your schema needs to be extensible. This continues to be the main problem with DTDs. On the other hand, if you were exchanging address listings with the others cc'd on this email and you arbitrarily changed the way you describe phone numbers you would potentially be hampering, rather than aiding, communication. The fact that we have agreed on a common set of constraints is itself useful. I agree that you should be able to add "astral phone" without messing up the other entries that we all agreed upon. The right way to do this is with...


Dethe: XML Namespaces help with this, but how they work in *real world* applications is still a bit up in the air. The W3C has punted on really defining Namespaces properly, which has left them in a bit of a mess.


Michael: Ahhh, namespaces. The W3C should be forced to wear pink pants and purple shoes for a year as punishment for that mess. It is, in concept, a good approach to the problem, though.


Michael: So to bring this back to documents and schemas: Having the detail there is not a problem if we give the actor the ability to "blur the edges" where necessary and not get overwhelmed with complexity, while being tolerant of the errors of approximation likely to result. The computers that are running in the companies that employ the authors we're talking about want finer granularity and greater predicability. I think we can give authors greater ability express themselves, by making the tools they work with less general. I futher think that the application of these constraints help the "machine readability" of the content they create at the same time. None of this specifically recommends any particular means of constraint, but schemas would be one useful tool.


Dethe: Yes.


Michael: What do you think? This is an interesting topic; should we invite Stewart and Derek into the dicussion?


Dethe: Yes, let's bring them in.