Note: the title should make sense by the end of the post…
Kevin Dangoor’s recent announcement of
TurboGears has resulted in a dramatic
increase of interest in Kid. Kid was first
announced on November 30, 2004 as a Pythonic XML-based
Templating Language. I remember thinking I was going to do a series
of articles on why I wanted this specific combination of features in a
library. I never did and Kid progressed into its current form, growing a
small community along the way.
Although it’s now used primarily for HTML templating, that wasn’t the
initial goal of the project. What I really wanted to do was to
illustrate a different way of thinking about “Web Services”.
Not what you were expecting, eh?
But it’s true - Kid was supposed to be a simple device that I would use
to start a narrative exploring a variety of topics related to building
distributed systems atop the web (i.e. “Web Services”). I had decided
that the direction being taken by the industry mainstream was incorrect
and that web services would languish until they became more like the
existing, working, proven, web.
There was a lot of talk about “Web Services Infrastructure” and
framework and that talk continues today. The assumption by nearly
everyone was (and still is) that web services would require a whole new
set of tooling and paradigm. SOAP/WS was still very much RPC oriented
and so the focus was on language bindings, typing, discovery, and the
like. Web Services programming had almost no resemblance to existing web
programming.
This has only accelerated in the time since I was heavily involved. Now
it seems the industry is calling for some mysteriously unspecified SOA
toolkit, ESB, or some other insignificant combination of alphanumerics
to save Enterprise IT.
I was in the middle of the situation a year ago or so and it troubled me
deeply. Long days and nights at work were spent combining technologies
like SOAP, WSDL, WS-Security, and BPML with massive Java
infrastructure. I ate it all up and knew it like the back of my hand. It
is not impossible for someone dedicating 8-12 hours a day to
understanding this stuff to love it. I loved it. It’s all very
intriguing for someone with my particular constitution.
But so is masturbation.
I’m not sure what happened. I guess I had what alcoholics refer to as “a
moment of clarity”. I distinctly remember going over to grab a co-worker
for lunch. He was on the phone and so I was talking to some guy in an
adjacent cube who did “services work”. The services people were
developers but were considered a different department than product
development. They dealt with specific customer needs and were tasked
with using our platform to craft actual real solutions for actual real
business problems. Their world is much different from product
development, whose main customers were marketing and the executive team.
Anyway, I noticed a very large stack of printed material on this guys’
desk, entitled “Introduction to Web Services”. Development had collected
a set of introductory materials that were to be distributed to all of
the services people in the field and, I assume, eventually to external
developers using our platform. I had a small part in compiling these
materials and had reviewed it at various stages in electronic form.
The hard copy put me on my ass. (I say that metaphorically but if it
were to hit me with any significant velocity it very well could have
literally put me on my ass.) I picked up the tomb (an “Introduction”,
mind), slapped it on my co-worker’s desk, who was now off the phone, and
whispered, “this isn’t going to work.”
The services guy doesn’t love this stuff - at all. I’m sure he was
completely capable of digesting all of it had he the time and interest
of, say, someone in my position, but he doesn’t. He has a customer that
wants a bunch of systems to talk to each other and they all have very
large and amazingly different framework and infrastructure. More
framework and infrastructure is the last thing he needs. The services
guy was a wake-up call.
At the time this was happening I had been nursing a long-time interest
in true web architecture. This didn’t have anything to do with work - a
very long time ago I threw together a little download tool thingy that
was capable of resuming failed downloads (with 14.4 baud modems this was
a big deal and none of the browsers had native support for
resuming). This required that I read bits of RFC 2616 and
implement a basic HTTP client. I remember being amazed at some of the
capabilities of HTTP because I had assumed that it was mostly a simple
file transfer protocol (closer to FTP than, say, CORBA). The spec ended
up staying with me and I continued to explore different ways people were
using the web and HTTP to do new and exciting things.
At some point, I became convinced that existing, basic web architecture
solved many of the problems we were seeing with Web Services
adoption. The problem wasn’t that WS was incapable of solving the
technology issues (it’s quite adequate), the problem was that it was
incapable of solving the social issues. It far exceeded the threshold of
acceptable complexity for true adoption by this large community of
people whose primary goal is to solve business problems.
The web, on the other hand, was invented to solve the same basic
integration problems businesses are experiencing now. Tim Berners-Lee’s
dilemma is not so different than our own: a bunch of systems with a
bunch of closed data formats and processing capabilities that should be
universally accessible. Berners-Lee realized early on that solving this
problem would require, above all else, a virus: something that was so
simple and lightweight that it would be hard not to adopt. And that’s
what the web is: a virus. Just like C and UNIX, Windows, Visual Basic,
and a slew of other technologies that kept, as a primary requirement,
the ability for real people to solve real problems.
I shifted my thinking drastically and tried to imagine how the
integration problems we were seeing would be solved using existing web
architecture, which we know to have the traits necessary for mass
adoption. I should mention here that by “web architecture”, I mean W3C
Tag Web Architecture but also all of the tools and techniques that
have evolved above and below it. The servers, proxies, template
languages, mod_rewrite, sessions, cookies, load balancing, liberal feed
parsers, virtual hosts, MVC, MultiViews/content negotiation, monitoring
systems, automated testing tools, dynamic languages, view source, and
on and on. All of these little tricks and techniques add up to an
extremely powerful and yet fairly simple and understandable toolkit for
building distributed systems. What’s more is that there are an unmatched
number of people who understand how to build these systems using all
variations of platform and language.
Which brings us to Kid as “Web Services Infrastructure”. The concept is
simple: for whatever reason, template languages (PHP, ASP, JSP, CFML,
ERB, Cheetah, Tapestry, Velocity, etc.) have become a fundamental tool
for web development and their usefulness is in no way limited to
presentational data (HTML). Templates are simple, templates are
cool. You throw some junk in there and look at the result. If the result
isn’t right, you tweak your template until it does look right. There’s
no layers of magic to get in your way. When something doesn’t come out
right, you change the template. There’s no “management” or “container”
involved.
There’s no rule that says templates must only be used to generate
HTML. Indeed, many of the RSS and Atom feeds in the wild are generated
from some form of template. They are never
automatically-generated-behind-the-scenes using language bindings and
are very rarely generated using some kind of DOM/SAX API.
<rant>
RSS is the most successful web services data format in existence (after
HTML, of course ;). Successful web services in the future are likely to
resemble it. Is there a use-case for RSS or Atom in WS-* land? There are
thousands of pages of spec text and no one could throw together a simple
use-case for the most successful web service in existence? That’s
irresponsible.
</rant>
The point I’m trying to make is that template based web services are a
reality and that we should be thinking about making incremental
improvements to the general model to facilitate more machine-centric
data formats instead of creating whole new paradigms.
There are a variety of really important issues with using templates for
general purpose web services programming, most having to do with the
first part of Postel’s Law:
“Be conservative in what you do; be liberal in what you accept from
others.”
The problem with using templates to produce XML (including XHTML) is
that it is exceedingly hard to be conservative in what you do. Most
template engines are text based, making it easy to miss well-formedness
errors. There are also a range of character encoding issues that
template languages could ease but often simply ignore and sometimes make
worse.
Kid is a simple attempt at building features that aid in
conservativeness into the template engine. I actually considered
tag-lining it The Ultra-Conservative Template Engine as a play on Mark
Pilgrim’s Ultra-Liberal Feed Parser whose name comes
from the second part of Postel’s law. This is, of course, the whole
point of being conservative in the first place: so that we don’t need an
Ultra-Liberal parser for each variation of “web service”.
I think some of these features are compelling and would like to see them
pursued in other tools that are in common use for web development. For
instance, one of the most important and least talked about features of
XML is that it provides a reasonable system for encoding the entire
range of unicode code points in any character set, including ASCII. A
template engine with a basic understanding of XML could process
templates authored in utf-8, interpolate data encoded in ISO-8859-1, and
output in 7-bit ASCII — if that’s what Postel demanded. Kid does that.
One of the most important features of XSLT is that well-formed templates
are guaranteed to yield well-formed output (with some well understood
exceptions). If the template runs, you know it will provide a basic
level of conservativeness. Kid does that. We’ve wontfix
‘d feature
requests because they would break this contract.
Most template languages require you to explicitly encode data that may
contain reserved characters. Kid takes the opposite approach and assumes
that all content is textual and should be encoded unless you explicitly
state that something is XML (in which case it must be well-formed).
There’s also some interesting features around serializing the resulting
XML infoset with different variations. For example, you can author
templates in XHTML 1.0 and output in HTML 4.01. The output
serializer takes care of all the little quirks for things like empty
elements, non-escaping of SCRIPT
and STYLE
content, boolean
attributes, etc. The result is a clean authoring environment and an
ultra conservative output format. I bring this up in the context of web
services infrastructure only to show that the ability to filter template
output can be very useful when dealing with different types of user
agents.
All this to say that if you’re looking for “Web Services Infrastructure”
for exposing processes and information, you’re probably looking to
hard. If you have a database, templating, and a web server, you have
most of the infrastructure and framework required to begin exposing
information from each of your systems in a proven and established way.
What you want to be on the lookout for are small and specific
enhancements to these existing pieces that allow you to interact with
other machines in a more predictable manner or in new and different
ways.
The next time someone is selling you infrastructure for Web Services, or
SOA, or ESB, or whatever they’ll call it next, make sure you ask “Why?”
After they tell you, make sure you understand, agree, and have the
problems they propose to solve. If not, ask again. New framework and
infrastructure is extremely expensive in more ways than one: you have to
ramp people on it and then deploy, manage, monitor, and support
it. Sometimes new infrastructure is unavoidable but when it overlaps a
large portion of your existing infrastructure, you should make sure it’s
bringing back a significant return.