Here’s a slightly elided log line from a system I work with:
WARN 2005-09-28 14:38:38 StandardContext[/pmr] {”mid”:”123″, “type”:”XYZ”, “msg”:”foo”, “T”:”2005-09-28 14:38″, “L”:”warn”}
It identifies an XML document passing through the system using some keys from the envelope. It may seem a strange looking way to log an event, but between the angle brackets lies a useful side effect - it’s a Python dictionary. Hence this (vastly oversimplified) example log processing code, that runs eval() on each log line to load the dictionaries into memory:
Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on win32
Type “help”, “copyright”, “credits” or “license” for more information.
>>> dicts =[]
>>> import re
>>> p = re.compile(”.*({.*?}).*”)
>>> file = open(”log.txt”).readlines()
>>> for line in file:
… m = p.match(line)
… if m:
… d = eval(m.group(1))
… dicts.append(d)
…
>>> for d in dicts:
… print d
…
{’L': ‘warn’, ‘mid’: ‘123′, ‘type’: ‘XYZ’, ‘T’: ‘2005-09-28 14:38′, ‘msg’: ‘frobnizer found’}
{’L': ‘warn’, ‘mid’: ‘124′, ‘type’: ‘XYZ’, ‘T’: ‘2005-09-28 14:38′, ‘msg’: ‘frobnizer found’}
{’L': ‘warn’, ‘mid’: ‘125′, ‘type’: ‘XYZ’, ‘T’: ‘2005-09-28 14:38′, ‘msg’: ‘frobnizer found’}
{’L': ‘warn’, ‘mid’: ‘126′, ‘type’: ‘XYZ’, ‘T’: ‘2005-09-28 14:38′, ‘msg’: ‘frobnizer found’}
>>>
What’s nice about doing this is that it’s cheap to write the log line in a way that maximises usefulness later on - logs are often an afterthought, but good logging is critical to operating production systems. And it’s one example where creating metadata is cheap.
As well as programming data structures (changing the above to a Ruby hash is straightforward), you can get log lines to output something like Turtle, which is a greppable RDF format. What’s good about Turtle in particular is that you can dump or stream a log file straight into an RDF query engine, such as ARQ without any transform step and start writing queries against the data. Or how about writing down inferencing rules that help indicate the business consequences of a software error? The log lines themselves don’t carry enough information, but combined with rules and some context as to where the log event came from you can do interesting and useful things - the biggest issue with any monitoring software is gathering up all those machine level events and contextualising them so that data (noise) can be turned it into information (signal). Think of a machine readable data stucture inside logfiles as just another microformat.