Logging microformats
By Bill de hÓra under Python, Ruby, microformats on 13. December 2005Here’s a slightly elided log line from a system I work with:
WARN 2005-09-28 14:38:38 StandardContext[/pmr] {”mid”:”123″, “type”:”XYZ”, “msg”:”foo”, “T”:”2005-09-28 14:38″, “L”:”warn”}
It identifies an XML document passing through the system using some keys from the envelope. It may seem a strange looking way to log an event, but between the angle brackets lies a useful side effect - it’s a Python dictionary. Hence this (vastly oversimplified) example log processing code, that runs eval() on each log line to load the dictionaries into memory:
Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on win32 Type “help”, “copyright”, “credits” or “license” for more information. >>> dicts =[] >>> import re >>> p = re.compile(”.*({.*?}).*”) >>> file = open(”log.txt”).readlines() >>> for line in file: … m = p.match(line) … if m: … d = eval(m.group(1)) … dicts.append(d) … >>> for d in dicts: … print d … {’L': ‘warn’, ‘mid’: ‘123′, ‘type’: ‘XYZ’, ‘T’: ‘2005-09-28 14:38′, ‘msg’: ‘frobnizer found’} {’L': ‘warn’, ‘mid’: ‘124′, ‘type’: ‘XYZ’, ‘T’: ‘2005-09-28 14:38′, ‘msg’: ‘frobnizer found’} {’L': ‘warn’, ‘mid’: ‘125′, ‘type’: ‘XYZ’, ‘T’: ‘2005-09-28 14:38′, ‘msg’: ‘frobnizer found’} {’L': ‘warn’, ‘mid’: ‘126′, ‘type’: ‘XYZ’, ‘T’: ‘2005-09-28 14:38′, ‘msg’: ‘frobnizer found’} >>>
What’s nice about doing this is that it’s cheap to write the log line in a way that maximises usefulness later on - logs are often an afterthought, but good logging is critical to operating production systems. And it’s one example where creating metadata is cheap.
As well as programming data structures (changing the above to a Ruby hash is straightforward), you can get log lines to output something like Turtle, which is a greppable RDF format. What’s good about Turtle in particular is that you can dump or stream a log file straight into an RDF query engine, such as ARQ without any transform step and start writing queries against the data. Or how about writing down inferencing rules that help indicate the business consequences of a software error? The log lines themselves don’t carry enough information, but combined with rules and some context as to where the log event came from you can do interesting and useful things - the biggest issue with any monitoring software is gathering up all those machine level events and contextualising them so that data (noise) can be turned it into information (signal). Think of a machine readable data stucture inside logfiles as just another microformat.
l.m.orchard:
Of course… if you’re going to
eval
log lines, make sure no end-user is able to drop a payload of their own in there. But other than that, neat trick!comment at 14. December 2005
Bill de hOra:
Les, oh sure; v2 will have a ’strict’ mode ;)
comment at 14. December 2005
Paul:
I was going to post the same thing — q.v. August’s AWStats hole, which involved Perl code injected into log lines (via referrer) to be subsequently ingested, and inadvertently executed, by an eval() call in AWStats.
comment at 15. December 2005
Jochem:
Why not directly log to XML? And when the file is closed, to writeout the closing root element. Using (c)ElementTree or any other XML parser to parse the loglines later.
comment at 20. December 2005
Bill de hÓra:
“Why not directly log to XML? “
Jochem: grep.
comment at 20. December 2005
Wade Leftwich:
How about JSON for an eval’able microformat that’s a bit safer? I’ve been doing something similar to the O.P. for a while, saving a web request as str(dict) (and storing it in a db text field). Recently switched over to reading & writing it with json-py.
comment at 20. December 2005
Mike:
Good to see someone else doing this… serialized dicts have been my logging format for years! It’s just so easy to add lines in your code like:
Python’s serializing code ensures that I’ll only get one record per line, regardless of what binary garbage is in the ‘data’. And processing existing logs is as easy as grep or:
Granted, there are security problems if you eval someone else’s data, but for logs you generate, this is as easy & powerful as it gets.
comment at 20. December 2005
Maciej:
wrap logging into function:
and use it like this:
comment at 22. December 2005
Masklinn:
Because it doesn’t have any advantage over using dicts, is completely unreadable to most human beings, is extremely verbose, and requires specific packages/programs to be parsed?
comment at 20. February 2006
The mass amateurisation of everything, in code at Suttree, Elixir for Immortal Baboon:
[…] Turning log files into JSON formatted code to be eval’ed by a Python script (how insecure is that!). Again, do it in PHP and you be laughed at, deservedly. […]
pingback at 07. March 2006