lesscode.org


Verbal Communication  

By Alex Bunardzic under Rails, Theory on 26. October 2005

I’ve started writing a blurb for lesscode.org on some of the fundamental axioms of web information processing, but my pen took me down the rambling path and I’ve ended up with a longish article on my hands. So instead of clogging the lesscode.org’s bandwidth, I’ve posted it on my blog.

If you’re interested in examining certain long-standing challenges related to the web computing, and how Ruby and Rails approach the solution, you may find some meat in there.

P.S. I’ll be blunt and admit right away that I’m slamming the role of RDBMS in the web architecture, so I’m not really expecting that most people will agree with my analysis. Oh well, c’est la vie…

9 Responses to “Verbal Communication”

  1. hxa:

    How about ‘lesswords’? — Delete the first three sections and get to the point straight away. Forget the circumstantial inferences about storage size. Why things are like that is irrelevant. Just say how they should be.

    The point is an interesting and appropriate one: Data entered as text can mostly be stored unprocessed as text. All type enforcement or relational conversion is redundant and wasted effort.

    But what about dates, addresses, card numbers? Is no treatment of those needed?

    You don’t describe what or how Ruby on Rails does its derivative processing of text data…

    comment at 27. October 2005

  2. Alex Bunardzic:

    hxa wrote:

    How about ‘lesswords’?

    Sorry, I don’t subscribe to that philosophy. To me, ‘less mass’ applies only to technology, not to other forms of communication (like, I wouldn’t go for ‘less music’, ‘less art’, ‘less poetry’, etc.)

    *Why* things are like that is irrelevant. Just say how they should be.

    There is genesis to it, and it would be foolish to ignore it, lest we repeat our mistakes.

    But what about dates, addresses, card numbers? Is no *treatment* of those needed?

    Dates etc. must be treated as special cases of text. Human mind treats them that way when parsing text, so human mind’s servant (i.e. software) must do the same. But database is not the right place for that kind of a treatment. Historically (and for the reasons I’ve discussed in my post), we’ve ended up painting ourselves in the corner by marrying certain products (i.e. RDBMSs), but now’s the time for the divorce.

    You don’t describe what or how Ruby on Rails does its derivative processing of text data…

    I’m working on a followup article where I need to introduce the concepts that Ruby uses in order to act as a smart servant that would do the appropriate parsing.

    Stay tuned (and thanks for your very useful comments!)

    comment at 27. October 2005

  3. Alex Bunardzic:

    You don’t describe what or how Ruby on Rails does its derivative processing of text data…

    Some details just posted here.

    comment at 27. October 2005

  4. Aristotle Pagaltzis:

    How about ‘lesswords’?

    I don’t subscribe to that philosophy. To me, ‘less mass’ applies only to technology, not to other forms of communication (like, I wouldn’t go for ‘less music’, ‘less art’, ‘less poetry’, etc.)

    Being wordy and longwinded is no virtue.

    comment at 28. October 2005

  5. Alex Bunardzic:

    Aristotle Pagaltzis:

    Being wordy and longwinded is no virtue.

    Exactly. That’s why all the good books worth reading are not longer than two-three pages.

    comment at 29. October 2005

  6. Bob:

    Misc Nitpicks:

    Binary coded decimal isn’t a particularly compact representation for numbers. It is more admittedly more compact that representing each digit as an ASCII (or EBCDIC) character. But the real motivation for using BCD - right down to the machine level - was in order to simplify the routines needed for the conversions between the computer and human representations. Disk (and tape) storage may have been a scarce resource, but core memory came at a much higher premium.

    The were probably many reasons for the intertia regarding 2 digit years, but one that Alex overlooks is data entry. If dates are always given with 2 digit years, that reduces the motivation for storing them otherwise. It’s odd that Alex misses this, since a possible implication of his comment about dates is that they should be stored in the form they were given by the user.

    Here’s an oldie but a goodie: Alex is getting the strong-vs-weak typing mixed up with static-vs-dynamic typing. Ruby is strongly typed. From the fact that Alex recommeds its use - rather than that of, say, Forth or BCPL - it is evident that he likes using a strongly typed language. Perhaps Alex intends to say that a good language can have various aggregate types (like arrays and hashes) but should only have one basic type: string. This, of course is the case with Perl. Contrast:

    $ perl -e ‘$two = “2″; print $two * $two, “\n”‘
    4

    with:

    $ ruby -e ‘two = “2″; puts two * two’
    -e:1:in `*’: can’t convert String into Integer (TypeError)
    from -e:1

    It also isn’t very accurate to suggest that there’s a necessay fit between dynamically-typed languages and the use of plain text. There is a long-standing tradition in Lisp and Smalltalk of using saving memory images to disk. The emphasis on using plain text wherever possible is a UNIXism.

    The reason for other “dynamic” languages being good with text is that they grew up in a UNIX-influenced world. More specifically, these languages, either directly or via Perl, show the influence of AWK. They are good with text because, whatever else they might be, they have to be “better AWK”. But AWK isn’t the primary language of UNIX: C is. And C is statically typed. AWK was created because of the heavy use of text in UNIX; not the other way around.

    [As an aside, given that Java’s remit was to be “easy C++” it’s not that surprising that it’s not so hot on this score, eg taking 20 lines (or whatever) to open a file and read it line-by-line. But that’s a problem with the libraries, not the language per se.]

    I won’t mount any defence of relational databases: but it makes more sense to me to dislike them for forcing all structure into a tabular straight-jacket than for having types for columns. All I can say is that if you’re using an RDBMS as a fairly dumb persistence engine, it’s probably best to be aware that this is what you are doing.

    Lastly, it is odd to praise XML - given the context - not least because XML is plain text for such small values of “plain”.

    comment at 30. October 2005

  7. Aristotle Pagaltzis:

    Bob:

    Perhaps Alex intends to say that a good language can have various aggregate types (like arrays and hashes) but should only have one basic type: string. This, of course is the case with Perl. Contrast:

    $ perl -e '$two = "2"; print $two * $two, "\n"'
    4

    Actually, Perl has several basic types, and they’re strongly typed.

    $ perl -le'$a = []; print $a->{foo}'
    Can't coerce array into hash at -e line 1.
    $ perl -le'$a = {}; print $a->[0]'
    Not an ARRAY reference at -e line 1.

    There are also a couple more types—but absolutely, positively no way you can turn one of them into another.

    comment at 31. October 2005

  8. Alex Bunardzic:

    Bob wrote:

    The were probably many reasons for the intertia regarding 2 digit years, but one that Alex overlooks is data entry. If dates are always given with 2 digit years, that reduces the motivation for storing them otherwise. It’s odd that Alex misses this, since a possible implication of his comment about dates is that they should be stored in the form they were given by the user.

    That’s a valid point. End-user convenience is a decisive factor, of course, and typing 2-digit years instead of 4-digit is a major issue, no doubt about it.

    However, there is absolutely no reason why we can’t have our cake and eat it too — we can retain the end-user convenience (i.e. ask them to type only 2-digits for year), while prepending the supplied year behind the scene with the appropriate leading digits (there are algorhitms, based on the system clock, that can do that for us). Then the year would be stored properly, with all four digits, and any future confusion would thus be easily avoided.

    I also agree with your critique that I’ve allowed sloppiness to enter my argument when I muddled static-vs-dynamic with strong-vs-weak typing. Excellent points, and thanks for the clarification!

    But my basic argument remains — languages should bend more toward human propensity to view information as text. Despite the fact that underlying machinery only understands 1s and 0s, it is its duty to present a legible face to the human users (in this case, programmers). In that regard, static languages are very unfriendly, actually they are quite rude.

    Dynamic languages fare better when it comes to letting us express ourselves in textual format by lowering the noise and amplifying the signal.

    I won’t mount any defence of relational databases: but it makes more sense to me to dislike them for forcing all structure into a tabular straight-jacket than for having types for columns. All I can say is that if you’re using an RDBMS as a fairly dumb persistence engine, it’s probably best to be aware that this is what you are doing.

    You’re again absolutely right. The long ingrained propensity to offload a lot of legwork to an RDBMS must be held in check.

    The issue with tabular data representation is not necessarily that it is a straight-jacket constraint. What is more problematic is the normalization of the tabular representation. Humans tend to like viewing information in a tabular format, but are totally confused when this tabular format gets broken down into a cluster of tables.

    If we compare OLTP systems with OLAPs, the first thing we’ll notice is that business analysis gravitates toward denormalizing OLTP tables, and ending up with one giant spreadsheet.

    comment at 31. October 2005

  9. Anonymous:

    “Exactly. That’s why all the good books worth reading are not longer than two-three pages.”

    So you’re writing a book here, or trying to convey meaning?

    Like Einstein said : “If you can’t explain something simply, you don’t understand it well.”

    It would clarify your arguments if you cut out the excessive leaky abstractions and didn’t use so many strawmen.

    comment at 03. November 2005