lesscode.org


Code Snippets and Systems of Ends  

By Ryan Tomayko under Theory on 30. August 2005

This post is a bit all-over-the-place, sorry.

I’ve stumbled across Code Snippets at least 5 times in the past couple of days. It’s basically del.icio.us for small pieces of code. Each snippet gets a title, description, actual code snippet, a set of tags, and most importantly a URL. The result of this situation is obviously great search engine indexability because, as I said, I’ve happened upon it at least 5 times now through basic Google usage.

What’s interesting is that, as far as I can tell, they’ve placed NO limitations on what type of snippets can be posted. There’s a quick bash two-liner for Automatically adding a bunch of stuff to CVS next to a Generic XHTML Template next to a Python snippet for generating midi tones on a Series 60 cell.

Considering this from more abstract level, you might call this a demonstration of a few bits of theory laid out by Doc Searls and David Weinberger in World of Ends. Here we have two ends (Google and Code Snippets) that work well together due to a common level of understanding of what’s desirable in the larger system they both operate in. The value of each end seems to increase with each new end that it touches. I think this basically follows Metcalfe’s Law, which states that “the value of a network equals approximately the square of the number of users of the system (n2).” Only “users” in this context can mean “other systems”. In this case, Code Snippets enhances Google and Google enhances Code Snippets. You might also say that Code Snippets gets more value from Google than Google gets from Code Snippets and that the actual value each obtain is close to that predicted by Metcalfe (or not).

Anyway, the enhanced searchability this style of organization facilitates got me thinking about the quality of metadata at del.icio.us proper… Did you know that Joshua disallows access to / for all robots? and that Google is not spidering the amazing set of collobrative metadata available there? Having the Googlebot run through del.icio.us on a regular basis would be insanely expensive for del.icio.us - Technorati did it for about two weeks and then were cut off if I remember correctly.

But why shouldn’t Google/Yahoo/whoever purchase the bandwidth and other resources necessary to run their spiders over del.icio.us? If you did pay, spidering probably wouldn’t be the best method of getting at the meat on del.icio.us. If I were a search engine, I would try to convince Joshua to lease me a basically persistent stream from this RSS 1.0 Feed. You should be able to put something together that improved the quality of your search dramatically with the quality of data in that stream.

Finally, it’s interesting to note that “/rss” is the only resource robots are allowed to access:

http://del.icio.us/robots.txt:

User-agent: *
Disallow: /
Allow: /rss

If del.icio.us were so inclined, I think it’s reasonable to believe that they could start picking up revenue by leasing high quality access to that URL.

10 Responses to “Code Snippets and Systems of Ends”

  1. Small Company CTO » Code Snippets:

    […] Full Post […]

    pingback at 30. August 2005

  2. Peter Hoven:

    Thanks for the great link. I immediately sent it to all my developers. Making a code library so easily searchable is great.

    And the more use it gains the more useful it becomes. A great example of a Web 2.0 app.

    comment at 30. August 2005

  3. Ian Bicking:

    I think trying to lease del.icio.us is probably a good way to keep it from being indexed for a long while — leasing involves negotiation and monetary transactions, and those are slow. Maybe it’s just as fast as the architectural/technical issues of indexing it for free. But it’s still a slow process, relatively.

    But anyway, isn’t access to /rss good enough? That contains 100% of the information del.icio.us carries; everything else is UI that would be distracting to search engines, not helpful. I suppose it would allow Google to point to http://del.icio.us links — but is that really useful? The whole point is to point to the real page, and while del.icio.us links can have a little commentary, it isn’t enough to make it a destination.

    comment at 30. August 2005

  4. Danno:

    Maybe the delicious pages themselves aren’t important, but its hard to deny the usefulness of millions of handpicked links sorted with human verified meta-data.

    The Big G (or others), could probablly find space for that sort of knowledge in their ranking algorithms.

    comment at 30. August 2005

  5. Ryan Tomayko:

    But anyway, isn’t access to /rss good enough? That contains 100% of the information del.icio.us carries; everything else is UI that would be distracting to search engines

    Right. The only issue is that del.icio.us/rss moves so fast. You would need a 10-15 second poll on it to ensure you’re getting everything. Maybe you don’t need everything or maybe something like mnot’s feed history IETF memo could help but even then, del.icio.us might not be able to allow this level of access without incurring significant cost.

    comment at 30. August 2005

  6. BillSaysThis:

    (Self-interest alert) How about if you had a site similar to delicious but with a full text (not tag only) search engine built in? A few other features as well but this would be one way to describe RawSugar a just out of stealth company where I work. By this time tomorrow we will have surfaced a delicious import function so you can compare for yourself the relative experience.

    comment at 30. August 2005

  7. Aristotle Pagaltzis:

    Danno:

    its hard to deny the usefulness of millions of handpicked links sorted with human verified meta-data.

    True, but it’s also hard to deny that del.icio.us has zero spam protection built in. I believe that is the biggest issue in this Google + del.icio.us discussion, and one I’d consider a real stumbler: if Google were to try to derive value from del.icio.us, it is hard to conceive that the spammers would take longer than you need to say “potato” before they’d be all over it.

    In that sense I would actually prefer that Google stay away from it.

    Code Snippets (awesome link, btw, thanks Ryan) is very different, because it has actual content that stands on its own. I think Ian hit on the right spot in saying that del.icio.us is, itself, not a destination.

    comment at 31. August 2005

  8. Ryan Tomayko:

    Aristotle said:

    it’s also hard to deny that del.icio.us has zero spam protection built in.

    All complex ecosystems have parasites. (Audio)

    comment at 31. August 2005

  9. Aristotle Pagaltzis:

    Sure, and the parasites survive as long as the ecosystem can fend them off sufficiently to sustain itself and the parasites. If it can’t, the parasites overpower it, wring it dry, and kill both the system and ultimately themselves. I don’t a lot in the concept of del.icio.us that would allow the former scenario, and I’d rather not see the latter happen.

    comment at 31. August 2005

  10. Alec Reed:

    Regarding search engine spidering and tag-based organization, I recently learned an important mathematics lesson. Two weeks after I started up my weblog, I had used about twelve tags on about that many separate posts. The blogging application (it’s custom; not sure if I’m going to release it) has a prominent tag cloud page and then allows you to browse tag intersections, adding tags or removing tags via automatically generated links. You can also access an RSS 1.0 or 2.0 feed for any intersection.

    Thing is, it doesn’t stop you from browsing and continuing to browse tag intersections that don’t match any content (after all, there might be some content there later, right?). It just keeps giving you new intersection urls and new feeds.

    The math lesson I learned was … well, I’m no good with permutations and combinations and factorials, but the lesson was that you can intersect twelve tags in a whole lot of ways. I guess since my blogging software doesn’t enforce any ordering of the tags when it generates the intersection url, you could theoretically have:

    12! + 12!/1! + 12!/2! + 12!/3! + ... + 12!/11!
    

    Something like that. Anyway, it’s a lot. And Google started crawling them all, plus the RSS feeds. (Actually, not all of the permutations were accessible because a given tag has to be related to one in the current intersection in order to show up as a choice to add. Still: a lot.) Combine that with the fact that I was caching the RSS feeds even when they were empty, and I had a problem.

    I worked it out by: adding a meta “noindex,nofollow” tag to all the pages with no actual content, blocking Googlebot from all the tag RSS feeds it had already found via robots.txt, deleting the tens of thousands of cache files for the RSS feeds Googlebot had already hit, and turning off future RSS caching. All told, Googlebot crawled almost 200,000 pages using about one gigabyte of bandwidth before it stopped.

    comment at 31. August 2005