Accentuating the Positive in Metadata and Folksonomies

I just wrote a very short, introductory piece on folksonomies for Capulet’s newsletter (which, if you’re so inclined, you can sign up for here). You can read the full spiel in the newsletter or refer to something more authoritative in Wikipedia. Here’s an excerpt of what I wrote:

Originally coined by Thomas Vander Wal, folksonomies are a blending of the terms ‘folk’ and ‘taxonomy’. Where a taxonomy is a rigid, top-down organizational structure, a folksonomy is an improvised, bottom-up approach to classification.

I was recently thinking Flickr tags, a very common example of a folksonomy. I recently read an interview with Flickr’s Stewart Butterfield, in which he describes the attraction of tags for the average user: [more]

The complaint that [tagging is] uncontrolled and it’s not going to be captured in a consistent way to me is really irrelevant. Because tags are first and foremost for people to organize their own photos–and if they weren’t, it wouldn’t work. It’s a happy accident that the whole global collection emerges. And let’s say it’s only 50 percent accurate and complete and let’s say right now we have 10,000 photos tagged “Italy;” it might actually be 20,000 photos that should have been tagged “Italy,” but who cares? No one is going to look at all 10,000 photos, let alone 20,000 photos. And in six months, it will be 50,000 photos instead of 100,000 photos.

Then I thought about the five-star rating system in iTunes, which is a similar kind of metadata. I don’t share it (but I no doubt could), and others can’t modify it, but it’s useful to me in segregating great songs from good ones, and good ones from lousy ones. In my largish iTunes song library, I sorted my songs by rating and paged through them.

That’s when it occurred to me: my metadata skews toward the positive. Check out this chart, which shows the ratings for songs in my library (I’ve only actually assigned a rating to a tenth of my collection):

As you can see, I’ve rated nearly twice as many songs as above-average than below-average. Under normal conditions, shouldn’t those numbers be roughly even?

The same is true of my Flickr photos. When I upload an unusual or interesting image, I tag the hell out of it. If it’s something ordinary, my number of tags (and effort spent thinking about them) decreases.

To return to Stewart’s example, say 2000 people each upload 10 photos of Italy. They each put more effort into tagging their best one. We’re therefore likelier to see the best 2000 photo, and disregard the rest.

What’s the conclusion? Maybe folksonomies and metadata are self-filtering. If everyone spends more time and effort describing good things than bad ones, will we end up consuming fewer bad things? Is there anything wrong with that?


  1. I agree! The unspoken peer pressure in sharing systems like Flickr (and blogs that allow comments) will increasingly mean that only the best (self-perceived) gets commented. There will still be 99% crud out there, but it will be invisible, the Dark Matter of the internet.

    The comparison in your iTunes case is not to the 200 1- and 2-star songs, but to the 90% that were not worth rating.

    Good post.

  2. Very nice way to think about the potential quality of an item — the effort spent tagging as represented by the number of tags.

    Though, as spam hits, it’s clear that there is an upper limit of tags beyond which we can assume that the link is not useful.

    I do think that this might be a way of finding more authoritative users in an effort to filter for quality as well.

    Very thought-provoking. Thanks.

  3. I’ve pondered the folksonomy system myself. I’m some-what of a perfectionist, so it’s hard for me to commit to tags in such a way. I tend to want to have finite keywords to choose from. To that, I think that one should be able to choose from available tags in a relational view. Flickr gives you the ability to choose from your own tags, which pretty cool.

Comments are closed.

%d bloggers like this: