Thursday, August 16, 2007

Six Degrees of Brian Sletten

Anyone who has ever sent me a LinkedIn invitation knows that I turn my nose up at centralized social-networking blivets (military usage) like that. The Internet is bigger than any single site and there are very few endgames for these companies that doesn't involve them selling this information. While I don't imagine it is yet a viable replacement, I prefer something like FOAFas it is decentralized, based on open standards and makes the networks available to you (something LinkedIn is only now considering).

I do believe that within the next few years there will be real pressure to assert yourself into a social network in order to get a job, get hired as a babysitter off of Craig's List or go out on a date. It is simply too easy to start to spider and find connections.... we know the same people, we went to the same school, we are interested in the same topics, etc.

While Dave and I have worked on some cool tools to passively harvest RDF while browsing (using NetKernel as a proxy and, after grabbing pages, asynchronously running them through a GRDDL filter), I still considered FOAF a niche technology.

I needed an application for my next DevX article which is going to be on Mulgara. While I was originally going to demonstrate using ZeroConf to advertise RESTful services and store RDF metadata about said services (an immensely better approach than lame technologies like U Don't Dare Invest (UDDI)), I decided that might be a little too adventurous for a 2,000 word article.

So, I decided to write a FOAF explorer. I start with my FOAF file and then do a query for everyone I know:

alias <> as foaf;
select $foafurl from <rmi://localhost/server1#foaf> where $someone <foaf:knows> $someoneelse and $someoneelse <> $foafurl;

For every person that comes back, I load their FOAF file into my Mulgara instance:

load <> into <rmi://localhost/server1#foaf>;

I keep track of everyone I've processed and repeatedly query Mulgara until I run out of data.

Now, admittedly, I don't have very many people in my FOAF file at the moment: the management team at Zepheira (and seƱor Uche, who knows everyone, hasn't even updated his file yet!), some mavens/connectors like Danny, some folks I've worked with in the past, a few NFJS speakers and some Mulgara collaborators. Also, my approach didn't use threads and was pretty simple but stupid, so there is plenty of room for performance improvements.

Despite these caveats, I am astonished how many results I got back. The same spidering session is still going. What's interesting is that I've presently hit a LiveJournal trough that I don't expect to break out of to non-LiveJournal sites. As soon as I get the transitive friendship closure of the LiveJournal folks that got me in there, I expect it to stop.

It's pretty cool that there is that much FOAF out there though and that sites like LiveJournal have committed to folding it into their system so that every user. As it prints out the FOAF files it is spidering, I've enjoyed copying the links and seeing the random people I am connected to.

I was going to wait until it finished to report how many people I was ultimately connected to, but I'm not sure where this is going to finish. (Update: Oooh! Just got some non-LiveJournal files!)

This exercise has also inspired me to make some progress on my goal to create some good tools to lower the bar to FOAF usage. I am going to leverage the PURLS work that we are doing for the OCLC. This will allow us to create permanent, resolvable names for ourselves that transcend where we currently hang our hats*. This will allow the networks to be more resilient. As many links as I am finding, there have been a ton of broken links (presumably people who have moved on) that would have enriched the result set even further!

This is going to be fun!

*This work is being built on NetKernel too!