Tag Central
I’ve already run this idea by a couple people, but I thought I’d write about it so there’s some record that it was my damn idea. If you’ve read someone write about this or actually do it already, let me know in the comments.
Tag, You’re It
Many of you are already familiar with tagging, or folksonomies. Popular services use this categorization/organization paradigm to help users manage their content. For example, there’s del.icio.us for bookmarks, Flickr for pictures, 43 Things for goals, and even Technorati for blog posts such as this one. Some of these services are even beginning to list content from these other services when browsing tagged content.
Reselling
My bright idea is that instead of relying on any of these services showing the relevant data from other sites, you use an independent tag aggregator. That’s where I step in. Exact implementation details aside, I want to provide an interface to browsing tagged data from every site possible. I don’t want to restrict the data to the several already popular sites, but also include results from lesser known services, such as Tagsurf, the tagged message board.
It could provide a common interface, allow for different stylesheets, maybe even have user profiles to retain certain browsing/searching preferences. All these ideas are very free form right now, but I’d like some feedback on what you’d like to see.
What’s the Problem, Bub?
My real hangup about this idea is several fold. They all basically relate to popularity if this does catch on like I expect it would.
First, bandwidth could be a real concern. My site barely uses any now, but at the same time, I don’t have the largest readership (I still haven’t found a good way to measure it, so maybe I’m wrong). Many sites that provide even small sized content (like just text) have issues with this due to popularity. Take a look at DrunkenBlog.
The second issue is that of implementation. Should I use PHP? Maybe Python? What about caching? I don’t know. It also depends on how hard it is to interact with the various APIs that these services provide. There is no universal API for these types of services (which is in itself a shame). XML-RPC, REST, SOAP…. I’ll be knee deep in alphabet soup. I could (and will) take on help should anyone want to help. The problem then becomes how to manage the software. I’ll be honest that I’ve never set up a version control repository. Do I want to make it open source or do I want to try to be greedy and keep the code to myself?
The third issue really troubling me is that of caching. Most of these services are concerned about bandwidth (and processor time) as well, requesting that anyone using their services don’t abuse them. They kindly, and rightly, request that some kind of caching be used. The question becomes how should this be done? I could store a cache file for each tag (up to a certain predetermined number of items), but will that horribly slow the site? Could a MySQL database help with that? I don’t know. Another issue is how much space will it take up. Could I run out of space at my host?
A fourth issue is that of funding. Ideally, I’d add something like Google’s AdWords in a sidebar. However, will all the services I will be tapping with this be cool with that? Will they want some kind of service charge or kickback?
I have lots of questions and no real answers, so please leave any thoughts you have.
Tangentally Related
A recent Robot Co-Op broached the issue of API access to their service. Should they provide facilities for members to manipulate their data externally, I’d like to make a Mac OS X client for it, a la Cocoalicious or Flickr Export for iPhoto. It’ll be good practice.

Technorati already is a taggregator. They are showing data from Flickr, del.icio.us and blogs. Oddiophile also has a taggregator, and numerous people around the web have mentioned it.
http://oddiophile.com/taggregator/index.php?tag=Hawaii
The technical issues behind building something like this are fairly mundane. You’d definitely want to use a DB. You’d have to write some code which would suck in the RSS feed data for each tag. One issue which would arise is how to get a list of all available tags. I don’t think any of the tag-based services will provide a full list through their API. What you could do though is when the first request comes in for a particular tag then you keep a record of that tag’s existence and suck the data right then and there. Then next time someone asks for the tag you will already have the cached data. You could have a process running which would check for updates on each tag periodically and update your local copy. This is how bloglines seems to work for RSS aggregation.
Anyhow, good luck.
Sincerely, Anthony Eden
Thanks for the info Anthony. Oddiophile is along the lines of what I was thinking. It would be simple enough probably to use RSS, but I was worried about the granularity of control and the data presented. Using other methods might make it easier to utilize other data associated with the objects (like author, etc.). That might also make it possible to pull less data in (leave the default number of pulled in objects at 5, and only get more when necessary). That might be trying too hard to optimize for a problem not yet encountered though.
At any rate, thanks for the feedback, it’s extremely helpful.
The taggregator thing I did was laughably easy…Anthony has it basically right. When someone requests a tag that I haven’t seen before, I grab the RSS feed right then and cache it. I’d be happy to share the code with you if you want. It’s in PHP.