| |
 |
Monday, June 03, 2002 |
|
RSS auto-discovery and meta-linking
I've implemented Mark Pilgrim's final version of the RSS auto-discovery technique, inspired by Matt Griffith's proposal. A small change in Mark's Radio bookmarklet seemed to be needed, from indexOf(application/rss+xml') to indexOf('application/rss+xml'), in order to be able to subscribe in Radio to sites using this technique.
Mark writes:
It has been surprisingly painless and friction-free. Together, we have come up with a new standard that is useful, elegant, forward-thinking, and widely implemented. In 4 days.
Amen!
DJ's meta-link is a great idea too. An excellent way to facilitate the kinds of social network analysis that we're all buzzing about lately.
1:55:03 AM
|
|
 |
Thursday, May 23, 2002 |
|
Postel's dictum applied to HTML in RSS
In the commentary attached to Ben Hammersley's RSS discussion, someone (whose name got truncated) cites Jon Postel's famous dictum in regards to use of HTML in RSS:
Escaped html in the tag is very common and isn't going to disappear any time soon. So let's make it a convention as RSS creators to be sparing and deliberately strip the more aggressive tags when we create it. As consumers, we should be writing code that strips the tags we don't like and not complain too much when people throw incomplete tables at us. Be lenient with what you consume, be pedantic and accurate with what you create.
I took a run at this problem last week on a plane trip. For a long time now, I've intended to rip out the MS DHTML edit control in Radio, and pop in Ektron's slick eWebEditPro, which I wrote a column about last year. It should be a straightforward swap that would make Radio's WYSIWYG editor able to emit either clean HTML or XHTML, rather than the hideous stuff that MS control spews out now.
Given the amount of writing I do here, I'd be quite willing to pay something for a add-in that would enable me to honor Postel's dictum -- that is, "to be conservative in what you send."
For some reason, though, I went round in circles. Couldn't seem to find the right combination of UserTalk, HTML, and JavaScript to get that Ektron control working in place of the default WYSIWYG control. I'll take another run at it one of these days, but I just thought I'd report the idea in case somebody else wants to go there first.
9:08:52 PM
|
|
|
What is an RSS description?
Ben Hammersley has taken note of a flock of RSS-related rumblings. Here's some commentary on his commentary on my rumblings.
<fullitem> a sub-element of <item>
<fullitem> does for RSS0.9x what mod_content does for RSS1.0. It allows the entire item text to be included in the feed, including entity-encoded HTML markup.
This, to my mind, is where one possible weakness comes in: allowing formatting HTML markup, such as FONT tags, within an RSS feed does allow feed providers to royally mess up aggregating sites. A misplaced *lt;tr> might break much layout, as would a missized font, and so on. My suggestion is to include an attibute that points to a suggested stylesheet:
This fits nicely with the push toward XHTML in ordinary webpages, and seems more elegant. To me at least.
I guess I'd say that <fullitem> does for RSS0.9x what <description> does for RSS0.9x, in a situation where (as is true for my primary feed now), <description> is truncated to less than the entire item.
Sending escaped HTML markup can cause all sorts of trouble, for sure. It's really embarrassing to break other peoples' aggregators with a bum feed, as I've discovered for myself. It'd be wonderful if XHTML writing tools were common. But they're not, and until/unless they become so, I guess the safeguard is the immunological system of blogspace, which quickly punishes offenders.
<blurb> a sub-element of <item>
The blurb element contains a precis of the item, halfway in size between a description and a full text. I would suspect that this might only be used for very long pieces, where the full text is much too full, and the description too high-level.
As I'm using it (in my secondary feed), <blurb> is the truncated item which, in my primary feed, is called <description>.
This is goofy, of course. I did it in part to see if the sky would fall if I added extra tags into my non-modularized feed. (It didn't.)
I'm aiming to offer choice. Rather than always truncating (which disappoints people who like to read whole items in aggregators), or always sending whole items (which disappoints people who like to scan and decide whether to click through), I offer both styles. At the moment, I do this in parallel feeds:
primary feed: <description> = truncated, <fullitem> = nontruncated
secondary feed: <blurb> = truncated, <description> = nontruncated
I'm sure nobody is using either <blurb> or <fullitem>. Personally, I'd rather to combine these functions like so:
<p class="lead">The lead...</p> <p>The rest...</p>
In principle, the algorithm used to truncate (for me: first paragraph; for Eric Snowdeal, the first 500 characters) could be applied within a single instance of the item, without the duplication I've introduced. In practice, I doubt such intra-item coding would work reliably.
I expect that current practice -- either truncating items or not -- will continue. A few people (like me) may bother to offer a choice, in the form of parallel versions. The overhead is no big deal really, XSLT happily transforms one into the other. While aggregators could offer users the choice, within a single feed, of long or short variants of that overloaded thing we call <description>, I doubt this will matter to enough people to get off the ground.
12:30:20 AM
|
|
 |
Monday, May 20, 2002 |
|
Basic and advanced RSS
Sam continues to provide an excellent perspective on moving RSS forward:
I really would like to see the day where compatibility to the spec was as important as "works with the current version of the software provided by the spec author"
... as a contributor to a SOAP toolkit, I don't want SOAP to be defined in this way.
Meanwhile, here is the perspective of an author of another aggregator. [Sam Ruby]
Thanks Sam, especially for that pointer to Aggie.
In Radio, the path of least resistance at this moment is to clone the RSS writer and add in experimental tags. The better way (in my view) is to replace the RSS writer with a modularized writer that compartmentalizes experimentation according to the RSS 1.0 spec which was designed for that purpose. A Radio hacker is free to do either or both of those things. I've done the first, and plan to do the second unless (hopefully) somebody else gets there first.
In either case, it seems to me, the gating factor is whether and how the UI exposes custom tag creation to the user, a la the Categories feature in Radio now, and whether and how the UI enables the user of the aggregator to work with extended metadata.
Coming back to the lawyers-and-asbestos angle, it's worth noting that the ability to solve the problem already exists. The .92 spec supports categories; categories can be created in the UI; categories can be routed as complete XML feeds. Although I completely agree with the points being made about extensibility and namespaces, I don't think these issues are preventing lawyers from today organizing their communication around topical feeds. Rather, I think what's stopping them is Sam's observation about the general lack of appreciation for the most basic uses of RSS.
There's a mom-and-apple-pie aspect to metadata collection. Everybody's for it -- at least insofar as they imagine a neatly-tagged semantic web made available for them to use. When they realize it's their job to do all that neat tagging, though, enthusiasm (rightly) wanes. If we had lots of people using software that made metadata collection seem natural and effortless, I don't think it'd be hard to sort out issues of extensibility and namespace collisions. Those would be good problems to have, in other words.
11:11:12 AM
|
|
 |
Sunday, May 19, 2002 |
|
Extending RSS: first things first
On the always-thorny question of how to extend RSS, I guess I'm for a first-things-first aproach.
Rory Perry:
If the WV court has an XML feed for recent opinions (which we do), the lawyer in New Orleans could subscribe to that feed and watch for orders and opinions regarding asbestos mass litigation.
Sam Ruby:
If we want Internet-scale standards (whereby the likes of Rory Perry can create discipline-specific extensions), we need to get to the point where everybody has equal opportunity to create modules.
I am not religious about this stuff. I see no problem with Rory and his legal pals agreeing on some tags (like <asbestos>) which they'll use by mutual consent. The immediate bottleneck is getting software into their hands that enables one user to pop such a tag into a feed, and another user to discriminate based on the tag. And then getting them to the point where they can actually experience that.
For a while now I've been sending out RSS channels with things like <pubDate>, <blurb>, and <fullitem>. Nobody's complained, so apparently it's not breaking any existing aggregator. Thanks to the new ability, in Radio, to replace the RSS writer, I can -- and indeed will -- replace my feed with RSS 1.0, using Dublin Core metadata and possibly defining a module to account for the variant elements (long vs short description) in my feed.
The phrase "Internet-scale" always worries me a bit, though. Until and unless the likes of Rory and friends can start bootstrapping the process of creating and consuming customized feeds, there's no scaling issue to worry about. If and when namespace collisions start to become a problem, then people will be in a position to see the value in modularized readers and writers. At which point, it shouldn't be hard to transition to them.
But do people need modularized readers and writers to even get to first base? Or do they just need to find out what it feels like to hit a few singles, using the simplest tools? Given Sam's (and my) surprise that even the most basic use of RSS is still relatively new to many bloggers, I'm inclined to take things one step at a time.
11:41:45 PM
|
|
 |
Friday, May 10, 2002 |
|
Daylight ahead for RSS writers and readers
Jenny writes:
What I do see as the bigger issue is that this provides a path for further news aggregator development in Radio. I'm not knowledgeable enough to know what it means for RSS news aggregation in general, but I believe quite strongly that some form of aggregation will become part of our everyday information lives in the future, so I welcome any and all roads that lead to that day. [The Shifted Librarian]
I believe the same. It's important to note that Radio's aggregator is one way forward, but not the only way. I just grabbed a copy of AmphetaDesk and -- omigosh -- it's written in Perl! I had no idea! It ships as a compiled executable for Windows, Mac, and Linux, but the source is all Perl and is available. The app runs Radio-style, shoveling script-written pages into a local webserver for browser consumption.
This presents a bit of a dilemma for me. I've made my peace with UserTalk, but I'm much faster and more competent in Perl. So it's tempting to do aggregator experimentation in AmphetaDesk. But then, it can't be so easily shared with the Radio community, many of whom will (rightly) prefer not to download and install an extra kit. But, this is a good problem to have: a choice between two viable options.
In any case, the point is that you're right, Jenny. All sorts of different groups will have reasons to tweak both the production and the consumption of RSS. For both writers and readers of RSS, there's daylight ahead.
9:46:51 AM
|
|
 |
Thursday, May 09, 2002 |
|
Radio's RSS writer is now user-extensible
The RSS writer in Radio is now officially user-extensible. "Before generating the RSS, we check user.radio.callbacks.writeRssFile," Dave writes today. Excellent. This will open the floodgates for all sorts of useful metadata experimentation. We'll see Radio UserLand sites emitting RSS 1.0, and others extending RSS .9x. It's not the format that matters to me, it's the experimentation.
In that vein, I've heard from a few folks who are working the other side of the street, looking for ways to enhance the aggregators that read RSS channels. I think this is fertile ground for innovation. Personal aggregators are still quite new, and we have a lot to learn about how we want to use them.
12:06:33 PM
|
|
 |
Wednesday, May 08, 2002 |
|
Two flavors of RSS channel
Jenny's poll reminded me that RSS truncation shouldn't be an either/or choice. So I'm experimenting with some extra tags in my RSS feed.
The basic feed continues to send truncated descriptions. It adds a [fullitem] tag that has the complete text of the item. Of course no newsreaders use this yet. But I want to make sure that sending this extra tag won't cause problems. In Radio it seems not to, but I want to see if AmphetaDesk, NewsIsFree, Meerkat, and others are OK with it. If not, please let me know!
I've also added a [pubtime] tag, just to see if I can. As Sam Ruby keeps pointing out in another context, extra information shouldn't be a problem so long as the required core elements are present.
There's also a long-format feed which is just an XSLT transform of the basic feed. In that version, [description] is the complete item, and [blurb] is the truncated description. So now there's choice. You can subscribe to either the basic feed with short descriptions, or the long-format feed with long ones.
The next step would be to experiment with the news aggregator. Given a feed containing both short and long items, it ought to be possible to let the user toggle between them, perhaps even on a per-channel basis.
8:46:07 AM
|
|
© Copyright 2002 Jon Udell.
|
|