| |
 |
Saturday, January 08, 2005 |
| Predicting Tsunamis |
|
Prediction: No word will gain popularity in 2005 as much as the word "tsunami".
We're headed towards a tsunami of tsunamis, so to speak.
|
|
|
|
 |
Saturday, March 06, 2004 |
| Social Software Lock-in |
|
Dare Obasanjo writes a follow-up to Adam Bosworth's what is the platform? post. Adam's point (and Dare agrees) is that we're seeing a shift from software platforms to platforms that are about community access, collaboration, and content. The implication appears to be that web-based service providers such as Amazon, Google, and Yahoo will rise in importance, as the actual software platforms of old (Linux, Mac and Windows) loose their relevancy as they become mere access points to the web.
Dare makes the following interesting point:
The interesting thing about the rise of social software is that this data lock-in is migrating from local machines to various servers on the World Wide Web.
One should not underestimate the consequences of this. Over the last few years, people have complained about the application data lock-in, meaning that they're locked into an application and can't migrate to using another because they can't port their data to the other application's format. In the next few years, people will realize that in terms of lock-in, the web has the potentail for much greater lock-in.
In software lock-in, you own the data bits. You can do whatever you want with them, including porting them to another format. If you can't do it yourself, you can probably find someone who can. In fact, this has already happened for the most successful application: all word processors, for example, come equipped with an import facility that allows them to read and use the competition's data format.
On the other hand, if all your data lives on the web, you're at the mercy of your service providers. If Yahoo goes down tomorrow, all your mail messages that they keep for you on their servers will be lost. Same for your IM contact list, the stock portfolio you track, etc.
Consider this: many people use web services because they're free. It's nice that you can get a 1GB mailbox from Google without paying anything. Some providers (such as MSN), will let you backup some of this data locally (for example, by using IMAP4), but only if you pay them a fee. What happens if providers decide that advertising doesn't make enough money for them, and start charging twice as much for allowing you to take your data elsewhere? Will you still be glad that you made a "deal" with the provider in which you paid it nothing and so it is under no obligations to give you back your data?
|
|
|
|
| Innovation: less important than it's made up to be |
|
If you haven't already, go read Clemens Vasters' excellent article Free as in Freedom. He makes lots of good points, that I won't repeat here.
One point he makes, however, is something I disagree with:
If someone is really interested in stopping them [Microsoft] from legitimately dominating every aspect of the software market (market as in money) in the long run, they need to compete with them on the innovation front.
Innovation means squat. It's the execution that counts. This is why in Microsoft, people are often judged by the number of times they've shipped a product. Whether the actual product proved successful or not matters less (though it still counts, of course). What's important is that you've made it to the target line. Shipping a product is what separates the men from the boys.
|
|
|
|
 |
Monday, March 01, 2004 |
 |
Saturday, February 28, 2004 |
| Rationalize Away, Brother! |
|
I saw this on Green Hat Journal:
Eric Raymond lambasted open-source hackers for their pathetic user-interfaces: "This kind of fecklessness [in UI design] is endemic in open-source land. And it's what's keeping Microsoft in business — because by Goddess, they may write crappy insecure overpriced shoddy software, but on this one issue their half-assed semi-competent best is an order of magnitude better than we usually manage."
One thing we (Microsoft) have on them (Eric Raymond and friends) is that we write software people actually buy. Hell, in their case they find it hard to give it away.
I wonder how does the Mac fit into Eric's rationalization. It can't be the UI...
|
|
|
|
 |
Tuesday, August 19, 2003 |
| Sleepless in Seattle |
|
The next two weeks I'll be paying a visit to the mother ship in Redmond. People who want to meet can drop me a mail to this weblog, or to my alias at Microsoft (ZivC).
UPDATE: (Radio, for unknown reason, has decided I wrote this in August.) Just to make it clear, this will be the weeks of x-mas and sylvester.
|
|
|
|
| A Comment on XML Namespaces and RDF/XML |
|
In response to my previous post, Danny writes:
Looking at your ugliness criticism, it all seems to be directed at the use of XML namespaces rather than RDF per se. It has pretty well been decided that Atom will support these, so it's hardly RDF's fault. Note too that this is a maximal example - practically all the elements used in practice would be those in the Atom namespace, rather than their counterparts in DC etc. Any equivalence would be stated at a schema level. I'm not entirely sure I understand your X:Y:Z namespacing, but it does sound rather like architectural forms, an alternative to XML namespaces that crops up on xml-dev periodically.
Yes, I think the XML Namespaces spec leaves a lot to be desired. It only goes half-way on the road to make it easy for authors to manually create XML documents composed of elements from multiple namespaces. If you have a long document to author, and you there are some elements from differing namespaces you constantly use, your only reasonable option is to declare all namespaces at the top, invent prefixes for each, and then constantly juggle these in your mind (and in your document) as you write the document.
As a designer, when I'm faced with the issue of whether to create my own vocabulary or use a vocabulry made with a mix of several pre-existing ones, Create-My-Own suddenly becomes the simpler option. This is bad. It should have been easier to go with what I already have, but it isn't. Not if the goal is KISS.
This being an XML issue, why am I picking on RDF/XML? Because I can easily create an Atom vocabulry that has everything in its own namespace (nice and simple for users), but I have to abandon that if I'm to go RDF/XML.
Bringing in yet-another-spec (architectural forms) doesn't help -- you see, it's the RDF people who have to tell us if there's a way to keep the syntax simple, and still manage to feed our documents to RDF parsers. I'm just a lowly XML guy.
(To anyone contending that once you have an Atom-RDF/XML template, everything is easy: yes, but Atom should support people whose main business is generating notifications from toasters, and XML is difficult enough for these guys.)
|
|
|
|
 |
Thursday, August 14, 2003 |
| A Useless Comment on Atom 0.2, RDF Style |
|
Some people have suggested that Atom could be made more RDF-friendly; others object. Most helpfully, we now have a suggested RDF version of the Atom 0.2 example. After looking at it, all I can say is how ugly.
It's not that I don't like RDF. I actually do.
RDF/XML, OTOH, ITD [*].
Two things are apparent:
- Syntax-wise, XML namespaces are seriously flawed. In every reasonable language (programming language, but the principle works the same for languages people speak) one has a way of pulling names from multiple namespaces into another namespace. In C++, I can say "using A::B::C; using D::E::F;" and later refer to A::B::C and D::E::F as C and F, respectively. This simplifies handling such names considerably. XML Namespaces doesn't let you do this -- it insists you write fully qualified names every time (except for one default namespace, which obviously is not enough in our case).
- Although RDF by its very nature is meant to weave together different namespaces into a single model, RDF/XML does nothing to alleviate the problem. It doesn't provide a way to locally create a namespace X and then map X::C to A::B::C and X::F to D::E::F. If you wonder why the Atom <id> element had to be replaced with the rdf:about attribute, this is the reason: while both mean the same thing, RDF demands using its own namespace, for no good reason.
Ugly.
BTW -- Looking at the sample feed, I believe there needs to be an rdf:about attribute on <foaf:Person>. Otherwise, if Mark Pilgrim were to write two entries, an RDF parser would not be able to tell they are the same person.
Update: Morten Frederiksen comments:
Re 1: "Fully qualified" would seem to imply that only complete namespace URIs with local names attached would work. That is of course not true, as qnames are used extensively. Also, you don't have to keep the same default namespace throughout a document.
I wrote "fully qualified" without regards to the meaning it has in XML. Sorry. What I meant was that you have to qualify names with their namespaces except for names in the default namespace. The fact that the default can change helps a little, but doesn't solve the problem of allowing you to create a document in which the presence of namespaces is reserved to just the "header" of the document. I don't want to juggle namespace constantly in my documents.
Re 2: Why define an atom-id in the first place? The rdf:about is part of the syntax. In other cases subPropertyOf etc. could be used.
Atom-id is defined because (1) when it was defined nobody considered reusing named from RDF a good thing, and (2) people wanted (and some of us still do) the document structure to be simple. As a result of the above, this means we have our own "union" namespace with all the interesting stuff we think Atom needs.
Re BTW: You should look at the FoaF spec to see that this is also not true. FoaF makes entensive use of owl:InverseFunctionalProperty to be able to identify persons accross mentions.
I must admin OWL is something I have avoided learning for quite some time... Thanks. (BTW -- if it can do that, can it also make an out-of-band association of the element atom:id with the attribute rdf:about?)
Update 2: Samuel writes:
However, in Java (and as I recall, C++, Perl, and Ruby), if you have X::Foo and Y::Foo, and you want to use them both in the Z namespace, you still have to call them X::Foo and Y::Foo regardless of what you use, import, or require.
That's the whole point of namespaces: avoid collisions. With code it's much easier to do because if there's a collision, it breaks and you fix it. But, XML is *not* code. It is data. You have absolutely *no* control over where your data might end up, and so it is imperative that the namespace delimiter stay with it to prevent possible collisions.
I think this is missing the issue:
- Local-name collision happens when the namespaces you want to unify have colliding names. Since what we're trying to create is a new namespace built from elements from known namespaces, we already can tell, at design time, whether their local names will collide or not. In our case, they do not, so this is not an issue.
- Even if it were, it should have been trivial to map names during the unification process. In C++, for example, if you have both A::x and B::x, you can pull both names into a single namespace by typedef-ing one as Ax and the other as Bx.
- Our intention in Atom is to create a core spec that provides all the essentials, as well as some extension mechanism. The core itself is completely "static" in this view -- it identifies the vocabulary we intend to use. Colliding names from other namespaces come by way of extensions, and they can still use XML Namespaces. It's about the core that we're talking about now, and how to make it as simple as we can.
[*] I truly dislike.
|
|
|
|
 |
Saturday, August 09, 2003 |
| The Economics of Application Installation |
|
Sean McGrath writes in ITWorld:
In my mind's eye, I see an installation system based on Unix's chroot concept (for establishing virtual hierarchies for applications) and Unix's symbolic link concept (for managed duplication). I see a world in which every Java application has its own JVM, its own JDK, its own copy of *everything* all in a nice tidy directory - a truly self contained world.
Why not? It would waste a few gigabytes? In the time it has taken you to read this article you have probably been paid the equivalent of many gigabytes of disk space.
The sad reality is that as CPUs are getting faster, main memory and disks lag behind. By a long shot. So, if each application you have installed duplicates all the libraries it depends on, it will take longer to install, longer to load, and (because modern CPUs totally rely on their cache to keep their maximum pace) longer to execute. The assumption that we should stop optimizing for size, popular as it is among dynamic languages supporters, is plain wrong. Actually, it's getting to be farther from the truth as CPUs keep getting faster, but memories and disks don't.
|
|
|
|
| URI != URL |
|
In a comment to my post on Atom 0.2, Sam notes that the <link> element is a URI, not a URL.
Thanks, Sam, I didn't notice it. Now that I have, it looks wrong to me.
There's a tendency in our industry to treat URIs and URLs as if they're the same thing, or at least very similar. There's also a common thinking that "a URI is everything a URL is, only a bit more general, so let's use that instead". I agree with neither.
On the Difference of URIs and URLs
To explain why, here are the two important differences between URIs and URLs:
- A URI represents identity. As such, a resource's URI doesn't change. A URL, on the other hand, is a name. Just like people can change their name, the name(s) of a resource may change.
- A resource's URI tells you nothing about the resource. It's URL gives you a closure of actions you can apply to it. (For example, if you have a http: URL, you can do GET, PUT, POST, etc; if you have a mailto: URL, you can send mail to the resource, etc.).
The point is that while these differences make little difference to us humans, tools that process them (should) behave differently. If U is declared to be a URI, a tool that processes it cannot in general apply any action to U. (There was a long discussion in the XML circles about what would you find at the "end" of namespace URIs when they are URLs; as far as I know, the proposal I most liked -- RDDL -- never got anywhere.)
When U is declared to be a URL, the tool can in general rely on its semantics beforehand, for example try to retrieve it for offline reading. (This applies to URLs whose protocol has some "GET" action, like http:; this isn't the case for mailto:, for example.) Even if the tool does not understand the semantics of the URL, it can still offer a hyperlink to it (punting the work to the OS, if you're working in Windows).
<link> Should be a URL
Coming back to the original issue, a <link> element serves human readers, because tools cannot rely on the resource it represents to mean anything. As such, making <link> a URI makes little sense to me: what is the utility of allowing a <link> to be "bla:1234.0987"?
|
|
|
|
 |
Friday, August 08, 2003 |
| Atom 0.2 |
|
Atom 0.2 is out. Although not an official spec, it looks like it's now solid enough to comment for people who are, shall we say, wiki-shy. So here's mine.
Choice of Top-Level Element
Atom 0.2, unlike RSS 2.0, has <feed> as its top element. While this is fine if all you want to use Atom for is blog notifications, it's too restricting for future growth.
For example, it doesn't handle cases in which several feeds are held in a single container. (For example, one can think of an Atom replacement for the ugly and restrictive OPML format, in which you have a single XML tree that only holds feed elements, with no content.)
I think we need a top-level element that would hold the <feed> element, as well as any @version information we'd care to put.
The <link> Element
The spec mandates one <link> element per <feed> and <entry> elements. It says this about the element:
[T]he link to the website described by this feed
and:
[p]ermanent link to a representation of this entry
Here's what it doesn't say, but is implied: the definition of a <link> element in this manner means that it is useful mostly to humans. Of course, one may build tools that would make good use of this element, but in general, such links cannot be relied upon.
Why am I saying that? After all, this is what RSS 2.0 does today, so it is the proven way of doing things, right?
Well, I don't think so. Let's consider the use of the permalink in weblogs. Most weblogs today fall into one of two camps: "Take That" weblogs provide the entire content of each entry in the feed. "E.T. Phone Home" weblogs provide only a teaser, and readers who want to actually read the entry are forced to do so with their browsers.
Now, consumers who prefer the first type of weblogs (TTW) rarely click on the <link> element, bacause they have no reason to. There are some weblogs that I couldn't recognize on sight, simply because I read them entirely using Aggie, without ever navigating to the site itself. For them, the element is mostly useless.
Other consumers rather like that they get only a teaser, and they get to decide if they want to navigate to the site or not. They also like getting to the site, to see everything in original colors, etc. For them, the link is everything.
Note, however, that in both cases the element is not used by the aggregator itself. To the aggregator, there's little difference between the <link> element and the <tagline> element -- they're both for consumption by humans.
So what? I have no problems with having a <link> element. I do have a problem with (1) having this element mandatory, and (2) the apparent thinking this mechanism is enough.
Don't Make <link> Mandatory
<link> should not be mandatory. It assumes a web presence, which is not always there. Suppose your printer delivers "job-done" notifications back to use in an Atom feed. What should it put in the <link> element? The URL of HP?
<link-to-atom>
Why is today's <link> element not enough? Suppose a producer of the feed does not want to provide full content (for example, because it takes too much bandwidth) but he has some readers (me!) who would like information to come to them rather than the other way around. With Atom 0.2, one has to give.
IMHO, this is exactly the type of limitations Atom was born to solve. The solution would be to provide another type of link element, <link-to-atom>, whose contents is a URL to a resource in Atom format itself. When attached to a feed, it provides the URL to the feed in Atom format, thus making the feed declare where it is located. (This has the pleasing property of allowing aggregators to track changes in feed location naturally, and producers who can't generate redirect pages happier.) When attached to an entry, it provides the URL to the Atom representation of the entry itself, only this time with the whole contents.
(Bandwidth-aware producers might also note that such a mechanism reduces their bandwidth consumption even more, because clients don't need to download all the "fluff" that entries usually contain in their web page.)
<generator>
This is a cosmetic remark. If we want the <generator> element to provide both a URL and a display name, then the URL should be an attribute and the elements' content be the display name, not the other way around. This is how anchor elements work in HTML, and I see no reason not to keep this format.
<author/url>
There must be a good reason why this is called <url> and not <link>, but I just don't see it.
<entry/id>
Yes, YES, YES
I can't stress this enough: a mandatory <id> is the most important feature in Atom, and it alone is sufficient to justify the whole effort.
Here's a simple use model that is not currently supported: Suppose I write a short article about (say) Atom 0.2. I want people to read this article, so I post it on my weblog. I want more than two people to read it, so I also post it to the Atom mailing list, the Wiki, as a remark at Sam Ruby's weblog (I'm sure he won't mind), and any other place I can spam. Now how can someone who reads several of these "water fountain" sources tell that he's already seen the element? How can he comment and make sure the comment propagates everywhere the original went?
Today, we have poor connection between distribution channels: People who leave comments in other people's weblogs don't post them to their own weblogs. Remarks I make in a mailing list remain confined there, unless I repeat them on my weblogs, and there's no way a smart universal client could weave them all together.
<id> will allow us to change all that. Technically, it's as old as SMTP and NNTP, who put it to good use. As a concept, it's as old as Adam's naming all animals. 3000+ years later, we still use these names [*]
content/mode
If this attribute is optional, the spec should call out what the default mode is.
[*] If you speak Hebrew, that is.
|
|
|
|
 |
Monday, July 28, 2003 |
| Ziv |
|
Google is loosing the war against weblogs. Consider this:
- On the first results page for Ziv Caspi, all links are about me
- According to Google, I am the most important Caspi on the net
- Not only that, yours truly is the number 2 Ziv out there (a trait I share with Sam)
This is just insane.
(Yes, the title of this post is designed to my position on that last point, because titles matter...)
|
|
|
|
 |
Friday, July 11, 2003 |
| Quick 'n Dirty |
|
A) Time to read Simon Fell's announcement: <1min.
B) Time to download resource to cache for testing: <1min.
C) Time to get Aggie to read his new Necho feed: <10min.
D) Time to write this in Radio (twice, because things didn't work on the first try): >(A+B+C)
|
|
|
|
 |
Saturday, June 14, 2003 |
| Antibiotic Days |
|
Tim Bray:
[...] there was a time when being a Web Guy was like being Gandalf the wizard and James Herriot the country vet all rolled into one.
|
|
|
|
| Full Content RSS Fragments |
|
In the RSS world, opinions differ on the important issue of whether or not to provide full content RSS feeds (that is, whether at least one of item/description, item/dc:content, or item/xhtml:body has the full "information" content of the item).
Ignoring for the moment conceptual aspects (for example, is the RSS feed a notification channel to draw people to the web site or a "first-class" content distribution means), there are significant "down-to-earth" aspects to this issue: both producers and consumers would like to cut down their bandwidth costs.
Downloading a 15-items full content feed when only one or two items change per download is wasteful. This has driven many producers to offer only "lightweight" feeds, in which the content is some "lossy-compression" of the full content, which is not provided as RSS. Sometimes the "compression" is done by providing just the first N words (or sentences) of the content; in other feeds the author providing an abstract in the RSS feed, or use a "tease" catch-phrase. In all cases, consumers have to manually go back to the original publisher's site to get the full monty.
Problem is, this makes reading RSS feeds unpleasant for a large group of people who read RSS feeds offline. Imagine this: you're on the train, happilly reading all the RSS feeds you've collected in your aggregator, when you read something interesting on Sam Ruby's weblog. Sam, however, just switched to short-form feeds, so you're out of luck; you have to wait until you get home to get it all.
It doesn't have to be this way.
Here's the idea: In every RSS <item>, provide a link to a resource that holds the item's full content in RSS form. For example:
http://bla.bla.bla.com/blog/12345.rss
(Hopefully, Joe would allow me to use his well-formed web namespace.) The resource indicated by link-to-rss is a valid RSS feed, probably (but not mandatorily) including only a single <item>, with full content. Aggregators are already quite good at detecting when items in an RSS feed change (by hashing title/link/description like Aggie does, or by looking into the dc:date/pubDate, or via similar means), so all an aggregator has to do is detect that an RSS item has been added/modified, and then download the item itself, this time with full content included.
Comments?
|
|
|
|
 |
Friday, June 06, 2003 |
| Lisp Syntax Matters |
|
This is bound to come up every now and then. Graham Glass writes:
LISP was an incredible work of art. so simple and so reflexive. but an absolutely crap syntax that doomed it.
In response, he got some of the expected "stupid, the syntax is what makes Lisp so powerful" comments (which I'll not link to), and some interesting ones (see his comments page). This debate comes every often, with Lisp gurus telling everybody else not to worry about the syntax and that they'll get used to it, and everybody else saying how they like Lisp, except for its syntax.
Come to think of it, not unlike the RDF scene (see what Joe, Tim, and yours truly had to say about that one).
|
|
|
|
 |
Monday, June 02, 2003 |
| IE and the OS |
|
Joe Gregorio writes:
Microsoft, trying to squeeze more revenue from operating system sales, looks to leverage it's monopoly in the browser market to force people to upgrade to the latest version of Windows.
Why do you say that? As you (I think correctly) point out, the current monopoly Microsoft enjoys in the browser arena [*] essentially provides no leverage.
IMHO, a decision to only add value to IE on future operating systems is simply making a good business move: there's no point in putting high-paid developers on a product that makes no income, right?
The Microsoft revenue stream is built almost entirely of selling bits and papers [**]: you buy bits from us (in the form of a CD, a DVD, an Internet download, or an activation number), and you buy software licenses. To keep the employees paid, we need to either increase market penetration, or improve our products so that people will want to upgrade. This mechanism is well-known.
So improving products is how Microsoft pays the bills (and Bill). Nobody forced people to install IE 6 (as has been indicated by so many people, it offers little features beyond IE 4!), yet they did. Similarly, nobody will force Joe to upgrade his Windows 98 . Perhaps if he sees enough value in it, he will [***]. The fact that he continues running Windows 98 (IMHO the worst OS release Microsoft made in the last 10 years) clearly shows that despite Microsoft's being a monopoly in the desktop OS market (and browser arena, and probably office suites), it still can't force customers to do what they want.
If you look at all the products/technologies that started life on their own and were later incorporated into the OS, I believe you'll see a giant jump in customer value, both in quality and in features. I don't doubt that the same could be said of IE.
* It certainly isn't a market, because most of us don't pay to get a browser, at least not directly.
** This is how Steve Wasserman explained it to me a month after my company (Peach Networks) was bought by Microsoft in early 2000.
*** Joe, here's an offer you can't refuse: I'll buy you a Windows XP Professional as a birthday present if you only dump that junk they call 98.
Disclaimer: I work for Microsoft. I own (very few) Microsoft shares. My opinions do not reflect those of my employer. I am not privvy to any internal discussions or decisions made by Microsoft on the future of IE (if I were, I wouldn't be posting). Everything I say here is based on stuff that has been publicly available on the Internet, and my own speculations.
|
|
|
|
 |
Friday, May 23, 2003 |
 |
Friday, March 21, 2003 |
| Capacity of Ad Hoc Wireless Networks |
|
Ad-hoc wireless networks (AHWN) are digital communication networks built from a set of small, independent, wireless devices. Unlike more "traditional" wireless networks, in which end devices communicate with some type of a fixed server, in AHWNs end devices talk to each other directly. As devices move from one place to another, or communication patterns change, connections between devices are changed accordingly, thus the "ad-hoc" aspect.
By their nature, AHWNs do not require a centralized architecture, and all devices on the network can be regarded on equal footing. In principle, this means that the entire population of a large geographical region can be equipped with such devices, and never pay their local service provider to communicate. If device A wants to talk with device B, it can do so directly provided they are close enough.
What if the two devices are two far apart? Here comes the "neat" part: The devices talk to other device which are close enough, thus establishing a multi-hop route between them. In terms of routability, it's just like how the Internet works (a message starting from end device A travels through multiple routers until it reaches end device B), except that each end device can also act as a router, the routes are ad-hoc, and no-one needs to pay any bills.
If it's so good, why don't we all dump our Internet providers tomorrow? Well, some people think this is exactly how the future looks like. They lobby quite regularly for their position these days. Personally, I have quite a few reservations about this sort of "free-lunch" architecture.
An interesting paper I read today provides some practical evidence (as opposed to theoretical arguments) that such networks -- while they might work perfectly for local regions -- do not scale to Internet sizes. In their paper Capacity of Ad Hoc Wireless Networks (which is recommended reading to anyone interested in the subject), the authors conclude:
[...] We find that, in general, 802.11 does a reasonable job of scheduling packet transmissions in ad hoc networks. 802.11 is more efficient for orderly local traffic patterns, such as a lattice network with only horizontal flows. 802.11 is also able to approach the theoretical maximum capacity of O(1/sqrt(n)) per node in a large random network of n nodes with random traffic.
We argue that the key factor deciding whether large ad hoc networks are feasible is the locality of traffic. We present specific criteria to distinguish traffic patterns that allow scalable capacity from those that do not.
[Li, Blake, De Couto, Lee, and Morris; Capacity of Ad Hoc Wireless Networks]
|
|
|
|
 |
Saturday, February 15, 2003 |
| |