If there’s one thing the history of technology shows us, it’s that people will do what they want regardless of what technologists decide. It doesn’t really matter if we say "I’ve got this great vision for 3D interaction." Make it useful. Make it interesting. Make it ubiquitous. Or people will stay away.
That’s what’s happened with 3D and the web so far. Commonly installed graphics cards have been powerful enough for a 3D web experience for at least 5 years now. We even had Google Earth (it’s predecessor) running reasonably well on older machines with no 3D acceleration in back 1999. User Interfaces like the Apple’s Aqua have used 3D acceleration for special effects for a few years already and Microsoft’s Vista is soon rolling out. A small army of "3D browsers" already exist, most of them pretty good at taking static 2D web pages and shuffling them around in 3D to various ends.
But maybe that’s the problem. Web pages are still 2D, so the best you can hope for in manipulating them is 2.5D, in other words, stacking them like papers on your desk (which is the basis for the current desktop metaphor, even if you let the windows "rotate"). In real life, do we take our 2D pieces of paper and spin them around our desks like a swarm of frozen butterflies? Is it useful?
People have tried adding 3D to web pages, mostly in the form of plug-ins. The limitation here is that the 3D content must all live in their own implicit windows, disconnected from the rest of your experience (namely, the nice 2D text and graphics we know and love). Essentially, it’s a side application that happens to be rendered onto the same page as your web content, but suffers from flaws, like ActiveX security and needing to restart every time you navigate to a new page. Ignoring the flaws, are the positive aspects of 3D in a little window all that interesting? Can you think of many examples (Shockwave3D apps, etc..) that have really blown you away?
Perhaps the problem is that the 3D elements are left isolated from the rest of the Web. One obvious solution is to make it so that 2D elements like text and pictures can float together with 3D elements. Some approaches add text and graphics to 3D markup languages, but they lose the quality of 2D rendering, especially for text. What we’d really need is a hybrid language that incorporates fast and fine 2D rendering with 3D objects and navigational metaphors. For it to be ubiquitous, or at least backwards compatible, it would need to work for all current web content and allow enabled browsers to use the extended functionality without missing a beat.
That’s non-trivial, especially given a standards body that understandably adds new HTML tags at a glacial pace, and a web development community that can and should be wary of going it alone. But it’s what I’ve certainly needed on several 3D projects already — and because of the limitations of browser technology, they wind up being stand-alone networked apps. But I’m sure I’m not alone in wanting a version of FireFox with the ability to freely mix 2D and 3D. And I’m sure, given the typical speed of 2D layout engines, it’s not going to happen anytime soon.
Now, the thing about the Metaverse, as it’s commonly conceived, is it goes even further than just 3D layout, to include a concept of "self," or point of view, which, in the desktop metaphor, is locked straight down to the desktop. In a 3D environment, that can obviously change. It’s curious, then, that most 3D desktop prototypes I’ve seen have focused on manipulating objects, mostly windows, in 3D while leaving the user’s point of view relatively limited. Our brains may be better suited to remembering where objects are placed in a large environment (in context) than remembering where on your desktop you put each window or icon. Navigation can be a problem for big spaces, but I think Google Earth shows that you can fly around the world in a few seconds if you handle the zooming right. [btw, did you know that when zooming in from space, you start out going significantly faster than the speed of light?]
The Metaverse also normally carries with it the concept of community as well, which an application like Second Life enjoys. Web browsing is a fairly solipsistic activity, though not always so (message boards and chat are good exceptions). What happens, though, if we turn the web into a visibly shared space? What does it look like? And how do we interact? If that space, full of avatars and objects, is a faithful reflection of the real world, is that more useful than just using Google Earth? If that space is more abstract, but big and empty, is still interesting? Can we come up with a world that’s as rich as Earth but unconstrained and content-driven, like the topology of the web?
Ultimately, as a technologist, I look for solutions in the fundamental protocols. The Web didn’t develop because someone said "let’s make portals or search engines." Those developed because the Web existed and there was a perceived need. What the web really is, deep down, is a set of protocols, languages for defining hyper-linked 2D content. The Web is not your browser app. It’s not the server on the other end. It’s not even the Internet that shifts bits back and forth between the two. It’s the language that makes it so that any browser can talk to any server and any link in any document can and should bring more information.
I think the same thing goes for 3D. Once we develop a true language for describing objects (not just points and polygons), for handling interaction between people in a 3D space (not just windows), then we’re at least halfway there. The fact that people have tried in the past (VRML, etc..) and it hasn’t taken off can mean only two things: either the Metaverse is the wrong concept entirely, or the specifics of what they tried didn’t pass the tests of usefulness, interest, and ubiquity. Perhaps it’s time to assume that those attempts went wrong somehow, learn from them, and try again with something new.