<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>RealityPrime &#187; Featured</title>
	<atom:link href="http://www.realityprime.com/category/featured/feed" rel="self" type="application/rss+xml" />
	<link>http://www.realityprime.com</link>
	<description>Advanced Technology Research</description>
	<lastBuildDate>Sun, 14 Feb 2010 07:46:35 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Best Use of Augmented Reality, Ever</title>
		<link>http://www.realityprime.com/featured/best-use-of-augmented-reality-ever</link>
		<comments>http://www.realityprime.com/featured/best-use-of-augmented-reality-ever#comments</comments>
		<pubDate>Sun, 14 Feb 2010 01:25:34 +0000</pubDate>
		<dc:creator>avi</dc:creator>
				<category><![CDATA[Featured]]></category>

		<guid isPermaLink="false">http://www.realityprime.com/?p=410</guid>
		<description><![CDATA[I worked with Blaise last year, starting at about the time he took over as architect on Virtual Earth (now Bing Maps). I claim no credit for this work, but I&#8217;m proud just the same.

]]></description>
			<content:encoded><![CDATA[<p>I worked with Blaise last year, starting at about the time he took over as architect on Virtual Earth (now Bing Maps). I claim no credit for this work, but I&#8217;m proud just the same.</p>
<p><object width="446" height="326"><param name="movie" value="http://video.ted.com/assets/player/swf/EmbedPlayer.swf"></param><param name="allowFullScreen" value="true" /><param name="wmode" value="transparent"></param><param name="bgColor" value="#ffffff"></param><param name="flashvars" value="vu=http://video.ted.com/talks/dynamic/BlaiseAguerayArcas_2010-medium.mp4&#038;su=http://images.ted.com/images/ted/tedindex/embed-posters/BlaiseAgueraYArcas-2010.embed_thumbnail.jpg&#038;vw=432&#038;vh=240&#038;ap=0&#038;ti=766&#038;introDuration=16500&#038;adDuration=4000&#038;postAdDuration=2000&#038;adKeys=talk=blaise_aguera;year=2010;theme=the_creative_spark;theme=a_taste_of_ted2010;theme=new_on_ted_com;event=TED2010;&#038;preAdTag=tconf.ted/embed;tile=1;sz=512x288;" /><embed src="http://video.ted.com/assets/player/swf/EmbedPlayer.swf" pluginspace="http://www.macromedia.com/go/getflashplayer" type="application/x-shockwave-flash" wmode="transparent" bgColor="#ffffff" width="446" height="326" allowFullScreen="true" flashvars="vu=http://video.ted.com/talks/dynamic/BlaiseAguerayArcas_2010-medium.mp4&#038;su=http://images.ted.com/images/ted/tedindex/embed-posters/BlaiseAgueraYArcas-2010.embed_thumbnail.jpg&#038;vw=432&#038;vh=240&#038;ap=0&#038;ti=766&#038;introDuration=16500&#038;adDuration=4000&#038;postAdDuration=2000&#038;adKeys=talk=blaise_aguera;year=2010;theme=the_creative_spark;theme=a_taste_of_ted2010;theme=new_on_ted_com;event=TED2010;"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://www.realityprime.com/featured/best-use-of-augmented-reality-ever/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The Unauthorized History of Virtual Worlds</title>
		<link>http://www.realityprime.com/articles/the-unauthorized-history-of-virtual-worlds</link>
		<comments>http://www.realityprime.com/articles/the-unauthorized-history-of-virtual-worlds#comments</comments>
		<pubDate>Tue, 24 Mar 2009 04:12:00 +0000</pubDate>
		<dc:creator>avi</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Featured]]></category>

		<guid isPermaLink="false">http://www.realityprime.com/?p=297</guid>
		<description><![CDATA[I wrote the following essay to help us get going crafting a review paper for a major comp-sci journal. &#8216;Us&#8217; in this case was Blaise Aguera y Arcas, one of the founders of PhotoSynth and Virtual Earfth&#8217;s new architect, and Jaron Lanier, one of the pioneers of VR, who thought of pretty much everything before [...]]]></description>
			<content:encoded><![CDATA[<p>I wrote the following essay to help us get going crafting a review paper for a major comp-sci journal. &#8216;Us&#8217; in this case was Blaise Aguera y Arcas, one of the founders of PhotoSynth and Virtual Earfth&#8217;s new architect, and Jaron Lanier, one of the pioneers of VR, who thought of pretty much everything before I became conscious of the world.</p>
<p>Now, I should caution that Blaise didn&#8217;t ultimately want to use this text and Jaron equally had issues with it. The tone is all wrong for an academic journal, plus Jaron disputes some of the dates I recorded from my research (he may well know better). But I felt it might at least be entertaining to RP readers, so I&#8217;m posting it for you to enjoy. Still, don&#8217;t take any of it as official, just me being a smart-ass.</p>
<p>&nbsp;</p>
<p><span id="more-297"></span></p>
<p>&nbsp;</p>
<p class="MsoNormal"><font face="Arial" size="3">October 20<sup>th</sup>, 2008 marked the 30<sup>th</sup> anniversary of the MUD <a title="" name="_ednref1" href="#_edn1" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 11pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[1]</span></span><!--[endif]--></span></span></a>or Multi User Dungeon, widely recognized as the world&rsquo;s first multi-participant text-based virtual world. Only three years later, a somewhat less interactive work, <em style="">True Names,</em><a title="" name="_ednref2" href="#_edn2" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 11pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[2]</span></span><!--[endif]--></span></span></a> by Vernor Vinge, imagined full multi-sensory worlds with millions of participants. The film <em style="">TRON</em> debuted only a year after that, popularizing, if not actually monetizing, computer-mediated virtual worlds as full-on alternate realities &#8212; places with lives onto themselves. But before any of these were even conceived, <em style="">The Veldt<a title="" name="_ednref3" href="#_edn3" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><strong style=""><span style="font-size: 11pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[3]</span></strong></span><!--[endif]--></span></span></a></em>, by Ray Bradbury, envisioned &ldquo;The HappyLife Home,&rdquo; a fully immersive CAVE-like space, consuming parents and kids alike, way back in 1950.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">The history of virtual worlds is a complex mesh of fact and fiction, weaving pioneers, dreamers, authors, and critics in a quest to define a grand vision and to meet an ancient need, dating back to the days of burnt charcoal on cold cave walls. That need is to communicate, to share and persist what is otherwise ephemeral, isolated, and ultimately bounded to the lifespan of memory: our thoughts, our ideas, and our stories about life, real and imagined. It is perhaps fitting that these visionary fictions are themselves conveyed to audiences, new and old, as print- and film-mediated virtual worlds<a title="" name="_ednref4" href="#_edn4" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 11pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[4]</span></span><!--[endif]--></span></span></a>, just as the CAVE acronym<a title="" name="_ednref5" href="#_edn5" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 11pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[5]</span></span><!--[endif]--></span></span></a> itself is a recursive allusion to &ldquo;Plato&rsquo;s Illusion&rdquo; playing on those same torch-lit walls.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">That grand vision, however, goes beyond mere communication. The desire for ubiquitous virtual worlds is perhaps an understandable manifestation of our collective (if not universal) longing to overcome the rules that bind us, to propel our minds over the limits of matter, space, and even death, of which the &ldquo;mortality of ideas&rdquo; is just one example. The end goal, for many, is to construct alternate realities so malleable, so perfectly adapted to our innate desires, that we could fairly call the results <em style="">magic.</em><a title="" name="_ednref6" href="#_edn6" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 11pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[6]</span></span><!--[endif]--></span></span></a> In those brave new worlds, there need be no scarcity, no ugliness, nor pain. We can be whatever we imagine or wish ourselves to be, and suffer none of the limits of ordinary, mundane life.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">Then again, in reality, as in fiction, the stories never quite seem to work out that way. Reality has a way of always winning in the end. And, as with the previous allegories, this grand quest of ours is so cyclical, so inflated and self-referential, as to often resemble the construction of a Klein bottle. The true history of virtual worlds is one of visionary and often impervious genius, promises made (and made again), and the search for the most elusive human-computer interface ever envisioned &ndash; the one that disappears.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">One of the early visionary geniuses in the field was the legendary ACM Fellow, Ivan Sutherland. His ground-breaking Sketchpad system plotted the lines along which future computer graphics interfaces would be drawn. He invented the head-mounted display, heavy as it was, to better couple the virtual imagery to our actual head motions, and in so doing, laid the foundation for untold future systems that place one or more virtual interactive cameras in a computer-mediated reality.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">But just 10 years earlier and only a few years after Bradbury&rsquo;s fictional exploration of virtual family values, Walt Disney more literally broke ground on a site in Anaheim that would become, for a significant period of time, the world&rsquo;s largest <em style="">physical</em> virtual world. Until then, the best example of a &ldquo;fantasy land&rdquo; was the haunted house, a grotesque and distorted reflection of reality, not entirely unlike some early experiments in VR. Disney&rsquo;s theme parks provided a comfortable level of immersion for many people &mdash; not nearly as interactive as, say, <em style="">Korea</em> or <em style="">Vietnam</em>, and not nearly as rich as the Renaissance Pleasure Faire and &ldquo;RenFaires&rdquo; since, but safe and fun for the whole family.</font></p>
<p class="MsoNormal"><font face="Arial" size="3"><span style="">The 1960s also saw an explosion of experimentation in the media of the time, including Mort Heilig&rsquo;s Sensorama, which included stereo visuals, smells, sounds and even haptics. Filmic experiments in 3D and surround visuals abounded, with Disney&rsquo;s Circle*Vision<a title="" name="_ednref7" href="#_edn7" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 11pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[7]</span></span><!--[endif]--></span></span></a> being one of the most widely viewed. The addition of sturdy handrails to keep guests from falling over (in a stationary theater) is a testament to the power of these experiences to move and consume us, for better or worse.<o :p=""></o></span></font></p>
<p class="MsoNormal"><font face="Arial" size="3">While we&rsquo;re not sure if Sutherland knew Disney or Heilig in his day, we do know that Evans and Sutherland computers were used to help animate Disney&rsquo;s TRON in 1981. And ten years after Tron failed to make a dent in any critical or commercial sense, Disney Imagineering&rsquo;s VR Studio would similarly begin their real-time interactive VR experiments on Evans and Sutherland image generators.<a title="" name="_ednref8" href="#_edn8" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 11pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[8]</span></span><!--[endif]--></span></span></a> The obvious choice of movie content gave way to a failed <em style="">Rocketeer</em> attraction and later <em style="">Aladdin&rsquo;s Magic Carpet Ride, </em>sporting heavy HMDs, counterbalanced much like Sutherland&rsquo;s Ultimate Display. One of Sutherland&rsquo;s students, Ed Catmull, also found inspiration in Disney animation. He embarked on 30 year quest to reinvent the art and science of animation, finally coming full circle in the new millennium, as the head of Disney&rsquo;s animation studios after Disney swallowed Pixar.<a title="" name="_ednref9" href="#_edn9" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 11pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[9]</span></span><!--[endif]--></span></span></a></font></p>
<p class="MsoNormal"><font face="Arial" size="3">The hard road towards better computer-mediated storytelling (in HMDs and on the silver screen) merely proved that obstacles are there to surmount. Randy Pausch, most famous for his &ldquo;last lecture&rdquo; at CMU, observed that obstacles &quot;give us a chance to show how badly we want something&rdquo; &#8212; or, perhaps, to make us take the time to understand <em style="">why</em>. Dr. Pausch worked with the same Disney VR Studio, CMU&rsquo;s Alice software, and other members of the story, all of whom badly wanted to solve the problems of more easily building and experiencing computer-mediated virtual worlds.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">But reinventing the world takes time, often a generation or two. Ivan Sutherland was reportedly influenced by Vannevar Bush&rsquo;s 1945 atomic-age essay &ldquo;As We May Think,&rdquo;<a title="" name="_ednref10" href="#_edn10" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 11pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[10]</span></span><!--[endif]--></span></span></a> which is also widely credited as inspiration for the World Wide Web. That it took 50+ years to produce a Google and an MSN to make the web more tractable is merely a reflection of the difficulty we have in taking grand ideas and rendering them in a form that works for ordinary people &ndash; the proper convergence of money and market, timing and technology; but more importantly, a better understanding of just what it is we should be asking for in the first place.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">In the early period, well before we had too many cogent glimpses of that sort of revelation, the 1970s saw the exploration, largely using university mainframes, of true multi-user environments. Maze War, in 1974, was the first known multi-user environment, and graphical to boot. But it was very limited in its influence. MUD, in 1978, had much more success. The lack of graphics proved to be no deterrent, and in fact, arguably improved interactivity and depth, simplifying some very hard problems to a more manageable scope.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">The 1980s saw a new push into real-time graphical interfaces, with cheaper and more available commodity hardware. On the desktop, Apple and Microsoft pushed 2D windows, icons, and mice to great effect. In movies, CGI went from niche to boutique to a mainstay of visual effects. And VPL Research pushed the envelope in full-sensory virtual reality (and even the term itself), providing visual programming elements and a grab bag of more natural user interfaces. Their Reality-Built-For-Two was the first commercial VR system, shipped in 1989.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">Alas, the Achilles heel of commercialization is the requirement of making money. VPL ultimately sold itself and its patents to Sun in 1998. Still, in 1989, Mattel released a PowerGlove that simplified the VPL DataGlove concept into something mass-marketable. It unfortunately failed to spawn the kind of kinetically addictive games that the Nintendo Wii presently enjoys, despite many similar capabilities. And while the Visual Programming Language VPL developed broke new ground, it fell to Lego and Microsoft&rsquo;s Robotics Studio, years later, to truly push some of these concepts to wider audiences.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">The 1980s can be thought of, in a sense, as the early adolescence of virtual worlds, in which the core technological concepts came to be expressed, tested, and propagated to anyone who would listen. Habitat, for example, emerged in 1988 on the popular Commodore 64 platform, and was arguably the first mass-market virtual world, presaging mega hits like Habbo Hotel, Club Penguin, and even Everquest by over a decade, but never quite gaining the market validation they sought.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">If the 1980s are the early teens, the 1990s represent the often-raucous teen to adult transition. Unfortunately, as difficult as it is for any child actor to grow up with the hype and spotlight of Hollywood, so too were the overwhelming expectations for VR technology a part of its downfall. Movies like <em style="">The Lawnmower Man</em> promised an effectively <em style="">supernatural</em> decoupling of VR&rsquo;s espoused benefits from any actual truth. Publications liked Wired treated VR visionaries (or indeed, anyone who seemed to have a good futuristic idea) like rock stars. In fact, it&rsquo;s the expression of difficult computer science concepts in natural human metaphors that is both the greatest strength and the ultimate weakness of VR &ndash; anyone describing it to lay audiences can use language that evokes sweeping images and expectations that are currently impossible to meet.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">Meanwhile, with all of the attention and potential of VR technologies for computer-mediated virtual worlds, research funding increased dramatically. Among the efforts, the Human Interface Technology Lab at the University of Washington pushed new technologies to solve some of the core interface challenges in VR, leading to devices like the Virtual Retinal Display, a laser-based display that is today most closely related to pocket-sized low-power projectors. UNC pushed the envelope on haptics and core technologies, while Universities like Utah, Ohio State, Brown, and Stanford pushed ahead primarily on 3D rendering. Carolina Cruz-Neira and others pushed the envelope in projected virtual environments, culminating in 1992 with the CAVE at the University of Illinois Electronic Visualization Laboratory. Meanwhile, continued military investment in visual simulation drove investment in graphics supercomputers, ultimately leading to cheap commodity 3D hardware acceleration for PCs.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">Virtual Reality went, more or less, like the space program. Massive investments didn&rsquo;t result in any influx of civilians going into space, no personal robots nor flying cars. It certainly got a lot of public attention, anxiety, and perhaps disappointment after some early climaxes. But the technologists and dreamers marched on. And core technologies produced did, in fact, wind up in everyday consumer devices, from cell phones to gaming consoles.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">Indeed, while true Virtual Reality devices failed to take over the world in any meaningful way, the offshoots of the very same work wedged themselves into our daily lives. While Disney Imagineers worked on million dollar SGI hardware to build an immersive Aladdin attraction in 1994, they played Doom and Quake in the office for fun. Giant dinosaurs gave way to nimble mammals, just waiting for their chance. John Carmack&rsquo;s games simultaneously spawned an entire genre of blood-splattering Demon/Nazi/Zombie shattering realism and helped blast PC-level 3D graphics into the mainstream.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">The gaming community can be credited with seeing some of the most lucrative uses of 3D graphics and interactivity, and of turning it all into a sustainable, even thriving business<a title="" name="_ednref11" href="#_edn11" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 11pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[11]</span></span><!--[endif]--></span></span></a>. Gaming has turned into such a serious business that a whole branch of Serious Games has emerged<a title="" name="_ednref12" href="#_edn12" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 11pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[12]</span></span><!--[endif]--></span></span></a>.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">But while game companies were having fun and making money, the internet boom took full effect, with a much stronger emphasis on the latter over anything else. SGI famously launched Cosmo, an offshoot of VRML 2.0, to take the &ldquo;Metaverse&rdquo; by storm. The &ldquo;dot com&rdquo; bubble dwarfed by orders of magnitude any investment in games or VR, pushed home broadband adoption, but ultimately left US network infrastructure and networked virtual worlds bobbing in the wake of the April 2000 tsunami.<em style=""><o :p=""></o></em></font></p>
<p class="MsoNormal"><font face="Arial" size="3">Phillip Rosedale was one survivor of that bubble. He was one of the inventors of a video codec that made Real Video hum, having sold his company to Progressive Networks in 1996. He founded Linden Labs in 1999 with the proceeds and some help from friends in physics and video games. And while the evolution of massively multiplayer on-line games can be traced from Maze War to XPilots to Meridian 59 and World of Warcraft, Second Life differentiated itself in several important ways: first, there was to be only one shared world, not many shards or instances to split the load; second, the world is malleable and subject to user&rsquo;s whims; and third, it&rsquo;s not a game, it&rsquo;s an alternate existence onto itself, tied into real world currency and the web itself, but separate just the same. Second Life also rejected the standard VRML view of the world, of scenegraphs and polygons, in favor of a completely dynamic soup of what are effectively virtual Legos. The end result is something that is both powerful, and still, eight years later, limited by its own design to a walled-garden and only linearly scalable topology.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">Meanwhile, another survivor of the dot-com implosion was busy building a fundamentally different kind of virtual world, right in the bubble&rsquo;s wake. Keyhole was a spinoff of the defunct game technology company, Intrinsic Graphics, involving many former engineers from SGI, who had worked with VR early adopters from Disney Imagineering to the NIMA and NGA<a title="" name="_ednref13" href="#_edn13" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 11pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[13]</span></span><!--[endif]--></span></span></a>. The key technology could stream an entire planet worth of visual information over a standard network connection and render the relevant window it in real-time on any PC. But the true advantage over contemporary systems, GIS and otherwise, was in how simple and intuitive the interface was. In 2004 Google acquired the company and renamed it Google Earth. And today, it defines what a mirror world is supposed to be &ndash; an accurate reflection of the real world, equally navigable from your desktop or mobile phone.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">However, there was yet another kind of mirror bubbling to the surface around this time, one much more grass-roots. Bloggers, with unfettered access to the Web, began posting about their daily lives, their interests, and their friends. Social networks began to take root in 2003, in a second wave of web startups, the so-called Web2.0. They developed &ldquo;what&rsquo;s new&rdquo; feeds meant to keep people in the loop. Sites like Twitter emerged later to simplify the process of posting one&rsquo;s status to the net. And more recently, efforts like Loopt, Whrrl, and others seek to leverage real-time location on mobile devices to build up a view of the mirror world that is both personal and reflective of our individual realities.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">A distinct kind of virtual world was just coming into focus around the new millennium as well. The earliest examples of Augmented Reality can be traced to systems like Myron Kruger&rsquo;s <em style="">VideoPlace</em></font> in 1975. These consisted of combining video streams of real people and real or artificial places to simulate a sort of detached immersion. Much as with a meteorologist on the local news, participants could see themselves acting in a virtual environment in the third person only, on monitors for example, vs. the first person points of view obtained with HMDs and CAVEs. Eventually, technology was added to accurately track and emulate camera lenses and to seamlessly register computer imagery into the video. Today, augmented reality typically refers to live captured from a person&rsquo;s point of view, overlaid with relevant CGI and fed back to some display device, perhaps even a head-mounted-display.</p>
<p class="MsoNormal"><font face="Arial" size="3">The promise of augmented reality is best epitomized in Vernor Vinge&rsquo;s novel, <em style="">Rainbows End</em>, where VR contact lenses provide continuous access to the augmented world both inside and out. Haptic interfaces provide the missing sense of touch to manipulate and make these virtual objects real. But the state of the art is nowhere near this threshold. Problems with tracking and registration of objects still remain, but the principle challenge remains the display device, making it small, unobtrusive, and yet high enough fidelity to enable a person to walk around and actually use it without getting hurt by the real world, which hasn&rsquo;t gone away.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">In the development of Virtual Worlds technology, we&rsquo;ve seen displays go from a few vectors to thousands, millions, and lately billions of pixels, and from a few dozen to a few billion polygons per second as well. We&rsquo;ve seen haptic interfaces shrink from room-sized devices, reminiscent of the Inquisition, to gloves and even direct neural stimulation with no mass at all.<a title="" name="_ednref14" href="#_edn14" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 11pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[14]</span></span><!--[endif]--></span></span></a> We&rsquo;ve seen input devices go from mechanical arms to wired magnetic sensors to wireless and even optical motion capture. And we&rsquo;ve seen networks go from slow drizzles of bits to full-on torrents, with sophisticated methods of prediction to cover latency and errors.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">But yet Virtual Worlds, despite such advancements, and despite the adoption of its technology and methodology in many fields, is still widely seen as a <em style="">future</em> technology, not relevant to our everyday lives, a walk down the street, a trip to the mall, a day in the office. Web 3D is derided as unnecessary and indeed cumbersome, given the success and simplicity of the current 2D Web. Second Life is seen as niche, despite many businesses trying, for a time, to set up shop in one of the best Metaverses around.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">Clearly, something is missing in the equation.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">After forty years of research and development, since the ground-breaking days of Ivan Sutherland, we&rsquo;ve finally beginning to come to a realization. People like escapism and fantasy worlds. Kids love to play in the modestly dimensional clubs of penguins and hotels. And companies are rushing in to tame the wild 3D west and pan their virtual gold. But the vast majority of human beings still spend the vast majority of their time immersed in a much more compelling and less inescapable 3D environment, which we tend to call &ldquo;real life.&rdquo;</font></p>
<p class="MsoNormal"><font face="Arial" size="3">Computer-mediated virtual worlds are just beginning to catch up with what we do out here in the real world. We buy and sell things. We get and share information. We communicate and &ldquo;get stuff done.&rdquo; We can increasingly do all of that in a virtual world too. But we can&rsquo;t do it nearly as well.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">And though we bemoan the limitations of 2D displays and table-top mice who have overstayed their welcome by a decade or more, the reason we don&rsquo;t typically do these things in a virtual world is not about the interface, or the speed of CPUs or GPUs. It&rsquo;s because we live out here, in the real world. No one likes to commute across an international &#8212; or inter-dimensional border &#8212; on their way to work.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">A significant push in modern immersive environments research, then, focuses not as much on how to bring us to Neverland, but on how to bring Neverland to us &ndash; to make The Virtual a part of our lives in a way that benefits us beyond the hype of paperless offices and social (read: peer pressure) networks.</font></p>
<p class="MsoNormal"><font face="Arial" size="3">What the new Virtual is all about is reality but augmented, mirrored, and hyper-realistic, giving benefits beyond VR, AR, and even AI (which has similarly vanished into the lattice of modern life).</font></p>
<p class="MsoNormal"><font face="Arial" size="3">But what does it take to make this vision real? How do we mirror the real world in a way that we can make it interactive, turn it into a trellis for overlaying whatever whimsical and/or beneficial fictions we can dream up? Once we have that trellis, how do we interact with these hybrid real/virtual objects? How do we trade them, when their substance costs no more than the electricity they consume?</font></p>
<p class="MsoNormal"><font face="Arial" size="3">Only one thing is certain: the story of the next 25 years will be written in the full light of day.</font></p>
<div style=""><!--[if !supportEndnotes]--></p>
<p><font face="Arial" size="3"><br clear="all" /><br />
</font></p>
<hr width="33%" size="1" align="left" />
<!--[endif]--></p>
<div id="edn1" style="">
<p class="MsoEndnoteText"><font face="Arial" size="3"><a title="" name="_edn1" href="#_ednref1" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 10pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[1]</span></span><!--[endif]--></span></span></a> MUD was written by Roy Trubshaw, Essex University in 1978. It was the first adventure game to permit multiple users.</font></p>
</div>
<div id="edn2" style="">
<p class="MsoEndnoteText"><font face="Arial" size="3"><a title="" name="_edn2" href="#_ednref2" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 10pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[2]</span></span><!--[endif]--></span></span></a> <a name="ss.true.names"><em><span style="font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">True names</span></em> (&copy;1981) first appeared in <em><span style="font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">Dell Binary Star #5</span></em>, and again </a><a name="anth.true.names">in <em><span style="font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">True names and other dangers</span></em> (&copy;1987, ISBN 0-671-65363-6)</a></font></p>
</div>
<div id="edn3" style="">
<p class="MsoEndnoteText"><font face="Arial" size="3"><a title="" name="_edn3" href="#_ednref3" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 10pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[3]</span></span><!--[endif]--></span></span></a> Published 23 September, 1950, in <em style="">The Saturday Evening Post</em>, and again in the anthology <em style="">The Illustrated Man</em> in 1951</font></p>
</div>
<div id="edn4" style="">
<p class="MsoEndnoteText"><font face="Arial" size="3"><a title="" name="_edn4" href="#_ednref4" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 10pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[4]</span></span><!--[endif]--></span></span></a> A matter of definition, virtual worlds can be said to exist in our minds first and foremost, as we never <u>directly</u> experience any objective reality &ndash; everything is mediated by our senses, and our memories in large part. Computer-mediated virtual worlds are a novel extension of the same idea, adding a level of depth that film often lacks. Textual virtual worlds (e.g., books), on the other hand, are still the world&rsquo;s most effective form of conveyance, considering price/performance and effective bandwidth, though the worlds they create in our minds can be said to be a highly lossy decompression of whatever the author had in mind.</font></p>
</div>
<div id="edn5" style="">
<p class="MsoEndnoteText"><font face="Arial" size="3"><a title="" name="_edn5" href="#_ednref5" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 10pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[5]</span></span><!--[endif]--></span></span></a> CAVE in fact stands for CAVE Automated Virtual Environment &ndash; the recursion has indeed been attributed to a Platonic influence.</font></p>
</div>
<div id="edn6" style="">
<p class="MsoEndnoteText"><font face="Arial" size="3"><a title="" name="_edn6" href="#_ednref6" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 10pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[6]</span></span><!--[endif]--></span></span></a> From Arthur C. Clarke&rsquo;s third law: &ldquo;Any sufficiently advanced technology is indistinguishable from magic.&rdquo;</font></p>
</div>
<div id="edn7" style="">
<p class="MsoEndnoteText"><font face="Arial" size="3"><a title="" name="_edn7" href="#_ednref7" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 10pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[7]</span></span><!--[endif]--></span></span></a> O<span style="">riginally named Circarama, and renamed to Circle*Vision in 1967.</span></font></p>
</div>
<div id="edn8" style="">
<p class="MsoEndnoteText"><font face="Arial" size="3"><a title="" name="_edn8" href="#_ednref8" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 10pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[8]</span></span><!--[endif]--></span></span></a> E&amp;S systems were used to prototype the Rocketteer virtual reality ride, which was never released, and were later replaced with more powerful and programmable supercomputers from SGI for the Aladdin ride. In a strange repetition of history, the VR studio choose to suspend their HMD from the ceiling, much as Sutherland&rsquo;s first HMD in 1968 had done. (International Conference on Computer Graphics and Interactive Techniques, 1996, Proceedings)</font></p>
</div>
<div id="edn9" style="">
<p class="MsoEndnoteText"><font face="Arial" size="3"><a title="" name="_edn9" href="#_ednref9" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 10pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[9]</span></span><!--[endif]--></span></span></a> Or vice-versa.</font></p>
</div>
<div id="edn10" style="">
<p class="MsoEndnoteText"><font face="Arial" size="3"><a title="" name="_edn10" href="#_ednref10" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 10pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[10]</span></span><!--[endif]--></span></span></a> As We May Think, The Atlantic Montly, July 1945.</font></p>
</div>
<div id="edn11" style="">
<p class="MsoEndnoteText"><font face="Arial" size="3"><a title="" name="_edn11" href="#_ednref11" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 10pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[11]</span></span><!--[endif]--></span></span></a> Insert $xxxB estimate</font></p>
</div>
<div id="edn12" style="">
<p class="MsoEndnoteText"><font face="Arial" size="3"><a title="" name="_edn12" href="#_ednref12" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 10pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[12]</span></span><!--[endif]--></span></span></a> This market estimated at $9B alone.</font></p>
</div>
<div id="edn13" style="">
<p class="MsoEndnoteText"><font face="Arial" size="3"><a title="" name="_edn13" href="#_ednref13" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 10pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[13]</span></span><!--[endif]--></span></span></a> The National Geospatial Intelligence Agency, named various other acronyms over its several lifetimes, has been the prime consumer of digital earth imagery. In conjunction with Al Gore&rsquo;s Digital Earth Initiative in the early 1990s, it&rsquo;s responsible for the exponential increase in availability (and reduction in price) of aerial imagery which makes Google Earth and Virtual Earth possible.</font></p>
</div>
<div id="edn14" style="">
<p class="MsoEndnoteText"><font face="Arial" size="3"><a title="" name="_edn14" href="#_ednref14" style=""><span class="MsoEndnoteReference"><span style=""><!--[if !supportFootnotes]--><span class="MsoEndnoteReference"><span style="font-size: 10pt; line-height: 115%; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;;">[14]</span></span><!--[endif]--></span></span></a> Presently limited to surgical limb-replacement procedures, the non-invasive stimulation of muscles and sensory receptors is still quite nascent.</font></p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.realityprime.com/articles/the-unauthorized-history-of-virtual-worlds/feed</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>The Singularity is Nigh</title>
		<link>http://www.realityprime.com/articles/ieee-spectrum-special-report-the-singularity</link>
		<comments>http://www.realityprime.com/articles/ieee-spectrum-special-report-the-singularity#comments</comments>
		<pubDate>Tue, 03 Jun 2008 17:38:13 +0000</pubDate>
		<dc:creator>avi</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Featured]]></category>

		<guid isPermaLink="false">http://www.realityprime.com/articles/ieee-spectrum-special-report-the-singularity</guid>
		<description><![CDATA[IEEE Spectrum: Special Report: The Singularity
I&#8217;ll post more when I get a second, but it&#8217;ll take some time to digest.
For what it&#8217;s worth, my present take on the Singularity is a cross of Vinge&#8217;s and something Stoss said at a WorldCon party (or elsewhere), and Kurzweil, despite some inherent contradictions:

1. The future beyond a singularity [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.spectrum.ieee.org/singularity">IEEE Spectrum: Special Report: The Singularity</a></p>
<p>I&#8217;ll post more when I get a second, but it&#8217;ll take some time to digest.</p>
<p>For what it&#8217;s worth, my present take on the Singularity is a cross of Vinge&#8217;s and something Stoss said at a WorldCon party (or elsewhere), and Kurzweil, despite some inherent contradictions:</p>
<blockquote>
<p>1. The future beyond a singularity is fundamentally unknowable. That&#8217;s the whole point. If we can accurately describe what&#8217;s past a so-called singularity, then it&#8217;s just your basic run-of-the-mill evolution, revolution or &quot;disruptive&quot; sea-change, which happen all the time.</p>
<p>2. People are good at extrapolating linearly, not exponentially. We can predict a few years out, but after that, reality diverges wildly from our naturally limited mental models.</p>
<p>3. We&#8217;ve already gone through multiple &quot;singularities&quot; throughout history, though perhaps increasing in frequency. Singularities are never the end of anything, but a new platform on which to complain about our current ways of life and ponder the color of the pasture on the far side of the next singularity.</p>
</blockquote>
<p>Before their introduction, could people have predicted how the world would change with Writing? Or Computers? Or Corporations? Could they have even predicted the invention itself? If not, then these may also be singularities, points in history that we can only understand by looking back, not forward, like the approaching event horizon of a black hole.</p>
<p>That is not to say that some visionaries don&#8217;t imagine a world past that event horizon or see the event coming. But it&#8217;s all speculation, cautionary or wishful fiction at best.</p>
<p>Even the inventor of the mechanical computer, beyond genius for his day, could not have predicted word processors, virtual reality, AI, or even the CAD software that would have unquestionably helped design his mechanical computer.</p>
<p>One could argue that the <em>One True Singularity </em>will occur only when we (our heirs or errs) become smart enough to see through to the future beyond, i.e., the real Singularity is the last Singularity we will ever know.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.realityprime.com/articles/ieee-spectrum-special-report-the-singularity/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Future of Virtual Worlds</title>
		<link>http://www.realityprime.com/articles/the-future-of-virtual-worlds</link>
		<comments>http://www.realityprime.com/articles/the-future-of-virtual-worlds#comments</comments>
		<pubDate>Tue, 06 May 2008 18:45:05 +0000</pubDate>
		<dc:creator>avi</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Featured]]></category>

		<guid isPermaLink="false">http://www.realityprime.com/articles/the-future-of-virtual-worlds</guid>
		<description><![CDATA[So my friend Cory Ondrejka (co-creator of Second Life) started an interesting thread last week that I didn&#8217;t see covered as widely as it should. Here are his slides &#8212; alas I didn&#8217;t get to hear the narration that went with it, but I can guess.

 &#124; View &#124; Upload your own


What he seems to [...]]]></description>
			<content:encoded><![CDATA[<p>So my friend Cory Ondrejka (co-creator of Second Life) started an <a href="http://ondrejka.blogspot.com/2008/04/apoc-week-13-aka-future-of-virtual.html">interesting thread</a> last week that I didn&#8217;t see covered as widely as it should. Here are his slides &#8212; alas I didn&#8217;t get to hear the narration that went with it, but I can guess.</p>
<div id="__ss_370385" style="width: 425px; text-align: left;"><object style="margin:0px" height="355" width="425"><param name="movie" value="http://static.slideshare.net/swf/ssplayer2.swf?doc=uscfaculty7-1209046555308190-8"/><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed src="http://static.slideshare.net/swf/ssplayer2.swf?doc=uscfaculty7-1209046555308190-8" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object></p>
<div style="font-size: 11px; font-family: tahoma,arial; height: 26px; padding-top: 2px;"><a href="http://www.slideshare.net/?src=embed"><img alt="SlideShare" style="border: 0px none ; margin-bottom: -5px;" src="http://static.slideshare.net/swf/logo_embd.png" /></a> | <a title="View this slideshow on SlideShare" href="http://www.slideshare.net/CoryOndrejka/usc-faculty-seminar-42208">View</a> | <a href="http://www.slideshare.net/upload">Upload your own</a></div>
</div>
<p><img height="0" width="0" border="0" src="http://counters.gigya.com/wildfire/CIMP/bT*xJmx*PTEyMTAwOTQzMjQxMDcmcHQ9MTIxMDA5NDMzMDcwMSZwPTEwMTkxJmQ9Jm49Jmc9Mg==.jpg" style="visibility: hidden; width: 0px; height: 0px;" alt="" /></p>
<p>What he seems to be describing is apparently not too far from what I&#8217;ve been <a href="http://www.realityprime.com/news/enter-the-vr-contact-lens" target="_blank">writing</a> <a href="http://www.realityprime.com/articles/googles-virtual-world-redux" target="_blank">about</a> <a href="http://www.realityprime.com/articles/the-augmented-world" target="_blank">for a while</a>. The part I&#8217;m still skeptical about is the life-logging, and probably because of my own preference for privacy. You&#8217;ll notice I don&#8217;t twitter. I have a hard time believing anyone would even care to follow what I do from moment to moment. And I think <em>careful editing</em> is the secret to any compelling narrative. I just don&#8217;t want to put gigabytes of sub-standard, often mundane, prose out there into the digital firmament.</p>
<p>But putting that aside, the germ (and/or gem) of what he&#8217;s saying, and the part I totally agree with, is this notion of a pervasive synthesis of augmented, mirror, and alternate realities &#8212; no need to distinguish between those arbitrary categories. Turns out, there&#8217;s an old word for this which I think we can now safely revive to summarize the intent:</p>
<p align="center"><strong>magic<br />
</strong></p>
<p><span id="more-223"></span>Practical Magic (not the <a href="http://practicalmagic.warnerbros.com/" target="_blank">movie</a> that has usurped the term) is ultimately what&#8217;s driving the vast majority of virtual worlds interest. When I enter a virtual world, I can suddenly do and be things I wouldn&#8217;t even try in real life. If that kind of magic invades the real world (and it already has, as I&#8217;ll get to in a moment), I get the best of both worlds. I don&#8217;t need to &quot;log on&quot; or even &quot;go anywhere,&quot; but I can now do things that would have been inconceivable to people just 100 years ago, at least outside of science fiction or fantasy genres.</p>
<p>What would I do? Well, for one thing, in an augmented/mirror world, I can gain a degree of omniscience, knowing anything that has been modeled with meta-data. I can point to objects and see their history, their meaning, their &#8216;DNA.&#8217; But why limit ourselves to overlaying information onto boring real world objects. When important objects exist only as bits of data, piped to my brain via (pick: <a target="_blank" href="http://www.realityprime.com/articles/the-future-of-display">desk/wall-mounted</a>, <a target="_blank" href="http://www.realityprime.com/articles/toshibas-head-mounted-washing-machine">head-mounted</a>, <a target="_blank" href="http://www.realityprime.com/news/enter-the-vr-contact-lens">eye-mounted</a>, or <a target="_blank" href="http://www.realityprime.com/articles/the-augmented-world">cortical interface</a>), I can now manipulate those in any way I choose, as if by magic. I can morph them according to my needs or whims. I can <a target="_blank" href="http://www.realityprime.com/articles/second-life-and-the-post-scarcity-world">clone</a> them. I can make them do seemingly magical things &#8212; fly, dance, interact with real and virtual objects, do work for me, entertain me, and so on.</p>
<p>Is this sort of magic really so new? No. Remember the telephone? It&#8217;s a device for putting people, who may be on the other side of the world, right beside your ear. That&#8217;s pretty damn magical, considering the ordinary limitations of time, space, and airport security. Cut the wires and walk around while doing so and it&#8217;s even more magical. Television would be even more magical still, if only the content wasn&#8217;t designed to turn your brain to consumer mush (quality of TV is indeed in the eye of the beholder, and by that I mean the <a target="_blank" href="http://en.wikipedia.org/wiki/Beholder">D&amp;D monster</a> that can paralyze you with a glance). All in all, we have been increasingly bypassing limits of the real world through communication technology.</p>
<p>What this all comes down to is not just the future of virtual worlds, but the future of <a target="_blank" href="http://www.realityprime.com/articles/web-3d-part-5">communication</a> itself, of which pure virtual worlds are only the most natural embodiment. Virtual Worlds are not a <strong>place</strong>, it turns out. They&#8217;re not a novel <strong>software application</strong> either. They are a key component of human-to-human communication, as old as humanity itself. Virtual worlds are what sit in our brains to reflect (i.e., model) the world around us. They are what sit in the bits in a computer&#8217;s memory to do much the same, but without as much intelligence or understanding. And they are what &quot;virtually&quot; sit between us when we try to communicate what&#8217;s in our heads &#8212; they are <em>concepts in context, encapsulated in content</em>.</p>
<p>The <em>real world &#8212; </em>physics, biology, even time &#8212; has very little to do with it, except as it serves as the <em>funnel </em>through which we must inevitably pass such information in order to communicate (see &quot;limits&quot; above). Virtual worlds widen that funnel to a full-on wind tunnel. They permit all sorts of sensory, conceptual, and sometimes even factual information to pass between us more easily.And what the future holds for that goes way beyond the &quot;metaversal,&quot; almost cartoony manifestations we see today. Virtual worlds have the potential to literally open my mind to yours, as directly as possible, to allow the free flow of information back and forth: ideas, and expressions moving between people in a massive increase of both bandwidth <em>and </em>understanding. That&#8217;s what we&#8217;re dealing with here, and nothing less.</p>
<p>So lets get back to what it will look like. The easiest thing to imagine is the virtual meeting &#8212; that&#8217;s being built already. Seven people spread across seven cities can meet around a common table. Now, forget the large projection screens and make it a cafe table in Paris, or an office anywhere. The easiest way to make that work is to give everyone their own virtual view overlaid on the real world. It&#8217;s not the only way. But it&#8217;s certainly the most compatible with our current business models. [Imagine if that cafe has to shell out $30k (even US dollars) for a VR-enabled table+room and it's much less likely to happen.]</p>
<p>So now we walk around our daily fog with displays that let us see things that aren&#8217;t really there. What then? Well, we&#8217;ll need to interact with them. 3D Cameras are becoming good enough such that we can skip the data gloves of my youth. It doesn&#8217;t give much force feedback, but there are better ways of doing both on the horizon &#8212; if we can help <a target="_blank" href="http://www.msnbc.msn.com/id/11103352/">paralyzed people walk via electrical stimulation</a>, we will eventually be able to simulate sensation and force feedback as neural impulses with no mechanical linkage required. So-called mind-reading rigs may also do a better job deciphering our motions with zero latency and even some predictive abilities to infer intent.</p>
<p>What does it mean, then, if I can not only twitter my friends constantly, or in turn be-twitted, but I can now see my friends around me all day long, as virtual &quot;ghosts&quot; not quite haunting my active life. What happens when I&#8217;m sitting at work and my mother (who is in Florida in RL) strolls into my office to talk to me about a recipe for chicken soup? Or better yet, my son plays with his friends in an entirely virtual game of cops and robbers, running through our real neighborhood, shooting at imaginary (to everyone else) targets?</p>
<p>There&#8217;s not much left to hold this back except the display and sensory technology, and that&#8217;s almost ready. Laser retinal scanners will be mass-marketed in the next 2-3 years, in the way blue-tooth headsets are becoming ubiquitous today. Give it 3-5 years to solve capturing your facial expressions from such a headset, just to be safe.</p>
<p>Rendering, by itself, will be more than good enough in 3 years. For an ever-diminishing price, we will have completely photo-realistic views of objects in real-time, overlaid and indistinguishable from the material ones. And within 5 years, even virtual humans will cross the uncanny valley and come up the other side &#8212; <strike>especially</strike> at least when those virtual humans are driven by real ones at the other end, meaning that teleconferencing can finally do away with video feeds and go 100% virtual in the next 5 years. That alone frees up a few degrees of freedom in terms of interaction.</p>
<p>So that&#8217;s most of what Cory was referring to, I imagine. I don&#8217;t even think it&#8217;ll take as long as he does, except to find the compelling applications. And the part that obviously interests me the most are the issues of communication &#8212; inventing new ways to share ideas that transcend the limits of the real world we&#8217;ve come to know and overcome.</p>
<p>Now, back to life-logging. In all this, the one element that should be clear is how much <em>more control </em>I can exert over my environment, real and virtual. It&#8217;s all about adding magic and power over your world. So I still am not convinced I will ever choose to give up so much power by giving out the stream of information (most of it useless) generated by my daily activities, where I look, who I see, etc.. Using that in a sort of &#8216;local augmented memory&#8217; is fine. But sending it up to Google, or Lifepress (a fictional Wordpress for lifelogging), etc.. is where I&#8217;d draw the line. (note: we already do give a lot of info for free in our daily credit transactions, but slightly less revealing).</p>
<p>In fact, I&#8217;d even be concerned about simply outsourcing any visual enhancement of my perceptual space to some seemingly benevolent company. What happens, for example, when that company modifies what I see to subtly persuade me to act a different way? (buy something, vote some way, etc..)&nbsp;</p>
<p>Well, I can certainly imagine a few useful tools that would at least offer to give me more power (more magic) through recording everything I see or do. But those recordings would inevitably get out or be used by someone to mine me for profit. And so I&#8217;m still waiting to see the compelling case for such a potential power loss. It may yet happen, but on this point, I remain to be convinced.</p>
<p>What do you think?</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.realityprime.com/articles/the-future-of-virtual-worlds/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Google Earth, for the Human Body</title>
		<link>http://www.realityprime.com/articles/google-earth-for-the-human-body</link>
		<comments>http://www.realityprime.com/articles/google-earth-for-the-human-body#comments</comments>
		<pubDate>Sun, 30 Sep 2007 18:55:50 +0000</pubDate>
		<dc:creator>avi</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Featured]]></category>

		<guid isPermaLink="false">http://www.realityprime.com/articles/google-earth-for-the-human-body</guid>
		<description><![CDATA[Researchers at IBM Zurich Research Laboratory claim a novel approach to accessing patient medical records &#8212; using the human body as the 3D framework in the same way that Google Earth uses the Earth as a framework to fuse and navigate geospatial information. Spin the body, click on a body part, and zoom in closer [...]]]></description>
			<content:encoded><![CDATA[<p><a target="_blank" href="http://www.zurich.ibm.com/images/news/asme/Asme1_300_RGB.jpg"><img align="right" alt="" src="http://www.zurich.ibm.com/images/news/asme/Asme1_300_RGB_tn.jpg" /></a>Researchers at <a target="_blank" href="http://www.zurich.ibm.com/news/07/asme.html">IBM Zurich Research Laboratory</a> claim a novel approach to accessing patient medical records &#8212; using the human body as the 3D framework in the same way that Google Earth uses the Earth as a framework to fuse and navigate geospatial information. Spin the body, click on a body part, and zoom in closer to get more information.</p>
<p><span id="more-202"></span></p>
<p>&nbsp;</p>
<hr width="100%" size="2" />
<p>Sections:</p>
<ol>
<li><strong>The Big Picture</strong></li>
<li><strong><a href="#Open">Towards an Open Future</a></strong></li>
<li><strong><a href="#Conclusion">Conclusion</a></strong></li>
</ol>
<hr width="100%" size="2" />
<p>
<strong>The Big Picture</strong></p>
<p>Just to be clear, I&#8217;m going to move the discussion beyond the IBM system above to talk about the &quot;big picture&quot; of what&#8217;s coming in the future. Despite the press release, the IBM system is not quite &quot;Google Earth&quot; just yet, because GE is much more than spinning a globe with URLs to click &#8212; it&#8217;s really a framework for fusing just about any geospatial data together, especially user-generated content, as well as a 3D search engine for the same. But building a true &quot;Google Earth&quot; for the human body turns out to be a challenge many times harder than building the actual Google Earth itself (which was not easy either). Any naive implementation is doomed to be a flash in the pan, a cute but limited toy, if it works at all.</p>
<p>How can I be so bold in my prediction? Because I&#8217;ve been researching this beast for a number of years and I have a pretty good idea of what it will take to build it.</p>
<p>For starters, the human body is volumetric, Google Earth isn&#8217;t &#8212; it&#8217;s only <em>mostly </em>3D &#8212; most things live on the surface of our lumpy oblate sphereoid, divided up into essentially 2D zones, with altitude added on top. You can&#8217;t (yet) fly inside the Earth and see the layers: crust, mantle, and so on or add any data there. You can&#8217;t even see a cross-section, though at least that would be easy enough to add with some graphics engine tricks.</p>
<p>But the biggest problem, beyond dimensionality, scalability, data storage and streaming, and even beyond 3D navigation metaphors (which are all hard enough on their own) is the one most people don&#8217;t even think about until they sit down and try to actually build something like this &#8212; <strong>topology, geography, cartography</strong> &#8212; coming up with the equivalent of <em>latitude </em>and <em>longitude or even X,Y,Z, </em>for the human body. Oh, that.</p>
<p>Think about it. Cartography in the geographic context has been developing for many centuries, first in a crude approximation (&quot;here, there be monsters&#8230;&quot;) and growing closer and closer to an accurate representation of the real world (&quot;here, there be McDonalds&#8230;&quot;).</p>
<p>Cartography has had time to mature, to work out solutions for problems such as: <em>how do we best project our lumpy oblate sphereoid onto a 2D piece of paper to most accurately convey relative size and distance?</em> And, it turns out, there are a dozen or so significant coordinate systems to answer that very question, for various purposes and to varying degrees of success. And now 3D virtual globes bring us back to a more spherical real-time reality and the field evolves&nbsp; (some would say, revolutionizes) yet again.</p>
<p>But &quot;human body mapping&quot; has a technically much harder problem to solve: there is only one Earth, but there isn&#8217;t even<em> one official human body to refer to&nbsp; </em>&#8211; more like <em>seven billion </em>unique ones, and not just superficially, in terms of height, weight, or sex&#8230; The location, orientation, shape and size of body parts, organs, even blood vessels, can vary even within a single person over time, as when we move or sit, are injured or sick, pregnant or tense, and especially as we age. The biospatial mapping problem is effectively <strong>four dimensional</strong>, not three.</p>
<p>The general solution medicine has come up with to deal with that dynamism and individuality is an overly crude and ambiguous mapping system using words like <em>anterior </em>and <em>posterior, medial </em>and <em>lateral </em>with simple counting systems (C1, D6), paired with names that are the stuff of nightmares for first year med students, patients, and software engineers alike. Anatomy textbooks typically resort to artistically rendered, idealized drawings and a few sample photos to help teach what goes where. And then students spend time with actual cadavers getting their gloves dirty to really understand.</p>
<p>Here&#8217;s an analogy to help you understand how coarse and confusing the current mapping system can be. It would be like trying to find a specific apartment at the <em>plexus </em>of the <em>posteromedial canal of the Islets of </em><font size="-1"><em>Manna-hata</em> and the </font><em>posterior broad main artery of New Amsterdam. </em>Sure, you could show up at the corner of Broadway and Canal St. in Manhattan, given that description&#8230; and a Latin-to-English dictionary&#8230; and perhaps a history book&#8230; But without an actual street address and apartment number &#8212; or the equivalent of latitude, longitude, and altitude &#8212; you&#8217;d still have to look around for a while until you found it, and then you&#8217;d have to remember its location for next time.</p>
<p>However, that and a magic marker are how many surgeons specify where they&#8217;ll place incisions on your body after looking at a few 2D x-rays and doing some hefty mental gymnastics. That&#8217;s one reason surgeons get paid so much, and also why there are as many dumb mistakes as there are (which goes back to cost). Just as a map of the coast could save your royal armada from doom, a correct digital dynamic map of the human body would save many lives and who knows how much money.</p>
<p>And the problem is inordinately harder when you talk about the human nervous system. Compared to the brain, most of the body is relatively uniform from person to person, even accounting for size &#8212; and simple too. The cortical folds, or convolutions (known as <em>gyri </em>and <em>solci</em>), are as unique as any fingerprint and apparently their topology is functionally significant as well. And if there&#8217;s any place where precision <em>really </em>matters, it&#8217;s in the brain. We don&#8217;t want neurosurgeons stomping around our brains like conquistadors, exploring if you will, to determine if they&#8217;re planting the national flag in the correct tissue.</p>
<p><strong><a name="Open"></a>Towards the Open Future<br />
</strong></p>
<p>So researchers have been trying to come up with ways to map the brain that can apply to any number of people, despite our many differences and the complexity of the information. It&#8217;s an active area of research, with the ultimate goal of solving that important problem for the whole body too, from birth to death and moment to moment. It can and will be done &#8212; and hopefully, in an <strong>open </strong>way, especially if the goal is to unify doctor-patient communication, medical teaching curricula, and even scientific discourse under one framework and with one free-to-use application.</p>
<p>Now, that won&#8217;t stop a few eager <a target="_blank" href="http://www.ehuman.com/">startups</a> from offering up solutions under dream of being &quot;The Next Google Earth,&quot; hoping to be snatched up by Google or go public. That can&#8217;t be avoided, I suppose. But when we start talking about mapping your MRIs, CAT scans, x-rays, surgeries, dental work, and so on, managing all of your health information in a way that works from person to person and time to time, we really need to do it right from the get-go. Lives are on the line, and trillions of dollars of vested interest are looking on with intense interest. The problem is just too big, and too complex for any single entity to solve or especially <em>own</em>, even Google, Microsoft and IBM &#8212; and they know it, or at least they say they do. Time will tell.</p>
<p>Remember, Keyhole and later Google only patented certain key technical features of how it works, not the fundamental &quot;how to map the Earth&quot; solution. And Google&#8217;s goal is now to make KML an open standard, which is the right approach to building such a comprehensive framework to unify all geospatial data. The same kind of approach would need to apply to biospatial data, with even more work on filtering and privacy controls for the personal specifics.</p>
<p>But when the time comes that researchers and hobbyists alike find easy access to that fully-annotatable Google-Earth-like application for the human body, on which they can post <strong>HML </strong>(&quot;human markup language,&quot; which has so far been focused on superficial, external, or expressed traits) or find answers, you&#8217;ll really have something special. You&#8217;ll see the same kind of emergence and new market potential we see with Google Earth and the like.</p>
<p>So, for example&#8230; The software might show you and your doctor your virtual body, with your own medical history of course, but now with all sorts of added information re-mapped from external data sources to match your personal details. Click on your liver, and you&#8217;ll bring up expert systems to help diagnose concerns, or reveal the latest experimental treatments, drug suggestions, or links to Alcoholics Anonymous. Medical students would have access to the standard course-work as well as the latest research, contradicting the 30-year old information that&#8217;s still being taught (and occasionally killing people) around the world.</p>
<p>I know several of the leading academic researchers who are working on the biospatial problem (outside of Google, of course) &#8212; especially in the brain &#8212; and it&#8217;s a been keen area of interest for me and my wife (who is a neuroscientist and neuroanatomy lecturer, as it turns out). But it will take time and patience and hard work to see the results from the various teams and companies and work through the inevitable phases of secrecy and hype to get to the other side.</p>
<p><strong><a name="Conclusion"></a>Conclusion</strong></p>
<p>While it&#8217;s powerful and timely, the IBM Zurich project is not exactly a new idea. Their claim is somewhat of a stretch, as is the comparison to Google Earth just yet, though their progress thus far is still impressive. In reality, some of the same people who were thinking ahead about using the Earth as the most intuitive interface for geospatial data ten, fifteen years ago were well aware of (and even doing research on) how the human body would also make the ideal interface to things like&#8230; say, patient medical records&#8230;</p>
<p>And that was before the general public started seeing the benefits of a 3D spatial browser like Google Earth. But what&#8217;s described from IBM research is only a sliver of the potential win here. I would consider this demo to be the &quot;low hanging fruit&quot; compared to what&#8217;s possible and coming in the next 3-5 years&#8230;</p>
<p>Keep in mind, when I wrote the old code to draw names on curving roads and near placemarks for the very first version of Keyhole&#8217;s software, I could read a few basic textbooks on cartography and apply my engineering skills to make it work in a dynamic, real-time context. When I wrote the early pre-cursor to KML (for adding UI elements and new features mostly), I chose to make up a simple lexical grammar because XML was barely known and not yet standard. In the case of mapping the human body, we&#8217;re still in 1492, very little is settled, and it&#8217;s still is a whole new world we&#8217;re trying to conquer and understand.</p>
<p>_________</p>
<p><sub>P.S. apologies to Native-Americans for being on the metaphorical butt-end of my Columbus references.</sub></p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.realityprime.com/articles/google-earth-for-the-human-body/feed</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>The [Predicted] Future of Google&#8217;s Street View</title>
		<link>http://www.realityprime.com/articles/the-predicted-future-of-googles-street-view</link>
		<comments>http://www.realityprime.com/articles/the-predicted-future-of-googles-street-view#comments</comments>
		<pubDate>Mon, 13 Aug 2007 13:32:01 +0000</pubDate>
		<dc:creator>avi</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Featured]]></category>

		<guid isPermaLink="false">http://www.realityprime.com/articles/the-predicted-future-of-googles-street-view</guid>
		<description><![CDATA[In guessing what Google and the marketplace have in store for us, I&#8217;m taking into account both what&#8217;s technologically feasible, now and on the horizon, and what I think people will demand.
I blogged a while back about virtualization and privacy issues for both Street View and Maps/Earth, which, for starters, implies to a [predicted] future [...]]]></description>
			<content:encoded><![CDATA[<p>In guessing what Google and the marketplace have in store for us, I&#8217;m taking into account both what&#8217;s technologically feasible, now and on the horizon, and what I think people will demand.</p>
<p>I <a target="_blank" href="http://www.brownianemotion.org/2007/06/08/g-privacy/">blogged</a> a while back about virtualization and privacy issues for both Street View and Maps/Earth, which, for starters, implies to a [predicted] future version of Street View that erases people and even cars from the imagery you see. So let&#8217;s start there.</p>
<p><span id="more-190"></span>The main reason for removing people and cars is the same reason you&#8217;d want the base Google Earth imagery to lack clouds:</p>
<ul>
<li>These things tend to block your view. You can&#8217;t really look behind them in a simple (essentially 2D) panoramic image.</li>
<li>They only represent one snapshot in time vs. a broader/more virtualized essence of the place.</li>
<li>They make it confusing to add dynamic versions of the same things on top of what&#8217;s permanently baked into the imagery. You can already see this with 3D buildings on top of 2D ones. (similarly, adding dynamic lighting and shadows is also hard when there are already shadows in the imagery)</li>
<li>And in the case of people, we tend to not like being included in commercial imagery without our permission &#8212; perhaps cars and clouds feel the same way, but they don&#8217;t represent any known marketing demographic, nor do they sue.</li>
</ul>
<p>Now, that doesn&#8217;t mean there shouldn&#8217;t ever be people or cars or clouds in Google&#8217;s version(s) of the Earth. It just means those should only be included for a much better purpose than &quot;they just happened to get caught in the camera&#8217;s lens&quot; and they should be first removed, and then re-added as needed.</p>
<p>For example, dynamic weather overlays make a lot of sense, once you remove the baked-in clouds. Moving 3D avatars also make sense in an opt-in application, e.g., in some 3D augmented-reality social network. And why not represent the real-time traffic layer with some corresponding density of cars?</p>
<p>All of those issues are solvable, if you first remove the items from the imagery and then add them back. The simplest method of removal is <em>oversampling. </em>Imagine taking multiple photos of the same place over a number of days. Most of the time, pixel for pixel, transient objects will show up in only one of the images, so you can basically vote on each pixel &#8212; majority wins. But this requires multiple passes over each city, when Google is presently racing to get coverage everywhere, and there are alignment issues, so give it some time (and perhaps some public pressure).</p>
<p>One obvious thing that Google can do right away is better integrate Street View with its generally-strong (though not always right) driving directions &#8212; now with impressive &quot;route-dragging.&quot; It turns out, there&#8217;s a very compelling reason for integrating Street View with driving directions: <em>most people can&#8217;t read maps.</em></p>
<p>Basically, there&#8217;s two kinds of people in the world &#8212; those who maintain re-orientable 2D/3D mental maps of the world around them, and those who navigate mainly by visual cues, landmarks, i.e., what they can see from their direct 1st-person perspective.</p>
<p>The people who can re-orient their mental maps have what we call high spatial cognition. The vast majority of the human race, however, has just enough spatial cognition to reach for a bag of potato chips and avoid walking into walls. That doesn&#8217;t mean those people are dumb &#8212; until recently, high spatial cognition was generally only useful for throwing spears. There are many other kinds of intelligence. And those with high spatial intelligence are simply more cut out for jobs in 3D graphics, architecture, industrial design, and perhaps billiards. People who <a target="_blank" href="http://www.techcrunch.com/2007/06/14/the-3d-realvirtual-world-hybrid-how-far-away/">can&#8217;t tell the difference</a> between panoramic 2D images and actual 3D graphics might be really good at business or marketing, for example, but still get lost in a cul-de-sac.</p>
<p>The point is that driving directions, for the vast majority of the world, would do much better with a series of 1st person images or video that shows you where to get off the highway with an actual image of the exit sign and off-ramp, saying, &quot;turn right here&quot; in the same way that one might say to a driver, &quot;turn where that blue car just went.&quot; That, and as I discovered this weekend, the Home Depot on Route 17 is not quite where Google Maps says it us. Having images of my route would be a nice &quot;sanity check.&quot;</p>
<p>Frankly, I&#8217;d expect this feature way before true virtualization as described above. But the closest they come to it right now is that you can use Street View when interactively examining your driving directions &#8212; drag the little yellow AOL-like guy around to see just one Street View image at a time. Perhaps they&#8217;ll soon offer to replace or combine the series of overhead street maps (which they include when you print) with 1st person views of each stage of navigation along the route.</p>
<p>What might come after that? The main thing to do, I&#8217;m guessing, is integrate the vast quantities of Street View data with Google Earth in a more 3D fashion, solving these 2D panoramas for depth and turning everything into textures and polygons.</p>
<p>The ultimate goal would be a view of the Earth not only with basic buildings, but with well detailed street-level views so you could zoom down and even [virtually] walk around. Go one step further, and Google could take their $10 per store photography project and recreate virtual interiors of semi-public places, perhaps with links or embedded shopping when you go inside.</p>
<p><em>That&#8217;s</em> when I think we might start to see avatars in Google Earth &#8212; when there&#8217;s some actual reason for people to have a sense of their own body and of other people in the virtual world, not that a Second Life / GE combination is likely, for reasons I&#8217;ve <a href="http://www.realityprime.com/articles/second-earth" target="_blank">already</a> outlined.</p>
<p>What else might there be? Well, integrating live street cameras would be a neat trick. All it really takes is knowing the position and orientation of the camera and coming up with a flash video player that can project 2D movies on an arbitrary 3D polygon, rather than always in the plane of your screen.&nbsp;It&#8217;s even simpler if the video is in a separate window.</p>
<p>Having the density of image capture points (aka &quot;nodes&quot;) be high enough to resemble moving video is also on the horizon. The cameras Google is reportedly driving around with are capable of capturing and processing 30 FPS and 360-degrees [edit: see the first comment below for a cool example of <strong>panoramic video</strong>]. The only real limitation is server-side storage for the higher density of nodes (not a big deal for Google) and bandwidth to your browser. Caveat: virtualizing the scene to a real-time 3D model is still preferable to streaming video because we don&#8217;t always want to move in the exact path (and direction) that Google took. Just imagine trying to drive down a street backwards &#8212; with Street View Video, time would seem to run backwards.</p>
<p>Clickable store-front information is an obvious must-have feature &#8212; show menus for restaurants, and so on. And like Google Earth, we&#8217;ll want to turn on information layers beyond the mere street names. Given the vast store of information already in Google Earth, adding this to Street View shouldn&#8217;t be too hard &#8212; even rendering KML should be doable, given enough time and energy.</p>
<p>The only really hard problem is the one I mentioned earlier &#8212; turning essentially 2D panoramic Street View images into 3D models &#8212; the start of the art still requires some manual labor, which kind of hampers any planet-scale effort. But new camera technology for recording distance per pixel could soon save the day, as well as an interesting <a href="http://www.realityprime.com/articles/more-on-google-acquires-imageamerica" target="_blank">recent acquisition</a>.</p>
<p>At that point, and as browsers are more and more capable of 3D, I expect to see Google Street View, Maps, and Earth become one truly integrated product, offering different perspectives on the same digital earth.</p>
<p>As always, I&#8217;m merely an observer. Google doesn&#8217;t tell me their plans. But let&#8217;s check back in a year or so and see how much of this comes to pass.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.realityprime.com/articles/the-predicted-future-of-googles-street-view/feed</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>How Google Earth [Really] Works</title>
		<link>http://www.realityprime.com/articles/how-google-earth-really-works</link>
		<comments>http://www.realityprime.com/articles/how-google-earth-really-works#comments</comments>
		<pubDate>Tue, 03 Jul 2007 18:01:32 +0000</pubDate>
		<dc:creator>avi</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Featured]]></category>

		<guid isPermaLink="false">http://www.realityprime.com/articles/how-google-earth-really-works</guid>
		<description><![CDATA[Introduction
After reading an article called &#34;How Google Earth Works&#34; on the great site HowStuffWorks.com, it became apparent that the article was more of a &#34;how cool it is&#34; and &#34;here&#8217;s how to use it&#34; than a &#34;how Google Earth [really] works.&#34;
So I thought there might be some interest, and despite some valid intellectual property concerns, [...]]]></description>
			<content:encoded><![CDATA[<h2>Introduction</h2>
<p>After reading an article called <a target="_blank" href="http://travel.howstuffworks.com/google-earth.htm">&quot;How Google Earth Works&quot;</a> on the great site <a target="_blank" href="http://www.howstuffworks.com/">HowStuffWorks.com</a>, it became apparent that the article was more of a &quot;how cool it is&quot; and &quot;here&#8217;s how to use it&quot; than a &quot;how Google Earth [really] works.&quot;</p>
<p>So I thought there might be some interest, and despite some valid intellectual property concerns, here we are, explaining how at least part of Google Earth works.<br />
<span id="more-151"></span><br />
Keep in mind, those IP issues are real. Keyhole (now known as Google Earth) was attacked once already with claims that they copied someone else&#8217;s inferior (IMO) technology. The suit was completely <a href="http://www.brownianemotion.org/2007/03/08/skyline-patent-infringement-suit-agains-google-earth-dismissed/">dismissed</a> by a judge, but only after many years of pain. Still, it highlights one problem of even talking about this stuff. Anything one says could be fodder for some troll to claim <em>he</em><strong> </strong>invented what you did because it &quot;sounds similar.&quot; The judge in the <em>Skyline v. Google</em> case understood that &quot;sounding similar&quot; is not enough to prove infringement. Not all judges do.</p>
<p>Anyway, the solution to discussing &quot;How Google Earth [Really] Works&quot; is to stick to information that has already been disclosed in various forms, especially in Google&#8217;s own patents, of which there are relatively few. Fewer software patents is better for the world. But in this case, more patents would mean we could talk more openly about the technology, which, btw, was one of the original goals of patents &#8212; a trade of limited monopoly rights in exchange for a real public benefit: <em>disclosure. </em>But I digress&#8230;</p>
<p>For the more technically inclined, you may want to read these patents directly. Be warned: lawyers and technologists sometimes emulsify to form a sort of linguistic mayonnaise, a soul-deadening substance known as Patent English, or <em>Painglish </em>for short . If you&#8217;re brave, or masochistic, here you go:</p>
<p>1. <a href="http://www.google.com/patents?id=J4YOAAAAEBAJ&amp;dq=6618053">Asynchronous Multilevel Texture Pipeline </a><br />
2. <a href="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&amp;Sect2=HITOFF&amp;d=PALL&amp;p=1&amp;u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&amp;r=1&amp;f=G&amp;l=50&amp;s1=7,225,207.PN.&amp;OS=PN/7,225,207&amp;RS=PN/7,225,207">Server for geospatially organized flat file data</a></p>
<p>There are also a few more loosely related Google patents. I don&#8217;t know why these are shouting, but perhaps because they&#8217;re very important to the field. I&#8217;ll hopefully get to these in more detail in future articles:</p>
<p>3. <a href="http://www.google.com/patents?id=Y5h-AAAAEBAJ&amp;dq=inassignee:google&amp;as_drrb_ap=q&amp;as_minm_ap=1&amp;as_miny_ap=2007&amp;as_maxm_ap=1&amp;as_maxy_ap=2007&amp;as_drrb_is=q&amp;as_minm_is=1&amp;as_miny_is=2007&amp;as_maxm_is=1&amp;as_maxy_is=2007">DIGITAL MAPPING SYSTEM</a><br />
4. <a href="http://www.wipo.int/pctdb/en/fetch.jsp?FORM=SEP-0%2FHITNUM%2CB-ENG%2CDP%2CMC%2CPA%2CABSUM-ENG&amp;LANG=ENG&amp;C=10&amp;DBSELECT=PCT&amp;SORT=1183873-KEY&amp;IDB=0&amp;IDOC=1171142&amp;QUERY=%28FP%2Fmap%29+AND+%28PA%2FGoogle%29+&amp;TOTAL=12&amp;SERVER_TYPE=19&amp;LANGUAGE=ENG&amp;IA=US2005009538&amp;TYPE_FIELD=256&amp;DISP=25&amp;START=1&amp;SEARCH_IA=US2005009538&amp;RESULT=11&amp;DISPLAY=DESC">GENERATING AND SERVING TILES IN A DIGITAL MAPPING SYSTEM</a><br />
5. <a href="http://www.wipo.int/pctdb/en/fetch.jsp?DISP=25&amp;IDB=0&amp;SORT=1188101-KEY&amp;LANG=ENG&amp;LANGUAGE=ENG&amp;SERVER_TYPE=19&amp;FORM=SEP-0%2FHITNUM%2CB-ENG%2CDP%2CMC%2CPA%2CABSUM-ENG&amp;IA=US2006046782&amp;TOTAL=261&amp;C=10&amp;SEARCH_IA=US2006046782&amp;START=1&amp;QUERY=%28PA%2Fgoogle%29+&amp;DBSELECT=PCT&amp;TYPE_FIELD=256&amp;RESULT=8&amp;IDOC=1386818&amp;DISPLAY=DESC">DETERMINING ADVERTISEMENTS USING USER INTEREST INFORMATION AND MAP-BASED LOCATION INFORMATION</a> <br />
6. <a target="_blank" href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&amp;Sect2=HITOFF&amp;u=%2Fnetahtml%2FPTO%2Fsearch-adv.html&amp;r=1&amp;p=1&amp;f=G&amp;l=50&amp;d=PG01&amp;S1=20070143345.PGNR.&amp;OS=dn/20070143345&amp;RS=DN/20070143345">ENTITY DISPLAY PRIORITY IN A DISTRIBUTED GEOGRAPHIC INFORMATION SYSTEM</a> (this one will be huge)</p>
<p>And there is this <a target="_blank" href="http://www.cs.virginia.edu/~gfx/Courses/2002/BigData/papers/Texturing/Clipmap.pdf">more informative technical paper from SGI</a> (PDF) on hardware &quot;clipmapping,&quot; which we&#8217;ll refer to later on. Michael Jones, btw, is one of the driving forces behind Google Earth, and as CTO, is still advancing the technology.</p>
<p>I&#8217;m going to stick closely to what&#8217;s been disclosed or is otherwise common technical knowledge. But I will hopefully explain it in a way that most humans can understand and maybe even appreciate. At least that&#8217;s my goal. You can let me know.</p>
<p><strong>Big Caveat: </strong>the Google Earth code base has probably been rewritten several times since I was involved with Keyhole&nbsp; and perhaps even after these patents were submitted. Suffice it to say, the latest implementations may have changed significantly. And even my explanations are going to be so broad (and potentially out-dated) that no one should use this article as the basis for anything except intellectual curiosity and understanding.</p>
<p><strong>Also note:</strong> we&#8217;re going to proceed in reverse, strange as it may seem, from the instant the 3D Earth is drawn on your screen, and later trace back to the time the data is served. I believe this will help explain why things are done as they are and why some other approaches don&#8217;t work nearly as well.</p>
<h2>Part 1, The Result: Drawing a 3D Virtual Globe</h2>
<p>There are two principal differences between Google Maps and Earth that inform how things should ideally work under the hood. The first is the difference between fixed-view (often top-down) 2D &amp; free-perspective 3D rendering. The second is between real-time and pre-rendered graphics. These two distinctions are fading away as the products improve and converge. But they highlight important differences, even today.</p>
<p>What both have in common is that they begin with traditional digital photography &#8212; lots of it &#8212; basically one giant high-resolution (or multi-resolution) picture of the Earth. How they differ is largely in how they render that data.</p>
<p><strong>Consider: </strong>The Earth is approximately 40,000 km around the waist. Whoever says it&#8217;s a small world is being cute. If you stored only one pixel of color data for every square kilometer of surface, a whole-earth image (flattened out in, say, a <a target="_blank" href="http://en.wikipedia.org/wiki/Mercator_projection">mercator projection</a>) would be about 40,000 pixels wide and roughly half as tall. That&#8217;s far more than most 3D graphics hardware can handle today. We&#8217;re talking about an image of 800 megapixels and 2.4 gigabytes at least. Many PCs today don&#8217;t even have 2GB of main memory. And in terms of video RAM, needed to render, a typical PC has maybe 128MB, with a high-end gaming rig having upwards of 512.</p>
<p>And remember, this is just your basic run-of-the-mill one-kilometer-per-pixel whole-earth image. The smallest feature you could resolve with such an image is about 2 kilometers wide (thank you, <a target="_blank" href="http://en.wikipedia.org/wiki/Nyquist_frequency">Mr. Nyquist</a>) &#8212; no buildings, rivers, roads, or people would be apparent. But for most major US cities, Google Earth deals in resolutions that can resolve objects as small as half a meter or less, at least four thousand times denser, or <em>sixteen million times more storage </em>than the above example. </p>
<p>We&#8217;re talking about images that would (and do) literally take many terabytes to store. There is no way that such a thing could ever be drawn on today&#8217;s PCs, especially not in real-time.</p>
<p>And yet it happens every time you run Google Earth.</p>
<p><strong>Consider: </strong>In a true 3D virtual globe, you can arbitrarily tilt and rotate your view to pretty much look anywhere (except perhaps underground &#8212; and even that&#8217;s possible if we had the data). In all 3D globes, there exists some source data, typically, a really high-resolution image of the whole earth&#8217;s surface, or at least the parts for which the company bought data. That source data needs to be delivered to your monitor, mapped onto some virtual sphere or ideally onto small 3D surfaces (triangles, etc..) that mimic the real terrain, mountains, rivers and so on.</p>
<p>If you, as a software designer, decide not allow your view of the Earth to ever tilt or rotate, then congrats, you&#8217;ve simplified the engineering problem and can take some time off. But then you don&#8217;t have Google Earth.</p>
<p>Now, various schemes exist to allow one to &quot;roam&quot; part of this ridiculously large texture. Other mapping applications solve this in their own way, and often with significant limitations or visual artifacts. Most of them simply cut their huge Earth up into small regular tiles, perhaps arranged in a <a href="http://en.wikipedia.org/wiki/Quadtree" target="_blank">quadtree</a>, and draw a certain number of those tiles on your screen at any given time, either in 2D (like Google Maps) or in 3D, like Microsoft&#8217;s Virtual Earth apparently does. </p>
<p>But the way Google Earth solved the problem was truly novel, and worthy of a software patent (and I am generally opposed to software patents). To explain it, we&#8217;ll have to build up a few core concepts. A background in digital signal theory and computer graphics never hurts, but I hope this will be easy enough that that won&#8217;t be necessary.</p>
<p>I&#8217;m not going to explain how <a target="_blank" href="http://en.wikipedia.org/wiki/3D_rendering">3D rendering</a> works &#8212; that&#8217;s covered elsewhere. But I am going to focus on <a target="_blank" href="http://en.wikipedia.org/wiki/Texture_mapping">texture mapping</a> and <em>texture filtering</em> in particular, because the details are vital to making this work. The progression from basic concepts to the more advanced texture filtering will also help you understand why things work this way, and just how amazing this technology really is. If you have the patience, here&#8217;s a very quick lesson in texture filtering.</p>
<h4>The Basics</h4>
<p>The problem of scaling, rotating and warping basic 2D images was solved a long time ago. The most common solution is called <a target="_blank" href="http://en.wikipedia.org/wiki/Bilinear_filtering">Bilinear Filtering</a>. All that really means is that for each new (rotated, scaled, etc..) pixel you want to compute, you take the four &quot;best&quot; pixels from your source image and blend them together. It&#8217;s &quot;bilinear&quot; because it linearly blends two pixels at a time (along one axis), and then linearly blends those two results (along the other axis) for the final answer. </p>
<p>[A &quot;linear blend,&quot; in case it's not clear, is butt simple: take 40% of color A, and 60% of color B and add them together. The 40/60 split is variable, depending on how &quot;important&quot; each contributor is, as long as the total adds up to 100%.]</p>
<p>That functionality is built into your 3D graphics hardware such that your computer can nowadays do literally billions of these calculations per second. Don&#8217;t ask me why your favorite paint program is so slow.</p>
<p>The problem being addressed can be visualized pretty easily &#8212; that&#8217;s what I love about computer graphics. It turns out, whenever we map some source pixels onto different (rotated, scaled, tilted, etc&#8230;) output pixels, visual information is lost.</p>
<p>The problem is called &quot;aliasing&quot; and it occurs because we digitally sampled the original image one way, at some given frequency (aka resolution), and now we&#8217;re re-sampling that digital data in some other way that doesn&#8217;t quite match up.<br />
<strong><br />
</strong></p>
<table height="328" cellspacing="3" cellpadding="3" width="100%" border="0" align="right" summary="">
<tbody>
<tr>
<td>
<p>&nbsp;</p>
<div align="center"><span class="file-link image"><img vspace="10" border="0" src="../../../../../uploads/color4.thumbnail.png" title="color4.png" alt="color4.png" /></span><br />
            <sub>1. A simple low-res (11&#215;11 pixel) image is about to be rotated. (the grid lines are merely to delineate pixels)</p>
<p>            <span class="file-link image"> 			<img hspace="10" height="75" width="79" vspace="30" border="0" src="../../../../../uploads/colorrotzoom.thumbnail.png" title="colorrotzoom.png" alt="colorrotzoom.png" /></span><br />
            3. Close up of one output pixel. Bilinear interpolation averages the &quot;best&quot; four source pixels for each new destination pixel (shown as black border with white dots) based on their relative importance (ideally: fractional area).</sub><a id="file-link-168" href="../../../../../wp-admin/upload.php?style=inline&amp;tab=browse-all&amp;action=view&amp;ID=168&amp;post_id=151&amp;paged" title="colorrot.png" class="file-link image"><br />
            </a></div>
<p>            <font size="1">            </font></td>
<td width="40%">
<div align="center"><span class="file-link image"><img vspace="10" border="0" src="../../../../../uploads/colorrot.thumbnail.png" title="colorrot.png" alt="colorrot.png" /></span><br />
            <sub>2. Each pixel in the destination grid overlaps multiple pixels from the rotating original.<br />
            </sub></div>
<div align="center"><sub><br />
            <span class="file-link image"><img vspace="10" border="0" src="../../../../../uploads/color-smallrot1.thumbnail.png" title="color-smallrot1.png" alt="color-smallrot1.png" /></span><br />
            4. After bilinear interpolation, the resulting rotated image has some clear (or rather blurry) issues.</sub></div>
</td>
</tr>
</tbody>
</table>
<p><strong><br />
</strong>Now, when we talk about output pixels and destinations, it doesn&#8217;t much matter if the destination is a bitmap in a paint program or the 3D application window that shows the Earth. Aliasing happens whenever the output pixels do not line up with the sampling interval (frequency, resolution) of the source image. And aliasing makes for poor visual results. Dealing with aliasing is about half of what texture mapping is all about. The rest is mostly memory management. And the constraints of both inform how Google Earth works.</p>
<p>The mission then is to minimize aliasing through cleverness and good design. The best way to do this is to get as close as possible to a 1:1 correspondence between input and output pixels, or at least to generate so many extra pixels that we can safely down-sample the output to minimize aliasing (also known as &quot;anti-aliasing&quot;). We often do both.</p>
<p><strong>Consider: </strong>for resizing images, it only gets worse &#8212; each pixel in your destination image might correspond to hundreds of pixels of source imagery, or vice-versa. Bilinear interpolation, remember, will only pick the best four source pixels and ignore the rest. So it can therefore skip right over important pixels, like edges, shadows, or highlights. If some such pixel is picked for blending during one frame and skipped over subsequently, you&#8217;ll get an ugly &quot;pixel-popping&quot; or scintillation effect. I&#8217;m sure you&#8217;ve seen it in some video games. Now you know why.</p>
<p>Tilting images (or any 3D transformation) is even more problematic, because now we have elements of scaling and rotation, but also a great variation in pixel density across rendered surfaces. For example, in the &quot;near&quot; part of a scene, your nice high-res ground image might be scaled up such that the pixels look blurry. In the &quot;far&quot; part of the scene, your image might appear scintillated (as above) because simple 2&#215;2 bilinear interpolation is necessarily skipping important visual details from time to time.</p>
<table cellspacing="3" cellpadding="3" width="200" border="0" align="center" summary="">
<tbody>
<tr>
<td><img border="0" src="../../../../../uploads/blurry3.png" alt="blurry.png" /></p>
<div align="right"><font size="1"><sup>Copyright, Microsoft Virtual Earth</sup></font></div>
<div align="center"><font size="1"><br />
            <sub>Here&#8217;s an example of where a certain kind of texture filtering causes poor results. The text labels are hardly readable (why they&#8217;re painted into the terrain image at all is another issue).</sub> </font></div>
</td>
</tr>
</tbody>
</table>
<p>&nbsp;</p>
<h4>Better Filtering, Revealed</h4>
<p>Most consumer 3D hardware already supports what&#8217;s called &quot;tri-linear&quot; filtering. With tri-linear and a closely coupled technique called <a href="http://en.wikipedia.org/wiki/Mip_map">mip-mapping</a>, the hardware computes and stores a series of lower resolution versions of your source image or texture map. Each mip-map is automatically down-sampled by a factor of 2, repeatedly, until we reach a 1&#215;1 pixel image whose color is the average of all source image pixels. </p>
<p>So, for example, if you provided the hardware with a nice 512&#215;512 source image, it would compute and store 8 extra mip-levels for you (256, 128, 64, 32, 16, 8, 4, 2, and 1 pixel square). If you stacked those vertically, you might more easily visualize the &quot;mip-stack&quot; as an upside down pyramid, where each mip-level (each horizontal slice) is always 1/2 the width of the one above.</p>
<table cellspacing="1" cellpadding="1" width="200" border="0" align="left" summary="">
<tbody>
<tr>
<td>
<div align="center">&nbsp;</div>
<div align="center"><a id="file-link-171" href="../../../../../wp-admin/upload.php?style=inline&amp;tab=browse-all&amp;action=view&amp;ID=171&amp;post_id=151&amp;paged" title="pyramid.png" class="file-link image">&nbsp; 			</a><a title="Direct link to file" onclick="return false;" href="../../../../../uploads/pyramid2.png"><img height="66" width="128" border="0" alt="pyramid2.png" src="../../../../../uploads/pyramid2.thumbnail.png" /></a></div>
<div align="center"><sub>             Drawing of a mip-map </sub><br />
            <sub>pyramid (not to scale). </sub></p>
<p>            <sub>The X depicts trilinear filtering sampling two mip-levels for an in-between pixel to reduce aliasing.</sub></div>
</td>
</tr>
</tbody>
</table>
<p>During 3D rendering, mip-mapping and tri-linear filtering take each destination pixel, pick the two most appropriate mip-levels, essentially do a bi-linear blend on both, and then blend those two results again (linearly) for the final tri-linear answer. </p>
<p>So for example, say the next pixel would have no aliasing if only the source image had a resolution of 47.5 pixels across. The system has stored power of two mip maps (16, 32, 64&#8230;). So the hardware will cleverly use the 64&#215;64 and 32&#215;32 pixel versions closest to the desired sampling of 47.5, compute a bilinear (4-sample) result for each, and then take those two results and blend them a third time.</p>
<p>That&#8217;s tri-linear filtering in a nutshell, and along with mip-mapping, it goes a great distance to minimizing aliasing for many common cases of 3D transformations.</p>
<p><strong>Remember: </strong>so far, we&#8217;ve been talking about nice, small images, like 512&#215;512 pixels. Our whole-earth image will need to be millions of pixels across. So one might consider making a giant mip-map of our whole-earth image, at say one meter resolution. No problem, right? But you&#8217;ll realize fairly soon that would require a mip-map pyramid 26 levels deep, where the highest resolution mip-level is some <u>66 million</u> pixels across. That simply won&#8217;t fit on any 3D video card on the market, at least not in this decade.</p>
<p>I&#8217;m guessing Microsoft&#8217;s Virtual Earth gets around this limit by cutting their giant earth texture into many smaller distinct tiles of, say, 256 pixels square, where each gets mip-mapped individually. That approach would work to an extent, but it would be relatively slow and give some of the visual artifacts, like the blurring we see above, and a popping in and out of large square areas as you zoom in and out.</p>
<p>There&#8217;s one last concept about mip-maps to understand before we move on to the meat of the issue. Imagine for a moment that the pixels in the mip-map pyramid are actually color-coded as I&#8217;ve indicated above, with an entire layer colored red, another yellow, etc.. Drawing this on a tilted plane (like the Earth&#8217;s ground &quot;plane&quot;) would then seem to &quot;slice through&quot; the pyramid at an interesting angle, using only those parts of the pyramid that are needed for this view.</p>
<p>It&#8217;s this characteristic of mip-mapping that allows Google Earth to exist, as we&#8217;ll see in a minute.</p>
<table cellspacing="6" cellpadding="6" width="200" border="0" align="left" summary="">
<tbody>
<tr>
<td valign="top" align="center"><a href="/uploads/tour_driving4.jpg" onclick="return false;" title="Direct link to file"><img height="64" width="128" border="0" src="../../../../../uploads/tour_driving4.thumbnail.jpg" alt="tour_driving4.jpg" /></a><sub><br />
            A typical tilted Google Earth image (copyright &amp; courtesy Google).</sub><font size="1"><br />
            </font></td>
<td valign="top" align="center"><a id="file-link-167" href="../../../../../uploads/tour_driving5.jpg" title="tour_driving3.jpg" class="file-link image"><img border="0" src="/uploads/tour_driving5.thumbnail.jpg" title="tour_driving3.jpg" alt="tour_driving3.jpg" /></a><sub><br />
            The same view, using color to show which mip levels inform which pixels</sub></td>
</tr>
</tbody>
</table>
<p>The example on the left shows a normal 3D scene from Google Earth, as well as a rough diagram showing from where in the mip-stack a 3D hardware system might find the best source pixels, if they were so colorized.</p>
<p>The nearer area gets filled from the highest-resolution mip-level (red), dropping off to lower and lower resolutions as we get farther from the virtual point of view. This helps avoid the scintillation and other aliasing problems we talked about earlier, and looks quite nice. We get as close as possible to a 1:1 correspondence between source and destination, pixel for pixel, so aliasing is minimized.</p>
<p>Even better still, tri-linear filtering 3D graphics hardware has been improved with something called <a href="http://en.wikipedia.org/wiki/Anisotropic_filtering">anisotropic filtering</a> (a simple preference option in Google Earth) which is essentially the same core idea as the previous examples, but using non-square filters, beyond the basic 2&#215;2. This is very important for visual quality, because even with fancy mip-mapping, if you tilt a textured polygon to a very oblique angle, the hardware must choose a low-resolution mip-level to avoid scintillation on the narrow axis. And that means the whole polygon is sampled at too-low a resolution, when it&#8217;s only one direction that needed to dip down to the low-res stuff. Suffice it to say, if your hardware supports anisotropic filtering, turn it on for best results. It&#8217;s worth every penny.</p>
<h4>Now, to the meat of the issue</h4>
<p>We still have to solve the problem of how to mip-map a texture with millions of pixels in either dimension. <strong>Universal Texture</strong> (in the Google Earth patent) solves the problem while still providing high quality texture filtering. It creates one giant multi-terabyte whole-earth virtual-texture in an extremely clever way. I can say that since I didn&#8217;t actually invent it. Chris Tanner figured out a way to do on your PC what had only ever been done on expensive graphics supercomputers with custom circuitry, called Clip Mapping (see SGI&#8217;s <a target="_blank" href="http://www.cs.virginia.edu/~gfx/Courses/2002/BigData/papers/Texturing/Clipmap.pdf">pdf paper</a>, also by Chris, Michael, et al., for a lot more depth on the original hardware implementation). That technology is essentially what made Google Earth possible. And my very first job on this project was making that work over an internet connection, way back when.</p>
<p>So how does it actually work? </p>
<p>Well, instead of loading and drawing that giant whole-earth texture all at once &#8212; which is impossible on most current hardware &#8212; and instead of chopping it up into millions of tiles and thereby losing the better filtering and efficiency we want, recall from just above that we typically only ever use a narrow slice or column of our full mip-map pyramid at any given time. The angle and height of this virtual column changes quite a bit depending on our current 3D perspective. And this usage pattern is fairly straightforward for a clever algorithm to compute or infer, knowing where you are and what the application is trying to draw.</p>
<table cellspacing="1" cellpadding="1" width="200" border="0" align="right" summary="">
<tbody>
<tr>
<td>
<div align="center"><a id="file-link-170" href="../../../../../uploads/clipmap2.png" title="clipmap.png" class="file-link image"> 			&nbsp;<img src="../../../../../uploads/clipmap2.thumbnail.png" title="clipmap2.png" alt="clipmap2.png" /></a></div>
<div align="center"><sub>A <strong>Universal Texture</strong> is both a mip-map, plus a software emulated clip-stack, meaning it can mimic a mip-map of many more levels and greater ultimate resolution than can fit in any real hardware. <br />
            </sub></div>
<p>            <sub><br />
            </sub></p>
<div align="center"><sub><strong>Note</strong>: though this diagram doesn&#8217;t depict it as precisely as the paper, the clip stack&#8217;s &quot;angle&quot; shifts around to best keep the column centered.</sub></div>
</td>
</tr>
</tbody>
</table>
<p>So this clever algorithm figures out which sections of the larger virtual texture it needs at any given time and pages only those from system memory to your graphics card&#8217;s dedicated texture memory, where it can be drawn very efficiently, even in real-time.</p>
<p>The main modification to basic mip-mapping, from a conceptual point of view, is that the upside down pyramid is no longer just a pyramid, but is now much, much taller, containing a&nbsp; clipped stack of textures, called, oddly enough, a &quot;clip stack,&quot; perhaps 16 to 30+ levels high. Conceptually, it&#8217;s as if you had a giant mip-map pyramid that&#8217;s 16-30 levels deep and millions to billions of pixels wide, but you <em>clipped</em> off the sides &#8212; i.e., the parts you don&#8217;t need right now.</p>
<p>Imagine the Washington monument, upside down and you&#8217;ll get the idea. In fact, imagine that tower leaning this way or that, like the one in Pisa, and you&#8217;ll be even closer. The tower leans in such a way that the pixels inside the tower are what you need for rendering right now. The rest is ignored.</p>
<p>Each clip-level is still twice the resolution of the one &quot;below&quot; it, like all mip-maps, and nice quality filtering still works as before. But since the clip stack is limited to a fixed but roaming footprint, say 512&#215;512 pixels wide (another preference in Google Earth), that means that each clip-level is both twice the effective resolution and half the coverage area of the previous. That&#8217;s exactly what we want. We get all the benefits of a giant mip-map, with only the parts relevant to any given view.</p>
<p>Put another way, Google Earth cleverly and progressively loads high-res information for what&#8217;s at the focal &quot;center&quot; of your view (the red part above), and resolution drops off by powers of two from there. As you tilt and fly and watch the land run towards the horizon, <strong>Universal Texture</strong> is optimally sending only the best and most useful levels of detail to the hardware at any given time. What isn&#8217;t needed, isn&#8217;t even touched. That&#8217;s one thing that makes it ultra-efficient.</p>
<p>It&#8217;s also very memory-efficient. The total texture memory for an earth-sized texture is now (assuming this 512 wide base mip-map, and say 20 extra clip-levels of data) only about <em>17 megabytes</em>, not the dozens to hundreds of <em>terabytes</em> we were threatened with before. It&#8217;s actually doable, and worked in 1999 on 3D hardware that had only 32 MB or less. Other techniques are only now becoming possible with bigger and bigger 3D cards.</p>
<p>In fact, with only 20 clip-levels (plus 9 mip levels for the base pyramid), we see that 2<sup>29 </sup>yields a virtual texture capable of <em>up to 536 million pixels </em>in either dimension. Multiply that by 1/2 vertically, gives an virtual image of a <em>few hundred terapixels </em>in area, or enough excess capacity to represent features as small as 0.15 meters (about 5 inches), wherever the data is available. And that&#8217;s not the actual limit. I simply picked 20 clip levels as a reasonable number. And you thought the race for more megapixels on digital cameras was challenging. Multiply that by a million and you&#8217;re in the planetary ballpack.</p>
<p>Fortunately, for now, Google only really has to store a <em>few dozen </em>terapixels of imagery. The other beauty of the system is that the highest levels of resolution need not exist everywhere for this to work. Wherever the resolution is more limited, wherever there are gaps, missing data, etc.. the system only draws what it has. If there is higher resolution data available, it is fetched and drawn too. If not, the system uses the next lower resolution version of that data (see mip-mapping above) rather than drawing a blank. That&#8217;s exactly why you can zoom into some areas and see only a big blur, where other areas are nice and crisp. It&#8217;s all about data availability, not any hard limit on the 3D rendering. If the data were available, you could see centimeter resolution in the middle of the ocean.</p>
<p>The key then to making this all work is that, as you roam around the 3D Earth, the system can efficiently page new texture data from your local disk cache and system memory into your graphics texture memory. (We&#8217;ll cover some of how stuff gets into your local cache next time). You&#8217;ve literally been watching that texture uploading happen without necessarily realizing it. Hopefully, now you will appreciate all the hard work that went into making this all work so smoothly &#8212; like feeding an entire planet piecewise through a straw.</p>
<p>Finally, there&#8217;s one other item of interest before we move on. The reason this patent emphasizes asynchronous behavior is that these texture bits take some small but cumulative time to upload to your 3D hardware continuously, and that&#8217;s time taken away from drawing 3D images in a smooth, jitter-free fashion or handling easy user input &#8212; not to mention the hardware is typically busy with its own demanding schedule. </p>
<p>To achieve a steady 60 frames per second on most hardware, the texture uploading is divided into small, thin slices that very quickly update graphics video memory with the source data for whatever area you&#8217;re viewing, hopefully just before you need it, but at worst, just after. What&#8217;s really clever is that the system needs only upload the smallest parts of these textures that are needed and it does it without making anyone wait. That means rendering can be smooth and the user interface can be as fluid as possible. Without this asynchronicity, forget about those nice parabolic arcs from coast to coast.</p>
<p>Now, other virtual globes can also virtualize the whole-earth texture, perhaps they cut it into tiles, and even use multiple power-of-two resolutions like GE does. But without the <strong>Universal Texturing</strong> component or something better, they&#8217;ll either be limited to 2D top-down rendering, or they&#8217;ll do 3D rendering with unsatisfying results, blurring, scintillation, and not nearly as good performance for streaming the data from the cache into texture memory for rendering.</p>
<p>And that&#8217;s probably more than you ever wanted to know about how the whole Earth is drawn on your screen each frame.<br />
<sub><br />
</sub></p>
]]></content:encoded>
			<wfw:commentRss>http://www.realityprime.com/articles/how-google-earth-really-works/feed</wfw:commentRss>
		<slash:comments>80</slash:comments>
		</item>
		<item>
		<title>Second Earth</title>
		<link>http://www.realityprime.com/articles/second-earth</link>
		<comments>http://www.realityprime.com/articles/second-earth#comments</comments>
		<pubDate>Sun, 01 Jul 2007 11:34:42 +0000</pubDate>
		<dc:creator>avi</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Featured]]></category>

		<guid isPermaLink="false">http://www.realityprime.com/second-earth</guid>
		<description><![CDATA[Technology Review: Second Earth
Now that Linden Labs has open-sourced the Second Life client, if any Google Earth engineers chose to study it, I might no longer be the only person lucky enough to know both the Google Earth and Second Life internals well enough to make a bold statement on a mashup of the two. [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.technologyreview.com/Infotech/18911/">Technology Review: Second Earth</a></p>
<p>Now that Linden Labs has open-sourced the Second Life client, if any Google Earth engineers chose to study it, I might no longer be the only person lucky enough to know both the Google Earth and Second Life internals well enough to make a bold statement on a mashup of the two. It would be great if others (besides me) could do so soberly. Because all I&rsquo;m hearing lately is a lot of &ldquo;wouldn&rsquo;t it be great?&rdquo; and not much &ldquo;here&rsquo;s how&rdquo; and, better yet, &ldquo;here&rsquo;s why&rdquo; practical discussion. </p>
<p><span id="more-17"></span></p>
<p>I&rsquo;ve said before that I don&rsquo;t think any direct integration between SL and GE is wise at this point, at least not in their present forms and with the present missions of each application. I&rsquo;ll try to elaborate on that and see if I can convince you that the best thing overall is for each app to evolve along its own path. But I will point out a few areas where each could benefit from at least mimicking the other.<br /><span id="more-633"></span><strong><br /> [Edit: See Wade&rsquo;s comment below. I want to also make it clear that the issues of a SL/GE mashup are purely hypothetical, and are meant more to serve as a framework for understanding some important technical and usability issues in these two different kinds of virtual worlds.]<br /></strong> </p>
<h3>The issues:</h3>
<p><strong>Mirror Worlds vs. Virtual Worlds </strong>&ndash; Second Life is fictional, whimsical, experimental. Google Earth is a reflection of the real world, and according to its CTO, will remain so in the future. In SL, you can make or do almost anything. GE is meant to be a platform for delivering geo-referenced information that is strictly relevant and useful (as well as fun) for our first lives. </p>
<p>Now, GE bent those rules a little by adding custom layers and 3D import &mdash; there&rsquo;s no clear rule that says your KML file has to represent a real object or building. And KML search will have a hard time distinguishing between real vs. fictional search results (even as they add pagerank-style features). The problem is that fictional results pollute real-world uses, and if they get out of hand, Google will have to somehow segregate those realms. I definitely don&rsquo;t want to search for directions to the Home Depot and find that my path is blocked by a giant robot or a self-replicating pile of poo.</p>
<p>And for SL, the opposite problem exists &mdash; growth of the landscape was designed to be geometric and ad hoc, not mirroring any real-word geography. There&rsquo;s some exceptions to that as well, one of which I suggested to a SL-oriented colleague a while back, and that is that a single SL sim could serve as a stand-in for the entire earth, in miniature, linking via teleports to any number of other 1:1 scale sims to flesh out the real-earth geography, albeit with giant gaps &mdash; meaning you&rsquo;d have to pop back to the &ldquo;hub&rdquo; to get from place to place &mdash; so you can see some of the difficulties.</p>
<p><strong>Direct Integration </strong>&ndash; In this pure hypothetical, we&rsquo;re talking about the two companies working together (e.g., collaboration, or outright purchase/merger) or GE open sourcing their code so a third party can do the mashup. I don&rsquo;t think that&rsquo;ll happen anytime soon. Now, if Google can pay $1.7B for YouTube, I think anything is possible on the merger front. Money seems to be no object, but they&rsquo;re not being irrational. For the price Linden might demand, they&rsquo;ll need to show, say, 10-20M regular visitors and a revenue model that Google thinks will integrate with their core business and scale indefinitely. </p>
<p>I&rsquo;d also guess, purely based on rumors and the engineering personality type, Google would be more likely to try rolling their own social VW before buying an established company like Linden. If Google does that, and they do as well as Google Video did against YouTube, that&rsquo;s when I think buying Linden would make more sense. I don&rsquo;t expect it to happen, if at all, for several years. An IBM-style purchase is more likely, IMO. But what do I know?</p>
<p><strong>Mashups </strong>&ndash; This is a bit less speculative. What Google could do fairly easily in the near term is release a closed-source version of GE that is more like a toolkit or library, closer to the Microsoft model for their Virtual Earth offering, though designed for real-time rendering. A toolkit would allow someone to build a virtual environment that had both the real earth and whatever avatar systems they wanted to throw in. It wouldn&rsquo;t necessarily be SL, but it would be the first social VW based on the real/mirror Earth. More likely than that, though, is some licensing specific deal for the GE technology if the price is right (I don&rsquo;t know what that price is). </p>
<p>It would also be technically feasible for Google to do the reverse &mdash; embed Google Earth&rsquo;s OpenGL rendering code into the open-sourced SL client, such that anyone using that new SL client could create an instance of a GE globe inside SL as an in-world application as opposed to the actual terrain you stand on. The only problem would be licensing, if that&rsquo;s an issue at all. But without the analogous GE source or any license to use GE&rsquo;s umpteen terabytes of data, neither Linden nor any 3rd parties would be in any position to do this from their end. </p>
<p><strong>Client / Server </strong>&ndash; The one mashup that pundits seem to call for the most &mdash; a horde of real, active, chatting Second Life avatars inside one big shared Google Earth &mdash; is unlikely to happen in the near-term for several technical reasons. One is that SL is much more than the PC client you install. Without the requisite number of SL servers arranged into a lat/long map of the earth, or at least the parts we care about, none of your avatars would be able to interact. So while mashing-up the client you&rsquo;d also need to handle both SL and GE servers at the same time, preferably in some coordinated fashion. For example, if GE holds the terrain and building data, SL servers would need to know about those to perform the physical part of the simulation, collision detection and so on. So there&rsquo;s at least a three-way dialog going on, not just an integrated SL/GE client. And that makes us engineers very unhappy.</p>
<p>Putting just the bodies in GE is easier, although we&rsquo;re talking the cheesy, lame, unsupported version with very little interactivity. You can capture your avatar in a static pose from SL using an OpenGL based interceptor, convert that to Collada format and then KML and import it into GE. Animating it is trickier. Forget about dancing or even moving your limbs. The best you can probably hope for without extensive kludge is using KML&rsquo;s &ldquo;network update&rdquo; feature to fetch your avatar&rsquo;s current location (and that of anyone else near you) from a special server you&rsquo;d supply. In this case, you also won&rsquo;t get collision detection with 3D buildings unless you do it yourself. And similarly, you&rsquo;ll have to handle all interactions among avatars. Essentially, you&rsquo;ll be recreating a kludgey version of SL&rsquo;s simulators and using GE + KML pseudo-scripting as the client rendering engine. It&rsquo;s not something I&rsquo;d want to spend my time on.</p>
<p><strong>Scale </strong>&ndash; Given the land area of the earth is reportedly 150,000,000km<sup>2</sup>, and each SL server currently handles 256&times;256m (16 servers per km<sup>2</sup>), a quick calculation reveals roughly 2.4 billion servers would be needed for the land alone. If we just did Manhattan (87.46 km<sup>2</sup>) , it would take about 1400 servers, or roughly 1/4th of their current total just for one densely populated island. And we&rsquo;re not even talking about the limits on concurrent users yet.</p>
<p>Clearly, if GE and SL were ever to mate, SL would need to move from a rigid grid to something more adaptable, for example, a quad-tree that gets subdivided depending on where the people virtually are, or better yet, based on who is interacting with whom at any given time. I worked with a company briefly (they never got their funding) who was pursuing something like this to handle 1 million simultaneous users. It&rsquo;s not a trivial problem, but would be necessary to scale SL up to a full planet, if they ever see the need.</p>
<p>The next issue of scale is that of &ldquo;levels of detail.&rdquo; GE was designed from the ground up to seamlessly zoom through 20 or more powers of two (2<sup>20</sup> : 1 scaling) when zooming from way out in space down to a spot on the ground. SL was designed to let you walk, teleport, or fly relatively close to the ground. It essentially has four levels of detail &mdash; near, far, and off, plus we&rsquo;ll count the nice 2D overhead map. But SL isn&rsquo;t designed to handle viewing the entire world in so many different scales, smoothly moving through all of them. Without big architectural changes, it would be very difficult to do the kind of flying and zipping around one does in GE.</p>
<p>One final note under the category of &ldquo;scale.&rdquo; The Collada file format may have some built-in features to properly convey the scale of objects, but it&rsquo;s rarely used or even obeyed from what I can tell. All 3D models have built-in assumptions about scale. They&rsquo;re just a bunch of numbers after all, not smart about their context or nature. If someone exports data in centimeters and another program assumes those are feet or meters, you&rsquo;re going to have to write yet another program to convert everything to match up, or your avatar may be as tall as the Empire State Building, or too small to see. </p>
<p><strong>Other issues </strong>&ndash; I haven&rsquo;t even touched on the issues of converting from procedural or parametric objects to polygons, which is something I have a lot of experience with. That&rsquo;s a whole other discussion which I could spend hours on. Suffice it to say, it&rsquo;s a hard problem, but one in which there is an easy solution &mdash; if it&rsquo;s designed in to the system. </p>
<p>But for now, let&rsquo;s worry more about whether or not people want this and then get around to fixing the &ldquo;nitty gritties.&rdquo;</p>
<h3>The How&rsquo;s &mdash; What&rsquo;s really possible?<br /></h3>
<p>1. If Google is so inclined, I&rsquo;d love to see them create a free or licensed DLL that encapsulates the OpenGL rendering code from Google Earth in a form that could be invoked inside of someone else&rsquo;s OpenGL-based 3D engine. It would at a minimum require functions to set the viewing position, layer state, etc&hellip; and probably most things you would already do from the UI. This would at least allow someone to put a Google Earth inside Second Life, as a 3D object one could poke at. It would not be a regular place in SL, but even that could be improved with time. If Google figures out how to monetize GE with ads, such a DLL might require showing those ads as well, or we could see it as a one-off license to some well-funded company.</p>
<p>2. If Linden is so inclined, they could work on a new kind of &ldquo;grid&rdquo; that could scale up to a full earth-sized geography, solving the problem of zooming in and out as a method of managing that new expansive scope. FWIW, I think it&rsquo;s worth their time, because teleporting is a bad way work around the problems of walking and flying. GE&rsquo;s zooming gets you there just as fast, but without losing your geo-spatial awareness. In other words, the continuous pan across the earth keeps your brain working on the relative positions of the places you visit, as in a big mental map. Teleporting loses the advantages of having one big shared space &mdash; you might as well have a bunch of small connected rooms at that point, which some people are working on. If Linden does this, they could be in a position to build an &ldquo;Earth&rdquo; app like in SnowCrash &mdash; one that they can even zoom into as a method of getting around. This need not be a literal mirror of the earth, but the technology I outlined is still important to make it work, regardless of the source data; satellites or users.</p>
<h3>The Why&rsquo;s &mdash; What will people actually want or need?<br /></h3>
<p>Here&rsquo;s the big question, saved for the end. With all the &ldquo;wouldn&rsquo;t it be cool&rdquo; talk, people seem miss the fundamental issue with new technology: <em>people do the dumbest things. </em>It&rsquo;s not that people are necessarily dumb. It&rsquo;s more an issue of finding the &ldquo;lowest-energy&rdquo; solutions rather than the most elegant ones. </p>
<p>MySpace is a good example of a &ldquo;low-energy&rdquo; solution, as compared to many more elegant ones that were presented. And there is no doubt that there will be a host of 3D MySpaces within the next 12 months to further test the theory (I&rsquo;ve consulted for several but I&rsquo;m not promoting any). All of them will face a fundamental problem: <em>What is 3D good for?</em> More specifically, <em>why would I want to visit your virtual living room if you&rsquo;re not there?</em> Those questions have real answers, but perhaps not the most obvious ones.</p>
<p>Now, there are also some great reasons to want avatars in Google Earth. The more GE becomes a destination, rather than a tool to pick your destination, the more likely we are to say &ldquo;meet me on 5th avenue for shopping,&rdquo; and we mean the virtual version of each. This requires the avatars to interact. And if we go a little further to an application where I and a paid designer can work on my new house, you can see a need for GE to add in-world building tools to the mix. </p>
<p>As for a virtual economy and social networking tools, I don&rsquo;t know. As long as Google doesn&rsquo;t have to host your content themselves or care about bandwidth for user data, there really is no cost to them, and no big reason to quantify value for 3D objects. The &ldquo;layer&rdquo; approach makes it much easier to simply turn off content that&rsquo;s just too slow. But for all of these, I see them being so tightly integrated with Google&rsquo;s advertising economy and information delivery mission that I can only imagine them doing this in-house, slowly but surely adding more metaverse-like components that are more or less strictly tied to the real world.</p>
<p>Similarly, the path for SL is to make a bigger and bigger world and scale their way towards more and more simultaneous users. The &ldquo;why&rdquo; of making an entire earth in Second Life seems obvious &mdash; but the old joke &ldquo;it&rsquo;s a small world, but I wouldn&rsquo;t want to paint it&rdquo; comes to mind. If SL users really had 2.3 billion servers to populate, could 10 &#8211; 20 million users even do it in one second lifetime? Not by hand, I think.</p>
<p>GE at least has automated tools for capturing a world that&rsquo;s already out there. Imagine if people really had to build a second version of the whole world by hand, with the level of detail SL desires? </p>
<p>It&rsquo;s unlikely. But that&rsquo;s where procedural modeling comes in. It&rsquo;s not going to recreate the real world either, but there are known and waiting methods for building out entire planets full of detail. However, that&rsquo;s clearly a third way &mdash; not user generated, and not mirroring the real world. And that discussion is best left for another day.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.realityprime.com/articles/second-earth/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Scenegraphs: Past, Present, and Future</title>
		<link>http://www.realityprime.com/articles/scenegraphs-past-present-and-future</link>
		<comments>http://www.realityprime.com/articles/scenegraphs-past-present-and-future#comments</comments>
		<pubDate>Sun, 01 Jul 2007 03:59:12 +0000</pubDate>
		<dc:creator>avi</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Featured]]></category>

		<guid isPermaLink="false">http://www.realityprime.com/scenegraphs-past-present-and-future</guid>
		<description><![CDATA[Updated: 4/8/2003 for spelling, headers,and added links to &#8220;Scenegraphs Today&#8221; section
Updated: 9/13/2005 updated bio, added link to scenegraphs
Updated: 6/30/2007 moved to wordpress to allow comments &#8212; old URLs should forward here, but please update your links.

Sections:


Scenegraphs: a brief history and evolution,
Scenegraphs Today
Scenegraphs Future




In the Beginning&#8230;
To help understand where scenegraphs came from, it&#8217;s useful to take [...]]]></description>
			<content:encoded><![CDATA[<p><em>Updated: 4/8/2003 for spelling, headers,and added links to &ldquo;Scenegraphs Today&rdquo; section<br />
Updated: 9/13/2005 updated bio, added link to scenegraphs<br />
Updated: 6/30/2007 moved to wordpress to allow comments &#8212; old URLs should forward here, but please update your links.<br />
</em></p>
<p><strong>Sections:<br />
</strong></p>
<ol>
<li>Scenegraphs: a brief history and evolution,</li>
<li><a href="#today">Scenegraphs Today</a></li>
<li><a href="#tomorrow">Scenegraphs Future</a></li>
</ol>
<p><span id="more-13"></span></p>
<h2><em><br />
</em></h2>
<h3><a name="begin"></a>In the Beginning&hellip;</h3>
<p><span>To help understand where scenegraphs came from, it&rsquo;s useful to take a quick look at the evolution of graphics languages like OpenGL and DirectX. Early on, real-time graphics existed on special image generation (IG) hardware that contained entire visual databases in closed proprietary form. Modellers created their databases and loaded them onto the hardware IG. Programmers were generally limited to modifying elements of these databases, like the position and rotation of a helicopter or setting the time-of-day.</span></p>
<p><span><a href="http://www.sgi.com/">SGI</a> introduced a more open and programmable option for image generation hardware and along with it, graphical languages that allowed more direct programmability of the image pipeline. <a href="http://www.opengl.org/">OpenGL</a> (from SGI&rsquo;s original &ldquo;GL&rdquo;) consists of a stream of primitive drawing commands (draw polygon, line, point, etc..) state settings (set color, texture, etc..) and matrix manipulations (push/pop to model-view or perspective matrix, etc..). But it contains very little information that allows the system to self-optimize and improve performance.</span></p>
<p><span>This was fine for drawing all sorts of scenes. But polygons that are out of view do consume resources &ndash; the hardware doesn&rsquo;t even know they&rsquo;re out of view until very late in the rendering pipeline. Unnecessary state changes, extra texture loads, and other common graphics procedures are best avoided if they don&rsquo;t contribute to the final image. </span></p>
<h3>Culling</h3>
<p>Culling is the process of removing everything from a scene that will not contribute to the final image, including things that are behind the observer, off-screen, or, in more advanced systems, hidden behind other objects (i.e., occluded). Generalized frustum culling works by comparing each object&rsquo;s spatial boundaries with a viewing frustum &ndash; a truncated pyramid that represents the visible volume of space. OpenGL does this implicitly when you send it polygons &ndash; by default, it transforms and clips all polygons to the edges of the viewing volume (most hardware uses a combination of gross clipping and 2D scissoring, but that&rsquo;s a bit too detailed for this section of the article).</p>
<p>Rather than do the heavy work at the OpenGL and polygon level, scenegraph architects realized they could better perform culling at higher level abstractions for greater efficiency. If we can remove the invisible objects first, then we make the hardware do less work and generally improve performance and the all-important frame-rate.</p>
<table cellspacing="1" cellpadding="1" width="200" align="center" border="1">
<tbody>
<tr>
<td bgcolor="#ffffff"><img class="" height="252" alt="" hspace="20" width="388" align="middle" vspace="20" src="http://www.realityprime.com/bak/scenegraph_files/image001.gif" /></td>
</tr>
</tbody>
</table>
<p>The way it works is fairly straightforward. Any object that is entirely within the viewing frustum is sent on down to the hardware. For objects that are part in/part out, we usually don&rsquo;t bother checking individual polygons on the CPU, but we might break a very complex object into several simpler ones so some of them may be culled in or out individually. Of course, any object that is entirely outside the culling volume is rejected early on.</p>
<h3>Hierarchy</h3>
<p>To efficiently perform this calculation, it&rsquo;s beneficial to organize the objects into a hierarchy or tree, propagating any shared information towards the root of the tree. There are many kinds of trees we could use. But lets keep it to a simple one-parent N-child hierarchy &mdash; a directed acyclic graph or DAG.</p>
<p>Such a basic scenegraph will have a root node, with one or more children. Each child node can in turn contain zero or more children, some of which will be the graphical objects we want to draw. The other nodes are there for structural purposes and can get quite complex, as we&rsquo;ll see later on.</p>
<p>For example, if a building was composed of rooms, a group node of the scenegraph (call it &ldquo;Building&rdquo;) might contain several nodes (called &ldquo;room-0&Prime; &ldquo;room-1&Prime; and so on). The bounding box of the &ldquo;Building&rdquo; node would be defined such that it contains the bounding boxes of all of the rooms. So if the building node was determined to be invisible, then there would be no need to check the child nodes since they would also be invisible.</p>
<p>Another benefit of hierarchy was in ease of manipulation. Given a car containing doors and wheels, it was much easier to move the &ldquo;car&rdquo; node and have the child nodes (doors and wheels) follow automatically. Without a hierarchy, one might probably have to move each of these sub-objects synchronously each time the car moved. Of course, that could be solved with some clever back-pointers among dependent matrices, but that&rsquo;s exactly what scenegraphs are doing in a more formal fashion.</p>
<p>So for example, consider a tank. It might have the following hierarchical representation:</p>
<p>&nbsp;</p>
<table height="261" cellspacing="2" cellpadding="2" width="433" align="center" border="1">
<tbody>
<tr>
<td align="center" bgcolor="#ffffff"><img class="" alt="" align="middle" src="http://www.realityprime.com/bak/scenegraph_files/image002.gif" /><img class="" alt="" hspace="5" align="middle" vspace="5" src="../../../../../bak/scenegraph_files/image003.gif" /></td>
</tr>
</tbody>
</table>
<p>By splitting the object into &ldquo;nodes&rdquo; and representing the connectivity between these nodes, we can better manipulate the final polygons of the tank.</p>
<p>We can animate pieces separately. We can rotate the turret, fire the gun, and open the hatch. We can animate the left and right tread to simulate turning.</p>
<h3>Rendering Advantages &mdash; State Sorting</h3>
<p>Scenegraphs showed clear benefits for improving rendering performance and making more optimal use of the available hardware resources. By keeping a &ldquo;retained&rdquo; model of the virtual world, scenegraphs could make additional optimizations, such as parallel processing culling and drawing, and most importantly: state sorting.</p>
<p><strong>State sorting </strong>is a concept whereby all of the objects being rendered are sorted by similarities in state (texture map, lighting values, transparency, and so on). Since changing state is often an expensive operation due to hardware implementations, this is usually a big performance win, even on the newest hardware. A good example of this is turning lighting on and off &mdash; imagine a generic SIMD hardware architecture, executing the same code over four parallel geometry processors. There may be one version of the code for &ldquo;lit&rdquo; objects and one version for &ldquo;unlit.&rdquo; Changing from lit to unlit state can cause all four processors to flush and reload. But if we can try to turn lighting on or off only once per frame instead of once per object, we can improve performance.</p>
<p>For an even stronger example, imagine we were drawing 100 cars, each containing some polygons in metal (state 1), rubber (state 2) and glass (state 3), it might be beneficial to draw all of the metal objects first, then the rubber ones, and then the glass. We can have 3 state changes, or we can have 300. And at least some state sorting is already required if we&rsquo;re depth sorting the windows for correct blending results.</p>
<p>However, early state sorting was hampered by the fact that if two objects had very different transformations (for example two windshields on two cars in different locations), it was costly to sort these objects by state alone because changing the viewing matrices was also a fairly expensive operation. Today, however, it is usually much cheaper to sort by state first, though exactly which state is the most expensive (and therefore the most important sort key) varies from platform to platform. We might even want our engine to be able to vary how it state sorts depending on the hardware. As we&rsquo;ll see later in the article, this is where scenegraphs can excel.</p>
<h4>State Encapsulation</h4>
<p>Early scenegraphs employed the concept of state encapsulation to facilitate state sorting. This meant each object in the scenegraph would point to a separate state structure&ndash;a set of material colors, texture, lighting, transparency, and so on. The scenegraph could then compare these state objects for similarities or just sort by the pointers. Even still, when switching from one state set to another, the system tried to only change the relevant differences and not blindly apply all state parameters, some of which, like texture loads and binds, could be very expensive time-wise.</p>
<p>In these systems, state sharing was achieved by having two graphical objects point to the same state set. This had other advantages, such as being able to quickly switch from &ldquo;visible light&rdquo; states to &ldquo;infrared&rdquo; states using simple pointer swaps.</p>
<table cellspacing="1" cellpadding="1" width="200" align="center" border="1">
<tbody>
<tr>
<td bgcolor="#ffffff"><img class="" height="186" alt="" hspace="5" width="426" align="middle" vspace="5" src="../../../../../bak/scenegraph_files/image004.gif" /></td>
</tr>
</tbody>
</table>
<p>In this example, many of the nodes (rectangles) in the Tank hierarchy are assigned states (ovals). When the tank is drawn, we can sort the objects by state and try to minimize the number of state changes. For example, we can draw the left and right tread at the same time and only set the &ldquo;rubber&rdquo; state once. Since depth-first traversal would visit these in that order anyway, we haven&rsquo;t gained much. But we&rsquo;d want to draw the base and turret at the same time too; so state encapsulation sorting can provide the needed information to make this possible.</p>
<h3>Transform Graphs</h3>
<p>Early scenegraphs were primarily transform graphs, representing object hierarchies in terms of inherited parent/child transformation relationships. For example, a car node might have four wheel-nodes that would be specified relative to axle and steering nodes (their center of rotation), which would in turn be specified relative to the car. Or, perhaps, a building might contain walls, floors, windows, and interior rooms, which might contain desks and chairs and so on.</p>
<h3>Dynamic Coordinate Systems</h3>
<p>Dynamic Coordinate Systems (DCS) were added for things like our tank, where we wanted the tank to be able to move around from frame to frame and the turret to rotate independently. DCS nodes were originally more expensive, mainly because there was extra bookkeeping information that could not be pre-computed, but instead needed to be re-computed when the object moved, or at worst each frame.</p>
<p>What bookkeeping? Take culling, for example. It often uses bounding boxes or spheres to contain all of a node&rsquo;s children and their bounding boxes, recursively. If the node&rsquo;s bounding volume is invisible, all of the children are therefore invisible. When a child moves, the bounding box needs to be re-computed. So we might write the logic as: re-compute the bounding box only when a child moves. But what happens when all of the children move? Do we re-compute the bounding box each time or wait till they&rsquo;re all done moving? In that case, it might be better to re-compute the bounding box once per frame, or better yet, store a flag that says if <em>any </em>of the children changed that frame and then re-compute the box at most once per frame. This sort of tradeoff is the kind of thing scenegraphs excel at, where immediate mode rendering does little to help.</p>
<h3>Static Coordinate Systems</h3>
<p>In the case of buildings, since they don&rsquo;t move, we could use static coordinate systems (called SCS in Performer). These were simple matrix transformations without a lot of overhead. The main difference being that SCS nodes could pre-compute important information, like bounding boxes and collision information. More importantly, in a MP (multi-process) system, SCS nodes are guaranteed to remain the same from process to process, whereas DCS nodes need to be buffered so that changes in one process don&rsquo;t have immediate effects in another.</p>
<p>Aside: for a quick example of the sort of MP problems that arise, consider two cubes that are being manipulated in one process and drawn in another. If the first process modifies both cubes before either is drawn, things are happy. If the first process moves the cubes after they&rsquo;re drawn, things are okay, but you won&rsquo;t see the change until the next rendered frame, by which time something else might have happened. But if the first process modifies one cube and then both are drawn before it can modify the other, you can see strange artifacts that make the cubes appear to oscillate with respect to each other. Worse still, in a true MP system, the first process can be in the middle of updating one cube while the other is drawn, causing unpredictable results.</p>
<p>We may not be used to using multi-threading or multi-processing on wintel boxes, but it&rsquo;s becoming more and more important, even on single CPU machines. With hyper-threading, AGP bottlenecks, and consoles that contain many independent processors, synchronizing a dedicated &ldquo;draw&rdquo; process with a main application, possibly running at a different frame-rate is going to be a challenge more and more people will be familiar with.</p>
<h3>Adding Groups, LOD, and other useful nodes</h3>
<p>In addition to coordinate system nodes and basic graphical objects, scenegraphs added other types of nodes to take advantage of the &ldquo;retained mode&rdquo; and frame-to-frame coherence optimizations. Most of these node types derive from the basic group node, which acts as a simple container for any number of children, spatially proximate or not but does not impose any restriction on its children.</p>
<p><strong>Level of Detail </strong>nodes use computations about how far an object is from the observer to &ldquo;dial in&rdquo; the amount of detail shown or switch between two or more child nodes which represent an object at various fidelities. The basic idea is that a far-away object can be rendered at lower fidelity (fewer polygons, smaller textures, etc.). Many schemes have been invented to deal with object switching or fading between LOD states, and the state of the art lies in various so-called continuous level of detail schemes.</p>
<p><strong>Switch </strong>nodes are a form of group node that sets the active child node (zero or one out of N children) based on some key value (e.g., 0 to n-1). Sequence nodes are a form of switch where the key value cycles based on time. Animations can be made with sequence nodes &ndash; each frame of animation is stored as a unique child object and the parent sequence node controls the active frame. A DCS-Sequence is useful for motion-captured joint animation, for example, where an array of transformations is applied in the same way a sequence node iterates through the list of children (it used to require having N SCS nodes under a Sequence, which was wasteful). DCSSequences can, for example, be efficiently compressed and stored and take very little CPU time to play back (though their interactivity leaves something to be desired).</p>
<h3>Performer</h3>
<p>SGI&rsquo;s <a target="_blank" href="http://www.sgi.com/software/performer/">Performer</a> was an early example of a scenegraph that was primarily a multi-process transformation graph. Performer had state objects which did not exist in the hierarchy per se, but were referenced by graphical objects. Performer made many advances in the use of MP programming techniques to optimize performance on SGI&rsquo;s multi processor systems. Performer did a great job of state sorting, though an early design decision limited state sorting to only under individual DCS nodes &ndash; in other words, objects could not be grouped for similar-state rendering if they had different DCS nodes above them. Performer also made extensive use of traversal masks and per-node callbacks for special effects.</p>
<h3>Adding State Nodes to the Tree</h3>
<p>Later scenegraphs added the notion of state as an actual node type. This had some advantages, especially in terms of being able to aggregate common state. For example, if there were 100 brick objects, we could insert a &ldquo;brick&rdquo; material node as parent to those 100 objects and the scenegraph render process would implicitly render these together. In fact, one of the principal benefits of state nodes are that explicit state sorting is given to the scenegraph modeler. For skilled modelers, this provides more control and more potential for optimization than automatic state sorting. But in the general case, it probably is not a win.</p>
<p>Why? An illustrative example takes 100 tank objects, each with three states (say tread, metal, and camo). But since we want the tanks to each be independently movable, they would be grouped with each tank having its own parent DCS node, plus some more DCS nodes for the turret and tread wheels if desired. Below that top DCS, we&rsquo;d see the three state nodes and below those, the individual geometry (shared or instanced). This means, in practical terms, that we&rsquo;d have 100 tread, metal, and camo nodes and that we&rsquo;d change state at least 300 times during the rendering of the scene. A better scheme might group the graphical objects by the three common states, but that would require each geometry object to have its own DCS and we&rsquo;d run the risk of a turret forgetting to drive on when the base of the tank does.</p>
<h3>VisKit</h3>
<p>Paradigm&rsquo;s VisKit is a good example of this approach. It also added other useful node types like &ldquo;cameras&rdquo; (representing the observer in the scenegraph, rather than as implicitly at 0,0,0 in modelview space. But in other ways, VisKit was very similar to early versions of Performer (not surprisingly, since its designer was the person who had managed the early Performer team at SGI).</p>
<h3>Adding Action or Event Nodes</h3>
<p>Many scenegraphs had the notion of per-node callbacks that the programmer could specify. In Performer, each node could have multiple callbacks, depending on the context. In Cull processing, any cull callbacks (if present) would be invoked to affect the culling result. In Draw processing, any draw callbacks would similarly be invoked for drawing special effects. Since these processes worked in a hierarchical depth-first traversal fashion, pre- and post- traversal callbacks were often provided to let things be done before and/or after traversal of child nodes. Application-side callbacks were also provided to do computation or automation on a node once each frame (e.g., for conditional logic, for animation, to move a DCS, collect statistics, and so on).</p>
<p>However, the main drawbacks of such automatic actions per node are twofold. First, they are very difficult to schedule efficiently, since the application does not know in advance which nodes will be visible or how much time any given callback might consume. They can take an arbitrary amount of time to execute, and generally block further processing of culling or drawing (blocking on draw can cause &ldquo;bubbles&rdquo; or stalls in hardware queues). They are also somewhat scattered in terms of cache coherence and branch prediction&mdash;similar operations are almost never performed in repeated series. In Performer apps, for example, callbacks were sometimes found to cause CPU bottlenecks and non-deterministic behaviors.</p>
<p>The second drawback of callbacks is more complicated. Since app-side callbacks need to be invoked before the culling or drawing traversals begin (since the app can change the positions of objects, moving them in and out of view, for example), the app traversal generally visits <em>every</em> object in the scenegraph, even those that are way off screen. This can be very costly and ultimately defeats the advantages that culling gives over a brute-force immediate mode implementation.</p>
<p>A better system might do some culling first and then do per-node processing based on how close an object was to being in view. Far away objects usually need limited processing, usually just to determine when they will enter the view. And the app process may move an object. So there&rsquo;s still a cyclic dependency between this optimization and culling which needs to be addressed.</p>
<h3>Inventor</h3>
<p>Inventor existed at SGI at the roughly same time as Performer with a very different approach. The goal there was usability over performance. The result was a very elaborate and highly re-usable set of scenegraph nodes, but at the cost of performance. So much so that Inventor was relegated to academic projects and rapid prototyping but to my knowledge, no serious (i.e., high performance) real-time efforts. Many people tried to mix Performer and Inventor to get the best of both worlds, but this was almost always a dead end.</p>
<h3>Adding Event Nodes</h3>
<p>Event nodes were a later addition to systems like Inventor and its descendent, VRML. The idea behind a scenegraph event system is fairly clever in theory. If the camera or observer is an object in the scenegraph, we can test to see when this object collides with one or more invisible &ldquo;trigger&rdquo; volumes also in the scenegraph. A trigger or sensor object could be linked to an effector or action object that would animate a node, for example. Events could be mouse or keyboard based too, so if you click on a 3D button, something else happens in the virtual world.</p>
<p>In this way, one could write an entire user-interactive program in a scenegraph. Doors could be opened, lights turned on by flicking virtual switches, and so on. All data driven.</p>
<h3>VRML</h3>
<p>Virtual Reality Modeling/Markup Language was the extension of Inventor, drafted after many competing forces finally came together (lead by SGI at the time). It was very similar to Inventor in form and function and suffered from many of the same performance disabilities. But the main benefit was that it was highly self-contained and simple to transport across network connections. It also added concepts for extensibility and portability that Inventor largely lacked (being SGI-specific) and is now being further revised in something called Web3D or X3D or VRML200x.</p>
<h3>Body and Facial Animation</h3>
<p>X3D and MPEG-4 add special node types for Body and Facial Animations, since for humans, there are some clever ways to extract differences from a standard (implicit) model for better compression. We can encode phonetic visual expressions (visemes) as well as joint animations for elbows and wrists using many fewer bits than if we were coding these things generically.</p>
<h3>GeoSpatial</h3>
<p>GeoSpatial problems (like drawing the entire earth) require some special nodes to deal with the inherent hardware precision limitations of graphics hardware, namely single precision floating point. True geospatial information requires more than 23 bits of mantissa to properly represent and scenegraphs are generally done using 32-bit floats, so we add some new GeoNode types to various scenegraph schemes. GeoVrml is one such approach, driven largely by the folks at SRI. Keyhole used its own approach for EarthViewer.</p>
<h3>PVS</h3>
<p>&ldquo;Potential Visual Set&rdquo; is a broad term for a sort of generalized culling technique. In basic culling, we take the entire scene and recursively find which objects fall on or within some bounding volume, usually a frustum (a truncated pyramid, approximating the viewing volume). In generalized culling, we might have pre-computed lists of objects that are spatially grouped (like &ldquo;group&rdquo; nodes, only they need not be hierarchically associated) and probably visible at the same time. Other techniques might make use of shadow or blocker objects that rule out certain regions of space.</p>
<p>The &ldquo;Cell and Portal&rdquo; approach, for example, usually groups the world into rooms or cells, with each cell having a list of objects in it and a list of portals, doors, or other connections to the adjacent (or even distant but connected) cells. When a portal is deemed visible, the culling routine looks at the portal&rsquo;s connected cell and checks all of <em>its </em>portals, and so on and so on recursively, each time adding (ORing) the overall set of visible objects and each time, reducing (ANDing) the frustum to the portal (door) we can see through. In simpler implementations, objects within a single cell are considered visible whenever their cell is culled in. Often traditional spatial culling is used to further narrow the visible set.</p>
<p>What&rsquo;s most interesting about Cells and Portals is that it can also generalize the notion of rendering to framebuffers and destinations and make use of standins or impostors. A doorway can be a portal to another room, or it can be implemented as a textured polygon, pre-rendered from an image of that room from the correct perspective. If it&rsquo;s done right, there&rsquo;s no way to tell the difference. Mirrors are implemented in much the same way. A mirror can be rendered by inverting the view matrix and projecting the camera through the mirror, then drawing normally into the framebuffer. Or it can be rendered by projecting the camera through the mirror, rendering the scene to a texture, and applying the texture to the mirror as a painting.</p>
<p>The downside of PVS techniques is that they&rsquo;re usually added to scenegraphs as an afterthought and not built in from the ground up. NetImmerse is/was a game engine that made extensive use of Cells and Portals.</p>
<h3>Inventor Revisited</h3>
<p>Inventor is easy to use. It provides a rich set of node types which make it easy to get something up and running quickly. And it adds some nice 3D GUI types too, which make producing a finished application that much quicker.</p>
<p>However, Inventor is a poor performer. It suffers from some critical design flaws, such as virtualizing all interfaces, even to atomic data members, which doesn&rsquo;t help performance any (even COM objects try not to virtualize member getters and setters). But the biggest flaw is in the execution model, the active nodes in the scenegraph consume CPU time while the scene is being rendered. And since all nodes must be visited, view frustum culling is not common, even at the rendering stage. So richly immersive scenes will be slow unless the programmer makes the effort to optimize it by him or herself.</p>
<p>[note: some people who use Inventor have written in to say many of my criticisms have been addressed more recently.]</p>
<h3>VRML Revisited</h3>
<p>VRML suffers from many of the same performance limitations as Inventor. It&rsquo;s nice to be able to specify what are essentially dataflow programs right in the scenegraph by hooking sensors to effectors using routes or linkages and place active clickable objects in the world with a few lines of text. But VRML suffers from a severe namespace problem, where declared objects can be ambiguously or incompletely defined (via dangling external references) and so on.</p>
<p>Just looking at the dataflow problem gives some sense of how buggy a VRML system can be. If a scenegraph finds an effector node first and then finds a sensor node that drives the effector, what is the proper way to process this? Do we normally process the effector node first, then the sensor, thereby potentially computing the effector again this frame (risking an infinite loop or at least a performance hit to fix the problem)? Or do we wait till next frame where it may be too late? Or, perhaps, do we sort the entire scenegraph to make sure all sensors come before their down-wind effectors (if that is even possible given the cyclic possibilities)? This could bring up problems with state and transform dependencies and make objects go haywire.</p>
<p>Given global DEF/USE semantics, can we have two objects using the same global name or is this an error? It could be accidental. If so, did we mean to use the first one or the last one? If we try to use the hierarchy to segment the namespace (as is done in Java, for example), what happens when we subtly reorganize the scenegraph because two objects that had been attached now can move independently (for example, a car riding on a moving flatbed train now drives off at the station). What if we want to reorganize the scenegraph for better state-based performance on different target hardware configurations? We could easily break our nice scenegraph-based program in the process.</p>
<h3><a name="today"></a>Scenegraphs Today</h3>
<table cellspacing="2" cellpadding="2" width="60" align="right" summary="" border="0" padding="4">
<tbody>
<tr>
<td style="z-index: 1; left: 620px; width: 200px; top: 9000px; height: 75px" valign="middle" align="center" width="9%" bgcolor="#333333"><strong>Scenegraphs I&rsquo;m aware of:</strong></td>
</tr>
<tr>
<td style="z-index: 1; left: 620px; width: 200px; top: 9000px; height: 75px" valign="middle" align="center" width="9%" bgcolor="#333333"><strong>OpenSG</strong><br />
            <a target="_blank" href="http://www.opensg.org/features.EN.html">Features</a></td>
</tr>
<tr>
<td style="z-index: 1; left: 620px; width: 200px; top: 9000px; height: 75px" valign="middle" align="center" width="9%" bgcolor="#333333"><sup>(recommended for price/performance)</sup><br />
            <strong>Open Scenegraph</strong><br />
            <a target="_blank" href="http://www.openscenegraph.org/featuresngoals/">Features and Goals<br />
            </a></td>
</tr>
<tr>
<td style="z-index: 1; left: 620px; width: 200px; top: 9000px; height: 75px" valign="middle" align="center" width="9%" bgcolor="#333333"><strong>X3D</strong> &#8211; <a target="_blank" href="http://www.web3d.org/x3d.html">Overview</a><br />
            <strong>Java3D </strong>- <a target="_blank" href="http://java.sun.com/products/java-media/3D/collateral/j3d_clas.pdf">Overview (PDF)</a></td>
</tr>
<tr>
<td style="z-index: 1; left: 620px; width: 200px; top: 9000px; height: 75px" valign="middle" align="center" width="9%" bgcolor="#333333"><strong>Gizmo3D </strong>-<a target="_blank" href="http://www.tooltech-software.com/products.htm">Overview</a></td>
</tr>
<tr>
<td style="z-index: 1; left: 620px; width: 200px; top: 9000px; height: 75px" valign="middle" align="center" width="9%" bgcolor="#333333"><strong>RenderWare</strong><br />
            <a target="_blank" href="http://www.renderware.com/">Main Site </a></td>
</tr>
<tr>
<td style="z-index: 1; left: 620px; width: 200px; top: 9000px; height: 75px" valign="middle" align="center" width="9%" bgcolor="#333333">NetImmerse/<strong>Gamebryo</strong><br />
            <a target="_blank" href="http://www.ndl.com/">Main Site</a></td>
</tr>
<tr>
<td style="z-index: 1; left: 620px; width: 200px; top: 9000px; height: 75px" valign="middle" align="center" width="9%" bgcolor="#333333"><sup>(recommended for Vis-Sim)</sup><br />
            <strong>OpenPerformer</strong> -<a href="http://oss.sgi.com/projects/performer/"> Overview</a></p>
<p>            <sub>Note: if a scenegraph is not recommended, it may simply mean I probably haven&rsquo;t evaluated it yet.</sub></td>
</tr>
</tbody>
</table>
<p>Scenegraphs today are quite sophisticated and quite readily available, even free and open sourced. They&rsquo;re generally well suited for cross-platform game development. But current scenegraphs do have some important weaknesses. One is an overloading of the tree concept with all sorts of bells and whistles that slow things down. Another is that without structural changes, coordinating changes in distributed systems is difficult. Very few of the current crop of scenegraphs were designed with MMOGs in mind.The heart of the problem is an overloading of what was once a nice, straightforward performance improvement over immediate mode OpenGL. We moved to hierarchies so we could cull and draw more efficiently. Then we added in all this extra stuff, like hanging ornaments on a Christmas tree, except that some of the ornaments are nice juicy steaks and some are whole live cows. They simply don&rsquo;t belong.Put another way, the original transform-graph concept sought to organize the visual database spatially to take advantage of grouping proximate or linked objects. We propagated shared spatial information up the tree, where we could make earlier traversal decisions and save time in true log-n tree fashion.But we have more than one way of organizing our visual database. Culling and PVS techniques want to have spatially organized databases for optimum performance. If the scenegraph is instead organized largely by state, then we might need to cull each 3-state tank (in the tank example) three times, once for each articulated part, instead of being able to cull out each tank once and only once. But if we want to get the best hardware performance, we really do want to sort the visible set by the most expensive state changes first. Moreover, since states don&rsquo;t change that often, we don&rsquo;t want to re-sort the scene every frame. But if we start with a spatial view each time and sort only the visible objects, that seems that&rsquo;s we&rsquo;re stuck with (as was the case with Performer, believe it or not). If we re-sort the whole scenegraph for state optimization (only once, hopefully), we lose the nice spatial coherence we count on for fast culling.</p>
<p>Given that we want to hook some nodes up to other nodes to enable event processing, we&rsquo;d also like a guaranteed consistent way of naming objects that doesn&rsquo;t change after spatial or state sorting or doesn&rsquo;t even change if parts of the scene are currently loaded or not (early scenegraphs were entirely memory resident). We want a logical or semantic naming scheme, like in namespaces. We want handles that persist and reflect structures that may not even be local.</p>
<p>By executing actions at each node during a depth-first traversal, we are most likely invoking bits of code in an arbitrary (almost random) order. This runs counter to the advanced scheduling many compilers try to do to take advantage of CPU branch prediction and pipelining, instruction pre-fetch and high-speed local caching, to name a few. Instruction and Data Cache misses can affect performance by up to 10x on many systems. So doesn&rsquo;t it make sense, that if we have 100 physics nodes and 100 inverse kinematics animation nodes, we try to process those nodes together, just like we tried to do for state (especially for systems with special vectorizing or SIMD capabilities). So this gives yet another competing organizational approach to how to optimize the scenegraph.</p>
<p>Put all of these together and it&rsquo;s easy to see that the current evolution of scenegraphs has taken a wrong turn somewhere. And it will require a change in approach to move past the roadblock.</p>
<h3><a name="tomorrow"></a>Scenegraph&rsquo;s Tomorrow</h3>
<p>Granted, it is probably impossible to find a single perfect organization for a scenegraph that simultaneously optimizes for spatial, state, semantic, and CPU considerations. Some people try to hand-design theirs to straddle the fence and make the best of what they have. But a better idea is to remove one of the fundamental constraints: that there need be a single scenegraph organization for a given visual database.</p>
<p>It is entirely possible that we can have a single set of objects, call it an object soup, but have two, three, four, or more hierarchies linking these objects into independent and complimentary organizations. It&rsquo;s been on the wish of a number of scenegraph designers for years, though it&rsquo;s never been a requirement before distributed databases came along.</p>
<p>But how to implement this is another matter. The solution, it seems, lies in the separation of concepts of scenegraph &ldquo;nodes&rdquo; from the &ldquo;objects&rdquo; they represent. By making shared objects live in a soup, we minimize the amount of waste and miscoordination we might see with four or more simultaneous object hierarchies. This way the &ldquo;node&rdquo; part of an object is just a few bytes &ndash; just enough to point to the object in the soup and to the parent/child/sibling relationships in this particular view. All of the real &ldquo;meat&rdquo; is kept once in the object, which ideally contains back pointers to each node in each graph, limited to a small number like four.</p>
<p>Is this rocket science? Not really. Relational databases have separated indices from data since the dawn of time. And scenegraphs are just one way of indexing into big visual databases. Once scenegraph designers come to grips with that, the rest is downhill.</p>
<p>The second problem is how to correlate among multiple database views (i.e., sets of indices). Since lightweight nodes in two views point back to the same object, it&rsquo;s easy to see how given a node in one database view, we could find the corresponding node another view &mdash; just follow the back pointers. This lets us cull using the optimized spatial view and render using the hardware-optimized state view.</p>
<p>The heart of an efficient distributed database implementation, then, is using the spatial view to limit what happens in the other views (rendering, culling, physics, animation, and so on) and distributing changes in the spatial view among disparate systems. The state, semantic, and application views do not generally change, except for visibility and priority per time interval, so the real meat of the task is in synchronizing the spatial views.</p>
<h3>Semantic View</h3>
<p>The semantic or logical view of a visual database is just a convenient way of accessing objects in the object soup. Think of it as the google (albeit local, not web-wide) of visual databases. The organization is arbitrary and entirely up to the developer. A developer might use the semantic view as a large dictionary of objects, organized by object type, subtype and so on. Or a game may divide objects up by their role in game play. But the main idea is that the leaves of this tree are the actual objects in the world.</p>
<p>What&rsquo;s important is that the logical/semantic structure is well known (published) for all concurrent developers to use. It is a rendezvous point, as well a convenience.</p>
<p>But it can be used for more elaborate schemes as well. For example, if the semantic view is organized into &ldquo;vehicles&rdquo; and then &ldquo;cars&rdquo; under that, we could perform some operation on all of the game universe&rsquo;s cars at once (perhaps, proximity tracking).</p>
<p>And there is no reason why objects could not be located under more than one branch of the semantic tree. There could be a branch called &ldquo;physical objects&rdquo; as well as the &ldquo;vehicle/car&rdquo; branch. One could set the physics computation process everything under the &ldquo;physics objects&rdquo; branch automatically.</p>
<h3>State View</h3>
<p>As discussed earlier, the State View is intended to be a platform-specific state sort and state aggregation view. For a platform on which texture fetching is very expensive, we might see textureIDs as the most significant branches in the tree, thereby minimizing the number of textureID changes. On another platform, lighting mode might be more expensive to change. The State View can generally be computed on the client at load-time and does not change much. Which objects are on or off does, but their fundamental draw order does not.</p>
<p>One exception to that rule is depth-sorted objects, like transparent polygons. Here, we might have a branch of the state view that is somewhat dynamic without slowing down the rest of the system.</p>
<h3>Shading Languages</h3>
<p>One of the latest buzzwords in modern computer graphics is Shading Languages. The main idea is that complex images can be constructed by mathematically combining (adding, subtracting, multiplying, etc..) many simpler images, often through a small assembly language program instead of using actual framebuffer operations. For example, a nice 3D bump mapped brick texture (where the bump mapping provides nice light and shadow cues to make the brick seem more 3D) might be described as a combination of a flat red texture, two or three rendering stages of bump mapping (rendering light and shadows), a light map for global shadows, and perhaps a specular highlight map if the object has little glass or metal bits.</p>
<p>Shaders can be expressed as programs, algorithms, or as a &ldquo;shading tree,&rdquo; where the constituent sub-shaders are broken down in hierarchical fashion, like we see for spatial transformations. This shading tree might be explicit, if the underlying scenegraph supports such advanced concepts, or it might be implicit, as an abstract representation of (for purposes of understanding) some pre-compiled code.</p>
<p>It&rsquo;s important to realize that the shading tree we see could easily vary from hardware platform to hardware platform, depending on the graphics capabilities and other factors. For example, some hardware supports advanced bump mapping in a single operation &ndash; so the state tree node in that case would be one node. Other hardware might not support bump mapping at all, but we can still achieve bump mapping effects by making multiple simpler rendering passes (one for the texture, one for light areas and one for dark areas). So the shading tree in that case might have a parent node with three children, representing the three passes.</p>
<table cellspacing="1" cellpadding="1" width="200" align="center" summary="" border="1">
<tbody>
<tr>
<td bgcolor="#ffffff">
<div align="center"><img alt="" hspace="10" __fcktemplabel="1" src="http://www.realityprime.com/bak/scenegraph_files/image005.gif" /></p>
<p>            ~vs~</p></div>
<div align="center"><img class="" alt="" hspace="10" align="middle" __fcktemplabel="1" src="http://www.realityprime.com/bak/scenegraph_files/image006.gif" /></div>
<p align="center"><sup>(note: here, boxes are states and circles are rendering objects)</sup></p>
</td>
</tr>
</tbody>
</table>
<p>Shading trees will also vary from software API to software API. But since we&rsquo;ve separated out the notion of our Spatial View (see below) from the State View, this affords us a good place to handle the interface between with underlying graphics APIs we might want to use (such as OpenGL or DirectX).</p>
<h3>Application View</h3>
<p>The presence of an Application View is not a strict requirement. In fact, it is the least useful view out of the bunch, mainly because compilers are so much better at scheduling code on CPUs. And a big problem with data-driven programs is that they can be very hard to debug and stamp out pseudo race conditions. But, on the other hand, they&rsquo;re very nice for rapid prototyping and platform-neutral abstraction. They&rsquo;re also quite useful for giving game players the ability to dynamically change game behaviors (e.g., mod programming or simple tunability).</p>
<h3>Spatial View</h3>
<p>The Spatial View, on the other hand, is the most important view from a distributed database point of view (and with all of the MMOG pushes out there, who isn&rsquo;t building a distributed database these days?). By organizing the world into spaces and sub-spaces, we can efficiently decide how to route messages, prioritize computations, and cull the database to minimize rendering time and network traffic.</p>
<p>The subdivision of the world into hierarchical spaces is not arbitrary, but there are a number of valid schemes for doing so. What is important is that the subdivision scheme be fairly well tuned to the culling procedure, that no node has too many or too few children (i.e., too tall or wide a tree). In other words, the same rules that apply to well balanced trees in general.</p>
<p>The choice of whether spaces are static or dynamic is also open. For quad tree schemes, the subdivision is relatively static. If an object moves, it might cause new quad cells to be created or destroyed, but no quad-cells ever move. For spheres or bounding box trees, the bounding volumes will likely move as the objects they contain move. Rules for stretching volumes and forcing children in or out of them are also flexible and fairly easy to impement as iterative solutions (with local, not global optimization). In this scheme, bounding volumes can overlap, but they need not do so.</p>
<p>It&rsquo;s not even a problem for an object to be contained in multiple bounding volumes as long as it&rsquo;s not culled in or out more than once per frame. I&rsquo;ve played with systems with &ldquo;floating&rdquo; spaces that group objects for lighting purposes (e.g., all objects that are affected by a light are in one space). Grouping objects in formation is another useful extension. A group of tanks or fighters can be dynamically gathered by their proximity and culled as a group, even if there isn&rsquo;t a single &ldquo;parent&rdquo; node in the traditional scenegraph sense.</p>
<h3>Summary</h3>
<p>I&rsquo;ve covered the basics of scenegraphs, where they came from, where they are, and where I think they&rsquo;re going, at least from one point of view. Much of this work is related to on-going development of a so-called &ldquo;multi-view&rdquo; scenegraph. The ultimate goal of this work is to come up with a simple, light-weight system for optimizing rendering across many platforms. Look for future articles on my progress with this work.</p>
<p>This document intentionally doesn&rsquo;t directly address whether you should or shouldn&rsquo;t use a scenegraph in your 3D app. I trust that given the full facts you&rsquo;ll know best what you need. But for those people who dismiss scenegraphs out of hand, I hope this article does at least shed some light on the likelihood that you <em>are</em> using a scenegraph in one way or another, whether you call it &ldquo;portals,&rdquo; bones,&rdquo; &ldquo;linked matrices,&rdquo; or anything else. Because when it comes down to it, this is all just common sense and experience put to work.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.realityprime.com/articles/scenegraphs-past-present-and-future/feed</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>Google&#8217;s Virtual World, Redux</title>
		<link>http://www.realityprime.com/articles/googles-virtual-world-redux</link>
		<comments>http://www.realityprime.com/articles/googles-virtual-world-redux#comments</comments>
		<pubDate>Thu, 25 Jan 2007 18:13:02 +0000</pubDate>
		<dc:creator>avi</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Featured]]></category>

		<guid isPermaLink="false">http://www.realityprime.com/articles/googles-virtual-world-redux</guid>
		<description><![CDATA[Rumors persist that Google is in the process of turning Google Earth into a virtual world. Well, I hate to burst anyone&#8217;s bubble, but GE already is a virtual world. It&#8217;s a virtual earth. It has all of the features of a virtual world (spatiality, point of view, presence, information modeling), minus a few we&#8217;ve [...]]]></description>
			<content:encoded><![CDATA[<p><a target="_blank" href="http://www.techcrunch.com/2007/01/24/googles-metaverse/">Rumors persist</a> that Google is in the process of turning Google Earth into a virtual world. Well, I hate to burst anyone&#8217;s bubble, but GE already <em>is </em>a virtual world<em>. </em>It&#8217;s a virtual earth. It has all of the features of a <a href="http://www.brownianemotion.org/2006/09/30/web-3d-part-5/">virtual world</a> (spatiality, point of view, presence, information modeling), minus a few we&#8217;ve come to expect from a game or socially-oriented space (seeing yourself, seeing other people, and directly interacting together).</p>
<p>First, a step back. Regular visitors know that I know a bit about the internal workings of Second Life and Google Earth, though, as I always repeat, I don&#8217;t know <em>any </em>of their current plans. What you don&#8217;t know is that I&#8217;ve also consulted for and even considered roles in some of the newer crop of social MMOs. I think I have a pretty good sense of the field, both good and bad. And I&#8217;ve also heard the rumor that Google wants to get into this space.</p>
<p>I can&#8217;t speak for anyone at Google, but I know this: they&#8217;re certainly capable of it if they invest the bucks. John Hanke was the business/marketing guy behind Meridian59, one of the first 3D online multiplayer games. He&#8217;s now in charge of Google Earth. So I&#8217;m sure he has a passion for this space and could find great designers and technologists to help him pull it off. But the big question I have is one of fit with Google&#8217;s overall mission to organize the world&#8217;s information, especially after their &quot;better products, not more&quot; mandate came down.</p>
<p>The thing about GE is that it&#8217;s a so-called &quot;mirror world.&quot; The whole point was always for GE to accurately and compellingly reflect information about the <em>real world</em>. Opening up 3D content development via SketchUp and COLLADA import allows one to put virtually anything on the planet. That&#8217;s extremely useful, even if the information is speculative (like a new home plan or a proposed stadium). But the point is always to relate even the most speculative information back to the overall context: the real world.</p>
<p>So what happens if/when a purely <em>fictional </em>data layer is intentionally introduced? Does GE become a big open sandbox with a nice, but vestigial picture of the Earth on the floor? Is it SecondLife on a sphere? [edit: in case it's not clear, I think mixing fictional and real content is a mistake, unless it gets its own distinct context, like a game. Right now this separation is regulated by what gets promoted to an official layer. In a free-form SL-like world, perhaps not so much.]</p>
<p>So people talk about the technical challenges a lot. But that&#8217;s the easy part. Adding avatars certainly wouldn&#8217;t be hard. It would require a new server intrastructure. It would require the client to be improved somewhat, mostly to hide communication latency and handle thousands or even millions of active objects (esp. those pesky moving avatars).</p>
<p>Some have said that &quot;resolution&quot; is the limiting factor &#8212; this is true for real-world imagery, though this is more a data-availability problem than a technical limitation (1mm pixels are not out of the question). The system could probably support very detailed 3D models for buildings as well. But relying on users to create these may not produce good results in the near term. It takes <em>a lot </em>of work and it&#8217;s not yet important to do so. Procedural tools, unlike today&#8217;s SketchUp, would be essential. Without paid artists making these things, competition and collaborative rating/filtering of content is also essential.</p>
<p>And so let&#8217;s say Google does add whatever is needed and suddenly you can see and even chat with all of the other users of GE in your virtual proximity. That&#8217;s cool. But what then? I mean, the key thing for any experience is that it must either be fun or useful (or, ideally, both). So what would make it fun or useful? That&#8217;s the hard part. And as history shows, simply having the implicit marketing muscle of Google is only enough to get people in the door.</p>
<p>Here&#8217;s a short list of some good (and bad) applications of this strictly potential technology:</p>
<p><strong>Collaborative editing </strong>&#8211; work together on models in-world. Minimally requires SketchUp functionality to be merged into GE (which is possible, but not at all easy). Initially, it could just be used for guided tours, like for selling real estate. That&#8217;s something, but it&#8217;s still fairly niche.</p>
<p><strong>Socializing </strong>&#8211; To talk to people, you first need to find them. Say you fly to NY and see a hot-looking avatar nearby. What do you say? &quot;Hey, I see you&#8217;re also searching for French restaurants near 42nd st. Ooh, la la.&quot; Second Life (and others) offer the concept of personal spaces, or what I&#8217;d call HomeSpaces (like home pages on the web). Where is yours in a social GE? Is it tied to your real home? Do we invite people to come visit our HomeSpace, full of virtual furniture from Ikea and appliances from Sears? Yawn. Beyond the basic 3D MySpace everyone wants to do, the key to socializing is sharing context and doing (hopefully fun) things together.</p>
<p><strong>Networking. </strong>It might be cool to discover likable real-world neighbors (assuming knocking on doors is too intimidating). But apart from the obvious privacy issues, I&#8217;m not sure you need access to a whole virtual planet to meet the kid next door. The &quot;dating&quot; angle could certainly be made to work, after the privacy issues are solved. Adding a social network like SL has might ultimately allow you to find new friends easily, if Google works on the profiling and discovery tools. But what then? Social Networking by itself is fickle. Again, people need something to do together, or at least a purpose for spending time and forking over their personal information to a big corporation, even Google.</p>
<p><strong>Creative Exploration </strong>&#8211; Ah, here&#8217;s something. Say a group of people get together to turn a bit of empty virtual real estate into a hub of creativity, like Burning Man, where people of similar (or similarly altered) minds know to come. GE then becomes more of a showcase tool. Here, adding scripting to the client would be essential. Just looking at 3D models gets boring. They need to come alive, perhaps even with physical simulation. Go a little further and you have games (some of which are already done as mashups with Google Maps &#8212; but these could live inside the system, not outside). This is what SL seeks to do. So could Google do it better?</p>
<p><strong>Alternate Reality </strong>&#8211; AR is usually about overlaying fictional worlds onto the real one. But why not add fictional places to GE&#8217;s map too? So a group of people take the NY skyline and turn it into a fantasy land (middle ages, futuristic, etc..). That might be fun to build and explore. And there could even be a few games built there. But apart from the interesting juxtaposition of an Elven Forest across the Hudson from Jersey City, why does this need to live on a map of the real world? Certainly, GE wouldn&#8217;t want people searching for French restaurants to wind up with unreal results. (I can just see the mapping directions now: turn left at the big oak tree, down the rabbit hole, and 1.2mi across the swamp of eternal tears&#8230;). They may need some better separation between these two products, without sacrificing fun  accidental discoveries. This has always been an issue for the &quot;layers&quot; approach.</p>
<p>Frankly, the most profound thing Google could do with Google Earth <em>right now </em>is like what they did for maps: <strong>enable 3D mashups. </strong>Any and all of the ideas above would get developed, tried and tested by others. But for 3D applications like GE, this is probably the most difficult technical hurdle of any I mentioned. Had Intrinsic Graphics (the makers of the 3D rendering layer inside GE) survived, I imagine it would be easier to have a nice, free Google Earth Toolkit for building new 3D apps, using GE and its powerful servers under the hood. But that didn&#8217;t happen, at least not yet.</p>
<p>On the other hand, two of the founders of Intrinsic Graphics are now at Google. So rather than have Google try to solve all these virtual worldly problems (as they tried with Orkut for social networks), I&#8217;d much rather see them open up the system in the way Microsoft has for its VE offering, as a component that others can build on or integrate, for free, but with ad revenue flowing back to Google, of course.</p>
<p>The risk to Google is much lower. They can still make gobs of money. And the potential wins are much greater than going it alone. At least, that&#8217;s what I would do.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.realityprime.com/articles/googles-virtual-world-redux/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>
