Archive for category Articles
Most head mounted VR gear brings me back to my teenage years, where my orthodontist tried to make me wear a night brace to straighten my teeth.
It works while you sleep, he said.
You try sleeping with your head in a vice, I said.
He didn’t care. He got paid $40/month regardless of how long it took. Needless to say, I soon got a new orthodontist. And I’ve kept trying on various VR gear too.
I originally favored the CAVE projection kind of display and built a few variations of my own, including a six-sided one at Disney. The main benefit is zero latency (ignoring stereo parallax changes) — in other words, the image is already there when you turn your head. No blurriness. The main downside, of course, is who has room or money for an 8′ cube in their living room. Not practical until we get digital wallpaper or big flexible roll-up screens.
But even still, I happily bought into the Oculus Rift’s kickstarter, eager to try again. I love seeing people so enthusiastic about this stuff, especially new blood.
Though I’ve personally used the Rift for many minutes at a time, my own purchased dev kit is still sitting in its box, alas, waiting for me to find time to build something useful. The head-tracking latency was actually very good, but the original display felt much like the world was made of LiteBrite. If you’ve never tried it, here’s a good oculus rift simulator to try.
I just pre-ordered the 2nd gen dev kit too, which fixes much of the resolution issue, and I’m sure comes in an even nicer box.
I’m hopeful that Carmack can solve some of the rendering latency issues that fast OLED displays alone can’t. I did some research in this area too, fwiw. There’s a lot that can still be done to wring the delays out of various pipelines.
Some friends and I also got to try out the new Sony Morpheus HMD at GDC this week. We had to get in line the moment the expo doors opened, just to get a ticket to stand in line to wait to try. But it was worth it, I keep telling myself.
The resolution was impressive. The persistence of their LCD displays was not as good as the Rift’s. Now, I can’t be sure what they’re using inside, but I would have thought they’d throw some 4k SXRD panels in there, just like they use in their nicest projectors.
I thought those were akin to DLP in terms of super-fast switching time, but I’m not so sure anymore. Maybe there isn’t room for front-reflection in the optical path. I can say that the LCD-like images we saw seemed to be over-driven and washed out a bit, mostly suffering from slow switching times. Brightness was great, but black blacks were in short supply.
In any event, no one got nauseated, which is a small victory for those of us who can’t watch the Blair Witch Project without dramamine. The Rift also does well on that front. But in both cases, using a simple laptop trackpad or arrow keys to navigate puts me back on the vomit comet.
A nice omni-directional treadmill might do the trick. Someone had one of those on display too, but I didn’t get to try it — seems to need special slippery shoes. But if we’re going through the trouble of a 4′ treadmill, why not just go back to using CAVEs? If it’s just a matter of integrating your furniture, my friends in MSR solved that nicely.
For what it’s worth, I still have my money on real see-through AR displays as the ultimate winner. Let me walk in the real world, augmented with new content. Yes, indeed.
It’s not the $19B price tag of Facebook’s WhatsApp acquisition that’s most amazing to me. That’s just math (lots of math). More users * more engagement * a wealthy/desperate suitor = bigger deals.
It’s not most amazing to me that the deal works out to over $250M per employee, if evenly split. For comparison, Google seems to be worth about $8.5M per employee, last I checked. Apple reportedly pulls in a whopping $2.3M yearly per employee in revenues alone. (average Apple employees, how much of that do you get paid again?).
WhatsApp just did more and faster with fewer people. That’s what makes a great team. I’m sure everyone was amazing. Dysfunctional teams or people barely earn back their salaries, if they survive at all.
The motivation for the deal is not amazing at all, since Facebook desperately needs to connect the very people who don’t need Facebook to connect, i.e., the people you see every day. It seemed almost inevitable, given the trends.
No. What is most amazing to me is the powerful principle behind this simple note at right — one of the core principles that apparently made WhatsApp so popular with hundreds of millions of users. I found this via the Sequoia blog post about the sale, and I honestly had no idea.
Jan keeps a note from Brian taped to his desk that reads “No Ads! No Games! No Gimmicks!” It serves as a daily reminder of their commitment to stay focused on building a pure messaging experience.
It’s most amazing to me, and most inspiring, because there are so many professional CEOs and advisors out there who try to convince their startups that they need to collect and hoard as much user data as possible, then sell it surreptitiously, while also pushing ads and wringing every last monetizable cent with games and gimmics to keep people addicted and virally engaged.
Those same fine folks somehow duck out (or get fired) when the users finally complain, defect and disappear. And they apparently never learn from their mistakes. But they do come back, with the same tired story again and again.
WhatsApp proved them wrong and proved it 19 billion times over.
Build value for users. Give them what they want and need, every day. That’s the recipe for success. This kind of success is something I can truly appreciate and admire.
P.S. I’d like to think that this is the real reason Facebook bought them, given FB’s reliance on those same tired ads, games, etc… Maybe it wasn’t for more European and emerging market penetration. Maybe it wasn’t merely to disarm a growing competitor. Maybe it’s for Facebook to become more like them? Maybe that’s why they put Jan on the board. If so, good for them.
I’ve heard that credit card companies know when you’re cheating or about to get divorced. I’d imagine things like flowers, gifts and hotels would be a tip off, but general expenses probably approach double once a couple is separated.
This data about facebook posts makes perfect sense too. People using FB will post more flirtatiously until the relationship starts. But after, why post so much? Even in real life, it’s rare for couples to make strong public displays of affection once they’re together. If they do, it may come more from insecurity about the relationship, or insensitivity to others’ feelings, than from some unparalleled eternal flame.
This highlights one of Facebook’s core challenges — how to capture sentiments shared between people who spend a lot of time together.
This idea reportedly comes from a competition that Meta (Space Glasses) is holding. The idea is to project a holographic image of your phone in space (using said glasses) and let you virtually interact with it, instead of taking the phone out of your pocket to do exactly the same.
Why is it a brilliant idea?
It’s simple. People get accustomed to their phone’s UIs. Projecting the phone holographically requires not a single new thought and changes nothing about the core experience. Well, it does lose out on touching that sleek and sexy touch-screen, feeling the nicely balanced weight of the phone in your hand, and of course key sensors like accelerometers (to a degree) and cameras (at all) to certain phone experiences.
So I guess that means you couldn’t run old augmented reality apps on your holographic phone for a recursive experience. Oh well. There goes a nice photo op.
Why is this a stupid idea?
Your head mounted device can [eventually] paint pixels anywhere you look. It can detect touch anywhere it can see your hands. Why would we limit ourselves to drawing a 4″ screen when we have an infinitely large screen on our head?
It’s a lot like saying, “Hey, we got used to small CRT TVs so let’s draw a small TV inside our brand new 60″ flat screen TV so people don’t have to learn something new.”
Interfaces for AR will run the gamut from holographic virtual actors who become your daily assistant, to making every physical surface in the world potentially interactive by touch, sight and sound. Why would we limit ourselves to UI mechanisms that were designed around the limits of small screens and touch?
Just for the experience of not having to take our phone out of our pocket? Are we really that lazy? If so, ask yourself how much you’d be willing to pay to use your phone without taking it out of your pocket. I’d pay maybe $1.
This really comes down to a core question about AR. Is it about being the ultimate hands-free device, principally meant to deliver us from holding our phone in our hands or up to our faces? Or is it about re-imagining the analog world with new digital layer(s) of content on top?
I can see an app like this being very popular, at least in the way the fart app is popular. That’s only because people’s imaginations are presently too limited. They just haven’t seen the best ideas yet.
On the other hand, it’s turning out that the most popular interface for your new 60″ flat-screen TV with billions of streaming video options is not some new fancy XBox-like natural UI, but rather just your phone.
So what do I know? People may ultimately find ‘stupid’ brilliant.
The idea that most man-made objects can be represented with sweep surfaces (cylinders, tubes, squares, etc..) isn’t that new. Second Life primitives used exactly the same principle, with some interesting extensions for cuts, twists, tapers and so on.
But selecting photographic imagery based on implicit primitives, in-painting (hallucinating) the background and unseen object views, and (occasional) relighting of the object is all extremely clever and very useful. Combine this and a system that can relight virtual objects based on scene shadows and you have a paint program that can revise reality, at least virtually, but in a way that would fool almost anyone.
The end-goal of all this work is something I used to call “parametric 3D video” — which roughly means we take one or more 2D video streams, split out the objects, backgrounds, people into separate and fully adjustable pieces, send them as 3D content vs. pixels, and then re-synthesize the result from any angle at the receiving end, along with any changes you want to make.
3D (color + depth) video capture makes the problem much easier. Techniques like this paper are still needed to finish the job, but they can be much more automatic in terms of finding and cutting objects.
Here’s a great* Verge article on the AWE 2013 conference that wrapped up last week. I had the honor of speaking, along with a number of my co-workers. Although, as Tish points out, we’re not actually doing AR at Syntertainment, we’re very passionate about the field as one of several key enabling technologies.
* yes, of course mentioning me positively supports my opinion of your article, even if you get my name slightly wrong.
I was checking the Oculus VR site today to see when my dev kit will ship (April, apparently) and I noticed this interesting job posting:
- Design and implement techniques for optical flow and structure from motion and camera sensors
Relevant computer vision research (3d optical flow, structure from motion, feature tracking, SLAM)
What this means to me is that Oculus may be trying to solve AR as well as VR.
Now, it’s possible they’re just trying to use optical flow to create an absolute reference frame for the rendered VR scene. The tiny gyros that track your head rotation tend to drift over time and corresponding magnetometers (basically, compasses) aren’t always reliable enough to help. It’s also desirable to know your absolute lateral translation at any given time, which onboard accelerometers can only guess at. So some amount of computer vision makes sense for VR, implying there will be cameras mounted on the VR glasses, if not already. (I’d certainly encourage them to put cameras facing your eyes too).
However, “structure from motion” and SLAM would be better suited to 3D tracking such that a video camera could overlay virtual 3D objects on video of real world, like many of the AR demos you see on phones and tablets today. Done well, it gives much more precise 3D transformations than VR might need, and SLAM is quite expensive to compute traditionally.
In fact, if you were doing VR glasses with cameras, tracking your arms and legs would be the first problem I’d solve. Getting 3D input right is very important. Oculus seems to be thinking of that as well. (see this other job posting).
So let’s assume AR uses for this SLAM work and see where that leads. The idea is you’d put a camera (or two) on the goggles to capture the real world at a similar field-of-view to the VR display, pipe that video into your 3D renderer with some depth-per-pixel, and you’d theoretically have a better AR solution than Google Glass or similar HUD approaches.
On the plus side, pixels captured from the real world can be correctly occluded by pixels in the virtual scene. That’s very hard to do with see-through AR, which is plagued by the ghostly transparent images we’ve come to call “holograms,” for their ethereal qualities. They’re transparent because there’s currently no solution on the market to stop the natural light that comes from the real world other than to darken the glasses everywhere, so they go from AR to VR anyway…
Of course, latency is the main issue with the camera-based-AR approach. If you turn your head with see-through AR, the real world is visible with zero lag, and the virtual part can be dialed down whenever it would be nauseating. With camera-based AR, you typically have a few frames of latency added to the queue, which can induce nausea and disorientation, especially if the goal is to just walk around with these and not hit or be hit by “things.” Latency has to be extremely low, like 4 ms.
There are tricks to avoid this. Cheaper cameras use what’s called a “rolling shutter” which means they’re effectively only capturing one line of pixels at a time, while that line sweeps across the image at some high rate. This is what causes those funny skewing artifacts when you move your cell phone while taking pictures or video — the picture wasn’t all taken at one point in time like an old film camera or high-end capture device.
The challenge is how to couple a rolling shutter to a “beam-chasing” rendering algorithm, which is the same idea applied in reverse — to the visual output instead input. If you do that right (and I’m sure there is a way), then the distance between the camera’s “scan line” and the currently rendered “raster line” is your actual latency, which would be measured in low milliseconds instead of full frames. Cool stuff.
But that’s not a full solution — the 99% of the scan lines we rendered in the past (this “frame”) would be stuck in their old positions while our heads turn. But at least the newest rendered pixels would be more correct, right?
A full solution includes a very low-latency image capture of the physical scene, 3D pose estimation (SLAM or similar) while a high-fidelity render of the virtual scene is conducted, and then a very high-frame-rate (120-240hz) continuous re-rendering of that scene is done based on the latest head rotation and translation measurements from the gyros and accelerometers.
This is indeed a kind of magic. Could Oculus do it? They’ve seemed to raise enough money. So best of luck to them.
TLDR; an artist is iterating through every combination of pixels to produce every possible digital photograph so as to explore the concept of infinity.
This is less interesting to me as an exploration of infinity, since there are a provably finite number of unique combinations of pixels for any given resolution and color depth.
Of course, for any reasonably sized image, say 640×480 at 24bpp, the number is exceedingly large. It’s about 2 to the 7 million, in this case, which would take more time than the age of the universe to merely count, given computers that today can typically count to only 2^64 sometime before you die. That’s not really that interesting, because counting images is no way to find anything interesting in there.
What’s more interesting to me is the underlying idea that every image is just a number. If you see an array of colored pixels as the mere bits they are, then it’s more obvious that there is a distinct integer or index matched to each unique image. A paint program, like Photoshop, is not actually helping you “draw” anything, in this sense, but merely changing which of those strings of bits are being displayed at any given time.
Yes, so therefore a paint program is just navigating through a pre-determined space of all possible images of a given size. Painting just one pixel is enough to move a little or a lot in that finite space.
You didn’t make that nice piece of art, you merely steered the computer towards it.
But what’s even more interesting is the idea that all of those images already and provably exist. That’s right, if you know the number, you know the image and vice-versa. Whether anyone’s ever seen the funny picture of George W. Bush lighting his hair on fire isn’t the point — that image definitely exists, and at 640×480 is going to be very clear. Whether it’s a photo of anything real is another story. And since movies are just strings of strings of bits, the same goes for video too.
So, one might ask, what the hell is copyright for digital imagery but a claim of owning a specific number, or set of numbers that represent the same approximate image? This would go for books and music too, btw, but let’s not wander.
I was so fascinated by this concept when I was younger that I wrote an April Fools post for an old computer graphics usenet group back in 1993. Thanks to Google usenet archives, I have recovered the text (yes, I really wrote this and posted it anonymously while I worked at Disney):
October 21, 1993 — [Geneva] Two Swiss scientists announced last week their stunning discovery of a method for generating and storing any conceivable picture using ordinary personal computers. Called The Database of Every Picture Imaginable, or DOEPI, their system is currently seeking patent and copyright protection in virtually every industrialized nation, including the United States.
Other image generation and storage technologies have been introduced in the past to help cope with the incredible demands of Multimedia and Video-Dialtone but, according to co-inventor Dr. Francois La Tete, of the Alpine Institute, a well-respected Swiss mathematical society, DOEPI is the first system which is capable of storing literally every image. ”Our proprietary algorithm is the first of it’s kind,” says La Tete, ” It can compress every image into such a compact space that the software can run with less than one megabyte of memory.”
Indeed, the performance of their system is impressive. Independent experts have confirmed that when fed a “bit-index-code” (a string of 1′s and 0′s which tell the database how to find the proper image), the database can produce the correct image in less than two seconds.
According to La Tete, the original developer of the software, the technique uses an extremely simple principle but one which obviously has eluded the rest of the world so far. Like many inventors, he came up with the idea indirectly. ”I was trying to automate the process of collecting images from various FTP sites,” he explains shyly, “when I realized that I could simply create the images myself.” From that point on, his personal computer worked day and night to generate the images he desired.
But Alfonso Marzipani, La Tete’s business partner and former Human Genome Project director, immediately realized the practical side of the invention. He understood that a technique that could generate any picture could be used to create pictures never before seen. ”We used [La Tete's] stuff to make a movie about some scary dinosours,” explains Marzipani. “Then we saw the same exact thing in a movie I can’t name for legal reasons. I said, ‘Frankie,’ we’re on to something big.”
One year later, Marzipani claims the database is complete. The energetic team has, in addition to filing for software patents, filed for blanket copyrights on all of the images stored in the database. ”If we make the images first, we should own them, right?” claims Marzipani.
Copyright Offices seem to agree. Nearly twenty countries have already granted blanket copyrights to Marzipani’s operating company, La Monde, SpA. Other countries, like the United States are more cautious. According to US Copyright Chairperson Ingrid Dingot, such a broad copyright must be seriously considered. ”A blanket copyright might mean that they might own any image anyone else tries to create,” says Dingot. ”That might have an impact on the US economy.”
To conclusively determine if DOEPI actually does contain every image, the USCO has enlisted James Farrel, an independent data retrieval expert. Farrel has begun the laborious process of printing copies of all of the images in the database, one by one. It is estimated that a full dump of the database will require several trillion years and more paper than exists on the planet.
In the mean time, he has used a more direct approach. Seven major entertainment companies have come together to donate their accumulated libraries of images to Farrell’s effort in the hopes of proving the DOEPI database incomplete. ”If we can find even one image they don’t have,” says Arturo Nakagawa, CEO of Sony America, “then their claim is false.”
So far, Farrel has searched for nearly ten thousand images with complete success. ”It’s incredible,” says Farrel. ”After they compute the bit-index from the control image, it takes only a second or two to find the matching image in the database. Every damn time.”
But according to La Tete, the software isn’t perfect. He admits that the size of the bit-index-code, sometimes in excess of one megabyte, or eight million bits, is overly cumbersome and hints that the next generation of the software will reduce this requirement by a hundred times or more. ”At that point,” says La Tete, “our software will be used by nearly every person on the planet.”
But it appears as if La Tete and Marzipani may have their way before the improved software is ever released. ”We may have to grant the copyrights,” admits Dingot candidly. ”Even the one for sequencing a series of still images as a motion picture.” Indeed, the future of ownership of visual imagery seems bleak.
But the battle isn’t over. Michael Eisner, CEO of the Walt Disney Company, has countered threat with threat. ”If their database contains every image, then it contains Disney property,” says Eisner, “Those two owe the Disney Company a large amount of money.” This battle may go over to Disney, with a record of success at this kind of clear-cut property claim. But only time will tell who will win the war.
[AP] -end included article. Reprinted without permission.
I had attributed this to some fake users, but now you know.
Here’s a great interview with my former CEO/CTO, the brilliant Michael T. Jones, in the Atlantic magazine. Link goes to the extended version.
[BTW, The Atlantic seems to be on a tear about Google lately (in a good way), with John Hanke and Niantic last month and lots on Glass recently as well. With that and Google getting out of federal antitrust hot water, it seems they're definitely doing something right on the PR front.]
Quoting Michael on the subject of personal maps:
The major change in mapping in the past decade, as opposed to in the previous 6,000 to 10,000 years, is that mapping has become personal.
It’s not the map itself that has changed. You would recognize a 1940 map and the latest, modern Google map as having almost the same look. But the old map was a fixed piece of paper, the same for everybody who looked at it. The new map is different for everyone who uses it. You can drag it where you want to go, you can zoom in as you wish, you can switch modes–traffic, satellite—you can fly across your town, even ask questions about restaurants and directions. So a map has gone from a static, stylized portrait of the Earth to a dynamic, inter-active conversation about your use of the Earth.
I think that’s officially the Big Change, and it’s already happened, rather than being ahead.
It’s a great article and interview, but I’m not so sure about the “already happened” bit. I think there’s still a lot more to do. From what I see and can imagine, maps are not that personal yet. Maps are still mostly objective today. Making maps more personal ultimately means making them more subjective, which is quite challenging but not beyond what Google could do.
He’s of course 100% correct that things like layers, dynamic point of view (e.g., 2D pan, 3D zoom) and the like have made maps much more customized and personally useful than a typical 1940s paper map, such that a person can make them more personal on demand. But we also have examples from the 1940s and even the 1640s that are way more personal than today.
For example, consider the classic pirate treasure map at right, or an architectural blueprint of a home, or an X-ray that a surgeon marks up to plan an incision (not to mention the lines drawn ON the patient — can’t get much more personal than that).
Michael is right that maps will become even more personal, but only after one or two likely things happen next IMO: companies like Google know enough about you to truly personalize your world for you automatically, AND/OR someone solves personalization with you, collaboratively, such that you have better control of your personal data and your world.
This last bit goes to the question of the “conversation,” which I’ll get to by the end.
First up, we should always honor the value that Google’s investments in “Ground Truth” have brought us, where other companies have knowingly devolved or otherwise strangled their own mapping projects, despite the efforts of a few brave souls (e.g., to make maps cheaper to source and/or more personal to deliver). But “Ground Truth” is, by its very nature, objective. It’s one truth for everyone, at least thus far.
We might call the more personalized form of truth “Personal Truth” — hopefully not to confuse it with religion or metaphysics about things we can’t readily resolve. It concerns “beliefs” much of the time, but beliefs about the world vs. politics or philosophy. It’s no less grounded in reality than ground truth. It’s just a ton more subjective, using more personal filters to view narrow and more personally-relevant slices of the same [ultimately objective] ground truth. In other words, someone else’s “personal truth” might not be wrong to you, but wrong for you.
Right now, let’s consider what a more personal map might mean in practice.
A theme park map may be one of the best modern (if not cutting edge) examples of a personal map in at least one important sense — not that it’s unique per visitor (yet) — but that it conveys extra personally useful information to one or more people, but certainly not to everyone.
It works like this. You’re at the theme park. You want to know what’s fun and where to go. Well, here’s a simplified depiction of what’s fun and where to go, leaving out the crowds, the lines, the hidden grunge and the entire real world outside the park. It answers your biggest contextual questions without reflecting “ground truth” in any strict sense of the term.
Case in point: the “Indiana Jones” ride above is actually contained in a big square building outside the central ring of the park you see here. But yet you only see the entrance/exit temple. The distance you travel to get to and from the ride is just part of the normal (albeit long) line. So Disney safely elides that seemingly irrelevant fact.
Who wants to bet that ground truth scale of the Knotts Berry map is anywhere near reality?
Now imagine that the map can be dynamically customized to reveal only what you’d like or want to see right now. You have toddlers in tow? Let’s shrink most of the rollercoasters and instead blow up the kiddie land in more detail. You’re hungry? Let’s enhance the images of food pavilions with yummy food photos. For those into optimizing their experience, let’s also show the crowds and queues visually, perhaps in real-time.
A Personal Map of The World is one that similarly shows “your world” — the places and people you care most about or are otherwise relevant to you individually, or at least people like you, collectively.
Why do you need to see the art museum on your map if you don’t like seeing art? Why do you need to see the mall if you’re not going shopping or hanging out?
The answer, I figure, is that Google doesn’t really know what you do or don’t care about today or tomorrow, at least not yet. You might actually want to view fine art or go shopping, or plan an outing with someone else who does. That’s often called “a date.” No one wants to “bubble” you, I hope. So you currently get the most conservative and broadest view possible.
How would Google find out what you plan to do with a friend or spouse unless you searched for it? Well, you could manually turn on a layer: like “art” or “shopping” or “fun stuff.” But a layer is far more like a query than a conversation IMO — “show me all of the places that sell milk or cheese” becomes the published “dairy layer” that’s both quite objective and not much more personal than whether someone picks Google or Bing as their search engine.
Just having more choices about how to get information isn’t what makes something personal. It makes it more customized perhaps… For truly personal experiences, you might think back to the treasure map at the top. It’s about the treasure. The map exists to find it.
Most likely, you want to see places on the map that Google could probably already guess you care about: your home, your friends’ homes, your favorite places to go. You’d probably want to see your work and the best commute options with traffic at the appropriate times, plus what’s interesting near those routes, like places that sell milk or flowers on the way home.
Are those more personal than an art layer or even a dairy layer? Perhaps.
Putting that question aside for a moment, an important and well known information design technique focuses on improving “signal to noise” by not just adding information but more importantly removing things of lesser import. You can’t ever show everything on a map, so best to show what matters and make it clear, right?
City labels, for example, usually deter adjacent labels of less importance (e.g., neighborhoods) to better stand out. You can actually see a ring of “negative space” around an important label if it’s done properly.
In the theme park map example, we imagined some places enlarged and stylized to better convey their meaning to you, like with the toddler-friendly version we looked at. That’s another way to enhance signal over noise — make it more personally relevant. Perhaps, in the general case, your house is not just a literal photo of the structure from above, but rather represented by a collage of your family, some great dinners you remember, your comfy bed or big TV, or all of the above — whatever means the most to you.
That’s also more personal, is it not?
Another key set of tools in this quest concerns putting you in charge of your data, so you can edit that map to suit and even pick from among many different contexts.
Google already has a way to edit in their “my maps” feature. But even with the vast amount of information they collect about us, it’s largely a manual or right-click-to-add kind of effort. Why couldn’t they draw an automatic “my maps” based on what they know about us already? Why isn’t that our individual “base layer” whenever we’re signed in, collecting up our searches in a editable visual history of what we seem to care about most?
Consider also, why don’t they show subjective distances instead of objective ones, esp. on your mobile devices? This is another dimension of “one size fits all” vs. the truly personal experience to which we aspire.
A “subjective distance” map also mirrors the theme park examples above. If you’re driving on a highway, places of interest (say gas stations) six miles down the road but near an off-ramp are really much “closer” than something that’s perhaps only 15 feet off the highway, but 20 feet below, behind a sound wall and a maze of local streets and speed bumps.
How do you depict that visually? Well, for one, you need to start playing more loosely with real world coordinates and scale, as those cartoon maps above already do quite well. Google doesn’t seem to play with scale yet (not counting the coolness of continuous zoom — the third dimension). I’m not saying it’s easy, given how tiled map rendering works today. But it’s certainly possible and likely desirable, especially with “vector” and semantic techniques.
For a practical and well known example, consider subway maps. They show time-distance and conceptual-distance while typically discarding Cartesian relationships (which is the usual mode for most maps we use today).
I have no idea where these places (below) are in the real world, and yet I could use this to estimate travel time and get somewhere interesting. And in this case, I don’t even need a translator.
Consider next the role of context. Walking is a very different context than driving to compute and depict more personalized distance relationships. If I’m walking, I want to see where can I easily walk and what else is on the way. I almost certainly don’t want to walk two hours past lunch to reach a better restaurant. I’m hungry now. And I took the train to work today, don’t you remember?
Google must certainly know most of that by Now (and by “Now” I mean “Google Now”). So why restrict its presence to impersonal pop up cards?
Similarly, restaurants nearby are not filtered by Cartesian distance, but rather by what’s in this neighborhood, in my interest graph, and near something else I might also want to walk to (e.g., dinner, movie, coffee == date) based on the kinds of places we (my wife and I) might like.
Context is everything in the realm of personal maps. And it seems context must be solicited in some form. It’s extremely hard to capture automatically partly because we often have more than one active context at a time — I’m a husband, a father, a programmer, a designer, a consumer, a commuter, and a friend all at the same time. So what do I want right now?
Think about how many times have you bought a one-time gift on Amazon only to see similar items come up in future recommendations. That’s due to an unfortunate lack of context about why I bought that and what I want right now. On the other hand, when I finish reading a book on my Kindle, Amazon wisely assumes I’m in the mood to buy another one and makes solid recommendations. That’s also using personal context, by design.
The trick, it turns out, is figuring out how to solicit this information in a way that is not creepy, leaky, or invasive. That same “fun factor” Michael talks about that made Google Earth so compelling is very useful for addressing this problem too.
Given what we’ve seen, I think Google is probably destined to go the route of its “Now” product to address this question. Rather than have a direct conversation with users to learn their real-time context and intent and thus truly personalize maps, search, ads, etc.. , Google will use every signal and machine learning trick they can to more passively sift that information from the cumulative data streams around you — your mails, your searches, your location, and so on.
I don’t mean to be crude, but it’s kind of like learning what I like to eat from living in my sewer pipes. Why not just ask me, inspector?
I mean, learning where my house is from watching my phone’s GPS is a nice machine learning trick, but I’m also right there in the phone book. Or again, just ask me if you think you can provide me with better service by using that information. If you promise not to sell it or share it and also delete it when I want you to, I’m more than happy to share, esp if it improves my view of the world.
So why not just figure out how to better ask and get answers from people, like other people do?
If the goal is to make us smarter, then why not start with what WE, the users, already know, individually and collectively?
And more importantly, is it even possible to make more personal maps without making the whole system more personal, more human?
The answer to what Google can and will do probably comes down to a mix of their company culture, science, and the very idea of ground truth. Data is more factual than opinions, by definition. Algorithms are more precise than dialog. It’s hard to gauge, test, and improve based on anyone’s opinions or anything subjective like what someone “means” or “wants” vs. what they “did” based on the glimpses one can collect. Google would need a way of “indexing” people, perhaps in real-time, which is not likely to happen for some time. Or will it?
When it comes to “Personal Truth,” vs. “Ground Truth” perception and context of users are what matter most. And the best way to learn and represent the information is without a doubt to engage people more directly, more humanely, with personalized information on the way in and on the way out.
This, I think, it what Michael is driving at when he uses the word “conversation.” But with complete respect to Michael, the Geo team, and Google as a whole, I think it’s still quite early days — but I’m also looking forward to what comes next.
In 2002, I wrote a sci-fi short story titled Blockbuster that depicted a world in which individual people go IPO. In such a world, investors would want to start young, picking the winners early, funding their education and public launches in exchange for “stock” in the person and, effectively, a cut of their future earnings. I mean, if corporations can be people, then why can’t people be corporations?
It’s a dystopian vision, in which the main character, the most successful such funded individual in history, groomed from childhood for ultimate success, bankrupts himself in every possible way to finally become “fully self-owned.”
We should never do this in real life.
It’s not that there is no value in attending a top kindergarten, a great private school, the best ivy league university, the best startup incubator, etc… There’s generally a reason why these are vaunted. But when we place so much value on the promise of success vs. the actual evaluation of good ideas, we set ourselves up for a kind of “blockbuster effect” that plagues movies and books, in which only the most fundable ideas get supported, because no one wants to take a risk. Everyone wants a winner.
For example, the funding model sounds reasonable. Instead of a loan, individuals promise a cut of future earnings. Imagine if universities worked this way (ignoring alumni contributions, which are voluntary). Universities would by nature want to accept only students who will likely make a lot of money. And that would often mean picking students who start out with a lot of money, valuable connections, or who attended only the best kindergartens, etc..
And that’s unfortunately a lot like the admission process for the best ivy league universities already. Do we want all of life to be like that? What do we lose by focusing only on the fat head and losing the long tail?
Of course, there’s something to be said for bringing back the benefactor model, in which the wealthy don’t just invest in people (artists, writers, visionaries) for the sake of profit, but to make the world a richer place.