Netflix is offering $1M to improve their automated recommendation system by 10%, plus $50k for just a 1% improvement. It’s an interesting problem. Here they’re limited to a 1-5 rating per movie per user at best (assuming people cooperate), and they need to then tell you what movies you might like. According to the contest FAQ, their Cinematch system doesn’t even take traits of the movie (director, genre, actors) into account. No wonder it sucks.
They give registered contestants (groups, ideally) about 1/10th of their actual collected data, AOL-style, with names anonymized. To better protect privacy, they say they’re also slightly randomizing the data. And they won’t give you the program that tests your accuracy. You have to upload your predictions (basically, filling in the blanks of the ratings they intentionally removed) and wait a week each time to see your score.
The area of research is called collaborative filtering, because just knowing you liked movie A and B tells you nothing about movie C without context. That context is what other people also thought about these movies. For example, identify and group all users who rate movies most similarly. Then, for a given user, find movies the others in his group saw that this user didn’t, and just use the aggregate ratings as a prediction. It works well for those people who don’t see as many movies. If you’re the trend setter, you’re SOL.
But the real problem is in the 1-5 ratings. Amazon also his this same problem with “customers who bought this also bought…” There isn’t enough real context for these decisions–the answer to the question “why?”