In this series I explore three aspects of the Life Grand Challenge. In the first part, I briefly described the Life Grand Challenges, estimated active participation and lastly proposed the current unfairness in the challenge which may be what is holding back active participation. In the second part, I detailed the current resources available to accuracy challenge participants and the possibility that our current generation are only looking for quick rewards. In this last part, I will describe the successful NetFlix challenge and their crowd sourcing model.
Ever used IMDB or Rotten Tomatoes as an indicator of how good a movie is and whether you should watch it? I have but lately I really need to find out who has been rating crap movies as good. For example, Animal Kingdom is the biggest piece of artsy fartsy crap I’ve ever seen. It got 97% on Rotten Tomatoes, while Tekken best movie ever only got 5.0 on IMDB. This highlights the problem – just because someone else likes the movie doesn’t mean I would like the movie.
Netflix wanted to know, what a customer would rate a movie given a customer’s history of movie ratings. They have a current implementation called Cinematch but wanted people to improve the accuracy. The competition rules/specifications in point form
- Competition was open in 2006 and would end in 2011
- A million dollar prize was given to the person/team that improved accuracy by 10%. Once a participant has hit this target, other participants are notified and have 30 days to try and beat this submission.
- A progress prize of 50, 000 is given each year to the person/team that improved accuracy by 1% of the previous progress prize winner. Therefore, for the first year this would just be the submission with the best accuracy.
NetFlix Data Set
The challenge participants were given two data sets.
- “The training data set consists of more than 100 million ratings from over 480 thousand randomly-chosen, anonymous customers on nearly 18 thousand movie titles.” Customers details withheld and a random customer ID was used.
- “A qualifying test set is provided containing over 2.8 million customer/movie id pairs with rating dates but with the ratings withheld. These pairs were selected from the most recent ratings from a subset of the same customers in the training data set, over a subset of the same movies.”
The movie rating is an integer score between 1 and 5 stars. The goal is to predict what the withheld rating is in the test data set. The root mean square error (RMSE) is then calculated from predicted and actual (i.e. the withheld) ratings. The RMSE score achieved by the Cinematch program was 0.9525, therefore the grand prize winner must achieve 0.8572 (i.e. 10% accuracy improvement). This was achieved in September 2009.
How big was the crowd Netflix was sourcing from?
“There are currently 51051 contestants on 41305 teams from 186 different countries. We have received 44014 valid submissions from 5169 different teams.” This does not make the 500 or so accuracy challenge participants with currently NO submissions seem so impressive for the Life Grand Challenge. Now I will highlight the differences between the two crowd sourcing models.
First and foremost with the NetFlix challenge you are not competing with the company itself. There is not a team of Cinematch programmers with access to their whole database competing against you. Second, everyone has the same data set to try to figure out a better algorithm. In massive contrast, for the Ion Torrent, larger sequencing centers will have larger data sets for their employees to work with. I currently have a pathetic TWO whole 314 data set to work with. Yes, this is better than having ONE but not much better. While an accuracy challenger working at Sanger, Broad and BGI are swimming in this stuff. Not to mention preferred and early access customers. This clearly highlights that Life Technologies is NOT democratizing sequencing, more like a dictatorship. Don’t be surprised if someone from BGI wins this competition. This unfairness in competition is even worse for the speed and throughput challenge which also favors the sequencing centers with the massive budgets. i.e. Sanger, Broad and BGI!
The target for the NetFlix challenge was a RMSE 0.8572 set in 2006. This target did not change during the period of the competition. In contrast, the target for the accuracy challenge changes every 3 months. This quarter’s target is actually twice as hard to achieve than last quarter. If the active participation was next to zero last quarter, this isn’t a good way of encouraging people to take up the challenge.
The milestone reward each year was $50,000 for the NetFlix challenge. In massive contrast, the Life Grand Challenge gives you absolutely nothing for all your hard efforts. Kinda makes you wanna code to 4am each night and turn up to work as a zombie 😆 No way, I’m waiting for the release of Diablo 3 for doing that 😛
There are two simple things that made the NetFlix crowd sourcing model successful. These two concepts was taken from a great blog post discussing the NetFlix Challenge. I have included my own two points, that of fairness and the value of ideas and concepts.
Both the NetFlix challenge and the Life Grand Challenge have a 1 million dollar grand prize, however the similarities end there. The NetFlix challenge rewards the best submission for the year with $50,000. Imagine all the instant noodle packets you can buy with that money 😆 In addition, the challenge participants are publicly known and therefore is a free advertisement to the world about their talents. This serves as non-monetary but good reward. At the moment for the grand challenge, all participants, submission and leader board is kept hidden from the public.
The value of Ideas and Concepts
My suggestion is to reward ideas/concepts and not just solutions. This in a way will provide small rewards or milestones in trying to reach the best solution. These ideas and solutions may be valued at 5,000 or 10,000 and of course a solution that smashes the benchmark is still worth 1 million. The likelihood of someone in the public coming up with a solution is low but much higher for ideas/concepts that appear to have potential.
To expand a little further on the solution vs idea/concepts discussion. A solution must work in a restrictive framework which computational analysis has no control of. For example, most people working on this solution do not have an Ion Torrent therefore cannot modify flow order, chemistry, etc. If you view the accuracy challenge as a holistic problem with many factors coming together to achieve the goal, sadly a solution may be difficult to achieve. However, an idea/concept does not have to be limited to a restrictive framework. For example, an idea/concept could become a solution if things upstream were tweaked to take advantage of this idea/concept.
Besides some people having bigger brains and no life, the NetFlix challenge was quite fair. Everyone had the same data set available to them and they were not competing with NetFlix employees. While the Life Grand Challenge is as fair as an election in Zimbabwe 🙂 In the Ion Community the concept of offering a small grant to promising participants was entertained. Also a discount of 30% was offered for a PGM purchase for accuracy challenge participants. I argued that it would be impossible for someone to prove that they were active. It would go something like this “Thanks for the discounted PGM, I tried improving accuracy but it was too hard so I gave up after 5 minutes”. This offer on Innocentive was promptly removed for other reasons. I do hope they come up with some innovative initiatives to improve an obvious lack of fairness.
The 454 homopolymer problem has existed since the technology was released. Why would Ion Torrent persist with a similar flawed design? The 454 problem still persists so is not going to be solved in one quarter, therefore it is important to keep the motivation of the challengers up at all times. My suggestion, a leader board that is publicly available. All programmers love to see their name in lights particularly if it is at the top. Also a substantial quarterly prize and I’m not talking about a free T-shirt 😛 Even the PGM Users in the Ion Community showing their best Ion Torrent runs are given a reward. Developers have been offered nothing, little surprise there hasn’t been any third party plugins developed for the Torrent Browser.
This concludes my opinion based blog on the Grand Challenge and how it may be a “grand challenge” to improve the current model that appears more like a public relations stunt rather than a true fair competition. The criticism is harsh but I believe I am voicing the concerns of the many and through feedback, I have faith the competition will improve. My next few posts will be purely technical because there so much opinions I can write before I start sounding like Abe Simpson.
My next post will be Part 3 on the Fundamentals of base calling, where I will outline the EXACT computational challenges that must be overcome to improve accuracy. This post will be released at the start of next week along with the source code used to generate the data for that series.
Disclaimer: For the good of all mankind! This is purely my opinion and interpretations. I have tried my best to keep all analyses correct. Opinons are like PGMs, everyone should have one 🙂