Fundamentals of base calling (Prequel)

In the coming days I will write and discuss the following major topics:

  1. Signal Normalization and Signal droop
  2. Correcting for incomplete extension and carry forward
  3. Bias and weakness in these current methods

This will be presented as a three part series, where I will provide my own code or pseudo code to demonstrate a point.

Before I start posting this series there are a few things I must write. I think Life Technologies has made a great move establishing the Ion Community allowing the free flow of information amongst members. Yes, some have criticize that it should be open like SEQAnswers but the difference is that within this forum a lot of intellectual property is discussed and exchanged and by having it closed and requiring registrations forces people to accept the terms and conditions. To join the TorrentDev part of the Ion Community, you do not need to own a PGM and besides this is the most interesting part of the Ion Community. The PGM Users section is just for people that like to beat their chest to show how great they are to others. I have benefited greatly from my interaction in the Ion Community, asking questions directly to people in R&D and getting an answer in less than 10 minutes in the morning. Yes, these people are bloody amazing! I will share a high level summary of what I have gained from discussing technical details with the active Life Tech contributors on the forum. I would like to thank Mike, Mel and especially Simon for their extremely detailed responses.

Secondly, it is great that Life Tech is willing to release the source code, that is making it open source to members of the Ion Community. Unfortunately, I will not be discussing any details of the code in this blog and these details are best discussed within the Ion Community. I will only discuss the high level concepts using my own code that does not rely on the code that was released. Looking at the approximately 20,000-50,000 lines of code and pulling it apart and summarizing the mathematics and modelling has been an excellent learning experience for me. It is quite rare, one has the opportunity to look at the nuts and bots of commercial software. I think the current members of the Ion Community should take the time to understand and learn from the code. I know they are not currently doing this as I seem to be the only one making them aware of things missing from the gzipped tarball that prevents the code from compiling. eg. the missing CUDA development libraries. Some may argue, why look at the code in the first place? And wouldn’t that influence any contributions you can make? My response is… firstly why reinvent the wheel? Secondly, it is good to know what methods have been attempted. The worse thing that could happen is that you attempt the same thing not knowing, but worse your attempt is quite lousy and inferior.

Lastly, the 1 million dollar accuracy challenge is a great idea to get the whole community involved and interested. I like to call it the “homopolymer problem”.  Life Tech has taken a great approach with this, they are saying yes there is a bit to go to improve on these homopolymer errors but with the community’s help we can get there. Unlike Roche’s approach with the 454… “what homopolymer error, we don’t have any problems” and no you cannot see how we are currently dealing with it.

I believe the OPEN approach taken by Life Tech and the Ion Torrent being out there before the MiSeq makes it an easy target for unfair criticism. Most independent blogs and discussions online are quite negative or criticize Ion Torrent data and accuracy. Where is the faith that this technology will improve? I bet everyone was saying the personal computer was crap when it first came out and that everyone should be still running computers the size of your house! We got an Ion Torrent PGM because we saw the innovation and promise in the technology. Just remember this, No one can criticize the MiSeq because NO ONE currently owns one.

Anyways enough ranting, stay tuned for my three part series on the fundamentals of base calling.

Disclaimer: For the good of all mankind! This is purely my opinion and interpretations.  I did not get paid by Life Tech to write this and am not under the influence of drugs while writing this.

13 responses to “Fundamentals of base calling (Prequel)

  1. Quote: Secondly, it is great that Life Tech is willing to release the source code, that is making it open source to members of the Ion Community.

    That doesn’t sound like open source to me! If it was, you could repost the code. It sounds like “look but don’t touch” under an NDA (the details of the arrangement are not obvious on the Ion Torrent website).

    But I do agree it sounds more open than Roche 454 and the “Newbler” suite of off instrument applications.

    • Thanks for the comments. I fully agree with you that it isn’t as open source as github or sourceforge. The code is GPL and there are also terms and conditions when signing up for the Ion Community. The GPL legal documentation goes for pages and I never read the terms and conditions when joining. Since I’m a poor student, I’ll play it on the safe side and not discuss or put any of the code online. Besides most of the discussion will not require their code, my generic code is sufficient to demonstrate a point.

      • Wow – if IonTorrent really are licensing (some of) their tools as GPL, that would be really impressive. But if so, why do they hide it and sample data behind an strict NDA developer agreement? I looked at joining the “Ion Community” but the pages of pages of T&C put me off.

  2. The source code for Torrent Suite is licensed under a straight GPL v2. yes, there are terms and conditions associated with the Ion Community, but there are T&Cs with the GPL v2. I encourage everyone to read them if they want to right to complain about them. I’m not certain where the “strict NDA developer agreement” language is coming form that Peter mentions.

    Open source, open comment.

  3. Hi Mike,
    Thanks for visiting my blog site. Also thanks for the comments and clarification. Your support means a lot 🙂

    I apologies to Peter and others, if they have misinterpreted my blog post to infer that things are closed. I am just playing on the safe side as there is a mountain of GPL legal paperwork to read and I don’t have the time or have a lawyer. I rather spend that time understanding the source code 😛

  4. Mike, I would link to the Ion Torrent T&C page if I could, but it is part of the registration process on which is cookie dependent. This is a 3rd party URL, but I notice redirects to Jive Software so its probably OK. Since the T&C doesn’t have a URL of its own, that makes open comment hard. Certainly part of it sounds like a NDA to me, quote:

    Content may include certain restricted information, including, but not limited to, protocols, user guides, user bulletins, application notes, performance updates, tutorials and package inserts or other Content posted in an access restricted area of the site (collectively, the “Restricted Information”). You acknowledge that Life considers such Restricted Information to be of a confidential, proprietary or trade secret nature, and that the Restricted Information, is, therefore, Life’s confidential information. Accordingly, you agree that you (i) shall maintain the Restricted Information in confidence, (ii) shall limit dissemination of the Restricted Information to your directors, officers, employees, agents, and subcontractors who require such Restricted Information in order to perform their essential job functions, (iii) shall not disclose such Restricted Information, or any portion thereof, to any other person, and (iv) shall use such Restricted Information only as and only to the extent expressly authorized in writing by Life.

    Not having seen the community site, it is not clear what fraction of the content is “Restricted Information”.

    I notice you offer a couple of E. coli O104:H4 datasets as freely open downloads on aka but as a visitor to the main Ion Torrent website this is no hint of this. Look at and try searching some keywords like download and software.

    Your main website could be a lot more “open”, and making the GPL open source available easily would be a big plus point.

  5. Thanks for your comments Peter. Wow you did more reading than me on the T&C. Good work.

  6. Thanks Mike for the words of wisdom 🙂

  7. Hi Mike – Your lawyers seem to be hurting the openness drive. I really do think Ion Torrent would benefit in the long run by making more of your resources available directly from your website *without* requiring people to go to all the hassles of registering an email, agreeing to T&C, and so on. If your code is released as GPL, why not just put up a download page on the main Ion Torrent Website (with links to the community for people to follow up on if interested)?

  8. A half way suggestion is to mention the source code perhaps on the main Ion Torrent website or Applied Biosystems and then direct users to the Ion Community if they wish to look at it. I think the biggest shame is that most people aren’t aware of the existence of the source code. As Peter and others on twitter have mentioned this gives Ion Torrent a big advantage over its competitors.

  9. James on Twitter has pointed out that hiding code behind a NDA is incompatible with the GPL, which I have expanded into a blog post (linking back to here).

  10. Hi Peter, thanks for the posting a link to your blog post. Interesting read and thanks for the mentions 🙂 Hopefully they take some of your feedback on board and move towards a more open model. I guess something like this has not being done before in BioTech (perhaps I’m wrong) and I guess it takes time or perhaps there are people in LifeTech that are still holding on to the past similar to the dinosaurs in the Movie and Music industry.

  11. I have just removed comments as the authenticity and identity of the commenter could not be confidently verified. I have also removed comments in the past when it appeared Life Tech employees were advertising competitor’s products. This situation is obvious but sometimes it is not so obvious. It is important for all to note the Internet is full of trolls and the comment trail at the bottom of blog posts are not immune to these trolls. I sincerely apologies for any inconvenience caused.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s