Ion Torrent – Rapid Accuracy Improvements

The periodic public release of data sets by Life Technologies and others in the scientific community has allowed me to perform a “longitudinal study” of the improvements made on the Ion Torrent. In fact, the last few months has been quite exciting with Ion Torrent engaging the community through public data release along with source code. This has made the whole scientific community feel in some way as being part of the action. In this three part series which will run in parallel with the Signal Processing series, I will look at three major developmental themes:

  1. Improvements in Accuracy
  2. Homopolymer problem – can’t call it improvements because I haven’t analyzed the data yet πŸ˜›
  3. Changes in the ion-Analysis source code. This binary is largely responsible for all the data analysis, that is going from raw voltages (DAT files) to corrected incorporation signal (SFF files). Subsequent base calling from SFF files is quite trival πŸ™‚

The analysis was performed using Novoalign in the Novocraft package (v2.07.12) according to the instructions detailed on their Ion Torrent support page. The plots were produced using the Rscripts provided in the package with slight modifications to change the look and feel. I used the fastq files as input and did not do any pre-processing to ensure the reproducibility of the data. The same command line options was used and is noted at the bottom of each plot. The only exception is the last plot for the long range data set where the “-n 300” option was used to inspect quality past the default 150 bases. Kudos to Nick Loman for the help (see Comments below). I quite like the package and the fast support provided on the user forum (kudos to Colin). There is a nice gallery of figures provided on their Facebook page.

From the quality plots there are two very obvious things. First, the predicted estimation is overly conservative and they are underselling themselves by an average of 10 Phred points. This was noted also on the Omics Omics, EdgeBio and Pathogenomics blog posts. Second, the predicted quality along reads from the 316 data set (Figure below) used by the Illumina MiSeq application note is an unfair and incorrect representation of what is happening.

Raw accuracy cheat sheet:

Q10 = 90% accuracy
Q20 = 99% accuracy
Q23 = 99.5% accuracy
Q30 = 99.9% accuracy

In my opinion, actual observed accuracy is more important than predicted. For example, I predicted network marketing was going to make me a fortune and I would be financially free by now. Unfortunately, my friends didn’t want to buy my stuff πŸ˜₯ Their loss !! The plots from the long read data set shows the massive improvements made in just a few months. This makes me very optimistic for the futureΒ  πŸ˜›

18th May 2011 (Analysis date)

Source: EdgeBio (Project: 0039010CA)
Chip: 314
Run date: 2011-04-07
Library: DH10B
ion-Analysis version: 1.40-0
Flow cycles: 55

8th June 2011 (Analysis date)

Source: Life Technologies (316 data set)
Chip: 316
Run date: 2011-06-06
Library: DH10B
ion-Analysis version: 1.49-3
Flow cycles: 65
PGM: B13


21st July 2011 (Analysis date)

Source: Institute for Neuroscience and Muscle Research (my lab :))
Chip: 314
Run date: 2011-07-21
Library: DH10B
ion-Analysis version: 1.52-11
Flow cycles: 65
PGM: sn11c032809

Source: Institute for Neuroscience and Muscle Research (Run 2)
Chip: 314
Run date: 2011-07-21
Library: DH10B
ion-Analysis version: 1.52-11
Flow cycles: 65
PGM: sn11c032809

28th July 2011 (Analysis date)

Source: Life Technologies (Long Read data set)
Chip: 314
Run date: 2011-07-19
Library: DH10B
ion-Analysis version: 1.55-1
Flow cycles: 130
PGM: B14

Democratizing sequencing?

This wouldn’t be a blog post by me if I wasn’t complaining about something… I mean giving feedback πŸ™‚ A slogan that is regularly used by Ion Torrent is how they are “democratizing” sequencing. In terms of releasing data sets and source code they are far ahead of their competitors Illumina and Roche. The above analysis would not be possible without public release of data sets from Life Technologies and also EdgeBio (Ion Torrent Sequencing service provider). Illumina has provided some data sets from their MiSeq. This killed my bandwidth downloading as they forgot to compress the fastq files. What n00bs! When will Illumina and Roche provide more data sets for their competing desktop sequencers? In the case of Roche when will they provide any? Also when will they learn that people outside the major sequencer centers have brains and perhaps they should interact with them every now and then!

Despite the great efforts made by Life Technologies, there is still a long way to go in my mind to truly democratize sequencing. For example, early access to new products should be given to labs that are trying to make a difference in society and not just their “special customers”. What better way to promote your technology by showing that a small lab with little experience can get it to work. I am not impressed at all if an experienced sequencing lab can get it to work. Giving these products to just special customers (aka the “big boys”) is NOT democratizing sequencing, it is maintaining the dominance these labs have over high impact publications. Our lab has requested for early access to the TargetSeq enrichment system (not to be confused with the Qiagen SeqTarget junk). Having access to this enrichment would allow us to explore the possibility of diagnosing children with muscular dystrophy more efficiently and help parents, families and carers plan the future natural progression of these crippling diseases. Having early access will give us an opportunity to produce preliminary data for the next grant round. How about helping the “little guy” for a change?

In my next blog post of this series I will provide an independent analysis of homopolymers using the data sets above. This will provide further discussion in addition to the great post comparing Ion Torrent and 454 homopolymers from flxlex.

Disclaimer: For the good of all mankind! This is purely my opinion and interpretations. This is an independent analysis using Novoalign kept simple so others can reproduce the results. Despite begging, I have never been treated to free lunch/dinner or even a free T-shirt by Life Technologies πŸ˜₯

18 responses to “Ion Torrent – Rapid Accuracy Improvements

  1. Nice post. Your final plots for the 314 long read dataset would be more informative if you plotted the full length of the read, not just the first 150 bases though,

  2. In response to a tweet comment regarding amplicon data. There are two reasons why I can’t analyze amplicons. First, there are no public releases and if there are, they would be too few to compare over time. Second, I would love to show ours but all our projects have been delayed since we have to wait 3-5 weeks for Ion specific consumables, i.e. we have to wait until the “big boys” and “special customers” get their orders first πŸ˜₯ Democracy?

  3. It has always been interesting to see how large companies deal with members of the scientific community, given that companies primarily exist to make money, while people pursue a career in science to uncover new knowledge, but often need to operate like a business with the competition for grants and other funding. Companies that have their own research departments and research groups that enter into the commercial field with the products of their work add a further depth to this as well.

    From what i have seen so far Life Technologies have been promoting the idea that they are trying to open the field for everyone, with sharing of data and affordable equipment. They also, in my experience, seem to be trying to take the stance of letting the machine and its results do the talking in response to claims by other companies. It will be interesting to see how well this holds up when other companies have their competing platforms availible.

    I can see why the double standard develops with the “big boys” getting first access to a lot of the resources, they have significantly higher throughputs, leading to larger and more consistant business which is what a company wants, but this can be very frustrating to the rest of us who are significantly invested in our own projects and can see the next big breakthrough just around the corner when we can use that new tool.

    We’ll just have to wait and see how open and sharing the companies become/remain when the other competing platforms become availible and in use. Maybe there will be free t-shirts to persuade people if there is actually another option they can go and buy now instead of needing to wait for it.

  4. Thanks again Andrew for taking the time in writing your opinions as a detailed response. Just to clarify, I’m not saying don’t bother with the “big boys” as they will have the capacity of doing things fast through their infrastructure and expertise. I’m just saying don’t forget about the “little guys” also because they should get a break every now and then.

    We really need to get a custom t-shirt that says “Need some more free t-shirts, baby!!”

  5. If I had a dollar for every great discovery around the corner… Is that like the great miracle of the human genome sequence ?

  6. swim, thanks for your comments. I’m not sure what you are trying to say. What I can gather is that you have a lot of dollars and perhaps you can buy a t-shirt? πŸ˜†

  7. Regarding Ion Torrent and Early Access to new kits… It’s important to remember that when any company gives early access to new kits to “special” customers, the company is trying to see how untested kits work outside of the development environment. It is incredibly difficult for a company to give early access to kits to everyone. That would essentially be launching an untested kit and an irresponsible act by the company. Whether we give it to a big lab or a small lab is just a decision made quickly to find someone willing to test it and give feedback. Releasing half tested products to the masses shouldn’t be confused with democratizing sequencing. Lowering the costs (both capital and operating) and lowering the technical barrier to results IS democratizing sequencing. Broad release of untested products is just bad business.

  8. Thanks Mike for the comments and giving the business side of the debate. It is a good point that there are other aspects regarding “democratizing sequencing” such as initial outlay and operational costs. I guess releasing half tested products would lead to the iPhone4 fiasco that happened in Australia. They found that holding the iPhone4 a particular way with your hand cut out reception. They had to distribute this nice looking cover to make everyone happy. Biotech companies have nothing to worry about when things don’t work… there’s enough sloppy RAs to place the blame on πŸ™‚

    I’m confused, if everyone got early access it wouldn’t be called early access? Then I want access to the thing even before it has been produced so I can brag I got earlier than early access. Perhaps there’s a Bit Torrent (not to be confused with Ion Torrent) to facilitate that? ROFL πŸ˜†

  9. Hi Lek,

    Thanks for the comprehensive blog. I am not going to thank you for the jokes though πŸ˜‰ On a seriously and partially related note, I was wondering if you are experiencing any chip failures. In other words, do you know of any cases where the data is useless and it cant be attributed to a prep error? Thanks again.

  10. Marko thanks for the comments. Our lab is small so does not go through the chips that fast. I have not read any reports on the Ion Community “PGM Users” section regarding chip failures. There are a few things to note. (1) The PGM runs a chip diagnostic test before the run. (2) There are wells that are not measuring the signal correctly. The chip as a whole may be okay but there may be some slight imperfections in the wells.

    In some cases it would also be hard to narrow down what was due to sample prep and what was due to defective wells on the chip. When people in “PGM Users” look at the chip well occupation heat maps, they believe the result is fully to do with how they physically spun the Ion Sphere Particles down (ISP). At times I don’t see a pattern between method and percent occupation. No one has ever thought it was due to inter and intra chip batch variation. They are running a competition to see who can get the most output out of each chip, IMHO given two experienced PGM users… it’s really rewarding you on luck rather than anything else. It should be based on the median or average of all your runs for that period. Some may argue I’m a sour grapes big cry baby πŸ˜₯ which I am πŸ˜€

    But I digress… the answer to your original question is NO.

  11. Thanks for the input! I am looking forward to your continued postings πŸ˜‰

  12. For the “predicted” vs. “actual” accuracy question, the point of predicted accuracy is that many tools use this in their calculations. The more accurate these estimates, the happier those tools are. Of course, you can always go and recalibrate everything , but that’s an extra step one would rather avoid.

  13. Pingback: Ion Torrent – QV Prediction Algorithm | BioLektures

  14. Keith, thanks for the comments and expert input. I have updated the section in another blog post where I mention the value of predicted accuracy.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s