Signal Processing – Room for improvements?

During the Ion Torrent User Group Meeting at ASHG, Rothberg talked about how he would approach the Accuracy challenge. He said he would seek out people who did well at the mathematics olympiad and get them to work on the problem and pay them $5,000. What a cheap skate 😀 He also said that there should be more focus on the actual raw signal processing. Thinking on the same lines, Yanick asked in the comments of a previous post, what contribution raw signal processing made with the accuracy improvements in Torrent Suite 1.5.

At the moment with the data processing, there is a way to separate the contributions that raw signal processing makes from signal normalization, dephasing and base calling (all three abbreviated to base calling for the rest of this blog). This can be done by using the 1.wells files generated as this marks the end of raw signal processing. Therefore, we can use a version of Torrent Suite  to do the raw signal processing and a different version to do the base calling using the –from-wells command line option (Figure 1).

Figure 1. Using different versions of Torrent Suite to do signal processing and base calling. Results grouped along the x-axis by which version did the base calling and colored by which 1.wells file version was used as the input (i.e. what version did the signal processing).

Input data is from control library of DH10B sequenced on a 314 chip in our lab (features in a few previous blog posts, including the one on rapid software improvements). This input data was used for each result featured in Figure 1 to make the analysis comparable.  In this bar plot the measure of performance is the number of 100AQ23 reads that results – the current measure of accuracy in the Accuracy challenge. The general conclusion is that signal processing changes between versions makes little contribution on accuracy performance. This is evident as there is little difference in heights between bars within each of the base calling groups along the x-axis. In contrast, all 1.wells files which used v1.5 for base calling showed marked improvements regardless of what version of Torrent Suite was used for signal processing (i.e. 1.wells file). Interestingly, the signal processing from v1.4 appears to perform better than v1.5. This can be largely explained by an increase in the number of beads categorized as “live” in v1.4 compared to v1.5.

There are two possibilities to explain the small contributions made by signal processing:

  1. This year most effort was dedicated to dephasing and normalization which are the source of major improvements in Torrent Suite 1.5. Improvements in signal processing will be the next focus. OR
  2. The current signal processing model has reached it’s limit and a new model needs to be developed in order to see further improvements.

In this post and last post on software improvements, only 314 data was featured. To give a more comprehensive representation, a 316 (STO-409) and 318 (C22-169) run was used to observe accuracy improvements (Figure 2). Thanks to Matt for supplying the 1.wells file for these publicly released runs.

Figure 2. The improvements in base calling between v1.4 and v1.5 using 1.wells file from v1.5 as input. I couldn’t get v1.3 to run on either the 316 or 318 data 😕

What made the analysis featured in this blog post a little challenging was the 1.wells format changed to make use of the HDF5 standard in Torrent Suite 1.5. This has allowed for the files to be better organized, parsed and to be compressed by approximately 2-fold. Torrent Suite 1.5 is able to read in 1.wells files generated by previous versions (i.e. backward compatible) but unfortunately vice versa does not apply 😡 I had to write a small program to convert the 1.wells files (from 1.5) back to the legacy format. Kudos to Simon for the tips 🙂 What was a little concerning is the current implementation loads the whole 1.wells into memory which consumes 25-35%, 50-70% of total memory for the 316 and 318 chip.

<insert some awesome conclusion here> 😀

Disclaimer: For the good of all mankind! This is purely my opinion and interpretations.

One response to “Signal Processing – Room for improvements?

  1. A question was posted regarding the performance at higher Q values.

    In this blog I have plotted 200AQ23 because this is the benchmark used for the Accuracy challenge. When comparing base calling between v1.4 and v1.5, 316 chip the 200AQ47 reads increased from 290,216 to 1,066,187 and for the 318 chip from 1,054,633 to 1,915,634. Hence the improvements still look dramatic.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s