In the first part of this two part blog post, I highlighted the importance of test fragments and analyzed the results from Test Fragment A. In this 2nd part I will analyze and discuss the accuracy of the remaining test fragments, that is Test Fragment B, C and D.
Test Fragment D
This sequencing run of Test Fragment D is unusually bad. There is an error peak at the A 2mer at base position 60. This accounts for 616 (16%) of the 3,875 50AQ17 reads. You may have noticed that Test Fragment D has A 2mers scattered through the sequence. The figure below shows how it evolves over time. Again the ideal 2mer distribution is a narrow peak centered around 2.0.
In contrast to the Test Fragment A reads there is a shift towards the right for the A 2mers. This will eventually result in an over call of a 3mer. This will happen in the majority of calls at base position 77 (colored green).
Test Fragment B
Just to reiterate, Test Fragment B and C are no longer spiked in the sample before a sequencing run. However, it still remains in the Ion Torrent Controls Material Kit. The results I will be analyzing are from the first ever run of our PGM which contained only test fragments. What a waste of consumables! But it does generate a lot more reads for each test fragment.
The largest error peak occurs just before the G 2mer at base position 47. Normally the bar would sit on the first base of a 2mer instead of preceding it, however in this case the G 2mer is under called. The error peak at base position 47 accounts for 2,213 reads . As this occurs before base position 50, this massively reduces the number of 50AQ17 reads. You may have noticed that Test Fragment B has G 2mers scattered through the sequence. The figure below shows how it evolves over time.
The distribution for the G 2mer at base position 80 (colored aquamarine) is quite weird, with another peak at zero. The reason for this apart from a program bug or me goofing up, is that the majority of the strands are completely out of phase with the flows. Also the 2mer distribution colored in black explains the G 2mer under call at base position 48. I will explain further what I mean by both anomalies in my next blog post.
Test Fragment C
Last one !!! If you have made it this far you must have no life like me !! just kidding :o)
At first glance this looks like the most awesomiest one of them all. However, you may have noticed the error peak preceding the C 2mer call is the first 2mr in Test Fragment C. This accounts for 8,511 (83%) of the 10,216 50AQ17 reads for Test Fragment C. As there is a lack of 2mrs in this sequence, the figure below shows how the C 1mer distribution varies over time. In this case, the ideal distribution is a narrow peak centered around 1.0.
The later 1mer calls are above the 0.5 threshold and thus will still be called as a 1mer. However, a large fraction of distribution is below 0.75 (dotted vertical black line). Why is 0.75 an important threshold? Given the supposed linear nature of homopolymer calls, a 1mer call below 0.75 would suggest if there was a 2mer here (i.e. twice the signal) it would have a value below 1.5 and would be incorrectly called as a 1mer.
Anyways that wraps up a brief analysis of all 4 of the test fragments. In my next post, I’ll write about signal normalization, droop and phase correction or in other words how to get from a raw signal (i.e. 1.wells file) to a corrected signal (SFF file). Most of this knowledge comes from asking questions in the Ion Community and picking apart the 20,000-50,000 lines of code released by Life Technologies. So kudos goes out to Simon, Mike and Mel who have answered the majority of my TorrentDev questions on these topics. And yay for open source code!
Through sharing and caring we can get off this planet when our sun dies!
Disclaimer: For the good of all mankind! This is purely my opinion and interpretations. I have tried my best to keep all analyses correct.