Test Fragments are known sequences and therefore the ideal output can be derived and quality metrics can be performed. The Ion Torrent Sequencing platform uses test fragments (slightly less than 100 bp in length) spiked into the experimental sample before loading the sequencing chip. The current sequencing kit uses 2 test fragments (A and D), while previous versions have used 4 test fragments (A-D). Once sequencing and analysis is complete, the test fragment metrics appear in the summary report. Most PGM users either don’t understand or care about this section, paying most attention to the library fragment metrics. They fail to understand the test fragment sequencing is a good measure of sequencing quality, as each fragment is typically sequenced over 1000 times. In comparison the E. coli library provided is sequenced at 6-8X average coverage on a 314 chip. Therefore, test fragments plays an important role in reporting sequencing quality for a particular run.
I will now step through the Test Fragment A metrics for one of our sequencing runs (314 chip, Torrent Suite 1.4).
In this run there were 4,354 wells containing live Ion Sphere Particles (ISP) with Test Fragment A attached. Q17 is a quality score (2% error), corresponding to 1 base error allowed per 50 bases. Given the sequence length < 100, this corresponds to 1 error at most. Thus, for a given read the Q17 length would be one base before the second error in Test Fragment A. 50AQ17 is the number of test fragments with one or zero errors in the first 50 bases read (after alignment with Test Fragment A sequence). In this run 3,502 (80%) of test fragment A reads have one or zero errors in the first 50 bases. For more details on quality scores read this wiki page.
Below is the histogram AQ17 histogram. They represent counts on how far each of the 4,354 reads progressed before it accumulates over 2% sequencing error (Q17). Therefore, each histogram bar represents a population of sequencing reads with the second error (assuming read length > 50) occurring at the corresponding base position.
The peaks at 77 bases into the sequence indicate that the majority of reads are able to make it here before encountering the first error. You may have noticed between base position 50 to 75, Test Fragment A does not present a homopolymer sequence (n > 1) to challenge the sequencing. The first challenge it receives is deep into the sequence where most of the reads fail. At base position 77 (CC call), 1,668 reads (48% of the 50AQ17 reads) are unable to call the homopolymer correctly, with a further 28% at the adjacent (AA call). Although Test Fragment reads do not appear to be filtered based on poor signal profile, to me this is still a serious concern given Ion Torrent have 200 bp and 400 bp read milestones.
Taking the population of 1,668 reads which make up the largest peak at base position 77, we can have a look at how the corrected 2mer (i.e. homopolymer length 2, pronounced how Arnie Schwarzenegger would say “tumor” 😮 ) signal evolves over the flows or in other words, over time. Note: The corrected signal is taken from the SFF file and contains the signal after phase and droop correction have been performed. For an ideal 2mer call you would expect a narrow distribution centered around 2.0.
The above figure shows how rapidly the 2mer signal decays over a 100 bp read.
The first 3 2mers called (flows 32, 35 and 44) are very close to the ideal distribution. The next two 2mers called (flows 51 and 75) start deviating away. The next 2mer distribution (flow 134, base position 77) rapidly hits zero at 1.5, which is expected as these reads called the CC 2mer incorrectly. The last two 2mers (flow 140 and 141), demonstrate the problem cannot be fixed trivially by reducing the 2mer threshold. eg. from 1.5 to 1.4.
In my next blog (Part 2), I will discuss results from the other 3 test fragments (C-D).
Disclaimer: For the good of all mankind! This is purely my opinion and interpretations. I have tried my best to keep all analyses correct.
Hooray my first blog!! Flame away!