The main topic of this blog post is to detail a plugin that I have developed for the Torrent Browser. There are currently two plugins which does variant calling: (1) Germ-lineVariantCaller is a general variant caller plugin and (2) AmpliSeqCancerVariantCaller is specific to the AmpliSeq Cancer Kit. The plugin “Annotate” supplements the two variant caller plugin currently available as it addresses three important questions in disease genetics.
Novel versus Common Variants
Whether a variant is novel or common in the population. This can be done by seeing if a variant exists in dbSNP (version 132). A tool that can differentiate between novel and common variants saves time as novel variants are more likely to be disease causing compared to common variants. The Genome Analysis Toolkit (GATK) has an option to incorporate annotation from a VCF file through the -D option but I have decided against using this as the chromosome order in the dbSNP VCF file MUST match with the reference file used for variant calling. This creates a little dilemma as the hg19 reference stored on the Torrent Server is ordered different to the dbSNP VCF file from the GATK 1.2 resouce bundle. For this plugin, I have decided to index the VCF file using tabix and call the variants outside the GATK framework.
Functional Consequence of Variant
Whether a variant lies within a gene and the functional consequence. For example, does the variant result in an amino acid change? (i.e. non-synonymous variant). Common tools used are SNPEff (Latest update on Christmas Day!!) and ANNOVAR. Although SNPEff uses Gencode annotation and therefore is more comprehensive, it is quite hard to summarize information and the majority of transcripts (ENST) are non-coding, thus for this plugin I have decided to go with ANNOVAR which uses Refseq (NM) annotations.
Functional Impact of Novel Non-Synonymous Variants
Whether a novel non-synonymous variant is likely to have a functional impact on the resulting protein. This can be achieve using functional impact prediction tools. I have decided to use PolyPhen2 and SIFT for predictions as pre-computed values are available as text files on the ANNOVAR download page. I have decided not to use ANNOVAR for calling the functional impact predictions as the implementation is unusually slow. To speed up things I sorted the SIFT and PolyPhen2 prediction text file followed by indexing using tabix. This allows variants to be more efficiently searched within the now sorted text file.
Figure 1. Result from the C01-288 run of the AmpliSeq kit available for download in the Ion Community. All GATK variants called are KNOWN.
Figure 2. Result from the BUT-317 run of CFTR amplicon sequencing available for download in the Ion Community. Only one variant was called by GATK which was a novel variant. As this is a screenshot, you can’t see the tool tip for Polyphen2 (PP2) and SIFT. D = Damaging and SIFT scores < 0.05 are considered damaging.
We will be using this plugin in an up coming project using custom designed AmpliSeq primers on 10 large muscle disease causing genes across our undiagnosed patient cohort. Big thanks to Kelly and Life Technologies for awarding an Application Grant to our lab for this project
The Annotate plugin is a shell script which calls a collection of tools. It is important for organizations using this to have a look at the licenses and conditions of use for the following tools: ANNOVAR, PolyPhen2, SIFT, GATK, samtools, Picard Tools and tabix. For instance, ANNOVAR may not be free to use for commerical organizations “ANNOVAR is open-source and free for non-profit use. If you use it for commercial purposes, please contact Ellen Purpus, director of OTT (PURPUS@email.chop.edu) directly for license related issues.”
Thanks to David from EdgeBio for the feedback. EdgeBio created the first community developed Plugin called SNPEff, a neat plugin and you can check out more details on their blog post.
Disclaimer: For the good of all mankind! This is purely my opinion and interpretations. We sit on the shoulders on giants – this plugin is a script composed of available open source tools and resources.