Best Practice pipeline for WES, including VQSR

I am currently processing ~100 exomes and following the Best Practice recommendations for Pre-processing and Variant Discovery. However, there are a couple of gaps in the documentation, as far as I can tell, regarding exactly how to proceed with VQSR with exome data. I would be grateful for some feedback, particularly regarding VQSR. The issues are similar to those discussed on this thread: http://gatkforums.broadinstitute.org/discussion/4798/vqsr-using-capture-and-padding but my questions aren't fully-addressed there (or elsewhere on the Forum as far as I can see).

Prior Steps:
1) All samples processed with same protocol (~60Mb capture kit) - coverage ~50X-100X
2) Alignment with BWA-MEM (to whole genome)
3) Remove duplicates, indel-realignment, bqsr
4) HC to produce gVCFs (-ERC)
5) Genotype gVCFs

This week I have been investigating VQSR, which has generated some questions.

Q1) Which regions should I use from my data for building the VQSR model?

Here I have tried 3 different input datasets:

a) All my variant positions (11Million positions)
b) Variant positions that are in the capture kit (~326k positions) - i.e. used bedtools intersect to only extract variants from (1)
c) Variant positions that are in the capture kit with padding of 100nt either side (~568k positions) - as above but bed has +/-100 on regions + uniq to remove duplicate variants that are now in more than one bed region

For each of the above, I have produced "sensitive" and "specific" datasets:
"Specific" --ts_filter_level 90.0 \ for both SNPs and INDELs
"Sensitive" --ts_filter_level 99.5 \ for SNPs, and --ts_filter_level 99.0 \ for INDELs (as suggested in the definitive FAQ https://www.broadinstitute.org/gatk/guide/article?id=1259 )

I also wanted to see what effect, if any, the "-tranche" argument has - i.e. does it just allow for ease of filtering, or does it affect the mother generated, since it was not clear to me. I applied either 5 tranches or 6:

5-tranche: -tranche 100.0 -tranche 99.9 -tranche 99.5 -tranche 99.0 -tranche 90.0 \ for both SNPs and INDELs
6-tranche: -tranche 100.0 -tranche 99.9 -tranche 99.5 -tranche 99.0 -tranche 95.0 -tranche 90.0 \ for both SNPs and INDELs

To compare the results I then used bed intersect to get back to the variants that are within the capture kit (~326k, as before). The output is shown in the spreadsheet image below.

http://i58.tinypic.com/25gc4k7.png

What the table appears to show me, is that at the "sensitive" settings (orange background), the results are largely the same - the difference between "PASS" in the set at the bottom where all variants were being used, and the others is mostly accounted for by variants being pushed into the 99.9-100 tranche.

However, when trying to be specific (blue background), the difference between using all variants, or just the capture region/capture+100 is marked. Also surprising (at least for me) is the huge difference in "PASS" in cells E15 and E16, where the only difference was the number of tranches given to the model (note that there is very little difference in the analogous cells in Rows 5/6 andRows 10/11.

Q2) Can somebody explain why there is such a difference in "PASS" rows between All-SPEC and the Capture(s)-Spec
Q3) Can somebody explain why 6 tranches resulted in ~23k more PASSes than 5 tranches for the All-SPEC
Q4) What does "PASS" mean in this context - a score =100? Is it an observation of a variant position in my data that has been observed in the "truth" set? It isn't actually described in the header of the VCF, though presumably the following corresponds:
FILTER=<ID=VQSRTrancheSNP99.90to100.00+,Description="Truth sensitivity tranche level for SNP model at VQS Lod < -38682.8777">
Q5) Similarly, why do no variants fall below my lower tranche threshold of 90? Is it because they are all reliable at least to this level?

Q6) Am I just really confused? :-(

Thanks in advance for your help! :-)

Best Practice pipeline for WES, including VQSR - what is best?

Trending Articles

Avril Lavigne – Let Go (20th Anniversary Edition) [iTunes Plus M4A]

Fushigi no Dungeon – Furai no Shiren 3: Karakuri Yashiki no Nemuri Hime (JPN)

LAG, Lacp configuration on Mellanox switches

Who's been in court? A round up of cases heard by Essex magistrates

Man dies and another in serious condition after A614 crash between Driffield...

Sarah Samis, Emil Bove III

Papa Roach – Greatest Hits Vol.2 The Better Noise Years (2021) [FLAC...

Terry Bon Chaka — Zoozey (Ft. KK Fosu) #ThrowBack

Novel : I Love You, Stupid! 2

Dante Kennybrew, 19

Man charged with July slaying of Jovan Hopkins in Back of the Yards

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

Practice Sheet of Pronoun References for HSC Students

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

99 God Status for Whatsapp, Facebook

The Angry Birds Movie (Tamil Dubbed)

Casualty cut free following three-car collision in Newtown Unthank

David Kushner – The Dichotomy [iTunes Plus M4A]

Three walk