Impact of VQRS variant set size on model

Hi,

We are evaluating the option to gather a set of 'good reference samples' to function as additional data in the VQRS step during WES analysis. We would like to do so, since we receive trio-based data, and three samples is typically not recommended for VQRS. As a measure of performance, we try to retrieve the Genome-in-a-bottle variants provided in Illumina's Platinum Call set, in the sample sample sequenced & called in-house. I was rather surprised to see that the sensitivity seems to go down by including more samples in the VQRS training. How can this be explained, and should we thus stick to a lower number of samples ?

Some more details:

Samples are of mixed ethnicity, and this cannot be split up due to insufficient samples.
All samples are prepped and sequenced using identical kits (SureSelect.V5 , HiSeq4000), but not in the same experiment
We follow best practice, but perform single sample variant calling, since we are looking for sporadic/de novo mutations.
Sensitivity is defined as 'Matching'/(matching + mismatch + not_called + low_quality + non_covered)
matching is defined as the exact same genotype, not just variant, in the PASS tranche.
90 VCF files were called once, and 10 random sets per VQRS-training-size were extracted from this set (+ the GIAB-VCF) to train the model.

=> the variability is observed by variants moving from 'matching' to 'low_quality', due to failure to PASS the vqsr filtering.

SNP VQRS command (followed by the -input file.vcf list)
java -Djava.io.tmpdir=/tmp -Xmx10g -jar /opt/NGS/binaries/gatk/GATK_3.5.0/GenomeAnalysisTK.jar -nt 1 -S LENIENT -T VariantRecalibrator -R /opt/NGS/References/hg19/samtools/0.1.19/hg19.fasta -mode SNP -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -recalFile '/home/wesdev/Validatie_Cellijn//VQSR.Models/model_files/SNP.3.samples.set_1.iter_1.vcf' -tranchesFile '/home/wesdev/Validatie_Cellijn//VQSR.Models/model_files/SNP.3.samples.set_1.iter_1.tranches' -rscriptFile '/home/wesdev/Validatie_Cellijn//VQSR.Models/Rscripts/SNP.3.samples.set_1.iter_1.R' -minNumBad 1000 -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /opt/NGS/References/hg19/gatk_bundle/hapmap_3.3.hg19.vcf -resource:omni,known=false,training=true,truth=false,prior=12.0 /opt/NGS/References/hg19/gatk_bundle/1000G_omni2.5.hg19.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /opt/NGS/References/hg19/gatk_bundle/dbsnp_137.hg19.vcf -resource:1000G,known=false,training=true,truth=false,prior=10.0 /opt/NGS/References/hg19/gatk_bundle/1000G_phase1.snps.high_confidence.hg19.vcf

INDEL VQRS command (followed by the -input file.vcf list)
java -Djava.io.tmpdir=/tmp -Xmx10g -jar /opt/NGS/binaries/gatk/GATK_3.5.0/GenomeAnalysisTK.jar -nt 1 -S LENIENT -T VariantRecalibrator -R /opt/NGS/References/hg19/samtools/0.1.19/hg19.fasta -mode INDEL -an QD -an MQRankSum -an ReadPosRankSum -an FS -recalFile '/home/wesdev/Validatie_Cellijn//VQSR.Models/model_files/INDEL.3.samples.set_1.iter_1.vcf' -tranchesFile '/home/wesdev/Validatie_Cellijn//VQSR.Models/model_files/INDEL.3.samples.set_1.iter_1.tranches' -rscriptFile '/home/wesdev/Validatie_Cellijn//VQSR.Models/Rscripts/INDEL.3.samples.set_1.iter_1.R' -minNumBad 1000 -mG 4 -resource:mills,known=false,training=true,truth=true,prior=12.0 /opt/NGS/References/hg19/gatk_bundle/Mills_and_1000G_gold_standard.indels.hg19.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /opt/NGS/References/hg19/gatk_bundle/dbsnp_137.hg19.vcf

Impact of VQRS variant set size on model

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List