Quantcast
Channel: haplotypecaller — GATK-Forum
Viewing all articles
Browse latest Browse all 1335

BaseQRankSum variations with interval size in HaplotypeCaller

$
0
0

Analyzing the same sample with and without queue, I noticed a variant being filtered out in one of the runs with VQSRTrancheSNP99.00to99.90 in the filter column.

In my debugging of the problem, I noticed that the size of the region in HaplotypeCaller can influence both BaseQRankSum and ReadPosRankSum greatly in the g.vcf file.

commands:
1)
java -Xmx8g -Djava.io.tmpdir=tmp -jar /com/extra/GATK/3.5/jar-bin/GenomeAnalysisTK.jar -T HaplotypeCaller -I BDD.sorted.markdup.realigned.recal.bam -R ucsc.hg19_chrY_PAR1_PAR2_masked.fasta -L chr5:171333106-177333146 --genotyping_mode DISCOVERY --dbsnp dbsnp_138.hg19.vcf -ERC GVCF -variant_index_type LINEAR -variant_index_parameter 128000 -o BDD.sorted.markdup.realigned.recal.HaplotypeCaller_gVCF_chr5.vcf.gz

2)
java -Xmx8g -Djava.io.tmpdir=tmp -jar /com/extra/GATK/3.5/jar-bin/GenomeAnalysisTK.jar -T HaplotypeCaller -I BDD.sorted.markdup.realigned.recal.bam -R ucsc.hg19_chrY_PAR1_PAR2_masked.fasta -L chr5:175333106-177333146 --genotyping_mode DISCOVERY --dbsnp dbsnp_138.hg19.vcf -ERC GVCF -variant_index_type LINEAR -variant_index_parameter 128000 -o BDD.sorted.markdup.realigned.recal.HaplotypeCaller_gVCF_chr5.vcf.gz

The results for the SNP in question in the g.vcf file:
1)
chr5 176333126 rs2292256 C T, 5817.77 . BaseQRankSum=0.389;ClippingRankSum=2.280;DB;DP=314;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-1.360;RAW_MQ=1130400.00;ReadPosRankSum=-1.733 GT:AD:DP:GQ:PGT:PID:PL:SB 0/1:154,160,0:314:99:0|1:176333126_C_T:5846,0,7455,6310,7937,14246:53,101,56,104

2)
chr5 176333126 rs2292256 C T, 5817.77 . BaseQRankSum=-0.254;ClippingRankSum=0.132;DB;DP=314;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-1.278;RAW_MQ=1130400.00;ReadPosRankSum=-1.679 GT:AD:DP:GQ:PGT:PID:PL:SB 0/1:154,160,0:314:99:0|1:176333126_C_T:5846,0,7455,6310,7937,14246:53,101,56,104

This is probably the cause of the SNP being filtered in one run (no-queue) and not the other (queue). This leaves me with the question of which is most correct.

But why are these values different?


Viewing all articles
Browse latest Browse all 1335

Trending Articles