We are trying to call variants for Influenza A virus sequenced by MiSeq using HaplotypeCaller following GATK best practices (GATK version 3.7). However, when checking in IGV the called variants with BAM file, we frequently identify snps that are missed by HaplotypeCaller at the beginning or the end of a segment. The missing ones are well supported by the reads, and are called by samtools and UnifiedGenotyper with high confidence.
As one example (showing below), there are three rows of called variants at the top, from top to bottom, called by UnifiedGenotyper, samtools, and HaplotypeCaller. The right most snp is called by first two tools but missed by HaplotypeCaller, although the support reads show consistent snp readouts.
Just to show that this snp is well supported by the reads, here is the vcf record reporting this snp in VCF generated by UnifiedGenotyper:
A-New_Jersey-NHRC_93408-2016-H3N2(KY078630)-HA 15 . A T 166598 . AC=1;AF=1.00;AN=1;DP=3970;Dels=0.00;FS=0.000;HaplotypeScore=26.7856;MLEAC=1;MLEAF=1.00;MQ=59.99;MQ0=0;QD=34.24;SOR=4.823 GT:AD:DP:GQ:PL 1:0,3969:3970:99:166628,0
A close check in the HaplotypeCaller generated BAM file for debugging, we noticed that the variant is consistently missing from the de novo generated Haplotypes.
There are also other cases of missing snps. The similarity is that they are always at the terminal of the segment, well supported by reads, and only HaplotypeCaller misses them. However, for some samples, similar variants at the terminal are called by HaplotypeCaller.
My question is following:
- is this a bug of HaplotypeCaller? If so, has it been fixed?
- if not a bug, is there a parameter of HaplotypeCaller that can be set to guarantee that it will not miss the good quality variants at the terminal?
Many thanks.