There are 68 reads covering my SNP of interest:
samtools view mybam.bam 6:1611802-1611802 | wc -l
68
This matches what I see in IGV (after turning off all the filters in the preferences of course!).
Now I call with GATK:
java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R reference.fasta -I mybam.bam -L 6:1611802-1611802
and I get this output:
6 1611802 . G GGGC 1618.73 . AC=2;AF=1.00;AN=2;DP=58;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.32;QD=34.24;SOR=0.984 GT:AD:DP:GQ:PL 1/1:0,37:37:99:1656,111,0
Take a look at the last column. The depth (DP) at this position is 37. It's not 68 because GATK has filtered out 68-37 = 31 reads apparently. But now check out this output:
INFO 13:01:45,228 MicroScheduler - 44 reads were filtered out during the traversal out of approximately 275 total reads (16.00%)
INFO 13:01:45,228 MicroScheduler - -> 0 reads (0.00% of total) failing BadCigarFilter
INFO 13:01:45,228 MicroScheduler - -> 43 reads (15.64% of total) failing DuplicateReadFilter
INFO 13:01:45,228 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
INFO 13:01:45,229 MicroScheduler - -> 1 reads (0.36% of total) failing HCMappingQualityFilter
INFO 13:01:45,229 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
INFO 13:01:45,229 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
INFO 13:01:45,229 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter
INFO 13:01:45,229 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter
I do not understand this output. 275 total reads?
My expected report would be a total read number of 68 and those 31 filtered out reads being distributed among those filter classes (BadCigarFilter, DuplicateReadFilter, FailsVendorQualityCheckFilter, HCMappingQualityFilter, MalformedReadFilter, MappingQualityUnavailableFilter, NotPrimaryAlignmentFilter, UnmappedReadFilter).