I try to filter both variant non-variant sites together. I see the only reasonable way to do it is to filter by the per-sample DP.
However, I noticed that substantial fraction of called sites (~10%) have missing value in DP field ( . instead of 0 or other value). Although many of sites with missing DP values indeed have low coverage with bad mapping and called ./., some sites have in my view good coverage. When I check the bam file (-bamout result) in IGV, I see many reads mapped to those sites with good quality (MQ=60). The genotype is usually called correctly, but for some unclear for me reason GATK doesn't report DP values. The AD values are usually 0,0 in such sites. Interestingly, the nearby sites have DP values reported.
When I check gVCF files, these sites usually have DP=0. Assuming coverage of 0 I could discard these sites, but I see in a bam file that it is not 0. So, I do not want to throw away 10% of my data.
I notice a tendency that such site is usually homozygot for ALT allele. I provide an example of such a site below. See the 1st sample in the position 13388742. (1/1:0,0:.:3:1|1:13388738_T_C:45,3,0)
I generated the data using HaplotypeCaller in GVCF mode and then GenotypeGVCFs.
Could you please tell me the reason why this is so?
VCF:
scaffold_1 13388742 . A G 8529.41 . AC=35;AF=0.833;AN=42;BaseQRankSum=-1.730e-01;ClippingRankSum=-6.940e-01;DP=247;ExcessHet=0.4083;FS=9.162;InbreedingCoeff=0.1415;MLEAC=37;MLEAF=0.881;MQ=38.83;MQRankSum=2.51;QD=32.26;ReadPosRankSum=1.56;SOR=0.043 GT:AD:DP:GQ:PGT:PID:PL 1/1:0,0:.:3:1|1:13388738_T_C:45,3,0 ./.:0,0:0 ./.:2,0:2 ./.:4,0:4 ./.:0,0:0 ./.:3,0:3 0/0:20,0:20:0:.:.:0,0,577 0/1:0,1:1:11:0|1:13388692_G_T:81,0,11 1/1:0,10:10:33:1|1:13388724_G_A:495,33,0 ./.:0,0:0 1/1:1,19:20:27:1|1:13388742_A_G:979,27,0 1/1:0,20:20:35:1|1:13388724_G_A:944,35,0 1/1:0,1:1:6:.:.:90,6,0 1/1:0,7:7:24:1|1:13388724_G_A:360,24,0 0/1:0,2:2:30:0|1:13388692_G_T:165,0,30 1/1:0,5:5:15:1|1:13388742_A_G:225,15,0 1/1:0,20:20:60:1|1:13388724_G_A:900,60,0 1/1:0,8:8:30:1|1:13388724_G_A:450,30,0 1/1:0,15:15:48:.:.:720,48,0 0/1:3,28:31:42:0|1:13388742_A_G:1167,0,42 ./.:3,0:3 1/1:0,1:1:3:1|1:13388687_C_T:45,3,0 1/1:0,2:2:12:1|1:13388692_G_T:180,12,0 ./.:4,0:4 ./.:6,0:6 1/1:0,6:6:18:1|1:13388724_G_A:270,18,0 ./.:4,0:4 1/1:0,14:14:45:1|1:13388724_G_A:675,45,0 1/1:0,3:3:9:1|1:13388692_G_T:135,9,0 0/0:21,0:21:0:.:.:0,0,533 1/1:0,14:14:45:1|1:13388724_G_A:655,45,0
gVCF:
scaffold_1 13388740 . C <NON_REF> . . END=13388741 GT:DP:GQ:MIN_DP:PL 0/0:4:12:4:0,12,162
scaffold_1 13388742 . A G,<NON_REF> 31.82 . DP=0;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;RAW_MQ=0.00 GT:GQ:PGT:PID:PL:SB 1/1:3:0|1:13388738_T_C:45,3,0,45,3,45:0,0,0,0
scaffold_1 13388743 . A <NON_REF> . . END=13388743 GT:DP:GQ:MIN_DP:PL 0/0:5:15:5:0,15,214
Screenshot of a BAM: