Hi,
Just wondering what the possible reasons could be for Haplotype Caller (version: 3.5-0-g36282e4) to declare a reference genotype quality of 0 for positions where the read depth is relatively high, such as the following region:
SC1 1628 . T . . PASS DP=50;GC=33.33 GT:AD:DP:RGQ 0/0:50:50:99
SC1 1629 . T . . PASS DP=50;GC=38.1 GT:AD:DP:RGQ 0/0:50:50:99
SC1 1630 . T . . FAIL_RGQ DP=50;GC=38.1 GT:AD:DP:RGQ 0/0:45:50:5
SC1 1631 . C . . PASS DP=51;GC=33.33 GT:AD:DP:RGQ 0/0:51:51:99
SC1 1632 . A . . PASS DP=51;GC=33.33 GT:AD:DP:RGQ 0/0:51:51:96
In this instance the RGQ of the failed position above (at SC1:1630) was actually 5 (which I set as the threshold for filtering in this example), but I have plenty of instances where the read depth and resultant RGQ
are like:
SC1 1630 . T . . FAIL_RGQ DP=50;GC=38.1 GT:AD:DP:RGQ ./.:45:50:5
SC1 1640 . T . . FAIL_RGQ DP=48;GC=38.1 GT:AD:DP:RGQ ./.:34:48:0
SC1 1805 . T . . FAIL_RGQ DP=36;GC=33.33 GT:AD:DP:RGQ ./.:32:36:0
SC1 2046 . A . . FAIL_RGQ DP=37;GC=19.05 GT:AD:DP:RGQ ./.:33:37:2
SC1 2345 . A . . FAIL_RGQ DP=105;GC=23.81 GT:AD:DP:RGQ ./.:90:105:0
SC1 2352 . A . . FAIL_RGQ DP=116;GC=19.05 GT:AD:DP:RGQ ./.:103:116:0
SC1 2356 . C . . FAIL_RGQ DP=112;GC=23.81 GT:AD:DP:RGQ ./.:100:112:0
SC1 2359 . G . . FAIL_RGQ DP=111;GC=28.57 GT:AD:DP:RGQ ./.:99:111:0
It feels like something funny is going on. Should it be possible for RGQ to be so low with such high depth? Also, I thought the AD
format tag gave the count of unfiltered reads, whilst the format DP
tag gave the filtered read depth (i.e. reads HC finds informative). Therefore shouldn't the AD
count always be at least as high as the DP
count?