Quantcast
Channel: haplotypecaller — GATK-Forum
Viewing all articles
Browse latest Browse all 1335

All ALT fields are the same after GenomicsDBImport and GenotypeGVCFs

$
0
0
Hi GATK team,

I'm running GATK 4.1.3.0 to perform targeted single-cell genotyping, where each sample in the output VCF is a single cell (10-20k samples total per VCF). I've noticed something strange in the final VCF: for all cells genotyped as WT for a given variant, all ALT allele depths are the same.

A snippet of the VCF is provided below as an example. For example, in the first sample, all ALTs have a depth of 6. Furthermore, the DP is also always the sum of the REF and first ALT allele depths (suggesting all other ALTs should probably be 0). In the second site (chr1:115256518), there is a HET call that has the correct depths listed, so this seems to only be an issue with the 0/0 calls.

```
chr1 115256516 . A G,T,*,C 32261.26 . AC=29,8,1,2;AF=1.566e-03,4.319e-04,5.399e-05,1.080e-04;AN=18522;BaseQRankSum=0.282;DP=4450248;ExcessHet=3.1936;FS=0.000;InbreedingCoeff=0.1634;MLEAC=28,8,1,2;MLEAF=1.512e-03,4.319e-04,5.399e-05,1.080e-04;MQ=41.96;MQRankSum=0.00;QD=2.19;ReadPosRankSum=0.00;SOR=0.291 GT:AD:DP:GQ:PL 0/0:717,6,6,6,6:723:99:0,120,1800,120,1800,1800,120,1800,1800,1800,120,1800,1800,1800,1800 0/0:841,8,8,8,8:849:99:0,120,1800,120,1800,1800,120,1800,1800,1800,120,1800,1800,1800,1800 0/0:292,1,1,1,1:293:99:0,120,1800,120,1800,1800,120,1800,1800,1800,120,1800,1800,1800,18000/0:1034,9,9,9,9:1043:99:0,120,1800,120,1800,1800,120,1800,1800,1800,120,1800,1800,1800,1800 0/0:134,0,0,0,0:134:99:0,120,1800,120,1800,1800,120,1800,1800,1800,120,1800,1800,1800,1800 0/0:130,0,0,0,0:130:99:0,120,1800,120,1800,1800,120,1800,1800,1800,120,1800,1800,1800,1800
...
chr1 115256518 . T C,A 33446.22 . AC=19,11;AF=1.026e-03,5.939e-04;AN=18522;BaseQRankSum=-1.180e+00;DP=4451040;ExcessHet=3.1125;FS=0.000;InbreedingCoeff=0.1986;MLEAC=19,11;MLEAF=1.026e-03,5.939e-04;MQ=41.96;MQRankSum=0.066;QD=2.08;ReadPosRankSum=0.023;SOR=0.021 GT:AD:DP:GQ:PL 0/0:722,1,1:723:99:0,120,1800,120,1800,1800 0/0:839,10,10:849:99:0,120,1800,120,1800,1800 ... 0/1:692,85,1:778:99:965,0,26646,3041,26933,30335 0/0:658,4,4:662:99:0,120,1800,120,1800,1800
```

In terms of the pipeline, I'm using HaplotypeCaller, GenomicsDBImport, and GenotypeGVCFs all from GATK 4.1.3.0. I've observed the same behavior in 4.1.2.0 as well. Strangely, using CombineGVCFs (which is very slow and requires iterative merging) does not produce the repeated ALT depths.

Here are the exact commands used for each of the three programs (I'm copying from my Python script so the string formatting is there):

```
'gatk HaplotypeCaller -R %s -I %s -O %s -L %s ' \
'--emit-ref-confidence BP_RESOLUTION ' \
'--verbosity ERROR ' \
'--native-pair-hmm-threads 1 ' \
'--max-alternate-alleles 2 ' \
'--max-reads-per-alignment-start 0 ' \

'gatk --java-options "-Xmx4g" GenomicsDBImport ' \
'--genomicsdb-workspace-path %s ' \
'--batch-size 50 ' \
'--reader-threads 2 ' \
'--validate-sample-name-map true ' \
'-L %s ' \
'--sample-name-map %s

'gatk --java-options "-Xmx4g" GenotypeGVCFs ' \
'-V %s ' \
'-R %s ' \
'-L %s ' \
'-D %s ' \
'-O %s ' \
'--include-non-variant-sites'
```

I couldn't find any similar issues on the forum. I am fairly sure it's an issue with the GenomicsDB, given I have no issues when using CombineGVCFs instead (but, I could be wrong). Any ideas on what might be going on? Thanks!

Viewing all articles
Browse latest Browse all 1335

Trending Articles