Quantcast
Channel: haplotypecaller — GATK-Forum
Viewing all articles
Browse latest Browse all 1335

How to best minimize variation between runs of HaplotypeCaller in GVCF mode?

$
0
0

I am using a combination of HaplotypeCaller local (non-spark), in GVCF mode, followed by GatherVcfs to merge them, and I get very different call results across runs. I would expect the probabilities/confidence values to change slightly, but not so much the number of calls. Is this normal?

I'm using the gatk from docker://broadinstitute/gatk:4.beta.6 . My BAM/BAI files pass validation.

I see other posts about results being non-deterministic. But I'm not passing any of the -nt or -nct flags in this case.

I'm splitting all my contigs (bed file) in roughly equal-sized chunks, and calling HaplotypeCaller, like so. The VCF file produced changes a lot if I do 8 chunks, vs 128. I'm not sure whether that makes things worse.

# chunk 000
java -jar /gatk/gatk.jar HaplotypeCaller -R ANN0859.bam --emitRefConfidence GVCF -L bed_chunk_000.bed -O ANN0859.bam_000.g.vcf -hets 0.010000
# chunk 001
java -jar /gatk/gatk.jar HaplotypeCaller -R ANN0859.bam --emitRefConfidence GVCF -L bed_chunk_001.bed -O ANN0859.bam_001.g.vcf -hets 0.010000
...

I merge them like so (passing all the chunks in order):

java -jar /gatk/gatk.jar GatherVcfs -I ANN0859.bam_000.g.vcf -I ANN0859.bam_001.g.vcf ...

The entire bed is sorted, and the chunks are not overlapping. I've made sure that I'm not losing any contigs when I split my bed file.

To provide an example difference for one of the chromosomes, I get the following calls (for 128 chunks) in the final output gVCF:

HanXRQChr00c0117        2497    .       G       <NON_REF>       .       .       END=2580        GT:DP:GQ:MIN_DP:PL      0/0:0:0:0:0,0,0
HanXRQChr00c0117        10708   .       G       <NON_REF>       .       .       END=25539       GT:DP:GQ:MIN_DP:PL      0/0:0:0:0:0,0,0
(EOF)

And if I divide the work in 8 (longer) chunks, that last section just explodes into 1960 different calls:

HanXRQChr00c0117        10708   .       G       <NON_REF>       .       .       END=14265       GT:DP:GQ:MIN_DP:PL      0/0:0:0:0:0,0,0
HanXRQChr00c0117        14266   .       C       <NON_REF>       .       .       END=14267       GT:DP:GQ:MIN_DP:PL      0/0:1:3:1:0,3,42
...
HanXRQChr00c0117        14309   .       T       C,<NON_REF>     0.13    .       DP=2;MLEAC=0,0;MLEAF=nan,nan;RAW_MQ=7200        GT:PGT:PID      ./.:0|1:14309_T_C
HanXRQChr00c0117        14310   .       T       <NON_REF>       .       .       END=14315       GT:DP:GQ:MIN_DP:PL      0/0:1:3:1:0,3,45
HanXRQChr00c0117        14316   .       T       C,<NON_REF>     0.13    .       DP=2;MLEAC=0,0;MLEAF=nan,nan;RAW_MQ=7200        GT:PGT:PID      ./.:0|1:14309_T_C
HanXRQChr00c0117        14317   .       T       <NON_REF>       .       .       END=14321       GT:DP:GQ:MIN_DP:PL      0/0:1:3:1:0,3,45
...
HanXRQChr00c0117        14358   .       T       <NON_REF>       .       .       END=14359       GT:DP:GQ:MIN_DP:PL      0/0:4:12:4:0,12,180
HanXRQChr00c0117        14360   .       A       G,<NON_REF>     30.02   .       DP=4;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.5,0;RAW_MQ=14400        GT:AD:DP:GQ:PL:SB       1/1:0,1,0:1:3:45,3,0,4
5,3,45:0,0,1,0
...
HanXRQChr00c0117        25479   .       T       <NON_REF>       .       .       END=25484       GT:DP:GQ:MIN_DP:PL      0/0:8:24:8:0,24,296
HanXRQChr00c0117        25485   .       T       <NON_REF>       .       .       END=25485       GT:DP:GQ:MIN_DP:PL      0/0:8:21:8:0,21,315
HanXRQChr00c0117        25486   .       T       <NON_REF>       .       .       END=25521       GT:DP:GQ:MIN_DP:PL      0/0:6:18:6:0,18,217
HanXRQChr00c0117        25522   .       A       <NON_REF>       .       .       END=25524       GT:DP:GQ:MIN_DP:PL      0/0:7:15:7:0,15,225
HanXRQChr00c0117        25525   .       T       <NON_REF>       .       .       END=25539       GT:DP:GQ:MIN_DP:PL      0/0:5:9:3:0,9,133
(EOF)

I thought at first that maybe the chunk boundaries were at play, but those contigs are in the middle of a chunk file.


Viewing all articles
Browse latest Browse all 1335

Trending Articles