We ran GATK 3.7 HaplotypeCaller upon a sample to get .gVCF file few months back. Recently we tested out the same sample with same parameters of GATK 3.7 HaplotypeCaller and found that there is difference in the DP,PL values for many variants when comparing the two output .GVCF files from these two runs.
The command line parameters used for both the runs:
java -Xmx32g -Djava.io.tmpdir=Temp/ -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R ref.fa -I sample.bam -nct 24 --dbsnp dbsnp138.vcf --genotyping_mode DISCOVERY --minPruning 2 -newQual -stand_call_conf 30 --emitRefConfidence GVCF -variant_index_type LINEAR -variant_index_parameter 128000 -L chr1 -G none -l INFO -log sample.log -o sample_chr1.g.vcf.gz
The sample difference extracted between both the files using the diff command :-
F1 chr1 resemble the line extracted from the .gVCF file generated few months back
F2 chr1 resemble the line extracted from the .gVCF file generated recently
Change 1 observed: DP, PL values different between two output .GVCF files from these two runs
F1 chr1 1510162 . A <NON_REF> . . END=1510162 GT:DP:GQ:MIN_DP:PL 0/0:46:12:46:0,12,1425
F2 chr1 1510162 . A <NON_REF> . . END=1510162 GT:DP:GQ:MIN_DP:PL 0/0:45:9:45:0,9,1380
F1 chr1 6941045 . C <NON_REF> . . END=6941080 GT:DP:GQ:MIN_DP:PL 0/0:14:0:7:0,0,139
F2 chr1 6941045 . C <NON_REF> . . END=6941080 GT:DP:GQ:MIN_DP:PL 0/0:15:0:7:0,0,139
F1 chr1 45683203 rs34100486 CTTTT C,<NON_REF> 177.60 . DB;MLEAC=1,0;MLEAF=0.500,0.00 GT:GQ:PL:SB 0/1:22:185,0,22,188,37,225:1,0,3,2
F2 chr1 45683203 rs34100486 CTTTT C,<NON_REF> 168.60 . DB;MLEAC=1,0;MLEAF=0.500,0.00 GT:GQ:PL:SB 0/1:22:176,0,22,179,37,215:1,0,3,2
Change 2 observed: 29 variants added in the recent run .gVCF output file which were not in the present in the previous run .gVCF output file
Below are the few sample varaints added to the new run .gVCF output file
F2 chr1 15357649 . G <NON_REF> . . END=15357649 GT:DP:GQ:MIN_DP:PL 0/0:41:94:41:0,94,1235
F2 chr1 15357650 . A <NON_REF> . . END=15357650 GT:DP:GQ:MIN_DP:PL 0/0:39:99:39:0,102,1284
Change 3 observed: 10 variants present in the previous run .gVCF output file which were not in the present in the recent run .gVCF output file
Below are the few sample varaints present in the previous run .gVCF output file
F1 chr1 9282514 . C CTCCCCCTCCTCCTTGTCTCCTCCTCCCTCTCCCCCT,<NON_REF> 274.01 . MLEAC=2,0;MLEAF=1.00,0.00 GT:GQ:PL:SB 1/1:20:288,20,0,289,21,290:0,0,0,3
F1 chr1 9282515 . T <NON_REF> . . END=9282515 GT:DP:GQ:MIN_DP:PL 0/0:37:0:37:0,0,820
F1 chr1 27014608 . T <NON_REF> . . END=27014608 GT:DP:GQ:MIN_DP:PL 0/0:35:91:35:0,91,1388**
Could you please explain why I get different results in two runs of HaplotypeCaller and what this change in values between the two output .gvcf files mean? Can this affect variant calling (Joint genotyping) that will be done at a later stage with all sample together?