Quantcast
Channel: haplotypecaller — GATK-Forum
Viewing all articles
Browse latest Browse all 1335

HaplotypeCaller: Alternate allele get called or not depending on -ip option

$
0
0

Hi, I'm currently analyzing some data (exome-seq) using HaplotypeCaller and get what seems to me an odd behaviour:
The problem is that I've got a position which is clearly bi-allelic in IGV and that is said to have only the reference allele in the gVCF I'm generating with HaplotypeCaller.

Here is the command line I used:

nohup java -jar PATH/GenomeAnalysisTK-3.4-46/GenomeAnalysisTK.jar \
-T HaplotypeCaller \
-R PATH/ucsc.hg19_noHaps.fasta \
-I PATH/JLCL254.realigned.recalibrated.bam \
-L PATH/merged.bed \
-ip 50 \
--emitRefConfidence GVCF \
--variant_index_type LINEAR \
--variant_index_parameter 128000 \
-o JLCL254.vcf 

Here is the gVCF line of the variant of interest:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT JLCL254

chr1 912049 . T NON_REF . . END=912049 GT:DP:GQ:MIN_DP:PL 0/0:68:0:68:0,0,0

The variant of interest is located 19bp away from the captured region but with "-ip 50" it should be detected.

To check what is really analyzed, I output the bamout for all analyzed regions (-L PATH/merged.bed, -ip 50) and saw that the location of the variant is not analyzed (If I'm correct: as there is no coverage, this is not an active region).

Then I forced the bamout at the location +/-20nt around my variant to check whether some reads with the alternate allele are still kept. I used:
-L chr1:912029-912069 \
-forceActive \
-disableOptimizations

Doing so I've being able to see that many reads with the alternate allele are indeed still kept. The gVCF file generated along with the bamout file contains now my variant of interest with the alternate allele:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT JLCL254

chr1 912049 rs9803103 T C,NON_REF 1235.77 . BaseQRankSum=-0.677;ClippingRankSum=-0.942;DB;DP=61;MLEAC=1,0;MLEAF=0.500,0.00;MQ=54.88;MQRankSum=-0.600;ReadPosRankSum=-0.195 GT:AD:DP:GQ:PL:SB 0/1:19,42,0:61:99:1264,0,448,1321,574,1895:3,16,6,36

Please see an IGV screenshot:

Tracks are (from top to bottom):
* the original bam file
* the bamout for all captured regions (known from the file -L PATH/merged.bed)
* the forced bamout (at the location of the variant i.e -L chr1:912029-912069)
* merged.bed is the file used with the -L option.

Finally, I tried to call variants changing the -ip option to 100 and got the alternate allele called.

Please note that:
If I manually add/subtract 50bp to the closest target region boundaries, I've got the same result as with -ip 50.
If I manually add/subtract 100bp to the closest target region boundaries, I've got the same result as with -ip 100.

I tried several versions of GATK (3.46, 3.7, 4.0.4.0) and always got the same results.

I may have miss something but so far I can't explain myself what's happening. Do you see any explanation for what I observe? Do you see any options I should use to overcome this?
Many thanks in advance for your help.

NB: java -version
openjdk version "1.8.0_171"
OpenJDK Runtime Environment (build 1.8.0_171-b10)
OpenJDK 64-Bit Server VM (build 25.171-b10, mixed mode)


Viewing all articles
Browse latest Browse all 1335

Trending Articles