I'm running into a HaplotypeCaller issue with the latest release (2.5-2) using
Novoalign input reads. Here's a small reproducible input file:
https://s3.amazonaws.com/chapmanb/gatk_hc_problem_cigar.bam
Running:
java -Xms750m -Xmx3g -jar GenomeAnalysisTK.jar -R GRCh37.fa -I
problem_cigar.bam -L 4:120371315-120371586 -T HaplotypeCaller -o out.vcf
--read_filter BadCigar -debug
Errors out with:
org.broadinstitute.sting.utils.exceptions.ReviewedStingException: START (0) >
(-1) STOP -- this should never happen, please check read:
HWI-ST1124:106:C15APACXX:1:1107:15450:87092 2/2 58b aligned read. (CIGAR: 38H4D58M)
Looking at the read, the CIGAR string appears to be tricking the BadCigar
filter, since it has a 0M
element between an insertion and deletion:
38M4I0M4D58M
This patch fixes the BadCigar filter by only considering CIGAR elements with
non-zero length:
https://gist.github.com/chapmanb/5568411
With this applied, the read will be properly filtered and HaplotypeCaller can
continue without a problem. Hope this helps, please let me know if any other
detail about the problem would be helpful.