Hey GATK Devs!
I'm writing to report some unexpected behavior on the part of GATK3.8 HC. I'm trying to use Illumina data to call SNPs and indels on a PacBio assembly and identify loci where assembly polishing has failed to correct the assembly. I was looking through the reads of a particular contig and identified a locus (tig00006168:59182) that GATK failed to call. According the the reads, the locus should have been called homozygous for a deletion at 30X depth. Looking at the gVCF I see it is called homozygous for the reference allele (while reporting 31 reads supporting the alternate allele), reports 0 for GQ and all PLs:
tig00006168 59180 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:31,0:31:15:0,15,225
tig00006168 59181 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:30,0:30:15:0,15,225
tig00006168 59182 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:1,31:32:0:0,0,0
tig00006168 59183 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:1,31:32:0:0,0,0
tig00006168 59184 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:1,31:32:0:0,0,0
tig00006168 59185 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:27,2:29:54:0,54,1005
tig00006168 59186 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:30,1:31:72:0,72,1080
The -bamout
output for this run reports no variant-containing reads at this locus. However, if I include -L tig00006168:59172-59195
in the options, GATK calls the indel:
tig00006168 59182 . CA C 927.73 . AC=2;AF=1.00;AN=2;DP=27;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=34.36;SOR=2.753 GT:AD:DP:GQ:PL 1/1:0,27:27:81:965,81,0
The tview on the -bamout
for this latter run displays:
59141 59151 59161 59171 59181 59191 59201 59211 592
GGAAATGAAGGAGAAGAAAGTGTTTATCAGCCTCGTGGGCACAAACAGGAATGGGCTGCAGGTTGGTACCCCCAATCTCTNNN
..........................................................................
................................. .................................
....................................*.....................................
....................................*.....................................
........................... ,,,,,,,,,,,,,,,,,,,,c,,,,,,,,,,,,
.................. ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
....................................*.....................................
....................................*..............................
....................................*..............................
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,,g,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,g,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,t,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,c,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,t,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,, .....................
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,g,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,c,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
The mapping quality for all of these reads is over 30 and the base qualities in the 30+ range. In your experience, what might cause this odd behavior? I've tried GATK versions 3.2-2, 3.6-0, 3.7, and they exhibit the same behavior. My initial run log:
INFO 07:46:12,698 HelpFormatter - ---------------------------------------------------------------------------------------
INFO 07:46:12,701 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.8-0-ge9d806836, Compiled 2017/07/28 21:26:50
INFO 07:46:12,701 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 07:46:12,701 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO 07:46:12,701 HelpFormatter - [Fri Oct 13 07:46:12 PDT 2017] Executing on Linux 2.6.32-696.3.2.el6.nersc.x86_64 amd64
INFO 07:46:12,701 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_31-b13
INFO 07:46:12,704 HelpFormatter - Program Args: -T HaplotypeCaller --standard_min_confidence_threshold_for_calling 0 -rf BadMate -R ./contigs.fasta -L
tig00006168 -I 10X.bam -mmq 25 -mbq 30 -o tig00006168.trg.vcf.gz -bamout tig00006168.trg.bam
INFO 07:46:12,714 HelpFormatter - Executing as bredeson@hostname on Linux 2.6.32-696.3.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM
1.8.0_31-b13.
INFO 07:46:12,714 HelpFormatter - Date/Time: 2017/10/13 07:46:12
INFO 07:46:12,714 HelpFormatter - ---------------------------------------------------------------------------------------
INFO 07:46:12,714 HelpFormatter - ---------------------------------------------------------------------------------------
ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory specified in jar:file:~bredeson/tools/bin/GATK/3.
8-0-ge9d80683/GenomeAnalysisTK.jar!/META-INF/log4j-provider.properties
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
INFO 07:46:12,851 GenomeAnalysisEngine - Deflater: JdkDeflater
INFO 07:46:12,851 GenomeAnalysisEngine - Inflater: JdkInflater
INFO 07:46:12,852 GenomeAnalysisEngine - Strictness is SILENT
INFO 07:46:15,647 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 500
INFO 07:46:15,654 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 07:46:15,879 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.22
INFO 07:46:16,007 HCMappingQualityFilter - Filtering out reads with MAPQ < 25
INFO 07:46:18,047 IntervalUtils - Processing 106407 bp from intervals
INFO 07:46:18,157 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO 07:46:18,277 GenomeAnalysisEngine - Done preparing for traversal
INFO 07:46:18,277 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 07:46:18,278 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 07:46:18,278 ProgressMeter - Location | active regions | elapsed | active regions | completed | runtime | runtime
INFO 07:46:18,278 HaplotypeCaller - Disabling physical phasing, which is supported only for reference-model confidence output
INFO 07:46:18,325 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values.
WARN 07:46:18,325 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples.
INFO 07:46:18,326 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values.
INFO 07:46:18,674 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units
INFO 07:46:19,188 VectorLoglessPairHMM - Using OpenMP multi-threaded AVX-accelerated native PairHMM implementation
[INFO] Available threads: 40
[INFO] Requested threads: 1
[INFO] Using 1 threads
WARN 07:46:19,268 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper, not HaplotypeCaller
INFO 07:46:32,888 VectorLoglessPairHMM - Time spent in setup for JNI call : 0.012103603000000001
INFO 07:46:32,889 PairHMM - Total compute time in PairHMM computeLikelihoods() : 3.824669993
INFO 07:46:32,889 HaplotypeCaller - Ran local assembly on 82 active regions
INFO 07:46:33,175 ProgressMeter - done 106407.0 14.0 s 2.3 m 100.0% 14.0 s 0.0 s
INFO 07:46:33,175 ProgressMeter - Total runtime 14.90 secs, 0.25 min, 0.00 hours
INFO 07:46:33,175 MicroScheduler - 106744 reads were filtered out during the traversal out of approximately 133359 total reads (80.04%)
INFO 07:46:33,176 MicroScheduler - -> 0 reads (0.00% of total) failing BadCigarFilter
INFO 07:46:33,176 MicroScheduler - -> 6304 reads (4.73% of total) failing BadMateFilter
INFO 07:46:33,176 MicroScheduler - -> 0 reads (0.00% of total) failing DuplicateReadFilter
INFO 07:46:33,176 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
INFO 07:46:33,176 MicroScheduler - -> 99743 reads (74.79% of total) failing HCMappingQualityFilter
INFO 07:46:33,177 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
INFO 07:46:33,188 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
INFO 07:46:33,188 MicroScheduler - -> 697 reads (0.52% of total) failing NotPrimaryAlignmentFilter
INFO 07:46:33,188 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter
------------------------------------------------------------------------------------------
Done. There were 2 WARN messages, the first 2 are repeated below.
WARN 07:46:18,325 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples.
WARN 07:46:19,268 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper, not HaplotypeCaller
------------------------------------------------------------------------------------------