Quantcast
Channel: haplotypecaller — GATK-Forum
Viewing all articles
Browse latest Browse all 1335

GATK3 HC bug?

$
0
0

Hey GATK Devs!

I'm writing to report some unexpected behavior on the part of GATK3.8 HC. I'm trying to use Illumina data to call SNPs and indels on a PacBio assembly and identify loci where assembly polishing has failed to correct the assembly. I was looking through the reads of a particular contig and identified a locus (tig00006168:59182) that GATK failed to call. According the the reads, the locus should have been called homozygous for a deletion at 30X depth. Looking at the gVCF I see it is called homozygous for the reference allele (while reporting 31 reads supporting the alternate allele), reports 0 for GQ and all PLs:

tig00006168     59180   .       C       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:31,0:31:15:0,15,225
tig00006168     59181   .       A       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:30,0:30:15:0,15,225
tig00006168     59182   .       C       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:1,31:32:0:0,0,0
tig00006168     59183   .       A       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:1,31:32:0:0,0,0
tig00006168     59184   .       A       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:1,31:32:0:0,0,0
tig00006168     59185   .       A       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:27,2:29:54:0,54,1005
tig00006168     59186   .       C       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  0/0:30,1:31:72:0,72,1080

The -bamout output for this run reports no variant-containing reads at this locus. However, if I include -L tig00006168:59172-59195 in the options, GATK calls the indel:

tig00006168     59182   .       CA      C       927.73  .       AC=2;AF=1.00;AN=2;DP=27;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=34.36;SOR=2.753        GT:AD:DP:GQ:PL  1/1:0,27:27:81:965,81,0

The tview on the -bamout for this latter run displays:

59141     59151     59161     59171     59181     59191     59201     59211     592
GGAAATGAAGGAGAAGAAAGTGTTTATCAGCCTCGTGGGCACAAACAGGAATGGGCTGCAGGTTGGTACCCCCAATCTCTNNN
      ..........................................................................
      .................................        .................................
      ....................................*.....................................
      ....................................*.....................................
      ...........................              ,,,,,,,,,,,,,,,,,,,,c,,,,,,,,,,,,
      ..................                       ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
      ....................................*.....................................
      ....................................*..............................
      ....................................*..............................
      ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
      ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
      ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,       ,,g,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
      ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
      ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
      ,,,,,,,,,,,,,,,,,,,,,,                    ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
      ,,,,,,,,,,,,,,,,,,,,g,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
      ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,   ,,,,,,,,,,,,,,,,,,,,,,,,,,,
      ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
      ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,t,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
      ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
      ,,,,,c,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,
      ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,t,,,,,,,,,,,,
      ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
      ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,    .....................
      ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
      ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
      ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
      ,,,,,,,,,,,,g,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
      ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,c,,,,,,,,,,,,
      ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
      ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
         ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
            ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
                 ,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
                 ,,,,,,,,,,,,,,,,,,,,,,,,,*,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

The mapping quality for all of these reads is over 30 and the base qualities in the 30+ range. In your experience, what might cause this odd behavior? I've tried GATK versions 3.2-2, 3.6-0, 3.7, and they exhibit the same behavior. My initial run log:

INFO  07:46:12,698 HelpFormatter - ---------------------------------------------------------------------------------------
INFO  07:46:12,701 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.8-0-ge9d806836, Compiled 2017/07/28 21:26:50
INFO  07:46:12,701 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO  07:46:12,701 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO  07:46:12,701 HelpFormatter - [Fri Oct 13 07:46:12 PDT 2017] Executing on Linux 2.6.32-696.3.2.el6.nersc.x86_64 amd64
INFO  07:46:12,701 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_31-b13
INFO  07:46:12,704 HelpFormatter - Program Args: -T HaplotypeCaller --standard_min_confidence_threshold_for_calling 0 -rf BadMate -R ./contigs.fasta -L
tig00006168 -I 10X.bam -mmq 25 -mbq 30 -o tig00006168.trg.vcf.gz -bamout tig00006168.trg.bam
INFO  07:46:12,714 HelpFormatter - Executing as bredeson@hostname on Linux 2.6.32-696.3.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM
1.8.0_31-b13.
INFO  07:46:12,714 HelpFormatter - Date/Time: 2017/10/13 07:46:12
INFO  07:46:12,714 HelpFormatter - ---------------------------------------------------------------------------------------
INFO  07:46:12,714 HelpFormatter - ---------------------------------------------------------------------------------------
ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory specified in jar:file:~bredeson/tools/bin/GATK/3.
8-0-ge9d80683/GenomeAnalysisTK.jar!/META-INF/log4j-provider.properties
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
INFO  07:46:12,851 GenomeAnalysisEngine - Deflater: JdkDeflater
INFO  07:46:12,851 GenomeAnalysisEngine - Inflater: JdkInflater
INFO  07:46:12,852 GenomeAnalysisEngine - Strictness is SILENT
INFO  07:46:15,647 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 500
INFO  07:46:15,654 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO  07:46:15,879 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.22
INFO  07:46:16,007 HCMappingQualityFilter - Filtering out reads with MAPQ < 25
INFO  07:46:18,047 IntervalUtils - Processing 106407 bp from intervals
INFO  07:46:18,157 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO  07:46:18,277 GenomeAnalysisEngine - Done preparing for traversal
INFO  07:46:18,277 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO  07:46:18,278 ProgressMeter -                 |      processed |    time |         per 1M |           |   total | remaining
INFO  07:46:18,278 ProgressMeter -        Location | active regions | elapsed | active regions | completed | runtime |   runtime
INFO  07:46:18,278 HaplotypeCaller - Disabling physical phasing, which is supported only for reference-model confidence output
INFO  07:46:18,325 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values.
WARN  07:46:18,325 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples.
INFO  07:46:18,326 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values.
INFO  07:46:18,674 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units
INFO  07:46:19,188 VectorLoglessPairHMM - Using OpenMP multi-threaded AVX-accelerated native PairHMM implementation
[INFO] Available threads: 40
[INFO] Requested threads: 1
[INFO] Using 1 threads
WARN  07:46:19,268 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper, not HaplotypeCaller
INFO  07:46:32,888 VectorLoglessPairHMM - Time spent in setup for JNI call : 0.012103603000000001
INFO  07:46:32,889 PairHMM - Total compute time in PairHMM computeLikelihoods() : 3.824669993
INFO  07:46:32,889 HaplotypeCaller - Ran local assembly on 82 active regions
INFO  07:46:33,175 ProgressMeter -            done         106407.0    14.0 s            2.3 m      100.0%    14.0 s       0.0 s
INFO  07:46:33,175 ProgressMeter - Total runtime 14.90 secs, 0.25 min, 0.00 hours
INFO  07:46:33,175 MicroScheduler - 106744 reads were filtered out during the traversal out of approximately 133359 total reads (80.04%)
INFO  07:46:33,176 MicroScheduler -   -> 0 reads (0.00% of total) failing BadCigarFilter
INFO  07:46:33,176 MicroScheduler -   -> 6304 reads (4.73% of total) failing BadMateFilter
INFO  07:46:33,176 MicroScheduler -   -> 0 reads (0.00% of total) failing DuplicateReadFilter
INFO  07:46:33,176 MicroScheduler -   -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
INFO  07:46:33,176 MicroScheduler -   -> 99743 reads (74.79% of total) failing HCMappingQualityFilter
INFO  07:46:33,177 MicroScheduler -   -> 0 reads (0.00% of total) failing MalformedReadFilter
INFO  07:46:33,188 MicroScheduler -   -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
INFO  07:46:33,188 MicroScheduler -   -> 697 reads (0.52% of total) failing NotPrimaryAlignmentFilter
INFO  07:46:33,188 MicroScheduler -   -> 0 reads (0.00% of total) failing UnmappedReadFilter
------------------------------------------------------------------------------------------
Done. There were 2 WARN messages, the first 2 are repeated below.
WARN  07:46:18,325 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples.
WARN  07:46:19,268 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper, not HaplotypeCaller
------------------------------------------------------------------------------------------

Viewing all articles
Browse latest Browse all 1335

Trending Articles