Quantcast
Channel: haplotypecaller — GATK-Forum
Viewing all 1335 articles
Browse latest View live

How to call SNP without confidence SNP ?

$
0
0

Hello!
I have WGS data of 100 samples. There's few people work on my species,so I didn't find confidence SNP set to use.
I follow the best-practices of GATK. And my question is how to get convinced SNP from my data ? For I see there is a --db snp in the tool HaplotypeCaller.
By the way, someone told me I can call variants by different software and take the same part as my known sites. Does it work?


rules for max_alternate_alleles in HaplotypeCaller

$
0
0

Hi,

I can't come to any clear conclusion how this parameter works. Help me, please. I worked on the same files with exact command but the max_alternate_alleles. In first command I put 1 for its arguments
(--max_alternate_alleles 1) and 2 in second. Output was different by number of 600 SNVs,

a) There are sites on which haplotype caller for second command changed SNV on the one with better scores than in first command.
eg.
CSB10A_v1_contig_682 232 ref.: G first: GT(90.75) second: GTT ( 135.73). Scores in brackets.

b) There are sites where unlike first command, second command didn't give any SNVs, because there was no mapped reads

c) This is not sure, because I can't track back what I think I saw: the opposite to a) - scores from second command were worse than those from first.

Could you explain me why?

Paul

OpenMP multi-threaded AVX-accelerated native PairHMM in HaplotypeCaller not supported

$
0
0

I'm unable to get a multithreaded instance of PairHMM to work in HaplotypeCaller with JDK 1.8 on my local machine (Intel 4770K 8-core i7 processor) running MacOS 10.12.6. I've tried both a pre-built version from the Docker hub as well as one that I built on my local machine, and in both cases I get the warning:
"NativeLibraryLoader - Unable to find native library: native/libgkl_pairhmm_omp.dylib

I've tried the "-pairHMM AVX_LOGLESS_CACHING_OMP" option, but I then get:
"A USER ERROR has occurred: Machine does not support OpenMP AVX PairHMM.
PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported"

I suspect this might be caused by having a version of clang that doesn't support OpenMP, but I'm not sure. I'm using Homebrew gcc and c++ compilers, and an OpenMP clang (http://openmp.llvm.org) to no avail. Or maybe Intel 4770K can't support OpenMP PairHMM?

Here's my command:
gatk --java-options "-Xmx20g -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" HaplotypeCaller \
-R /Volumes/HighSierra/Users/tschappe/Documents/P.nicotianae_assembly/ASM148301v1/GCA_001483015.1_ASM148301v1_genomic.fna \
-I /Volumes/HighSierra/Users/tschappe/Documents/P.nicotianae_assembly/race0_2_sorted.bam \
-O /Volumes/HighSierra/Users/tschappe/Documents/P.nicotianae_assembly/race0.g.vcf.gz \
-pairHMM AVX_LOGLESS_CACHING_OMP

Here's the entire error stack trace:
16:28:17.652 INFO NativeLibraryLoader - Loading libgkl_compression.dylib from jar:file:/Applications/gatk-4.0/gatk/build/libs/gatk-package-4.0.0.0-37-g1316033-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_compression.dylib
16:28:17.731 INFO HaplotypeCaller - ------------------------------------------------------------
16:28:17.731 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.0.0.0-37-g1316033-SNAPSHOT
16:28:17.731 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
16:28:17.731 INFO HaplotypeCaller - Executing as tschappe@Tylers-iMac.local on Mac OS X v10.12.6 x86_64
16:28:17.731 INFO HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_161-b12
16:28:17.731 INFO HaplotypeCaller - Start Date/Time: January 24, 2018 4:28:17 PM EST
16:28:17.731 INFO HaplotypeCaller - ------------------------------------------------------------
16:28:17.731 INFO HaplotypeCaller - ------------------------------------------------------------
16:28:17.732 INFO HaplotypeCaller - HTSJDK Version: 2.14.1
16:28:17.732 INFO HaplotypeCaller - Picard Version: 2.17.2
16:28:17.732 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 1
16:28:17.732 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:28:17.732 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:28:17.732 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:28:17.732 INFO HaplotypeCaller - Deflater: IntelDeflater
16:28:17.732 INFO HaplotypeCaller - Inflater: IntelInflater
16:28:17.732 INFO HaplotypeCaller - GCS max retries/reopens: 20
16:28:17.732 INFO HaplotypeCaller - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
16:28:17.732 INFO HaplotypeCaller - Initializing engine
16:28:18.287 INFO HaplotypeCaller - Done initializing engine
16:28:18.332 INFO HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
16:28:18.877 INFO NativeLibraryLoader - Loading libgkl_utils.dylib from jar:file:/Applications/gatk-4.0/gatk/build/libs/gatk-package-4.0.0.0-37-g1316033-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_utils.dylib
16:28:18.880 WARN NativeLibraryLoader - Unable to find native library: native/libgkl_pairhmm_omp.dylib
16:28:18.880 INFO HaplotypeCaller - Shutting down engine
[January 24, 2018 4:28:18 PM EST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=740294656


A USER ERROR has occurred: Machine does not support OpenMP AVX PairHMM.


org.broadinstitute.hellbender.exceptions.UserException$HardwareFeatureException: Machine does not support OpenMP AVX PairHMM.
at org.broadinstitute.hellbender.utils.pairhmm.VectorLoglessPairHMM.(VectorLoglessPairHMM.java:78)
at org.broadinstitute.hellbender.utils.pairhmm.PairHMM$Implementation.lambda$static$4(PairHMM.java:64)
at org.broadinstitute.hellbender.utils.pairhmm.PairHMM$Implementation.makeNewHMM(PairHMM.java:120)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.(PairHMMLikelihoodCalculationEngine.java:141)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerUtils.createLikelihoodCalculationEngine(AssemblyBasedCallerUtils.java:169)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.initialize(HaplotypeCallerEngine.java:191)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.(HaplotypeCallerEngine.java:160)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.(HaplotypeCallerEngine.java:151)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.onTraversalStart(HaplotypeCaller.java:197)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:891)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:136)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:152)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195)
at org.broadinstitute.hellbender.Main.main(Main.java:275)

Core dump with HaplotypeCaller using GATK 3.7

$
0
0

I'm trying to run HaplotypeCaller on a single RNAseq BAM file, but I keep getting a core dump. I've tried allocating more memory (up to 150 GB RAM) and reverting to single-thread mode, but HaplotypeCaller continues to fail. I've other tools in GATK on this sample on this machine, and haven't had this issue.

% java -jar GenomeAnalysisTK-3.7-0/GenomeAnalysisTK.jar -T HaplotypeCaller -R GRCh38_full_analysis_set_plus_decoy_hla.fa -I EC11.GRCh38_filtered_split_recal.bam -dontUseSoftClippedBases -stand_call_conf 20.0 -o EC11.GRCh38_filtered_split_recal.vcf -nct 16
Picked up _JAVA_OPTIONS: -Xmx64g -Djava.io.tmpdir=/gpfs/commons/home/hoffmanp-420/ALS/tmp.DpVNrl7ouV
INFO  15:33:39,894 HelpFormatter - ---------------------------------------------------------------------------------
INFO  15:33:39,896 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18
INFO  15:33:39,896 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO  15:33:39,896 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO  15:33:39,897 HelpFormatter - [Fri Jan 26 15:33:39 EST 2018] Executing on Linux 3.10.0-693.2.2.el7.x86_64 amd64
INFO  15:33:39,897 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14
INFO  15:33:39,900 HelpFormatter - Program Args: -T HaplotypeCaller -R /gpfs/commons/home/hoffmanp-420/ALS/GRCh38_full_analysis_set_plus_decoy_hla.fa -I /gpfs/commons/home/hoffmanp-420/ALS/Recalibrated/EC11.GRCh38_filtered_split_recal.bam -dontUseSoftClippedBases -stand_call_conf 20.0 -o /gpfs/commons/home/hoffmanp-420/ALS/RawVCF/EC11.GRCh38_filtered_split_recal.vcf -nct 16
INFO  15:33:39,904 HelpFormatter - Executing as hoffmanp-420@pe1cc2-0042.c.nygenome.org on Linux 3.10.0-693.2.2.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14.
INFO  15:33:39,904 HelpFormatter - Date/Time: 2018/01/26 15:33:39
INFO  15:33:39,904 HelpFormatter - ---------------------------------------------------------------------------------
INFO  15:33:39,905 HelpFormatter - ---------------------------------------------------------------------------------
INFO  15:33:39,922 GenomeAnalysisEngine - Strictness is SILENT
INFO  15:33:52,760 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 500
INFO  15:33:52,767 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO  15:33:52,937 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.17
INFO  15:33:53,231 HCMappingQualityFilter - Filtering out reads with MAPQ < 20
INFO  15:33:53,263 MicroScheduler - Running the GATK in parallel mode with 16 total threads, 16 CPU thread(s) for each of 1 data thread(s), of 40 processors available on this machine
INFO  15:33:54,300 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO  15:33:55,538 GenomeAnalysisEngine - Done preparing for traversal
INFO  15:33:55,538 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO  15:33:55,538 ProgressMeter -                 |      processed |    time |         per 1M |           |   total | remaining
INFO  15:33:55,539 ProgressMeter -        Location | active regions | elapsed | active regions | completed | runtime |   runtime
INFO  15:33:55,539 HaplotypeCaller - Disabling physical phasing, which is supported only for reference-model confidence output
INFO  15:33:55,583 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values.
WARN  15:33:55,584 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples.
INFO  15:33:55,584 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values.
INFO  15:33:55,781 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units
INFO  15:33:55,782 PairHMM - Performance profiling for PairHMM is disabled because the program is being run with multiple threads (-nct>1) option
Profiling is enabled only when running in single thread mode

Using AVX accelerated implementation of PairHMM
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007faa12753ce9, pid=31376, tid=140368447252224
#
# JRE version: Java(TM) SE Runtime Environment (8.0_45-b14) (build 1.8.0_45-b14)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.45-b02 mixed mode linux-amd64 )
# Problematic frame:
# C  [libVectorLoglessPairHMM4517799834894878366.so+0x1bce9]  LoadTimeInitializer::LoadTimeInitializer()+0x1669
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /gpfs/commons/home/hoffmanp-420/ALS/hs_err_pid31376.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
Aborted

I'm using Java 1.8.0_45 on CentOS 7.4.1708. My machine has 40 cores and 256 GB RAM.

default read filters in haplotypecaller in GATK 4.0.0.0

$
0
0

Hi,

Are these still the read filters applied in GATK 4.0.0.0 haplotypecaller (gvcf mode)?-

• HCMappingQualityFilter (Default 20)
• MalformedReadFilter
• BadCigarFilter
• UnmappedReadFilter
• NotPrimaryAlignmentFilter
• FailsVendorQualityCheckFilter
• DuplicateReadFilter
• MappingQualityUnavailableFilter

I don't remember where I got it from, it must be some old gatk document. But, is it still valid for GATK4?

Ploidy level in HaplotypeCaller in GATK 4.0

$
0
0

Hi,

Thanks for the new version of GATK (GATK4.0).

We have a pooling of 64 samples and the organism is diploid, we are using ploidy of 128 (64x2=128). earlier when I am using HaplotypeCaller for variant calling in older versions of GATK, I am getting the error not enough memory to run this program., so was unable to run this with HaplotypeCaller earlier. Now when I tried it with GATK 4.0 version I am not getting this error, but a warn message mentioned below

12:40:23.159 WARN HaplotypeCallerGenotypingEngine - Removed alt alleles where ploidy is 96 and original allele count is 3, whereas after trimming the allele count becomes 2. Alleles kept are:[T*, C]

The command line which we have used is below

java -jar -Xmx64g gatk-package-4.0.0.0-local.jar HaplotypeCaller -R tilling.fa -I C1_S1.sorted.bam -O C1_S1.vcf -stand-call-conf 20.0 -ploidy 96

Can you please help us what does the warn message means, whether the command and the options which I am using are right, or I need to include more options for efficient variant calling.

Thanks in advance.

Regards,
Prateek

Single-Sample Genotyping: Different Workflow?

$
0
0

Hi,
If I need to independently call and genotype a single sample, is there a different workflow or set of GATK tools and settings that I ought to use instead of using haplotypecaller to generate a GVCF and then using genotypegvcfs to genotype a "batch of 1"?

(In other words, is there another tool or setting that will go directly from BAM to VCF and give better or significantly different results than above.)

I'm aware of the many benefits of the GATK best practices workflow for singly-calling BAMs to GVCFs and then jointly genotyping the batches; however, at the moment I am doing some benchmarking for a project where a joint-calling pipeline may not be feasible and we may need to call each sample independently.

Thanks!
James

Is it useful to call LeftAlignIndels after IndelRealigner

$
0
0

Hi,

In my pipeline to find variants in bacterial genome, I am first calling IndelRealigner and then calling variants using HaplotypeCaller. As left aligning of indels seems to be an important step, should I need to call LeftAlignIndels after the IndelRealigner?

I understand that documentation for LeftAlignIndels mentions that it is not required if using sophisticated tools like HaplotypeCaller, but I may be calling variants using other tools such as FreeBayes (just to compare variants found using different tools).

Thanks for reading and any help.

Cheers,
Ambi.


Is GATK4 HaplotypeCaller in evaluation phase?

$
0
0

Hi GATK team,

Congratulations on the release! I just found this public method in FireCloud that notes that HaplotypeCaller in GATK4 should not be used for production use yet since it is still in evaluation phase. This post was last updated on January 9th, the day of GATK4 release. Is this statement true? Could you provide more details about HaplotypeCaller evaluation?

Thanks!

Forgot to add -ERC GVCF when using haplotypecaller

$
0
0

Hi,
I have used the haplotypecaller to call the variants for my each sample without -ERC GVCF. Thus, I can not use the GenotypeGVCFs to merge them together. What should I do to resolve this problem? Should I re-run them? There are lot of samples. Although I can merge them together using bcftools, the result do not contain 0/0 type, there are only ./. 0/1 1/1 three types. I want to get a final merged VCf file which contain 0/0, ./., 0/1, 1/1
Can anyone help me?

Cheers,
Jian

Confusion in using gVCF mode

$
0
0

Hi

I have problem in using HaplotypeCaller gVCF mode ( GATK4 best practices). Please let me know following problems:

1- Should we run gVCF even when we have one WES sample?

2- I have 3 WES samples, should I use gVCF --> Cosolidate --> GenotypeGVCF --> VCF or it is better to obtain VCF directly from HaplotypeCaller and ignore its next steps?

3- If I have 3-5 WES samples, is it better to run HaplotypeCaller with multiple input (bams) or separately?

Regards.

HaplotypeCaller and reads mapped to multiple locations

$
0
0

Dear GATK team,

I've been trying to use GATK to call SNPs from RNA-Seq data mapped to a transcriptome assembly. I used Bowtie2 for the read mapping. I apologize if the information is already posted, but it seemed hard to find out about this information, so I hoped to get some advice or pointed to the right place - How does the HaplotypeCaller handle reads mapped to multiple places? I used paired-end reads for read mapping.

Thank you very much for any feedback you might have.

Sincerely,

Xin

Detecting called Indels with low read support on both sides?

$
0
0

Hi,

I am mapping chimpanzee samples to the human reference hg19. I mappend the samples using the standard protocol (BWA mem, remove duplicates, indel realigner) and called them with GATK 3.7 Haplotype Caller. After all variant filtering (hard filter + remove duplicated and low mappability regions from external bed files), I found an interesting insertion in one of my samples:

In chr2:48033272 there is a deletion of this sequence TTTTTGTTTTAATTCCT . The human reference has GCA|TTTTTGTTTTAATTCCTT|TTTTGTTTTAATTCCTT|TG this sequence duplicated. This sample is called homozygous for the deletion.

A few bp after this, GATK calls an insertion:
chr2 48033352 . C CAACCGATGTTGCTTTTCTGTCCTAGCATTTTTGTTTTAATTCCTT 108.02 PASS

Long story short:

  • There are only 6 reads supporting this insertion.
  • Of them, only 3 have the full "GATK-ALT-insertion". All of these 3 have, at least, 1 bp more.
  • None of these reads have the 3' side of the reference. It should be: CTTTAACAGGAAGAGGTAC ins TGCAACATTTGATGGG
  • I lied. One of these reads does have the full sequence:
    TAACAGGAAGAGGTAC | AACCGATGTTGCTTTTCTGTCCTAGCATTTTTGTTTTAATTCCTTTGAGTTACTTCCTTATGCATATTTTACTTTAACAGGAAGAGGTAC | TGCAACATTTGATGGGACAGCAATAGCAAATGCAGTTGTTAAAGA

It is a duplication of the whole previous sequence, including the deletion 80bp upstream. I want to run functional analysis of the variants detected, and I am changing from a frameshift insertion to a non-frameshift insertion.

Ok, I have detected this wrong indel, but I am calling 1.9M Indels in this dataset, and 1.7M more in another one and I am worried about reporting strong functional annotation to erroneous variants.

Is there any method I can use to detect this kind of indel (not enough reads supporting both tips of the insertion)? Or I can only filter by QD?
My hard filter removes QUAL<50 and QD<2 . This particular variant has QUAL=108.02, QD=4.91 and Genotyping Quality=99

Any advice?

Thanks in advance,

Txema

Testing Intel Arria® 10 GX FPGA implementation of HaplotypeCaller (PairHMM)

$
0
0

Hi,
I am a researcher from Chinese of Academy of Sciences.
I am trying to test the FPGA implementation of the HaplotypeCaller (PairHMM) on GATK 3.8-0-ge9d806836, using a Intel Arria® 10 GX FPGA (Arria 10, 10AX115N2F40E2LG).
I found a very useful link in gatk forum (https://gatkforums.broadinstitute.org/gatk/discussion/10501/testing-fpga-implementation-of-haplotypecaller-pairhmm), which says "Currently supported cards are the Nallatech 385a and Inspur F10A cards". And I tired the Intel® FPGA PairHMM Accelerator for Genomics Kernel Library on the website of Intel (https://downloadcenter.intel.com/download/27342/Intel-FPGA-PairHMM-Accelerator-for-Genomics-Kernel-Library-GKL-).
However, from my tests it doesn't work for Intel Arria® 10 GX FPGA. Is there any plan to support Intel‘s own board of FPGA implementation?
Kind regards,
Gong

GATK v4.0.1.1 HaplotypeCaller

$
0
0

I am using GATK v4.0.1.1 HaplotypeCaller for variant analysis. (paired-end DNA sequenced data mapped to the reference using BWA mem).

The command I used;
“gatk HaplotypeCaller –R Reference.fna –I input.bam –O output.vcf”

It runs for a while (couple of seconds) but does not produce an output. No error message was given. Am I doing this right? Any help is appreciated.


WGS+WES combined discovery/genotyping

$
0
0

Hi GATK team,

Hope you had great holidays!

We're analyzing small families where some individual have been sequenced by WES (HiSeqX) and others by WES (HiSeq4000). Could you please advise on the best approach to variant discovery and genotyping for these sets. We prefer to avoid the difficult normalization of the different vcf representations of identical variants that results when the WES|WGS sets are analyzed separately.

Our best idea so far is to run HC over mostly overlapping intervals (eg GenCode exons) on all individual samples in both sets, then jointly genotype the mixed g.vcfs (GenotypeGVCFs) - accepting that there will be some ./. calls in each set.

Also, could VQSR cope with the mixed variant properties?

We noticed that @Geraldine_VdAuwera has advised against a similar idea earlier this year (http://gatkforums.broadinstitute.org/wdl/discussion/6834/about-gatk-joint-call), but that was more complex (WES+WGS+RNAseq) and of course you may have looked into this since then.

Thanks in advance for your thoughts and advice

Missing or Inconsistent call between single-sample and multi-sample SNP calling

$
0
0

Dear all,

I have generated a gVCF file using HaplotypeCaller (v3.7)and searched for a specific variant of interest which looks like below:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT    N417
 chr1   627033  .   C   <NON_REF>   .   .   END=627033  GT:DP:GQ:MIN_DP:PL  0/0:4:0:4:0,0,0

The same gVCF is used for genotype gVCF across multiple samples and the site looks like below (genotype is shown only for this sample):

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  N417
chr1    627033  .   C   T   101.69  .   AC=2;AF=0.011;AN=182;DP=500;ExcessHet=0.0127;FS=0.000;InbreedingCoeff=0.1696;MLEAC=2;MLEAF=0.011;MQ=51.17;QD=12.71;SOR=1.179    GT:AD:DP:GQ:PL  ./.:4,0:4

A bam out is generated which looks like below which supports an alternate allele "T".

The IGV or bamout result is supported by PCR. However, the call is not made by GATK. Could someone comment about this behaviour and best practices to rescue such variants?

HaplotypeCaller warnings DepthPerSampleHC

$
0
0

Hi I'm trying to do a multisample variant call using several bam files in the following cmd

/mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk HaplotypeCaller -R /mnt/fastdata/md1jale/reference/hs37d5.fa -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24811_1#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24150_1#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24144_2#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24712_6#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24811_2#1.bam -O /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/raw_variants.vcf

Using GATK jar /mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -jar /mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar HaplotypeCaller -R /mnt/fastdata/md1jale/reference/hs37d5.fa -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24811_1#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24150_1#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24144_2#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24712_6#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24811_2#1.bam -O /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/mshef7_wt_vs_ips_raw_variants.vcf
10:26:29.719 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
10:26:29.935 INFO HaplotypeCaller - ------------------------------------------------------------
10:26:29.935 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.0.1.0
10:26:29.935 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
10:26:29.935 INFO HaplotypeCaller - Executing as md1jale@sharc-node122.shef.ac.uk on Linux v3.10.0-693.11.6.el7.x86_64 amd64
10:26:29.936 INFO HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_102-b14
10:26:29.936 INFO HaplotypeCaller - Start Date/Time: 14 February 2018 10:26:29 GMT
10:26:29.936 INFO HaplotypeCaller - ------------------------------------------------------------
10:26:29.936 INFO HaplotypeCaller - ------------------------------------------------------------
10:26:29.936 INFO HaplotypeCaller - HTSJDK Version: 2.14.1
10:26:29.936 INFO HaplotypeCaller - Picard Version: 2.17.2
10:26:29.937 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 1
10:26:29.937 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:26:29.937 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:26:29.937 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:26:29.937 INFO HaplotypeCaller - Deflater: IntelDeflater
10:26:29.937 INFO HaplotypeCaller - Inflater: IntelInflater
10:26:29.937 INFO HaplotypeCaller - GCS max retries/reopens: 20
10:26:29.937 INFO HaplotypeCaller - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
10:26:29.937 INFO HaplotypeCaller - Initializing engine
10:26:30.520 INFO HaplotypeCaller - Done initializing engine
10:26:30.528 INFO HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
10:26:31.119 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
10:26:31.154 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
10:26:31.259 WARN IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
10:26:31.259 INFO IntelPairHmm - Available threads: 16
10:26:31.259 INFO IntelPairHmm - Requested threads: 4
10:26:31.259 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
10:26:31.298 INFO ProgressMeter - Starting traversal
10:26:31.298 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
10:26:33.832 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
10:26:33.865 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
10:26:33.880 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
10:26:33.911 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
10:26:34.733 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
10:26:41.497 INFO ProgressMeter - 1:15485 0.2 80 470.6

Despite having slight memory issues with running the above, the now command runs on providing large amount of memory, although i do get lots of WARN DepthPerSampleHC. Is this normal?

HaplotypeCaller may fail to detect variant with the same reads with a different composition.

$
0
0

I have experienced a variant detection issue with confusion. The png file attached is the result of exact same NextSeq experiment but the read extraction range is different.

NextSeq2_point.bam: bam is composed of the reads which cover chr16: 89100686 position only.

NextSeq2_region.bam: bam is composed of the reads which cover the region of chr16: 89100686 +-100bp.
On position chr16:89100686, I presume T>C should be detected, but HaplotypeCaller failed to detect the variant with NextSeq2_region.bam.

NextSeq2_point.vcf:
chr16 89100686 . T C,<NON_REF> 7397.77 . DP=199;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1.00,0.00;RAW_MQ=716400.00 GT:AD:DP:GQ:PL:SB 1/1:0,199,0:199:99:7426,599,0,7426,599,7426:0,0,155,44

NextSeq2_region.vcf:
chr16 89100686 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:0,199:199:0:0,0,0

What causes the difference and why?

--- GATK Version (Docker latest)
    Using GATK wrapper script /gatk/build/install/gatk/bin/gatk
    Running:
        /gatk/build/install/gatk/bin/gatk HaplotypeCaller --version
    Version:4.0.1.2
---

--- Command used
gatk HaplotypeCaller -I /temp/NextSeq2_region.bam -O /temp/NextSeq2_region.vcf -R /temp/genome.fa -L /temp/only16.bed --debug true --output-mode EMIT_ALL_SITES --all-site-pls true --dont-trim-active-regions true --emit-ref-confidence BP_RESOLUTION
---
--- Genome Version: hg38
--- bed
chr16   89100681    89101347    NM_174917.4_cds_2_0_chr16_89100682_f    0   +

If you need the bams and vcfs, I can post them here.

Seeing multiple calls for the same position

$
0
0

Hello,

I am running a trio through the best practices pipeline for 3.7 and finding an unexpected result. I am seeing some positions that are being called multiple times with different calls. First, as output from HaplotypeCaller, I am seeing these results for position 78790153:

In each sample, the position 78790153 is represented by at least two lines, it's own line and at least one deletion prior. Of particular concern is that in the bottom case, the call at this position varies depending on which line you look at. It is heterozygous for a deletion at one point and homozygous reference at the others.

After Joint Genotyping, the variants come through in the result with that same position appearing twice still. (Order of samples top to bottom above is order of samples listed below first, second, third)

But, a new concern arises after applying family priors:

Here we see the first sample has her call from the deletion beginning at 78790143 changed from reference to heterozygous with no supporting evidence. This is obviously a very challenging spot because of the repeat, but is it deliberate for a position to be called multiple times and with different called alleles?

Thanks,
Scott

Viewing all 1335 articles
Browse latest View live