WARN messages with Haplotype Caller

March 28, 2018, 12:36 pm

≫ Next: Using GATK: create a F0 SNP library and then genotype F2 sample using it

≪ Previous: Cromwell: dead letters encountered

Hi,
I began a run using the --emitRefConfidence GVCF command asking for a .g.vcf file to be output for a single sample. Which is running (yay)!
So far, it is still running and has thrown up several warning messages. Two of them I understand, but these two I am not sure about. Can anyone throw some light on these for me please?

WARN 18:42:38,322 AnnotationUtils - Annotation will not be calculated, genotype is not called
WARN 18:42:38,342 AnnotationUtils - Annotation will not be calculated, genotype is not called
WARN 18:42:38,358 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper, not HaplotypeCaller

Many thanks,
Jenni

↧

Using GATK: create a F0 SNP library and then genotype F2 sample using it

March 1, 2018, 5:39 am

≫ Next: HC Tag

≪ Previous: WARN messages with Haplotype Caller

Hello GATK community,

I would like your comments/suggestions for my strategy.

I have F0 samples with two different phenotype.
I have F2 samples with unknown phenotype.
I would like to create a library with the F0 genotypes and then genotype my F2 samples using the previously created library.

STRATEGY:
I already pre-processed BAM files (I have all raw data if required).

Create genotype library with F0 samples:

GATK HaplotypeCaller for both F0 phenotype samples : java -Xmx30g -jar GenomeAnalysisTK_3-8.jar -nct 16 -T HaplotypeCaller -R GENOME --emitRefConfidence GVCF -I INPUT.bam -o OUTPUT.g.vcf
Merge the results: java -Xmx16g -jar GenomeAnalysisTK_3-8.jar -nt 16 -T GenotypeGVCFs -R GENOME --variant F0Variant1.g.vcf --variant F0Variant2.g.vcf -o Results_Merge_F0.vcf
then i used a homemade script to select only position with homozygous genotype and different genotype between both F0 phenotype samples (like 1/1 for a F0 sample and 0/0 for the other one): Results_Merge_F0_filtered.vcf

Genotype F2 sample with the library:

GATK HaplotypeCaller : java -Xmx30g -jar GenomeAnalysisTK_3-8.jar -nct 16 -T HaplotypeCaller -R GENOME --emitRefConfidence GVCF -I INPUT.bam -o OUTPUT.g.vcf -L $4 Results_Merge_F0_filtered.vcf
then i used a homemade script to identify genotype related to one (or the other) F0 phenotype.

BUUUUUUT
At this last step i mostly got homozygous SNP for my F2 samples...
I should get around 25% phenotype1 -- 25% phenotype2 -- 50% phenotype 1/2
I miss something but I don't know where.

↧

HC Tag

March 29, 2018, 6:32 pm

≫ Next: Several Annotations not working in GATK Haplotype Caller

≪ Previous: Using GATK: create a F0 SNP library and then genotype F2 sample using it

Hello,
I'm looking at the HC tag that comes out of the --bamOutput option in HaplotypeCaller in the corresponding bam file that is produced. I've noticed that some real reads (i.e. ones not labeled as "ArtificialHaplotype") do not have a HC tag and are NA. What does this mean? That the read was not used and discarded or that no local re-assembly was performed? Many thanks.

↧

Several Annotations not working in GATK Haplotype Caller

January 19, 2016, 8:27 am

≫ Next: Does GATK HaplotypeCaller has resume analysis feature

≪ Previous: HC Tag

I am using Genotype Given Allele with Haplotype Caller
I am trying to explicitely request all annotations that the documentation says are compatible with the Haplotype caller (and that make sense for a single sample .. e.g. no hardy weinberg ..)

the following annotations all have "NA"
GCContent(GC) HomopolymerRun(Hrun) TandemRepeatAnnotator (STR RU RPA)
.. but are valid requests because I get no errors from GATK.

This is the command I ran (all on one line)

java -Xmx40g -jar /data5/bsi/bictools/alignment/gatk/3.4-46/GenomeAnalysisTK.jar -T HaplotypeCaller --input_file /data2/external_data/[...]/s115343.beauty/Paired_analysis/secondary/Paired_10192014/IGV_BAM/pair_EX167687/s_EX167687_DNA_Blood.igv-sorted.bam --alleles:vcf /data2/external_data/[...]m026645/s109575.ez/Sequencing_2016/OMNI.vcf --phone_home NO_ET --gatk_key /projects/bsi/bictools/apps/alignment/GenomeAnalysisTK/3.1-1/Hossain.Asif_mayo.edu.key --reference_sequence /data2/bsi/reference/sequence/human/ncbi/hg19/allchr.fa --minReadsPerAlignmentStart 1 --disableOptimizations --dontTrimActiveRegions --forceActive --out /data2/external_data/[...]m026645/s109575.ez/Sequencing_2016/EX167687.0.0375.chr22.vcf --logging_level INFO -L chr22 --downsample_to_fraction 0.0375 --downsampling_type BY_SAMPLE --genotyping_mode GENOTYPE_GIVEN_ALLELES --standard_min_confidence_threshold_for_calling 20.0 --standard_min_confidence_threshold_for_emitting 0.0 --annotateNDA --annotation BaseQualityRankSumTest --annotation ClippingRankSumTest --annotation Coverage --annotation FisherStrand --annotation GCContent --annotation HomopolymerRun --annotation LikelihoodRankSumTest --annotation MappingQualityRankSumTest --annotation NBaseCount --annotation QualByDepth --annotation RMSMappingQuality --annotation ReadPosRankSumTest --annotation StrandOddsRatio --annotation TandemRepeatAnnotator --annotation DepthPerAlleleBySample --annotation DepthPerSampleHC --annotation StrandAlleleCountsBySample --annotation StrandBiasBySample --excludeAnnotation HaplotypeScore --excludeAnnotation InbreedingCoeff

Log file is below( Notice "weird" WARNings about) "StrandBiasBySample annotation exists in input VCF header"..
which make no sense because the header is empty other than the barebone fields.

This is the barebone VCF
head /data2/external_data/[...]_m026645/s109575.ez/Sequencing_2016/OMNI.vcf

fileformat=VCFv4.2

CHROM POS ID REF ALT QUAL FILTER INFO

chr1 723918 rs144434834 G A 30 PASS .
chr1 729632 rs116720794 C T 30 PASS .
chr1 752566 rs3094315 G A 30 PASS .
chr1 752721 rs3131972 A G 30 PASS .
chr1 754063 rs12184312 G T 30 PASS .
chr1 757691 rs74045212 T C 30 PASS .
chr1 759036 rs114525117 G A 30 PASS .
chr1 761764 rs144708130 G A 30 PASS .

This is the output

INFO 10:03:06,080 HelpFormatter - ---------------------------------------------------------------------------------
INFO 10:03:06,082 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12
INFO 10:03:06,083 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 10:03:06,083 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 10:03:06,086 HelpFormatter - Program Args: -T HaplotypeCaller --input_file /data2/external_data/[...]/s115343.beauty/Paired_analysis/secondary/Paired_10192014/IGV_BAM/pair_EX167687/s_EX167687_DNA_Blood.igv-sorted.bam --alleles:vcf /data2/external_data/[...]m026645/s109575.ez/Sequencing_2016/OMNI.vcf --phone_home NO_ET --gatk_key /projects/bsi/bictools/apps/alignment/GenomeAnalysisTK/3.1-1/Hossain.Asif_mayo.edu.key --reference_sequence /data2/bsi/reference/sequence/human/ncbi/hg19/allchr.fa --minReadsPerAlignmentStart 1 --disableOptimizations --dontTrimActiveRegions --forceActive --out /data2/external_data/[...]m026645/s109575.ez/Sequencing_2016/EX167687.0.0375.chr22.vcf --logging_level INFO -L chr22 --downsample_to_fraction 0.0375 --downsampling_type BY_SAMPLE --genotyping_mode GENOTYPE_GIVEN_ALLELES --standard_min_confidence_threshold_for_calling 20.0 --standard_min_confidence_threshold_for_emitting 0.0 --annotateNDA --annotation BaseQualityRankSumTest --annotation ClippingRankSumTest --annotation Coverage --annotation FisherStrand --annotation GCContent --annotation HomopolymerRun --annotation LikelihoodRankSumTest --annotation MappingQualityRankSumTest --annotation NBaseCount --annotation QualByDepth --annotation RMSMappingQuality --annotation ReadPosRankSumTest --annotation StrandOddsRatio --annotation TandemRepeatAnnotator --annotation DepthPerAlleleBySample --annotation DepthPerSampleHC --annotation StrandAlleleCountsBySample --annotation StrandBiasBySample --excludeAnnotation HaplotypeScore --excludeAnnotation InbreedingCoeff
INFO 10:03:06,093 HelpFormatter - Executing as m037385@franklin04-213 on Linux 2.6.32-573.8.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_20-b26.
INFO 10:03:06,094 HelpFormatter - Date/Time: 2016/01/19 10:03:06
INFO 10:03:06,094 HelpFormatter - ---------------------------------------------------------------------------------
INFO 10:03:06,094 HelpFormatter - ---------------------------------------------------------------------------------
INFO 10:03:06,545 GenomeAnalysisEngine - Strictness is SILENT
INFO 10:03:06,657 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Fraction: 0.04
INFO 10:03:06,666 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 10:03:07,012 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.35
INFO 10:03:07,031 HCMappingQualityFilter - Filtering out reads with MAPQ < 20
INFO 10:03:07,170 IntervalUtils - Processing 51304566 bp from intervals
INFO 10:03:07,256 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO 10:03:07,595 GenomeAnalysisEngine - Done preparing for traversal
INFO 10:03:07,595 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 10:03:07,595 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 10:03:07,596 ProgressMeter - Location | active regions | elapsed | active regions | completed | runtime | runtime
INFO 10:03:07,596 HaplotypeCaller - Disabling physical phasing, which is supported only for reference-model confidence output
WARN 10:03:07,709 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
WARN 10:03:07,709 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
INFO 10:03:07,719 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units
INFO 10:03:37,599 ProgressMeter - chr22:5344011 0.0 30.0 s 49.6 w 10.4% 4.8 m 4.3 m
INFO 10:04:07,600 ProgressMeter - chr22:11875880 0.0 60.0 s 99.2 w 23.1% 4.3 m 3.3 m
Using AVX accelerated implementation of PairHMM
INFO 10:04:29,924 VectorLoglessPairHMM - libVectorLoglessPairHMM unpacked successfully from GATK jar file
INFO 10:04:29,925 VectorLoglessPairHMM - Using vectorized implementation of PairHMM
WARN 10:04:29,938 AnnotationUtils - Annotation will not be calculated, genotype is not called
WARN 10:04:29,938 AnnotationUtils - Annotation will not be calculated, genotype is not called
WARN 10:04:29,939 AnnotationUtils - Annotation will not be calculated, genotype is not called
INFO 10:04:37,601 ProgressMeter - chr22:17412465 0.0 90.0 s 148.8 w 33.9% 4.4 m 2.9 m
INFO 10:05:07,602 ProgressMeter - chr22:18643131 0.0 120.0 s 198.4 w 36.3% 5.5 m 3.5 m
INFO 10:05:37,603 ProgressMeter - chr22:20133744 0.0 2.5 m 248.0 w 39.2% 6.4 m 3.9 m
INFO 10:06:07,604 ProgressMeter - chr22:22062452 0.0 3.0 m 297.6 w 43.0% 7.0 m 4.0 m
INFO 10:06:37,605 ProgressMeter - chr22:23818297 0.0 3.5 m 347.2 w 46.4% 7.5 m 4.0 m
INFO 10:07:07,606 ProgressMeter - chr22:25491290 0.0 4.0 m 396.8 w 49.7% 8.1 m 4.1 m
INFO 10:07:37,607 ProgressMeter - chr22:27044271 0.0 4.5 m 446.4 w 52.7% 8.5 m 4.0 m
INFO 10:08:07,608 ProgressMeter - chr22:28494980 0.0 5.0 m 496.1 w 55.5% 9.0 m 4.0 m
INFO 10:08:47,609 ProgressMeter - chr22:30866786 0.0 5.7 m 562.2 w 60.2% 9.4 m 3.8 m
INFO 10:09:27,610 ProgressMeter - chr22:32908950 0.0 6.3 m 628.3 w 64.1% 9.9 m 3.5 m
INFO 10:09:57,610 ProgressMeter - chr22:34451306 0.0 6.8 m 677.9 w 67.2% 10.2 m 3.3 m
INFO 10:10:27,611 ProgressMeter - chr22:36013343 0.0 7.3 m 727.5 w 70.2% 10.4 m 3.1 m
INFO 10:10:57,613 ProgressMeter - chr22:37387478 0.0 7.8 m 777.1 w 72.9% 10.7 m 2.9 m
INFO 10:11:27,614 ProgressMeter - chr22:38534891 0.0 8.3 m 826.8 w 75.1% 11.1 m 2.8 m
INFO 10:11:57,615 ProgressMeter - chr22:39910054 0.0 8.8 m 876.4 w 77.8% 11.4 m 2.5 m
INFO 10:12:27,616 ProgressMeter - chr22:41738463 0.0 9.3 m 926.0 w 81.4% 11.5 m 2.1 m
INFO 10:12:57,617 ProgressMeter - chr22:43113306 0.0 9.8 m 975.6 w 84.0% 11.7 m 112.0 s
INFO 10:13:27,618 ProgressMeter - chr22:44456937 0.0 10.3 m 1025.2 w 86.7% 11.9 m 95.0 s
INFO 10:13:57,619 ProgressMeter - chr22:45448656 0.0 10.8 m 1074.8 w 88.6% 12.2 m 83.0 s
INFO 10:14:27,620 ProgressMeter - chr22:46689073 0.0 11.3 m 1124.4 w 91.0% 12.5 m 67.0 s
INFO 10:14:57,621 ProgressMeter - chr22:48062438 0.0 11.8 m 1174.0 w 93.7% 12.6 m 47.0 s
INFO 10:15:27,622 ProgressMeter - chr22:49363910 0.0 12.3 m 1223.6 w 96.2% 12.8 m 29.0 s
INFO 10:15:57,623 ProgressMeter - chr22:50688233 0.0 12.8 m 1273.2 w 98.8% 13.0 m 9.0 s
INFO 10:16:12,379 VectorLoglessPairHMM - Time spent in setup for JNI call : 0.061128124000000006
INFO 10:16:12,379 PairHMM - Total compute time in PairHMM computeLikelihoods() : 22.846350295
INFO 10:16:12,380 HaplotypeCaller - Ran local assembly on 25679 active regions
INFO 10:16:12,434 ProgressMeter - done 5.1304566E7 13.1 m 15.0 s 100.0% 13.1 m 0.0 s
INFO 10:16:12,435 ProgressMeter - Total runtime 784.84 secs, 13.08 min, 0.22 hours
INFO 10:16:12,435 MicroScheduler - 727347 reads were filtered out during the traversal out of approximately 4410423 total reads (16.49%)
INFO 10:16:12,435 MicroScheduler - -> 2 reads (0.00% of total) failing BadCigarFilter
INFO 10:16:12,436 MicroScheduler - -> 669763 reads (15.19% of total) failing DuplicateReadFilter
INFO 10:16:12,436 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
INFO 10:16:12,436 MicroScheduler - -> 57582 reads (1.31% of total) failing HCMappingQualityFilter
INFO 10:16:12,437 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
INFO 10:16:12,437 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
INFO 10:16:12,437 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter
INFO 10:16:12,438 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter

↧

Does GATK HaplotypeCaller has resume analysis feature

March 26, 2018, 12:45 pm

≫ Next: GATK3.8 vs GATK4 HaplotypeCaller

≪ Previous: Several Annotations not working in GATK Haplotype Caller

Hello there,

I am calling variants on 800 exome samples using Haplotypercaller for some reasons the caller stopped the analysis on certain location on chromosome 5 (after 5 weeks and i have 10 weeks to go). The error is below (one of the samples was malformed)
_INFO 02:04:51,540 ProgressMeter - chr5:180670788 1.05846818873E11 4.5 w 25.0 s 31.7% 14.1 w 9.6 w
INFO 02:06:01,736 ProgressMeter - chr5:180687847 1.05853600709E11 4.5 w 25.0 s 31.7% 14.1 w 9.6 w
INFO 02:07:01,737 ProgressMeter - chr5:180687847 1.05853600709E11 4.5 w 25.0 s 31.7% 14.1 w 9.6 w
INFO 02:08:11,738 ProgressMeter - chr5:180687847 1.05853600709E11 4.5 w 25.0 s 31.7% 14.1 w 9.6 w
INFO 02:09:11,739 ProgressMeter - chr5:180687847 1.05853600709E11 4.5 w 25.0 s 31.7% 14.1 w 9.6 w
INFO 02:10:11,740 ProgressMeter - chr5:180687847 1.05853600709E11 4.5 w 25.0 s 31.7% 14.1 w 9.6 w
INFO 02:11:11,741 ProgressMeter - chr5:180687847 1.05853600709E11 4.5 w 25.0 s 31.7% 14.1 w 9.6 w

ERROR ------------------------------------------------------------------------------------------

ERROR A USER ERROR has occurred (version 3.8-1-0-gf15c1c3ef):

ERROR

ERROR This means that one or more arguments or inputs in your command are incorrect.

ERROR The error message below tells you what is the problem.

ERROR

ERROR If the problem is an invalid argument, please check the online documentation guide

ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.

ERROR

ERROR Visit our website and forum for extensive documentation and answers to

ERROR commonly asked questions https://software.broadinstitute.org/gatk

ERROR

ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.

ERROR

ERROR MESSAGE: File /media/daruma/sea/sample483.fastq.gz.mdup.realigned.fixed.recal.bai is malformed: Premature end-of-file while reading BAM index file sample483.fastq.gz.mdup.realigned.fixed.recal.bai. It's likely that this file is truncated or corrupt -- Please try re-indexing the corresponding BAM file.

_
I will re analysis the sample and want to resume the analysis to speed up the process i was wondering if Haplotypecaller has resume function. If it doesnt what is the best way to work around the issue?
- shall i modify the exome target file ? if yes, shall i start from the begining of chroomsome 5 or find the closer position to where the analysis stopped ?

why not including resume feature by feeding the analysis VCF file where it was stopped?
would like to hear other suggestions that was not mentioned

Thanks in advance

↧

GATK3.8 vs GATK4 HaplotypeCaller

March 28, 2018, 1:03 am

≫ Next: Best strategy to "fix" the Haplotype Caller - GenotypeGVCF "missing DP field" bug??

≪ Previous: Does GATK HaplotypeCaller has resume analysis feature

Hello, maybe I'm asking a naive question and maybe it has been answered somewhere else, but as the title states are there differences in the algorithm of the HaplotypeCaller between GATK3.8 release and GATK4?
I'm calling variants in two exomes, using the same bam files as input of the variant caller, and I tried to compare variants at the end of the pipeline. I found that GATK4 calls a slightly higher number of variants than GATK3.8 and that this one in the final step step (after filtering and selecting variants) retains variants filtered away from the GATK4 HC, is this expected or do I have to go trough my pipeline and check for errors?
Thank you

↧

Best strategy to "fix" the Haplotype Caller - GenotypeGVCF "missing DP field" bug??

April 7, 2016, 9:31 am

≫ Next: In which step exactly the allele depth / frequency is used during variant calling?

≪ Previous: GATK3.8 vs GATK4 HaplotypeCaller

Hi,

I've run into the (already reported http://gatkforums.broadinstitute.org/dsde/discussion/5598/missing-depth-dp-after-haplotypecaller ) bug of the missing DP format field in my callings.

I've run the following (relevant) commands:

Haplotype Caller -> Generate GVCF:

    java -Xmx${xmx} ${gct} -Djava.io.tmpdir=${NEWTMPDIR} -jar ${gatkpath}/GenomeAnalysisTK.jar \
       -T HaplotypeCaller \
       -R ${ref} \
       -I ${NEWTMPDIR}/${prefix}.realigned.fixed.recal.bam \
       -L ${reg} \
       -ERC GVCF \
       -nct ${nct} \
       --genotyping_mode DISCOVERY \
       -stand_emit_conf 10 \
       -stand_call_conf 30  \
       -o ${prefix}.raw_variants.annotated.g.vcf \
       -A QualByDepth -A RMSMappingQuality -A MappingQualityRankSumTest -A ReadPosRankSumTest -A FisherStrand -A StrandOddsRatio -A Coverage

That generates GVCF files that DO HAVE the DP field for all reference positions, but DO NOT HAVE the DP format field for any called variant (but still keep the DP in the INFO field):

18      11255   .       T       <NON_REF>       .       .       END=11256       GT:DP:GQ:MIN_DP:PL      0/0:18:48:18:0,48,720
18      11257   .       C       G,<NON_REF>     229.77  .       BaseQRankSum=1.999;DP=20;MLEAC=1,0;MLEAF=0.500,0.00;MQ=60.00;MQRankSum=-1.377;ReadPosRankSum=0.489      GT:AD:GQ:PL:SB  0/1:10,8,0:99:258,0,308,288
18      11258   .       G       <NON_REF>       .       .       END=11260       GT:DP:GQ:MIN_DP:PL      0/0:17:48:16:0,48,530

Later, I ran Genotype GVCF joining all the samples with the following command:

java -Xmx${xmx} ${gct} -Djava.io.tmpdir=${NEWTMPDIR} -jar ${gatkpath}/GenomeAnalysisTK.jar \
   -T GenotypeGVCFs \
   -R ${ref} \
   -L ${pos} \
   -o ${prefix}.raw_variants.annotated.vcf \
   --variant ${variant} [...]

This generated vcf files where the DP field is present in the format description, it IS present in the Homozygous REF samples, but IS MISSING in any Heterozygous or HomoALT samples.

22  17280388    .   T   C   18459.8 PASS    AC=34;AF=0.340;AN=100;BaseQRankSum=-2.179e+00;DP=1593;FS=2.526;InbreedingCoeff=0.0196;MLEAC=34;MLEAF=0.340;MQ=60.00;MQRankSum=0.196;QD=19.76;ReadPosRankSum=-9.400e-02;SOR=0.523    GT:AD:DP:GQ:PL  0/0:29,0:29:81:0,81,1118    0/1:20,22:.:99:688,0,682    1/1:0,27:.:81:1018,81,0 0/0:22,0:22:60:0,60,869 0/1:20,10:.:99:286,0,664    0/1:11,17:.:99:532,0,330    0/1:14,14:.:99:431,0,458    0/0:28,0:28:81:0,81,1092    0/0:35,0:35:99:0,99,1326    0/1:14,20:.:99:631,0,453    0/1:13,16:.:99:511,0,423    0/1:38,29:.:99:845,0,1231   0/1:20,10:.:99:282,0,671    0/0:22,0:22:63:0,63,837 0/1:8,15:.:99:497,0,248 0/0:32,0:32:90:0,90,1350    0/1:12,12:.:99:378,0,391    0/1:14,26:.:99:865,0,433    0/0:37,0:37:99:0,105,1406   0/0:44,0:44:99:0,120,1800   0/0:24,0:24:72:0,72,877 0/0:30,0:30:84:0,84,1250    0/0:31,0:31:90:0,90,1350    0/1:15,25:.:99:827,0,462    0/0:35,0:35:99:0,99,1445    0/0:29,0:29:72:0,72,1089    1/1:0,32:.:96:1164,96,0 0/0:21,0:21:63:0,63,809 0/1:21,15:.:99:450,0,718    1/1:0,40:.:99:1539,120,0    0/0:20,0:20:60:0,60,765 0/1:11,9:.:99:293,0,381 1/1:0,35:.:99:1306,105,0    0/1:18,14:.:99:428,0,606    0/0:32,0:32:90:0,90,1158    0/1:24,22:.:99:652,0,816    0/0:20,0:20:60:0,60,740 1/1:0,30:.:90:1120,90,0 0/1:15,13:.:99:415,0,501    0/0:31,0:31:90:0,90,1350    0/1:15,18:.:99:570,0,480    0/1:22,13:.:99:384,0,742    0/1:19,11:.:99:318,0,632    0/0:28,0:28:75:0,75,1125    0/0:20,0:20:60:0,60,785 1/1:0,27:.:81:1030,81,0 0/0:30,0:30:90:0,90,1108    0/1:16,16:.:99:479,0,493    0/1:14,22:.:99:745,0,439    0/0:31,0:31:90:0,90,1252
22  17280822    .   G   A   5491.56 PASS    AC=8;AF=0.080;AN=100;BaseQRankSum=1.21;DP=1651;FS=0.000;InbreedingCoeff=-0.0870;MLEAC=8;MLEAF=0.080;MQ=60.00;MQRankSum=0.453;QD=17.89;ReadPosRankSum=-1.380e-01;SOR=0.695   GT:AD:DP:GQ:PL  0/0:27,0:27:72:0,72,1080    0/0:34,0:34:90:0,90,1350    0/1:15,16:.:99:528,0,491    0/0:27,0:27:60:0,60,900 0/1:15,22:.:99:699,0,453    0/0:32,0:32:90:0,90,1350    0/0:37,0:37:99:0,99,1485    0/0:31,0:31:87:0,87,1305    0/0:40,0:40:99:0,108,1620   0/1:20,9:.:99:258,0,652 0/0:26,0:26:72:0,72,954 0/1:16,29:.:99:943,0,476    0/0:27,0:27:69:0,69,1035    0/0:19,0:19:48:0,48,720 0/0:32,0:32:81:0,81,1215    0/0:36,0:36:99:0,99,1435    0/0:34,0:34:99:0,99,1299    0/0:35,0:35:99:0,102,1339   0/0:38,0:38:99:0,102,1520   0/0:36,0:36:99:0,99,1476    0/0:31,0:31:81:0,81,1215    0/0:31,0:31:75:0,75,1125    0/0:35,0:35:99:0,99,1485    0/0:37,0:37:99:0,99,1485    0/0:35,0:35:90:0,90,1350    0/0:20,0:20:28:0,28,708 0/1:16,22:.:99:733,0,474    0/0:32,0:32:90:0,90,1350    0/0:35,0:35:99:0,99,1467    0/1:27,36:.:99:1169,0,831   0/0:28,0:28:75:0,75,1125    0/0:36,0:36:81:0,81,1215    0/0:35,0:35:90:0,90,1350    0/0:28,0:28:72:0,72,1080    0/0:31,0:31:81:0,81,1215    0/0:37,0:37:99:0,99,1485    0/0:31,0:31:84:0,84,1260    0/0:39,0:39:99:0,101,1575   0/0:37,0:37:96:0,96,1440    0/0:34,0:34:99:0,99,1269    0/0:30,0:30:81:0,81,1215    0/0:36,0:36:99:0,99,1485    0/1:17,17:.:99:567,0,530    0/0:26,0:26:72:0,72,1008    0/0:18,0:18:45:0,45,675 0/0:33,0:33:84:0,84,1260    0/0:25,0:25:61:0,61,877 0/1:9,21:.:99:706,0,243 0/0:35,0:35:81:0,81,1215    0/0:35,0:35:99:0,99,1485

I've just discovered this issue, and I need to run an analysis trying on the differential depth of coverage in different regions, and if there is a DP bias between called/not-called samples.

I have thousands of files and I've spent almost 1 year generating all these callings, so redoing the callings is not an option.

What would be the best/fastest strategy to either fix my final vcfs with the DP data present in all intermediate gvcf files (preferably) or, at least, extracting this data for all snps and samples?

Thanks in advance,

Txema

PS: Recalling the individual samples from bamfiles is not an option. Fixing the individual gvcfs and redoing the joint GenotypeGVCFs could be.

↧

In which step exactly the allele depth / frequency is used during variant calling?

April 5, 2018, 8:47 am

≫ Next: Memory error when using scatter-gather (haplotypecaller + GenotypeGVCFs) on 100 WES samples?

≪ Previous: Best strategy to "fix" the Haplotype Caller - GenotypeGVCF "missing DP field" bug??

I was thinking of how allele depth / frequency affect variant calling. I know that the HC is using Bayes to call variant. The question is how is allele depth / frequency information used during the variant calling?

Based on Bayes: The most possible variant allele is: P(G|D) = arg max P(G) * P(D|G). I initial thought that P(D|G) is the allele frequency of data. But feel something is not right.

Could you please share some comment? Thanks

↧

Memory error when using scatter-gather (haplotypecaller + GenotypeGVCFs) on 100 WES samples?

March 28, 2018, 7:46 pm

≫ Next: Confusion in using gVCF mode

≪ Previous: In which step exactly the allele depth / frequency is used during variant calling?

Dear GATK team,
I have prepared 100 WES processed BAMs and try to call variants and output them in a single VCF. I used scatter-gather WDL scripts following https://software.broadinstitute.org/wdl/documentation/article?id=7614 .
I submit the job to one node which has 40 cores. I tried more cores (from 2 or more nodes), but it seems haplotypecaller only runs on one node. One node should have about 126G memory.
The problem is the job seems need too much memory and the node is unavailable to access.
Does the scatter-gather needs a lot memory ?

Part of my log file:
2018-03-28 21:20:13,921 INFO - BackgroundConfigAsyncJobExecutionActor [UUID(d71d6520)jointCallingGenotypes.HaplotypeCallerERC:11:1]: Status change from - to WaitingForReturnCodeFile
2018-03-28 21:20:13,921 INFO - BackgroundConfigAsyncJobExecutionActor [UUID(d71d6520)jointCallingGenotypes.HaplotypeCallerERC:53:1]: job id: 4800
2018-03-28 21:20:13,922 INFO - BackgroundConfigAsyncJobExecutionActor [UUID(d71d6520)jointCallingGenotypes.HaplotypeCallerERC:63:1]: Status change from - to WaitingForReturnCodeFile
2018-03-28 21:20:13,923 INFO - BackgroundConfigAsyncJobExecutionActor [UUID(d71d6520)jointCallingGenotypes.HaplotypeCallerERC:25:1]: Status change from - to WaitingForReturnCodeFile
2018-03-28 21:20:13,924 INFO - BackgroundConfigAsyncJobExecutionActor [UUID(d71d6520)jointCallingGenotypes.HaplotypeCallerERC:53:1]: Status change from - to WaitingForReturnCodeFile
2018-03-28 21:20:13,945 INFO - BackgroundConfigAsyncJobExecutionActor [UUID(d71d6520)jointCallingGenotypes.HaplotypeCallerERC:84:1]: job id: 4937
2018-03-28 21:20:13,955 INFO - BackgroundConfigAsyncJobExecutionActor [UUID(d71d6520)jointCallingGenotypes.HaplotypeCallerERC:84:1]: Status change from - to WaitingForReturnCodeFile

stderr of one shard-0/execution:
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/work/home/jiahuan/Heart_WES/HTX/cromwell-executions/jointCallingGenotypes/d71d6520-d369-43c8-bfce-f0521358f8e9/call-HaplotypeCallerERC/shard-0/execution/tmp.2jWDLA

note: GATK3.7

Thanks!

↧

Confusion in using gVCF mode

February 1, 2018, 4:57 am

≫ Next: Haplotype Caller ERC BP Res with nonref bases explicitly detailed

≪ Previous: Memory error when using scatter-gather (haplotypecaller + GenotypeGVCFs) on 100 WES samples?

I have problem in using HaplotypeCaller gVCF mode ( GATK4 best practices). Please let me know following problems:

1- Should we run gVCF even when we have one WES sample?

2- I have 3 WES samples, should I use gVCF --> Cosolidate --> GenotypeGVCF --> VCF or it is better to obtain VCF directly from HaplotypeCaller and ignore its next steps?

3- If I have 3-5 WES samples, is it better to run HaplotypeCaller with multiple input (bams) or separately?

Regards.

↧

Haplotype Caller ERC BP Res with nonref bases explicitly detailed

April 6, 2018, 7:00 am

≫ Next: HaplotypeCaller haploid GVCF format error?

≪ Previous: Confusion in using gVCF mode

Hi GATK Team! I was wondering if there is a way to get Haplotype Caller in ERC BP Resolution mode to emit details for every base for every position? That is, for each position rather than just emitting in the ALT column, or in some cases an alternate allele or multiple alternate alleles and , is it possible to get it to emit every single base that had support in any of the reads - that is, samtoolsmpileup-like information where for every position there is information on the number of reads for A/C/T/G?

In particular, I am hoping to do this for a given set of positions I am interested in - would be great if there is an option for this as well.

I was hoping to potentially use this rather than an mpileup since local reassembly is performed on these sites.

Thanks a lot!

Alon

↧

HaplotypeCaller haploid GVCF format error?

April 9, 2018, 8:31 pm

≫ Next: GATK4beta6 annotation incompatibility between HaplotypeCaller and GenomicsDBImport

≪ Previous: Haplotype Caller ERC BP Res with nonref bases explicitly detailed

Hi all,

I prepared a clean Bam file following GATK Best Practice and used GATK4 HaplotypeCaller to create a gvcf with ploidy1 option:

gatk-4.0.2.1/gatk HaplotypeCaller --native-pair-hmm-threads 24 -I KU_filtered_sorted_mdup.bam -O HC.KU.raw.snps.indels.g.vcf -R ref.fasta -ploidy 1 --emit-ref-confidence GVCF

When I validated the gvcf, ValidateVariants threw errors at the end:

<br />11:27:55.681 INFO  ProgressMeter - Traversal complete. Processed 124689522 total variants in 3.8 minutes.
11:27:55.681 INFO  ValidateVariants - Shutting down engine
[April 10, 2018 11:27:55 AM JST] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 3.82 minutes.
Runtime.totalMemory()=4682940416
java.lang.IllegalArgumentException: Illegal character in path at index 15:HC.KU.raw.snps.indels.g.vcf
    at java.net.URI.create(URI.java:852)
    at org.broadinstitute.hellbender.engine.FeatureInput.makeIntoAbsolutePath(FeatureInput.java:242)
    at org.broadinstitute.hellbender.engine.FeatureInput.toString(FeatureInput.java:314)
    at java.util.Formatter$FormatSpecifier.printString(Formatter.java:2886)
    at java.util.Formatter$FormatSpecifier.print(Formatter.java:2763)
    at java.util.Formatter.format(Formatter.java:2520)
    at java.util.Formatter.format(Formatter.java:2455)
    at java.lang.String.format(String.java:2940)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.close(FeatureDataSource.java:589)
    at org.broadinstitute.hellbender.engine.FeatureManager.lambda$close$9(FeatureManager.java:505)
    at java.util.LinkedHashMap$LinkedValues.forEach(LinkedHashMap.java:608)
    at org.broadinstitute.hellbender.engine.FeatureManager.close(FeatureManager.java:505)
    at org.broadinstitute.hellbender.engine.GATKTool.onShutdown(GATKTool.java:857)
    at org.broadinstitute.hellbender.engine.VariantWalker.onShutdown(VariantWalker.java:95)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:180)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:199)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:159)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:202)
    at org.broadinstitute.hellbender.Main.main(Main.java:288)
Caused by: java.net.URISyntaxException: Illegal character in path at index 15: /media/yoshi/My Book/Aet_v4.0_ChrSeqSplit/HC.KU-2103.raw.snps.indels.g.vcf
    at java.net.URI$Parser.fail(URI.java:2848)
    at java.net.URI$Parser.checkChars(URI.java:3021)
    at java.net.URI$Parser.parseHierarchical(URI.java:3105)
    at java.net.URI$Parser.parse(URI.java:3063)
    at java.net.URI.<init>(URI.java:588)
    at java.net.URI.create(URI.java:850)
    ... 19 more

When I ran GenotypeGVCFs using the gvcf, it ran to completion, but threw the same "java.lang.IllegalArgumentException" errors at the end.

My java is Java runtime: OpenJDK 64-Bit Server VM v1.8.0_162-8u162-b12-0ubuntu0.16.04.2-b12.

Could you please help me with this problem?
Many thanks.

↧

GATK4beta6 annotation incompatibility between HaplotypeCaller and GenomicsDBImport

January 2, 2018, 7:15 am

≫ Next: HaplotypeCaller crash expected haplotypes.size() >= eventsAtThisLoc.size() + 1

≪ Previous: HaplotypeCaller haploid GVCF format error?

Happy New Year!

I'm attempting to joint genotype ~1000 exomes using GATK4. I've run HC per sample with the following command:

java -Xmx7g -jar gatk-package-4.beta.6-local.jar HaplotypeCaller -ERC GVCF -G StandardAnnotation -G AS_StandardAnnotation --maxReadsPerAlignmentStart 0 -GQB 5 -GQB 10 -GQB 15 -GQB 20 -GQB 25 -GQB 30 -GQB 35 -GQB 40 -GQB 45 -GQB 50 -GQB 55 -GQB 60 -GQB 65 -GQB 70 -GQB 75 -GQB 80 -GQB 85 -GQB 90 -GQB 95 -GQB 99 -I example.bam -O example.g.vcf.gz -R /path/to/GRCh38.d1.vd1.fa

And then attempted to create a GenomicDB per chromosome with the following command:

java -Xmx70g -jar gatk-package-4.beta.6-local.jar GenomicsDBImport -genomicsDBWorkspace chrX_db --overwriteExistingGenomicsDBWorkspace true --intervals chrX -V gvcfs.list

I get the following error:

Exception: [January 2, 2018 9:36:26 AM EST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.09 minutes. Runtime.totalMemory()=2238185472 htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: Discordant field size detected for field AS_RAW_ReadPosRankSum at chrX:251751. Field had 4 values but the header says this should have 1 values based on header record INFO=<ID=AS_RAW_ReadPosRankSum,Number=1,Type=String,Description="allele specific raw data for rank sum test of read position bias"> at htsjdk.variant.variantcontext.VariantContext.fullyDecodeAttributes(VariantContext.java:1571) at htsjdk.variant.variantcontext.VariantContext.fullyDecodeInfo(VariantContext.java:1546) at htsjdk.variant.variantcontext.VariantContext.fullyDecode(VariantContext.java:1530) at htsjdk.variant.variantcontext.writer.BCF2Writer.add(BCF2Writer.java:176) at com.intel.genomicsdb.GenomicsDBImporter.add(GenomicsDBImporter.java:1232) at com.intel.genomicsdb.GenomicsDBImporter.importBatch(GenomicsDBImporter.java:1282) at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.traverse(GenomicsDBImport.java:443) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:838) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:119) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:176) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:195) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:137) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:158) at org.broadinstitute.hellbender.Main.main(Main.java:239)

Which refers the following line in one of the GVCFs:

chrX 251751 . G A,<NON_REF> 46.56 . AS_RAW_BaseQRankSum=|30,1,33,1|;AS_RAW_MQ=0.00|7200.00|0.00;AS_RAW_MQRankSum=|60,2|;AS_RAW_ReadPosRankSum=|5,1,20,1|;AS_SB_TABLE=0,0|0,0|0,0;DP=2;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1.00,0.00;RAW_MQ=7200.00 GT:AD:GQ:PL:SB 1/1:0,2,0:6:73,6,0,73,6,73:0,0,1,1

I haven't found a way to get past this error. I found this post from a while back with a very similar error:

https://gatkforums.broadinstitute.org/gatk/discussion/comment/43382#Comment_43382

But they seemed to indicate that it was fixed for them in GATK4beta6.

Any help/insight in to how to resolve it, or if its an unimportant annotation how to ignore it would be greatly appreciated. Thanks!

Ben

↧

HaplotypeCaller crash expected haplotypes.size() >= eventsAtThisLoc.size() + 1

April 16, 2018, 10:37 am

≫ Next: GenotypeGVCFs Estimated Runtime 5.9 YEARS!!!

≪ Previous: GATK4beta6 annotation incompatibility between HaplotypeCaller and GenomicsDBImport

I'm running HaplotypeCaller (GATK 4) with --alleles argument and it is crashing on a particular --alleles VCF file record with a stack trace with error "expected haplotypes.size() >= eventsAtThisLoc.size() + 1".

My command is:

java -jar $GATK4 HaplotypeCaller -R /share/carvajal-archive/REFERENCE_DATA/genomes/GRCh38_decoy_LCCpanel/Homo_sapiens_assembly38_LCCpanel.fasta -L chr16:58920000-58930000 -I HapCrash.bam -O HapCrashOut.vcf.gz --alleles HapCrash.vcf.gz --genotyping-mode GENOTYPE_GIVEN_ALLELES --verbosity DEBUG 2>&1 | tee HapCrashLog.txt

I'm attaching a log file of the output, as well as a copy of the HapCrash.bam file and HapCrash.vcf.gz file.

↧

GenotypeGVCFs Estimated Runtime 5.9 YEARS!!!

April 16, 2018, 9:41 am

≫ Next: HaplotypeCaller Error: SAM/BAM/CRAM Invalid GZIP header

≪ Previous: HaplotypeCaller crash expected haplotypes.size() >= eventsAtThisLoc.size() + 1

Hello,

I have 3 de novo transcriptomes for which I am trying to genotype all SNPs. Originally, I asked a question about whether the joint genotyping pipeline will correctly identify SNPs fixed in one sample (e.g. A/A, A/A, T/T). That question is posted here, and though I'm still unclear about the answer, I've encountered a much bigger problem.

Using GATK 4.0.1.2, my pipeline was this one:
Pre-processing BAM file using best practices -->
HaplotypeCaller (-ERC GVCF) on each sample separately -->
CombineGVCFs -->
GenotypeGVCFs -->
VariantFiltration

However, when I got to CombineGVCFs (v4.0), the program didn't work at all. It would read the vcf files in ("Using codec VCFCodec to read file...") and then freeze forever, even with huge amounts of memory.

I considered using GenomicsDBImport instead of CombineGVCFs, but could not find precise instructions on how to separate and then concatenate by interval with -L (remember these are transcriptomes, and there are 250,000+ contigs in the reference, so processing contigs separately is not trivial). There does not seem to be an established pipeline for doing this, although several threads (e.g. this one) have mentioned CatVariants and GatherVCFs. I tested GenomicsDBImport on the first contig using -L TRINITY_DN32849_c0_g1 (name of that contig), but received an error.

Based on this post, I decided instead to skip CombineGVCFs altogether, and try GenotypeGVCFs v3.8 directly, by importing all 3 samples there (-V).
java -jar $GATK_HOME \
-T GenotypeGVCFs \
-R $fa_file \
--variant output.229.g.vcf \
--variant output.230.g.vcf \
--variant output.231.g.vcf \
--useNewAFCalculator \
-nt $threads \
-o cohort.vcf

Despite using 32 threads, the estimated runtime is 307 weeks, or 6 YEARS! Obviously this won't work.

Is there any version of GATK that is capable of genotyping my samples, in a reasonable amount of time? I'm completely stuck, and ready to give up. Any help would be very much appreciated!

Paul

↧

HaplotypeCaller Error: SAM/BAM/CRAM Invalid GZIP header

April 28, 2018, 12:23 pm

≫ Next: HaplotypeCaller crash with long allele in --alleles VCF

≪ Previous: GenotypeGVCFs Estimated Runtime 5.9 YEARS!!!

This is my GATK (3.5-0-g36282e4) arguments Program Args:

-T HaplotypeCaller 
-R human_g1k_v37.22.fasta
-nct 16
-I ref.22.500x.bwamem.sorted.bqsr.bam
-I somatic_sim_af20_500x.bwamem.bqsr.bam
-I somatic_sim_het_500x.bwamem.sorted.bqsr.bam
-D All_20170403.vcf
-L 22
-o somatic_sim.hpcaller.22.vcf

This is the error message:

##### ERROR MESSAGE: SAM/BAM/CRAM file somatic_sim_het_500x.bwamem.sorted.bqsr.bam is malformed. .... Error details: Invalid GZIP header

The command worked fine the first time I ran it. However, I goofed and used the same ID in the read group for the het and af20 BAM file.

I ran

samtools addreplacerg ...

samtools view -H new_rg.bam >header.txt

Then I manually removed the old read group since I've noticed in the past, GATK will omit null genotypes for that sample

samtools reheader -P header.txt new_rg.bam >het.bam

Reindexed it from scratch and now I get this error.

I've used addreplacerg and HaplotypeCaller in the past successfully. However I never removed the older read groups.

samtools quickcheck -v het.bam

Returns no error, so I'm at a lost here. Do BAM files typically have GZIP headers?

Thanks for any help or insight.

↧

HaplotypeCaller crash with long allele in --alleles VCF

April 12, 2018, 2:30 pm

≫ Next: Does Haplotypecaller assign genotype for each allele separately?

≪ Previous: HaplotypeCaller Error: SAM/BAM/CRAM Invalid GZIP header

I am running HaplotypeCaller with the --alleles option and it is crashing. I've traced it to a very long line in the --alleles VCF file, with a very long allele. The line is:

chr4    1394536 .       AATGTGGAGTGCCCGCCTGCTCACACGTGCCCATGTGGAGTGCCCGCCTGCTCATGTGCCCATGTGGAGTGCCCGCCTGCTCACACATGTCGATGCGGAGTGCCCGCCTGCTCACACATGCCC     A,AATGTGGAGTGCCCGCCTGCTCATGTGCCCATGTGGAGTGCCCGCCTGCTCACACATGTCGATGCGGAGTGCCCGCCTGCTCACACATGCCC,CATGTGGAGTGCCCGCCTGCTCACACGTGCCCATGTGGAGTGCCCGCCTGCTCATGTGCCCATGTGGAGTGCCCGCCTGCTCACACATGTCGATGCGGAGTGCCCGCCTGCTCACACATGCCC .
    .       .

This VCF was generated by Mutect2 for creation of a panel-of-normals.

The command and output were:

twtoal@carcinos>          env -i LD_LIBRARY_PATH=/share/carvajal-archive/PACKAGES/local/miniconda/miniconda3_perl/envs/R_pipeline/lib/R/library/rJava/libs:/share/carvajal-archive/PACKAGES/local/miniconda/miniconda3_perl/envs/R_pipeline/jre/lib/amd64/server PATH=/bin:/share/carvajal-archive/PACKAGES/local/miniconda/miniconda3_perl/envs/R_pipeline/bin /share/carvajal-archive/PACKAGES/src/java/jdk8_latest/bin/java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Djava.io.tmpdir=/share/carvajal-archive/tmp -jar /share/carvajal-archive/PACKAGES/src/GATK/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar HaplotypeCaller --QUIET             -R /share/carvajal-archive/REFERENCE_DATA/genomes/GRCh38_decoy_LCCpanel/Homo_sapiens_assembly38_LCCpanel.fasta             -I DATA/VO-56/N2/VO-56N2.recal.bam             -O DATA/VO-56/N2/VO-56N2.PONdepths.vcf.gz             -L GLOBAL/PON4.bed             --alleles GLOBAL/PON6.vcf.gz             --genotyping-mode GENOTYPE_GIVEN_ALLELES             --annotation-group StandardAnnotation             --seconds-between-progress-updates 1             --verbosity INFO
14:23:07.181 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/carvajal-archive/PACKAGES/src/GATK/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
14:23:07.438 INFO  HaplotypeCaller - Initializing engine
14:23:08.919 INFO  FeatureManager - Using codec VCFCodec to read file file:///share/carvajal-archive/SEQ_DATA/PANELS/MSEQ_GC_Panel_03_28_2017/GLOBAL/PON6.vcf.gz
14:23:09.003 INFO  FeatureManager - Using codec BEDCodec to read file file:///share/carvajal-archive/SEQ_DATA/PANELS/MSEQ_GC_Panel_03_28_2017/GLOBAL/PON4.bed
14:23:09.014 INFO  IntervalArgumentCollection - Processing 10 bp from intervals
14:23:09.046 INFO  HaplotypeCaller - Done initializing engine
14:23:09.163 INFO  HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
14:23:11.216 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/share/carvajal-archive/PACKAGES/src/GATK/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
14:23:11.230 INFO  NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/share/carvajal-archive/PACKAGES/src/GATK/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
14:23:11.547 WARN  NativeLibraryLoader - Unable to load libgkl_pairhmm_omp.so from native/libgkl_pairhmm_omp.so (/share/carvajal-archive/tmp/twtoal/libgkl_pairhmm_omp3184435069396731524.so: /usr/lib/x86_64-linux-gnu/libgomp.so.1: version `GOMP_4.0' not found (required by /share/carvajal-archive/tmp/twtoal/libgkl_pairhmm_omp3184435069396731524.so))
14:23:11.547 INFO  PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported
14:23:11.547 INFO  NativeLibraryLoader - Loading libgkl_pairhmm.so from jar:file:/share/carvajal-archive/PACKAGES/src/GATK/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm.so
14:23:11.786 WARN  IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
14:23:11.786 WARN  IntelPairHmm - Ignoring request for 4 threads; not using OpenMP implementation
14:23:11.787 INFO  PairHMM - Using the AVX-accelerated native PairHMM implementation
14:23:11.941 INFO  ProgressMeter - Starting traversal
14:23:11.941 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Regions Processed   Regions/Minute
14:23:13.127 INFO  VectorLoglessPairHMM - Time spent in setup for JNI call : 9.267520000000001E-4
14:23:13.127 INFO  PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.008127312000000001
14:23:13.128 INFO  SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.10 sec
14:23:13.128 INFO  HaplotypeCaller - Shutting down engine
java.lang.IllegalArgumentException: expected haplotypes.size() >= eventsAtThisLoc.size() + 1
        at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:681)
        at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerGenotypingEngine.createAlleleMapper(AssemblyBasedCallerGenotypingEngine.java:152)
        at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods(HaplotypeCallerGenotypingEngine.java:131)
        at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:565)
        at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:218)
        at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:295)
        at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:271)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:893)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:134)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)

↧

Does Haplotypecaller assign genotype for each allele separately?

April 24, 2018, 6:16 pm

≫ Next: Does GenotypeGVCFs call SNPs fixed in only 1 sample?

≪ Previous: HaplotypeCaller crash with long allele in --alleles VCF

Hi,
I read how haplotypecaller assign genotype in the following link, it says that for each position haplotypecaller will calculate the best genotype, so I wonder, if the positions are close, then will haplotypecaller calculate genotypes separately for them? if so, will the selected haplotypes for these two positions be different?
https://software.broadinstitute.org/gatk/documentation/article.php?id=4442

↧

Does GenotypeGVCFs call SNPs fixed in only 1 sample?

April 9, 2018, 3:42 pm

≫ Next: GATK 3.8: Allele-specific Annotations

≪ Previous: Does Haplotypecaller assign genotype for each allele separately?

Apologies if this was addressed elsewhere, but I have looked carefully through the documentation. Simply put: using the joint analysis pipeline of GATK's HaplotypeCaller (-ERC GVCF) -> CombineGVCFs-> GenotypeGVCFs, are SNPs called when they are fixed in only 1 sample? For example, imagine 3 samples total, where the correct genotypes should be: A/A, A/A, T/T. Will this variant be called correctly, given sufficient coverage/quality? This is very important to what I am trying to accomplish, as I only have 3 samples, and would like to analyze SNPs that are fixed between samples. It's unclear to me that this pipeline will work, since HaplotypeCaller will not see any find these variants within samples individually. But maybe GenotypeGVCFs finds them? This is not clearly documented.

↧

GATK 3.8: Allele-specific Annotations

April 28, 2018, 2:16 pm

≫ Next: when to apply assembly-regon-padding step

≪ Previous: Does GenotypeGVCFs call SNPs fixed in only 1 sample?

I am using GATK-3.8.1 for HaplotypeCaller (using gVCF mode and then GenotypeGVCF) and I noticed that final VCF output from GenotypeGVCF has missing DP values.

I found a workaround that while doing gVCF calling if I skip the -G StandardAnnotation -G AS_StandardAnnotation parameters, but then if I keep those parameters during the GenotypeGVCF step, DP values are NOT missing anymore.

My question is if the values for 'Allele-specific annotations' in the Final VCF will be any different if you don't include those -G parameters while gVCF calling but include them during GenotypeGVCF mode COMPARED to when you include those during both steps (gVCF and GenotypeGVCF steps) ??

*I am not able to upgrade to GATK-4 yet, since we still want to utilize UnifiedGenotyper feature.

↧